UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CAD
2026-06-03 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors created UniCAD, a new all-in-one test set for teaching computers to understand and create 3D engineering models from different types of inputs like text, images, and sketches. They also developed UniCAD-MLLM, a smart program that can handle many tasks related to CAD using various input types in one system. Their tests show this program works better than other specialized or combined systems. They plan to share their data and tools to help others do more research.
Computer-Aided Design3D modelingmulti-modal learningmulti-task learningbenchmark datasetlarge language modelpoint cloudsCAD reconstructionCAD generationquestion answering
Authors
Jingyuan Chen, Sheng Jin, Haopeng Sun, Wentao Liu, Chen Qian
Abstract
Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.