UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CAD

2026-06-03 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors created UniCAD, a new all-in-one test set for teaching computers to understand and create 3D engineering models from different types of inputs like text, images, and sketches. They also developed UniCAD-MLLM, a smart program that can handle many tasks related to CAD using various input types in one system. Their tests show this program works better than other specialized or combined systems. They plan to share their data and tools to help others do more research.

Computer-Aided Design3D modelingmulti-modal learningmulti-task learningbenchmark datasetlarge language modelpoint cloudsCAD reconstructionCAD generationquestion answering

Authors

Jingyuan Chen, Sheng Jin, Haopeng Sun, Wentao Liu, Chen Qian

Abstract

Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.

View PDFOpen arXiv