Arko-T: A Foundation Model for Text-to-Structured 3D Generation

2026-06-29 • Machine Learning

Machine Learning

AI summaryⓘ

The authors created Arko-T, a model that turns simple text into editable 3D design files instead of just pictures of shapes. Unlike previous systems, Arko-T produces parametric CAD programs that can be changed and customized later. They built the system to keep the design's important features and logic intact throughout the process. When tested against other big language models, Arko-T performed best on most benchmarks while using much less computing power. This shows that training models specifically for design tasks can be very effective even at a smaller scale.

Text-to-3DParametric CADExecutable designNatural language processingCode normalizationDesign stateLarge Language ModelsCAD programsData curationBenchmarking

Authors

Liang Wang, Zhaoyang Xi, Zekai Xiang, Heng Meng, Qishan Zhang, Pingyi Zhou, Jin Liu, Litao Chen

Abstract

Text-to-3D systems can now synthesize a mechanical part from a single sentence, yet the result is a shape to render, not a design to edit. We present Arko-T, a 4B-parameter text-to-design model that maps natural-language intent directly into executable, parametric CAD programs. Rather than optimizing for code executability alone, Arko-T aligns every stage of the pipeline to a formal notion of design state, so that data curation, code normalization, and execution-grounded supervision all work to preserve the features, parameters, and construction logic that make a CAD artifact editable. Benchmarked against seven frontier LLMs across 12 metrics, Arko-T attains the best score on 8 and the second-best on 3 more, at roughly one-tenth the per-benchmark cost. The results suggest that targeted design-level training at moderate scale can match frontier general-purpose models on structured CAD generation.

View PDFOpen arXiv