KDH-CAD: Knowledge-data hybrid CAD learning under data scarcity

2026-06-01Graphics

GraphicsMachine Learning
AI summary

The authors address the problem of limited CAD (computer-aided design) data by combining a small amount of real CAD data with existing knowledge from textbooks and large pretrained models. Their approach, called KDH-CAD, fills in missing or unclear CAD concepts in these models using structured domain knowledge, then fine-tunes this understanding with a few labeled examples without changing the main model itself. They tested this on mechanical part classification and showed that their method performs as well or better than other methods that need much more data. This suggests their approach can help learn CAD tasks effectively even when data is scarce.

Computer-Aided Design (CAD)Deep LearningData ScarcityFoundation ModelsDomain KnowledgeKnowledge CompletionMechanical Part ClassificationPretrained ModelsLow-Data Learning
Authors
Ziqin Gao, Zhijie Yang, Qiang Zou
Abstract
Deep learning in computer-aided design (CAD) remains fundamentally constrained by the data scarcity challenge: authentic CAD data is difficult to collect at scale, while synthetic data may not faithfully reflect real design practice. Rather than pursuing ever-larger CAD datasets, this paper alternatively treats CAD learning as a knowledge completion and calibration problem. It introduces KDH-CAD, a knowledge-data hybrid framework that integrates pretrained knowledge in foundation models, structured domain knowledge from textbooks/tutorials, and a very small amount of labeled CAD data. Domain knowledge is used to elicit and complete CAD-relevant concepts that are weakly expressed or under-represented in pretrained foundation models, while labeled CAD data calibrates these concepts in the latent space to account for task-specific geometric variability, without fine-tuning the foundation model. Experiments on real-world mechanical part classification show that KDH-CAD achieves strong performance in low-data regimes, reaching 92.6\% accuracy with only 250 training samples, 95.8\% with 1,000 samples, and continuing to improve with additional data. This matches or exceeds state-of-the-art performance that typically requires an order of magnitude more data. These results suggest that combining pretrained foundation models with structured domain knowledge can substantially reduce reliance on large-scale CAD datasets, providing a principled and practical direction for data-efficient CAD learning.