ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

2026-03-17 • Robotics

RoboticsArtificial IntelligenceGraphicsMachine LearningSoftware Engineering

AI summaryⓘ

The authors created ManiTwin, a method that turns a single photo into a detailed 3D model ready for robot simulation and learning. They used this to build ManiTwin-100K, a big collection of 100,000 annotated 3D objects with properties like physical traits and functional details. Their work helps generate diverse and useful data for teaching robots to handle objects in simulation. Experiments show that ManiTwin makes it easier and faster to create useful 3D assets for robotic tasks and related research like random scene setup and visual question answering.

robotic manipulation3D asset generationsimulationsemantic annotationphysical propertiesfunctional annotationsdata generationpolicy learningscene synthesisvisual question answering

Authors

Kaixuan Wang, Tianxing Chen, Jiawei Liu, Honghao Su, Shaolong Zhu, Minxuan Wang, Zixuan Li, Yue Chen, Huan-ang Gao, Yusen Qin, Jiawei Wang, Qixuan Zhang, Lan Xu, Jingyi Yu, Yao Mu, Ping Luo

Abstract

Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generation. Using this pipeline, we construct ManiTwin-100K, a dataset containing 100K high-quality annotated 3D assets. Each asset is equipped with physical properties, language descriptions, functional annotations, and verified manipulation proposals. Experiments demonstrate that ManiTwin provides an efficient asset synthesis and annotation workflow, and that ManiTwin-100K offers high-quality and diverse assets for manipulation data generation, random scene synthesis, and VQA data generation, establishing a strong foundation for scalable simulation data synthesis and policy learning. Our webpage is available at https://manitwin.github.io/.

View PDFOpen arXiv