TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
2026-04-10 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors created TouchAnything, a method to guess the 3D shape of objects by combining a few touch points with visual knowledge from a large pretrained image model. Because touching only gives limited local info, they use this image model as a guide to fill in the gaps and make a good guess of the whole shape. Their approach works better than earlier methods and can handle many different objects, even ones it hasn't seen before. This helps in tasks where vision alone is not enough to understand an object's shape.
3D reconstructiontactile sensingdiffusion modelgeometric prioroptimizationobject geometry estimationrobotic manipulationvision and touch fusionopen-world reconstructionpretrained models
Authors
Langzhe Gu, Hung-Jui Huang, Mohamad Qadri, Michael Kaess, Wenzhen Yuan
Abstract
Accurate object geometry estimation is essential for many downstream tasks, including robotic manipulation and physical interaction. Although vision is the dominant modality for shape perception, it becomes unreliable under occlusions or challenging lighting conditions. In such scenarios, tactile sensing provides direct geometric information through physical contact. However, reconstructing global 3D geometry from sparse local touches alone is fundamentally underconstrained. We present TouchAnything, a framework that leverages a pretrained large-scale 2D vision diffusion model as a semantic and geometric prior for 3D reconstruction from sparse tactile measurements. Unlike prior work that trains category-specific reconstruction networks or learns diffusion models directly from tactile data, we transfer the geometric knowledge encoded in pretrained visual diffusion models to the tactile domain. Given sparse contact constraints and a coarse class-level description of the object, we formulate reconstruction as an optimization problem that enforces tactile consistency while guiding solutions toward shapes consistent with the diffusion prior. Our method reconstructs accurate geometries from only a few touches, outperforms existing baselines, and enables open-world 3D reconstruction of previously unseen object instances. Our project page is https://grange007.github.io/touchanything .