Trans2Occ: Voxel Occupancy Estimation and Grasp for Transparent Objects from Simulation to Reality

2026-06-01Robotics

Robotics
AI summary

The authors developed a method to help robots understand and pick up transparent objects using just one regular photo instead of complicated depth sensors. They trained their system using computer simulations that show many different materials and lighting situations. Their method predicts the 3D shape of objects by estimating which parts of space are occupied, which helps the robot decide how to grasp them. Tests show their approach works well both in simulations and with real robots without extra adjustments. This makes it a practical way to deal with tricky transparent items in robotics.

transparent objectsrobotic perceptionvoxel occupancysingle-view RGBsimulation training3D reconstructionrobot graspingdomain transfer
Authors
Yixuan Yang, Sha Zhang, Rui Li, Zhenfei Yin, Xinzhu Ma, Yiran Qin, Lei Bai, Xudong Xu, Shilin Shan, Wangmeng Zuo, Yanyong Zhang, Wanli Ouyang, Feng Zheng, Shixiang Tang, Dongzhan Zhou
Abstract
Transparent objects remain challenging for robotic perception due to unreliable depth sensing caused by refraction and reflection. While prior approaches rely on multi-view reconstruction or depth completion, they are often difficult to scale or deploy in real-world robotic systems. In this paper, we present a practical framework for transparent object perception and manipulation based on single-view RGB input. Our approach predicts voxel-space occupancy directly from a single image, providing a geometry-aware representation that supports downstream robotic grasping. To enable large-scale training, we construct a simulation pipeline that generates paired RGB images and voxel occupancy annotations under diverse materials and lighting conditions. We demonstrate that the predicted occupancy representation is robust to domain shifts and transfers effectively from simulation to real-world robotic setups without fine-tuning. A simple rule-based grasping strategy built on top of the occupancy further achieves reliable grasp performance on transparent objects. Extensive experiments in both simulation and real-world environments show that our framework provides accurate 3D understanding and enables practical manipulation of transparent objects. These results suggest that single-view occupancy prediction offers a scalable and effective solution for transparent object perception in robotics.