Humanoid-OmniOcc: Stereo-Based Full-View Occupancy Dataset for Embodied AI

2026-06-22 • Robotics

RoboticsComputer Vision and Pattern Recognition

AI summaryⓘ

The authors created a new dataset called Humanoid-OmniOcc to help robots better understand their surroundings in indoor spaces. Unlike previous datasets that are mostly made for self-driving cars, this one focuses on humanoid robots using panoramic stereo cameras. They designed a special process where real-world data helps make accurate simulations, which then generate training data to improve real-world performance. Their proposed model, Humanoid-OmniOcc, uses depth information to better predict 3D occupancy from 2D images and works well both in simulations and real tests.

Occupancy PredictionVoxelPanoramic StereoHumanoid RobotsSimulation-to-Real3D ReconstructionDepth PriorsEmbodied PerceptionIndoor MappingRobot Navigation

Authors

Xianda Guo, Bohao Zhang, Chenwei Huang, Shiyuan Chen, Ruilin Wang, Yiqun Duan, Cong Yang, Qin Zou, Wei Sui

Abstract

Occupancy prediction at voxel-level granularity is essential for safe robotic navigation and interaction in complex environments. Existing occupancy datasets, however, are predominantly designed for autonomous driving with vehicle-centric biases -- forward-facing cameras, far-field geometry, and static road priors -- limiting their applicability to embodied humanoid perception. We present Humanoid-OmniOcc, a large-scale panoramic stereo-based occupancy dataset tailored for humanoid robots. The dataset encompasses 15 diverse simulated indoor scenes and 5 real-world environments, yielding over 155K samples with broad scene and style diversity. Importantly, the dataset is designed around a Real2Sim2Real closed-loop paradigm: real sensor specifications drive physically accurate simulation, simulation produces large-scale annotated training data, and models trained in simulation are directly evaluated on real-world captures -- enabling iterative refinement of the sim-to-real pipeline. We further propose \textbf{H}umanoid \textbf{S}urround \textbf{S}tereo-guided \textbf{Occ}upancy model (Humanoid-OmniOcc) that exploits robust depth priors for accurate 2D-to-3D lifting. Extensive experiments show that Humanoid-OmniOcc consistently outperforms monocular baselines and generalizes well to both unseen simulated test scenes and real-world environments, validating the effectiveness of the Real2Sim2Real design. Code and data will be available upon acceptance at https://d-robotics-ai-lab.github.io/humanoid-omniocc.

View PDFOpen arXiv