Hierarchical Object Representation for Spatial Robot Perception: Points, Meshes, and Superquadrics

2026-06-01Robotics

Robotics
AI summary

The authors focus on improving how robots understand the shapes of objects around them by creating a detailed way to represent objects in 3D scenes. They build a system that starts from raw sensor images and creates layered models of objects, ending with simple shapes called superquadrics that make calculations easier. Their method helps robots better recognize places, avoid collisions, and align maps accurately in both indoor and outdoor areas. They tested their approach with real robot data and found it works better than previous methods. Their work helps make robot navigation safer and more reliable in cluttered environments.

Hierarchical 3D Scene GraphssuperquadricsRGB-D images3D object reconstructionrobot navigationmap alignmentcollision checkingopen-set object scenesrobot autonomydense 3D meshes
Authors
Ceng Zhang, Wan Su, Mohamed Samshad, Gregory S. Chirikjian, Rajat Talak
Abstract
Hierarchical 3D Scene Graphs (3DSG) have emerged as an actionable and scalable representation for long-term autonomy incorporating metric, semantic, and topological information in the scene. However, the question of geometric representation of objects in 3DSG has been overlooked as most methods use simplified geometric models such as partial point clouds or 3D bounding boxes. In this work, we introduce a hierarchical object representation that can be leveraged for high-fidelity object-level reconstruction, object-based robust re-localization or map alignment, and efficient and analytical collision checking for safe robot navigation planning in dense and cluttered environments. The representation is structurally organized into four distinct layers, progressively abstracting the scene from raw sensor data to dense 3D meshes to analytical primitives such as superquadrics, which provide a sparse and analytical representation for object geometry. We develop a pipeline that builds the hierarchical object representation from RGB-D image stream captured by a robot, and demonstrate its working in real-world open-set object scenes in both indoor and outdoor environments. Extensive experiments across diverse datasets including HOPE, ReplicaCAD, Kimera-Multi, and NUS Campus Dataset collected using Unitree B2 Robot validate our pipeline in both indoor and outdoor environments. We show that our superquadric-based map alignment method outperforms the current state-of-the-art object based map alignment method ROMAN. Our code can be found at https://github.com/perceptica-robotics/Hickory.