OP3DSG: Open-Vocabulary Part-Aware 3D Scene Graph Generation for Real-World Environments

2026-06-29Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionRobotics
AI summary

The authors present OP3DSG, a new system for creating detailed 3D scene graphs that not only identify objects but also their parts and how these parts relate functionally and spatially. Unlike previous methods that focused mainly on whole objects, their approach better captures small and interactive parts by combining knowledge-based detection and 3D data fusion. They also use a smart method involving geometry and language models to build more accurate relational graphs. To test their system, they created UniGraph3D, a new benchmark for evaluating part-aware 3D scene understanding. Their experiments show OP3DSG performs better than existing methods and can help with real-world robotics tasks.

3D scene graphsopen-vocabulary generationobject-part detectionspatial relationsfunctional relationsaffordances3D fusionlarge language models (LLM)robotics perceptionbenchmark dataset
Authors
Yirum Kim, Ue-Hwan Kim
Abstract
3D scene graphs (3DSGs) provide a compact and structured abstraction of 3D environments. Although advances in foundation models have enabled open-vocabulary 3DSG generation, existing approaches remain object-centric and encode limited relational information -- restricting their applicability in real-world scenarios that require fine-grained understanding. We propose OP3DSG, an open-vocabulary part-aware 3DSG generation framework that constructs unified graphs that jointly model objects, interactive parts, spatial relations, functional relations, and affordances. OP3DSG integrates object-part knowledge-guided detection with part-aware 3D fusion to preserve small and interaction-relevant components, and employs a geometry-initialized prior graph with LLM-based refinement to reduce spurious relational predictions while enabling efficient graph construction. To systematically evaluate unified 3D scene graph construction, we introduce UniGraph3D, a benchmark designed for part-aware perception and multi-level relational reasoning. Experimental results show that OP3DSG achieves state-of-the-art performance and demonstrates its effectiveness as a perception backbone in diverse real-world robotics tasks.