Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control
2026-06-01 • Robotics
Robotics
AI summaryⓘ
The authors created a system that helps robots or remote-controlled devices avoid obstacles safely by using a camera and recognizing what kind of objects they are seeing. Instead of treating all obstacles the same, their method considers how risky each object type is and adjusts the safety margins accordingly. They combine 3D scene reconstruction with object recognition to build a special map that shows distances to objects along with their risk levels. This map is used in real-time to guide movement decisions, making navigation safer and more aware of the environment. Tests showed the system works quickly and effectively for both autonomous and human-controlled navigation.
Control Barrier Function (CBF)Euclidean Signed Distance Field (ESDF)monocular RGB videosemantic segmentationSLAM (Simultaneous Localization and Mapping)safe navigationteleoperation3-D geometry reconstructionsemantic riskonline perception-to-control
Authors
Dawei Zhang, Nuo Chen, Shuo Liu, Roberto Tron, Zhiwen Fan
Abstract
We propose an online monocular perception-to-control framework that embeds semantic risk into the distance field used by Control Barrier Function (CBF)-based safe navigation and teleoperation. Many perception-based safety filters assign the same distance-based safety margin to all mapped obstacles or use semantics only as a downstream controller adjustment, rather than encoding semantic risk in the spatial representation. Our framework instead reasons online about obstacle geometry and class-dependent risk by embedding semantic information directly into the Euclidean Signed Distance Field (ESDF). This design encodes semantic risk before control optimization, so high-risk objects exert a larger spatial influence in the safety field while retaining efficient ESDF queries at runtime. Specifically, a foundation-model-based SLAM front end reconstructs dense 3-D geometry from monocular RGB video, while per-frame semantic segmentation provides pixel-level class labels that are fused into the reconstructed geometry. The resulting geometric-semantic representation is then converted into an ESDF, where semantic labels identify safety-relevant regions and impose class-dependent inflation before field computation. The semantic-aware ESDF provides the local distance values and spatial derivatives required by the CBF controller, while class-dependent gains further regulate the controller response. Extensive simulation and hardware experiments demonstrate online operation at 10--20 Hz and semantic-aware safe behavior in both teleoperation and autonomous navigation.