CylindTrack: Depth-Aware Cylindrical Motion Modeling for Panoramic Multi-Object Tracking
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors developed CylindTrack, a new method to track many objects in videos from panoramic cameras, which capture a full 360° view. They address challenges like objects crossing the image edges and unreliable depth measurements by using a special depth tracking called Depth-Temporal Trajectory Modeling and making the tracking aware of the spherical shape of the panorama. Their approach keeps track of object positions smoothly over time and handles the unique wrapping motion around the panoramic image. This helps maintain better object identities and continuous tracking in these wide-angle videos.
Multi-Object Tracking (MOT)Panoramic camerasEquirectangular projectionDepth estimationTrajectory modelingTemporal filteringSpherical geometryMotion predictionIdentity preservationTracking-by-detection
Authors
Buyin Deng, Kai Luo, Lingxin Huang, Xinqi Liu, Fei Cheng, Hang Zheng, Liming Yin, Kailun Yang
Abstract
Multi-Object Tracking (MOT) is a core capability for embodied perception, and panoramic cameras are attractive for embodied systems because their 360° field of view reduces blind spots and keeps surrounding targets observable for longer durations. However, panoramic MOT is not a straightforward extension of perspective MOT. In equirectangular panoramic videos, the horizontal image domain is periodic rather than Euclidean, which breaks planar motion assumptions and makes IoU-based association unreliable near the 0°/360° seam. Meanwhile, large-FoV scenes often contain more objects, stronger scale variation, and more frequent interactions, making online association particularly sensitive to unstable frame-wise depth cues. To address these issues, we propose CylindTrack, a depth-aware cylindrical tracking-by-detection framework for panoramic MOT. CylindTrack first introduces Depth-Temporal Trajectory Modeling (DTM), which promotes instance depth from an isolated frame-wise cue to a temporally filtered trajectory-level state. To improve the reliability of depth observations, we further develop Spherical Spatio-Temporal Consistency Learning (SSTC), which combines a Temporal Mixer and Spherical Geometry-aware Attention to enhance temporal coherence and panoramic geometric alignment in depth-aware representations. Finally, we design a Topology-Aware Cylindrical Motion Model (TCMM) that lifts horizontal motion into a continuous angular state space and performs seam-consistent motion prediction and association in the periodic panoramic domain. By jointly modeling trajectory-level depth consistency and panoramic topology, CylindTrack improves identity preservation and trajectory continuity in challenging panoramic scenes. The source code will be released at https://github.com/warriordby/CylindTrack.