SUMO: Segment and Track Any Motion with Nonlinear State Space Models

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors present SUMO, a method that helps computers track and segment moving objects in videos without needing any prior training. They combine ideas from how robots predict movement with visual information to better handle complicated, unpredictable object motions. Their approach uses a special filter to estimate where objects are most likely located over time, improving accuracy. Tests show that SUMO performs very well compared to previous methods.

Visual Object TrackingMoving Object SegmentationState Space ModelNonlinear DynamicsUnscented FilterZero-shot learningObject trackingSegmentationTemporal object dynamicsMemory selection mechanism

Authors

Kexin Tian, Sixu Li, Keshu Wu, Yang Zhou, Zhengzhong Tu

Abstract

Visual Object Tracking (VOT) and Moving Object Segmentation (MOS) are two fundamental tasks in computer vision that involve both spatial and temporal object dynamics. Existing methods rely predominantly on visual cues and thus often falter in real-world scenarios where object motions are inherently complex and nonlinear. To address this limitation, we propose SUMO, a zero-shot, training-free, unified framework integrating nonlinear dynamics with vision-based segmentation for accurate and consistent VOT and MOS. Specifically, we develop a nonlinear State Space Model (SSM) inspired by robotics principles to capture the complex object dynamics. Building on this model, we propose a Selective Unscented Filter (SUF) for accurate state estimation, which features a joint scoring mechanism and dynamically fuses multi-source predictions to identify the most plausible object state over time. Furthermore, we apply a memory selection mechanism to evaluate the reliability of memory frames. Our extensive experimental results show that SUMO achieves state-of-the-art performance on both VOT and MOS tasks.

View PDFOpen arXiv