EdgeDAM: Real-time Object Tracking for Mobile Devices

2026-03-05 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors propose EdgeDAM, a new way to track objects in videos on devices like smartphones that have limited computing power. Unlike other methods that use heavy image segmentation, their approach uses a clever memory system to remember both recent positions of the target and tricky distracting objects to avoid confusion. They also add a smart switching method that helps the tracker stay accurate during occlusion or fast movement by freezing and adjusting the tracking box when needed. Tests on multiple datasets show their method works well and runs fast on mobile devices.

Single-object trackingEdge devicesDistractor-aware memoryBounding-box trackingOcclusionReal-time performanceLightweight trackerRe-identificationTemporal consistencyDetection-guided tracking

Authors

Syed Muhammad Raza, Syed Murtaza Hussain Abidi, Khawar Islam, Muhammad Ibrahim, Ajmal Saeed Mian

Abstract

Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under occlusion, distractor interference, and fast motion. However, recent state-of-the-art distractor-aware memory mechanisms are largely built on segmentation-based trackers and rely on mask prediction and attention-driven memory updates, which introduce substantial computational overhead and limit real-time deployment on resource-constrained hardware; meanwhile, lightweight trackers sustain high throughput but are prone to drift when visually similar distractors appear. To address these challenges, we propose EdgeDAM, a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints. EdgeDAM introduces two key strategies: (1) Dual-Buffer Distractor-Aware Memory (DAM), which integrates a Recent-Aware Memory to preserve temporally consistent target hypotheses and a Distractor-Resolving Memory to explicitly store hard negative candidates and penalize their re-selection during recovery; and (2) Confidence-Driven Switching with Held-Box Stabilization, where tracker reliability and temporal consistency criteria adaptively activate detection and memory-guided re-identification during occlusion, while a held-box mechanism temporarily freezes and expands the estimate to suppress distractor contamination. Extensive experiments on five benchmarks, including the distractor-focused DiDi dataset, demonstrate improved robustness under occlusion and fast motion while maintaining real-time performance on mobile devices, achieving 88.2% accuracy on DiDi and 25 FPS on an iPhone 15. Code will be released.

View PDFOpen arXiv