DeCoDrift: Stabilizing Decoder Coupling in Closed-Loop Foundation Segmentation

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors studied how popular segmentation models like SAM work when they are used repeatedly on the same data, feeding each new result back as input. They found a problem called "decoder coupling drift," where the model's attention shifts away from the target object over time, causing more errors. By analyzing internal behaviors of the model, the authors designed a method called DeCoDrift that improves the model's focus and stability during these repeated steps without needing extra training. Their work shows that looking inside the model can help fix problems in how these systems work over multiple iterations.

Segment Anything Model (SAM)segmentationmask decodercross-attentiondecoder coupling driftiterative promptingtemporal consistencyproximal anchoringclosed-loop dynamical systemattention stability

Authors

H. M. Shadman Tabib, Md. Shamsuzzoha Bayzid, M Sohel Rahman

Abstract

Foundation segmentation models such as Segment Anything Model (SAM) are now routinely used in iterative pipelines, where each predicted mask is fed back as the next prompt. This practice turns segmentation into a closed-loop dynamical process, yet the decoder-level behavior of these systems remains largely unexamined. We show that this feedback loop can induce a previously overlooked failure mode, decoder coupling drift, in which the mask decoder's cross-attention progressively loses alignment with the target object, causing errors to accumulate across iterations. We study this phenomenon by instrumenting SAM's mask decoder and deriving ground-truth-free measures of prompt-image coupling, attention stability, and temporal consistency. On volumetric electron microscopy data, these decoder-internal signals reveal that standard iterative prompting systematically degrades attention alignment and temporal coherence relative to oracle-anchored feedback. We then formalize iterative prompting as a discrete-time dynamical system and show how proximal anchoring reduces error amplification in the feedback loop. Building on this analysis, we introduce DeCoDrift, a training-free inference-time stabilization framework that constrains prompt updates and preserves decoder coupling across iterations. Across extensive experiments, DeCoDrift consistently improves attention stability, temporal coherence, and segmentation quality over standard iterative prompting, without retraining or ground-truth supervision. More broadly, our results show that decoder-internal dynamics are not merely diagnostic: they provide actionable signals for stabilizing foundation segmentation models in closed-loop use.

View PDFOpen arXiv