ReCoVR: Closing the Loop in Interactive Composed Video Retrieval

2026-05-11Information Retrieval

Information Retrieval
AI summary

The authors focus on improving video search systems where a user starts with a video and then uses text instructions to find a better match, doing this over multiple steps rather than just once. They found that current systems only handle one step and don’t check whether the search is going off track. To fix this, they created ReCoVR, a system that not only uses user feedback but also looks back at past searches to adjust and improve future results. Their tests showed that ReCoVR works better than earlier methods on popular video search tests.

composed video retrievalinteractive retrievalmulti-turn searchuser feedbackdual-pathway architecturereflexive perceptionretrieval trajectoryR@1 metricWebVid-CoVR dataset
Authors
Bingqing Zhang, Yi Zhang, Zhuo Cao, Yang Li, Xue Li, Jiajun Liu, Sen Wang
Abstract
Composed video retrieval (CoVR) searches for target videos using a reference video and a modification text, but existing methods are restricted to a single interaction round and cannot support the progressive nature of real-world visual search. To bridge this gap, we first formalize interactive composed video retrieval, a multi-turn extension of CoVR, where users progressively refine their search intent through natural-language feedback across turns. Adapting existing interactive retrieval methods to this setting reveals two structural weaknesses: reliance on a single retrieval channel and an open-loop retrieval design that consumes user feedback but does not diagnose whether its own retrieval trajectory is drifting or stagnating. To address these limitations, we propose ReCoVR (Reflexive Composed Video Retrieval), a dual-pathway architecture built on reflexive perception, where the system treats its retrieval history as diagnostic evidence alongside user feedback. Specifically, an Intent Pathway routes heterogeneous feedback to complementary retrieval channels, while a Reflection Pathway performs trajectory-level reflection to monitor result evolution and correct retrieval errors across turns. Experiments on multiple benchmarks show that ReCoVR consistently outperforms interactive baselines, notably achieving 74.30% R@1 after just one interactive round on the WebVid-CoVR-Test dataset.