Cross-Resolution Semantic Transfer for Robust Text-to-Image Retrieval in Low-Resolution Surveillance

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors study how to find people in images using text descriptions, especially when the images have different resolutions like blurry or clear ones. They identify two main problems caused by mixed image qualities that mess up the matching process. To fix this, they design a new method that estimates which parts of the image are reliable, uses text clues to improve matching, and adjusts rankings to work well across resolutions. Their experiments show better person search results on multiple datasets, especially for very low-resolution images, without hurting performance on clear images.

Text-to-image person re-identificationResolution varianceCross-resolution retrievalCLIPRanking distribution driftEvidence reliability collapseSemantic transferCUHK-PEDESmAPRank-1 accuracy

Authors

Wenjie Qian, Bin Yang, Xiao Wang, Wenke Huang, Ling Mei, Xin Xu, Mang Ye

Abstract

Text-to-image person re-identification (TIPR) retrieves target persons using natural language descriptions. However, existing methods largely overlook resolution variance in real-world surveillance. They characterize cross-resolution TIPR through two coupled failure modes: Evidence Reliability Collapse (ERC), where degraded visual tokens become unreliable for grounding fine-grained text, and Ranking Distribution Drift (RDD), where mixed-resolution galleries distort similarity neighborhoods and destabilize retrieval rankings. To address this challenge, we propose Cross-Resolution Semantic Transfer (CRST), a CLIP-style framework with three modules: resolution-conditioned reasoning, text-guided refinement and CR-RDA. Resolution-conditioned reasoning estimates token reliability to suppress corrupted evidence. Text-guided refinement injects semantic priors to recover discriminative cues. CR-RDA transfers HR neighborhood geometry to stabilize LR ranking under mixed resolutions. Experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid show that CRST improves ultra-low-resolution Rank-1 and mAP on average by 5.7% and 5.3%, while stabilizing mixed-resolution retrieval without sacrificing high-resolution accuracy.The code will be made publicly available.

View PDFOpen arXiv