Multimodal LLM-Empowered Re-Ranking for Generalizable Person Re-Identification
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors studied how to improve recognizing people across different camera systems when the system hasn't seen data from those cameras before. Instead of just training better models, they focused on improving the way results are refined after an initial guess, called re-ranking. They used a large multimodal language model, adapted to this task, to create a new distance measure that works better across different environments. This method helps fix the issues caused by differences between datasets and boosts performance without changing existing models. Their experiments show consistent improvements on various benchmarks.
Person Re-IdentificationDomain GeneralizationRe-rankingMultimodal Large Language ModelsDistance MetricDomain GapFine-tuningEncoderPerformance Benchmark
Authors
Jiachen Li, Xiaojin Gong
Abstract
Domain Generalizable (DG) person re-identification (Re-ID) has attracted growing research interest due to its potential for deployment in unseen real-world scenarios. Most existing approaches address DG Re-ID by focusing on training domain-generalizable encoders but ignore the possible refinements in inference stage. In contrast, this work explores an alternative direction which improves inference re-ranking to enhance DG Re-ID. Conventional re-ranking methods typically rely on neighborhood-based distances to refine the initial ranking list, inherently depending on features produced by the Re-ID encoder. However, they deteriorate on target domains since the encoder lacks sufficient generalizability to produce reliable feature distances on unseen scenarios. Inspired by the remarkable generalization capabilities of recent Multimodal Large Language Models (MLLMs), we propose an MLLM-empowered distance metric to improve re-ranking in DG Re-ID. Specifically, we first adapt an MLLM to Re-ID data through supervised fine-tuning, which incorporates a domain-agnostic prompt and a query-candidate hard mining scheme. Then, the adapted MLLM is employed to compute a $μ$-distance during inference, which is robust to domain gap and significantly enhances subsequent re-ranking performance. Our approach is model-agnostic and can be seamlessly integrated into previous re-ranking frameworks. Extensive experiments demonstrate that our approach consistently yields substantial performance improvements across multiple DG Re-ID benchmarks. The code of this work will be released at https://github.com/RikoLi/MUSE soon.