LangRetrieval: Language-Guided Self-Evolving Satellite-to-Radar Retrieval via CSI-Driven Reward
2026-06-08 • Multimedia
Multimedia
AI summaryⓘ
The authors address the challenge of estimating ground rainfall from satellite images, which is difficult because similar satellite data can mean very different rain conditions on the ground. They propose LangRetrieval, a method that uses language-like descriptions of weather to help improve these estimates. Their approach combines these weather descriptions with the satellite data in a smart way that learns and improves itself over time to give more accurate rainfall predictions. This method helps to better understand complex weather scenes from satellite images.
Satellite-to-radar retrievalGeostationary satellitePrecipitation monitoringConditional flow matchingMeteorological semanticsCross-attentionGroup Relative Policy OptimizationCritical Success IndexConvective storms
Authors
Chunlei Shi, Junming Hou, Yi-Lin Wei, Jiong Wang, Yecheng Zhang, Yichao Dong, Wenqi Ren, Dan Niu
Abstract
Satellite-to-radar (S2R) retrieval estimates ground radar precipitation from geostationary satellite observations, providing a critical solution for precipitation monitoring in radar-sparse regions. However, S2R retrieval is intrinsically ill-posed: similar cloud-top radiances can correspond to distinct precipitation regimes, storm organizations, and surface intensities, which are difficult to uniquely determine the underlying meteorological state from local spectral cues alone. Meteorological semantics offer complementary scene-level information that can help resolve this ambiguity. Yet existing static semantic conditioning is often insufficient, as externally predefined semantics cannot adapt to dynamic convective scenes or align with retrieval objectives. To this end, we propose LangRetrieval, a language-guided conditional flow matching (CFM) framework that establishes a closed-loop optimization mechanism between meteorological semantics and retrieval accuracy. Specifically, LangRetrieval consists of two core components: (i) Semantic Warm-up: structured meteorological attributes are injected into the CFM backbone through cross-attention conditioning, enabling continuous semantic guidance throughout the generation trajectory; and (ii) Self-Evolving Semantic Optimization: a lightweight attribute policy is first initialized from vision-language model annotations and subsequently refined via Group Relative Policy Optimization (GRPO) using multi-threshold Critical Success Index (CSI) rewards, enabling semantic generation to evolve directly toward improved retrieval accuracy.