HDRAgent: An Agentic Framework for Multi-Exposure HDR Imaging

2026-06-08 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors propose HDRAgent, a new system for creating high-quality HDR images from multiple exposures, especially in scenes with lots of motion. Unlike older methods that use a fixed approach, their system adapts its strategy based on what it sees in the scene using a special module that matches current situations with past examples and tools. They also include a feedback loop to improve the process over time and a way to handle extreme motion by reconstructing parts of the image using guidance from a reliable reference frame. Tests show their method reduces ghosting effects and improves image quality compared to existing methods.

HDR imagingmulti-exposureghosting artifactslarge language modelscontextual knowledge matchingadaptive reconstructionalignment methodsfeedback mechanismdynamic scenesgenerative alignment

Authors

Weiyu Zhou, Tao Hu, Yijian Wang, Xiaogang Xu, Ruixing Wang, Qingsen Yan

Abstract

Most existing multi-exposure HDR methods follow a fixed feed-forward reconstruction paradigm, making them prone to ghosting artifacts in complex dynamic scenes. To address this issue, we propose HDRAgent, the first agent-driven framework for HDR imaging, which adaptively selects reconstruction strategies according to the current scene conditions. Specifically, to provide scene-specific prior knowledge, we introduce a fine-grained contextual knowledge matching (FCM) module. This module leverages multimodal large language model (MLLM)-derived scene perception to retrieve relevant historical cases and tool knowledge, organizing them into structured evidence for MLLM-based adaptive tool scheduling. In addition, we propose a perception--distortion feedback mechanism that transforms post-execution quality assessment and artifact diagnosis into structured feedback, which is accumulated in historical memory to help subsequent contextual knowledge refinement and strategy selection. Furthermore, considering that extreme motion can invalidate alignment methods, we design an agent-guided generative alignment strategy that uses MLLM-based dynamic-region parsing to reconstruct unreliable contents in non-reference frames under reference-frame guidance. Experiments demonstrate that HDRAgent effectively reduces ghosting and local artifacts while achieving competitive or superior objective performance and visual quality.

View PDFOpen arXiv