AerialMetric: Benchmarking and Adapting UAV Monocular Metric Depth Estimation in the Real World
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors focus on teaching computers to estimate how far away things are using single aerial drone images, which is harder than for regular street or indoor photos. They created a new dataset called AerialMetric that includes a mix of real and synthetic images from drones, covering a variety of situations and viewpoints. Using this data, they tested current depth estimation models and improved them specifically for aerial views by fine-tuning. Their work offers a new benchmark that better matches the challenges of estimating distance from drone images. They also provide their dataset, code, and trained models for others to use.
monocular depth estimationUAV imageryaerial viewpointsmetric depthphotogrammetrydataset benchmarksynthetic scenesfine-tuningdomain adaptationcamera parameters
Authors
Zhongqiang Song, Guanying Chen, Yuqi Zhang, Yin Zou, Chuanyu Fu, Zhiyuan Yuan, Chuan Huang, Shuguang Cui, Xiaochun Cao
Abstract
This paper addresses the problem of monocular metric depth estimation in aerial UAV imagery. Although recent data-driven methods have achieved remarkable progress in ground-level scenarios, models trained primarily on street-view and indoor datasets exhibit significant domain gaps when applied to aerial viewpoints. To tackle these challenges, we introduce AerialMetric, a benchmark dataset designed to evaluate and facilitate the adaptation of monocular metric depth estimation under UAV aerial viewpoints. The dataset consists of four complementary subsets collected from different sources, jointly covering real-world photogrammetry data, controlled aerial acquisition settings, photorealistic synthetic scenes, and in-the-wild Internet imagery. Totally, AerialMetric provides 52K real-world and 16K synthetic image-depth pairs with reliable metric ground truth. Based on this dataset, we conduct systematic evaluations of existing state-of-the-art models under aerial settings and investigate the impact of viewpoint, altitude, and camera parameters on metric depth prediction. In addition, by fine-tuning representative metric depth model on our dataset, we establish a comprehensive aerial benchmark and achieve state-of-the-art performance across diverse aerial imagery. Our dataset, code, and model weight are publicly available at https://kuieless.github.io/AerialMetric-ECCV2026-page/.