Location Prior Generation via Multi-Source Urban Data Fusion for Low-Altitude Air Mobility

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address the problem that most global maps lack building height data, which is important for drones and other low-flying devices. They created a system called LPGF that combines satellite images, GPS data, and map info to estimate building heights using three steps based on available data quality. When possible, they improve estimates with a shadow-based method but only if certain quality checks pass. Their tests in Milan show the system reliably creates 3D building data with reasonable accuracy, helping to fill in missing height information for urban areas.

building height3D urban dataSentinel-2 imageryUAV telemetryGPS trajectoriesOpenStreetMapshadow-based height estimationdata fusionmean absolute error

Authors

Xiang Xie, Xiaonan Liu

Abstract

Building height, the third dimension (3D) of urban spatial data, is absent in over 95% of structures in global geospatial databases. For the emerging low-altitude economy, this data gap forces each aerial platform to rely on real-time onboard sensing rather than pre-computed 3D scene geometry. We present the Location Prior Generation Framework (LPGF), a multi-source data fusion pipeline that integrates Sentinel-2 imagery, UAV telemetry, vehicle GPS trajectories, and OpenStreetMap footprints into structured, reusable urban location priors. LPGF assigns building heights through a three-tier priority hierarchy: (1) explicit OSM height tags where available, (2) floor count multiplied by 3.2 m per story where recorded, and (3) building-type default heights otherwise, yielding a worst-case error of approximately 5.5 m. An optional shadow-based height estimation module (SHEM) is activated only when a four-criterion quality gate is satisfied; when any criterion fails, the pipeline routes to structured fallback. On the MiTra A50 Milan dataset, the quality gate correctly identified two imaging failure modes: sub-pixel shadows at 10 m GSD and ground shadow merging at 0.93 m GSD, producing a consistent 27-building prior in both cases. Tier 3 type-default heights were validated against manual floor counts (n=15), achieving MAE=3.07 m within the 5.0 m uncertainty bound. The framework demonstrates that structured, quality-gated fusion of universally available data streams can bootstrap 3D scene coverage for low-altitude urban operations.

View PDFOpen arXiv