Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address a challenge in making video segmentation consistent over time for drones flying low, where large flat areas cause noise in motion detection. They created a simple, non-learned gate that decides whether to use a homography (a flat surface model) or optical flow (motion tracking) for different image regions, based on RANSAC statistics. This gate helps combine both approaches effectively, improving segmentation accuracy by about 4-5% without adding many trainable parameters. Their analysis also shows that errors in motion flow tend to cluster in flat areas, which the gate helps correct, greatly improving temporal stability in suitable regions.

Video Semantic SegmentationLow-altitude UAVTemporal ConsistencyOptical FlowHomographyRANSACSemantic Similarity PropagationmIoUSpatial AutocorrelationSpearman Correlation
Authors
Jingpu Yang, Fengxian Ji, Zhengzhao Lai, Juanfan Wu, Mingxuan Cui, Yufeng Wang
Abstract
Video semantic segmentation for low-altitude UAVs requires temporal consistency, yet dense optical flow introduces spatially structured noise in the planar regions that dominate aerial imagery. We propose a zero-parameter geometric gate that uses RANSAC homography inlier ratios on a $16\times16$ spatial grid to route each region to either homography or optical flow warp before fusion via Semantic Similarity Propagation. The gate requires no learned parameters -- only a median-threshold binary decision on RANSAC statistics -- adding only 211K trainable parameters (the SSP fusion layer) to a frozen backbone. On synthetic UAVid, the method achieves +4.24--4.91\% mIoU improvement over base models across two architectures (SegFormer-b2 and Hiera-S+UPerNet). Mechanism diagnostics reveal that flow residuals in planar regions are spatially autocorrelated (Moran's I = 0.32, $p < 0.001$), predict boundary instability (Spearman $ρ= 0.66$), and that rigidification recovers temporal consistency from 62\% to 92\% (+29.5pp) in homography-valid regions.