UAV-OVO: Out-of-Viewpoint Generalization in UAV Action Recognition

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors highlight a problem in drone (UAV) action recognition where models trained on videos taken from low angles struggle to recognize actions from high angles, even though the action is the same. They created a new test called UAV-OVO to specifically measure how well models handle this viewpoint change by separating training and testing videos based on camera angle. Their experiments show that current models often fail to generalize to different viewpoints. To help fix this, they propose LATER, a method that adjusts the model during testing to better handle new viewpoints by focusing on meaningful features and ignoring viewpoint shifts.

UAV action recognitiondeployment shiftlow-depression viewpointhigh-depression viewpointout-of-distribution testingLoRA (Low-Rank Adaptation)feature re-centeringviewpoint generalizationsemantic anchorvideo recognition
Authors
Yu Xia, Zhengbo Zhang, Shuaihu Zhang, Zhigang Tu
Abstract
UAV action recognition faces a deployment shift that standard benchmarks often obscure: a model trained on UAV footage captured from low-depression viewpoints may be required to recognize the same action classes from high-depression viewpoints. While the action labels remain unchanged, this shift alters body visibility, motion projection, and scene context, encouraging models to rely on viewpoint-specific shortcuts. We introduce UAV-OVO, an Out-of-Viewpoint generalization benchmark for UAV action recognition. UAV-OVO derives view scores from uncalibrated videos, uses a view-isolation band to assign low-depression videos to the training and in-distribution test splits while reserving high-depression videos for out-of-distribution testing, and constructs ID/OOD test sets matched by class distribution so that performance differences reflect viewpoint shift rather than label imbalance. Across representative video recognizers, UAV-OVO reveals a substantial ID/OOD gap: models that fit the low-depression training distribution well often fail to transfer to held-out high-depression views, exposing viewpoint shortcuts hidden by aggregate accuracy. We further propose LATER, LoRA-Anchored Test-time Re-centering, which first adapts the recognizer with Low-Rank Adaptation (LoRA) and then uses the learned LoRA subspace as a semantic anchor for online feature re-centering. Specifically, LATER projects target-domain displacement onto the orthogonal complement of the LoRA subspace before re-centering features, reducing viewpoint-induced drift while preserving task-relevant semantics. Together, UAV-OVO and LATER provide a controlled testbed and a practical adaptation method for viewpoint-robust UAV video understanding.