A VideoMAE-v2 Approach to Zero-Shot Traffic Accident Anticipation

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors focus on predicting traffic accidents early from dashcam videos without having accident examples from the specific situation they want to test. They train a model only on a general public dataset with simple accident labels and use a special video analysis method called VideoMAE-v2 to make predictions for each frame. Their system works by looking at small chunks of video over time and estimating risk for every frame. This approach won 2nd place in a competition on zero-shot accident anticipation.

Traffic accident anticipationZero-shot learningDashcam videoVideoMAE-v2Temporal risk estimationSliding-window protocolBinary-labelled datasetCVPRAccident prediction
Authors
Siyuan Li, Xiaoyang Bi, Mengshi Qi
Abstract
Traffic accident anticipation -- predicting the likelihood of an imminent collision at every frame of a dashcam video -- is safety-critical yet difficult to scale, because collecting in-domain annotated accident footage for every deployment scenario is prohibitively expensive. We study this task under a zero-shot setting where no target-domain training data is available: the model must learn exclusively from a publicly available binary-labelled driving-accident dataset and generalise to unseen dashcam footage. We propose a framework that bridges the gap between the frame-level temporal risk estimation task and coarsely labelled binary accident datasets by coupling a VideoMAE-v2 backbone with a per-frame prediction head under a sliding-window protocol. Our method achieves 2nd place in the 2026 CVPR@AUTOPILOT Zero-Shot Accident Anticipation competition. Code is available at https://github.com/TimeSouth/zero-shot-taa-solution.