Early Warning Signals for OpenVLA Failure under Visual Distribution Shift

2026-06-29Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors studied how a combined vision, language, and action model (OpenVLA) performs when the visual inputs it receives are altered, like being partially blocked. They found that by looking at the model's internal signals during tasks, it is possible to predict if the task will soon fail, especially at a specific part of the model (layer 16). This prediction works well even when different visual disruptions happen, though not perfectly. However, the authors caution that this doesn't prove the model understands failures fully or that it can fix them automatically.

Vision Language Action modelsOpenVLAlinear decodingAUROCAUPRCocclusionmanipulation rolloutspolicy activationsfailure predictionlayer-wise analysis
Authors
Dipesh Tharu Mahato, Rachel Ren
Abstract
Vision Language Action models combine perception, language grounding, and control in a single policy, but their failures are hard to diagnose once visual conditions shift. We test whether OpenVLA feedforward activations contain linearly decodable information about near term task failure in LIBERO manipulation rollouts. The policy is fixed throughout. We log internal activations during execution and fit lightweight monitors after the rollouts are collected. Occlusion is the main controlled stress test. It reduces OpenVLA success from $57\%$ to $17\%$ over $100$ episodes per condition. Under this shift, a logistic probe at layer 16 reaches AUROC $0.972$ and AUPRC $0.352$ for predicting failure within a $15$ step horizon. It outperforms both a mean difference direction and an action disagreement baseline. A sparse layer sweep finds uneven decodability across depth: layer 16 is strongest among the tested layers, layer 8 remains informative, and layer 10 is weaker. To check whether the monitor is just an occlusion detector, we also evaluate color shift and camera jitter without refitting. Color shift produces no failures in this setting, so it is a benign control rather than a failure benchmark. Camera jitter does induce failures, and the occlusion trained monitor remains above random. The result is deliberately limited: OpenVLA internal states contain failure relevant structure under controlled perceptual shift, but these experiments do not establish a causal mechanism, task held out generalization, or a deployable recovery system.