Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

2026-06-16 • Robotics

RoboticsArtificial Intelligence

AI summaryⓘ

The authors suggest a new system called VERITAS for robots to get better at tasks while they work. They use a pre-trained robot brain to suggest actions and a separate visual checker to judge those actions without extra training. This setup helps robots improve what they do right away and also helps create better lessons for the robot to learn from later. Their tests show the system helps robots improve almost as well as when taught by experts, but without needing humans to step in.

generalist robot policyinference-timepolicy steeringgradient-free verificationpre-trained modelself-improvementoffline policy fine-tuningrolloutsrobot learningdemonstration data

Authors

Mingtong Zhang, Dhruv Shah

Abstract

Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time policy steering and self-improvement. We use a pre-trained generalist robot policy as a ``generator'' and pair it with a gradient-free ``visual verifier'' that evaluates actions at inference time. This framework enables inference-time steering that improves policy performance without additional training. We demonstrate that inference-time verification consistently outperforms vanilla generalists without training on additional demonstration data. Additionally, we demonstrate that the verified rollouts provide effective supervision for offline policy improvement: policies fine-tuned on verified self-generated trajectories achieve consistent performance gains. Notably, we find that post-training with verified rollouts achieves comparable efficiency to expert demonstrations, while requiring no human interventions. Our results highlight inference-time verification as a practical and scalable mechanism for improving robotic policies during deployment.

View PDFOpen arXiv