Assistron: Bayesian Shared Autonomy with Off-the-shelf Vision-Language-Action Models

2026-06-22Robotics

Robotics
AI summary

The authors introduce Assistron, a system that helps people with daily tasks by combining vision, language, and robot actions. It takes big steps on its own using voice commands but asks the user to step in when tricky parts come up. This way, it saves effort without needing extra training for the robot's brain and keeps it flexible. Tests show that Assistron completes tasks better than fully autonomous robots and is easier to use than direct control.

shared autonomyvision-language-action modelsrobot manipulationhuman-robot interactionmacro-reaching trajectoriesphase-aware interaction detectionflow matching guidancecatastrophic forgettingteleoperation
Authors
Pinhao Song, Ze Fu, Yutong Hu, Renaud Detry
Abstract
We propose Assistron, a shared autonomy model that leverages Vision-Language-Action (VLA) models to assist the user in daily activities. Our approach is grounded in two core principles: (1)~minimizing human cognitive and physical effort by leveraging VLA-driven autonomy for macro-movements, and (2)~prioritizing human intervention specifically at critical failure points. Driven by the user's verbal language commands, Assistron utilizes the VLA to autonomously execute macro-reaching trajectories, saving users' effort. In contact-rich interactions where VLAs tend to fail, Assistron employs a phase-aware interaction detection mechanism and solicits the user to intervene, in turn adjusting the VLA's action generation via flow matching guidance. Critically, our formulation eliminates the need for VLA fine-tuning, protecting its broad behavioral priors from catastrophic forgetting and ensuring the model does not become a narrow specialist. We validate our approach on a comprehensive multi-task scene recovery benchmark encompassing diverse daily manipulation skills. Empirical results demonstrate that Assistron significantly improves task success rates over pure autonomous baselines while significantly reducing human cognitive and physical workload compared to traditional teleoperation, offering a scalable, smooth, and effortless paradigm for assistive manipulation. The code is available in https://github.com/mousecpn/Assistron.git.