TORL-VLA: Tactile Guided Online Reinforcement Learning for Contact-Rich Manipulation
2026-06-08 • Robotics
Robotics
AI summaryⓘ
The authors developed TORL-VLA, a method that helps robots improve how they handle tasks needing a sense of touch, like opening latches or handling eggs. Their system uses touch data to predict the robot's actions and also learns and improves these actions while the robot is working. They created a special way to make sure the robot learns correctly, especially when humans step in to help. In real robot tests, their approach worked better and faster than other methods at completing complex tasks that involve contact.
Vision-Language-Action modelstactile feedbackreinforcement learningrobotic manipulationpolicy refinementonline adaptationwrench predictionhuman interventioncontact-rich tasksrobot learning
Authors
Huaihang Zheng, Yi Yang, Kai Ma, Shenglin Xu, Tian Xie, Guozheng Li, Xiangyu Wang, Yiren Ma, Si Liu, Yinian Mao, Baoxu Liu
Abstract
Vision-Language-Action (VLA) models have become a powerful framework for robotic manipulation, and recent studies have introduced tactile or force feedback into VLAs to address contact-rich tasks. However, these models are typically deployed as offline policies. When contact conditions shift from the training distribution, the policy cannot perform online adaptation, leading to problems such as inappropriate contact forces and inefficient retries. Therefore, we propose TORL-VLA, a tactile-guided online reinforcement learning framework that couples tactile feedback with policy refinement for contact-rich manipulation. Our method introduces a tactile-derived wrench-aware VLA to predict reference actions and future wrench sequences, while a lightweight online RL module is used to refine the reference actions. To stabilize learning from mixed exploratory policy-generated and human-intervention data, we introduce an intervention-censored critic that prevents post-intervention success from being wrongly credited to policy-generated actions preceding intervention. Real-robot experiments on long-horizon contact-rich tasks, including latch manipulation, coffee-cup placement, and egg handling, show that TORL-VLA improves success rates at both subtask and full-task levels, as well as time-bounded execution efficiency over strong baselines.