HiL-ResRL: A Model-Agnostic Finetuning Adapter via Human-in-the-loop Residual Reinforcement Learning
2026-06-22 • Robotics
Robotics
AI summaryⓘ
The authors tackle the problem that many robot learning models, which mimic human behavior, often make mistakes that pile up and don’t work well in real-world settings. They created a new way to fine-tune these models that can work with many different types of vision-language-action systems. Their approach teaches a correction policy that fixes bad actions and uses human help to guide safe learning. Tests in actual robots show that after just a short training time, the robots perform tasks successfully more than 95% of the time. This method could help bring robot models from the lab into practical use in factories.
Generative imitation learningBehavior cloningDistributional shiftVision-language-action (VLA) modelsFine-tuningResidual policyHuman-in-the-loopReinforcement learningRobotic manipulation
Authors
Jingyi Liu, Zhaohong Mai, ShunSen He, Hang Ren, Chao Wang, Shunbo Zhou, XiaoDong Wu, Heng Zhang
Abstract
Recent advancements in generative imitation learning have significantly propelled the field of robotic manipulation. However, the majority of existing models rely heavily on Behavior Cloning (BC), a paradigm that suffers from compounding errors and distributional shift. Consequently, the efficacy of these models in practical industrial deployments remains limited. To address these challenges, we introduce a novel, plug-and-play fine-tuning pipeline designed to facilitate the robust deployment of Vision-Language-Action (VLA) models in real-world environments. In contrast to contemporary reinforcement learning (RL) fine-tuning strategies, which are often constrained by specific model architectures, our proposed framework is model-agnostic and adaptable to a diverse range of VLA models. We conceptualize VLA-generated actions as a unified interface, upon which we train a residual policy. This policy is designed to rectify suboptimal actions and address the distributional shift inherent in imitation learning. Additionally, we incorporate human-in-the-loop guidance to ensure safe exploration and maximize training efficiency. We conduct experiments directly in real-world robotic settings. The results demonstrate that within only 1.5 hour of real-world online RL training, the average success rate exceeds 95% on real robots. Our work presents a practical solution for deploying behavior cloning models in industrial scenarios.