How to Mitigate the Distribution Shift Problem in Robotics Control: A Robust and Adaptive Approach Based on Offline to Online Imitation Learning
2026-05-25 • Robotics
Robotics
AI summaryⓘ
The authors address the problem where a learning agent struggles to choose the right actions in new situations not seen during training, known as distribution shift in imitation learning. They propose a two-step approach: first, offline learning uses extra expert examples to help the agent understand more states and actions. Then, during online use, the system detects when it encounters unfamiliar situations and adapts by learning from new experiences on its own. Tests in simulated environments show their method handles unfamiliar states better and adapts more effectively than previous approaches.
imitation learningdistribution shiftoffline learningonline learningstate-action coveragediscriminatorself-supervised learningMuJoCo environmentspolicy adaptation
Authors
Hyung-Suk Yoon, Seung-Woo Seo
Abstract
Distribution shift in imitation learning refers to the problem that the agent cannot plan proper actions for a state that has not been visited during the training. This problem can be largely attributed to the inherently narrow state-action coverage provided by expert demonstrations over the full environment. In this paper, we propose a robust offline to adaptive online imitation learning framework that handles the distribution shift problem in a lifelong, multi-phase scheme. In the offline learning phase, we leverage supplementary demonstrations to broaden the state-action coverage of the policy by utilizing a discriminator to effectively train the policy with supplementary demonstrations, thereby enhancing the robustness of the policy to distribution shift. In the subsequent online inference phase, our framework detects the occurrence of distribution shift and conducts self-supervised imitation learning from online experiences to adapt the policy to the online environments. Through extensive evaluations in MuJoCo environments, we demonstrate that our method exhibits better robustness to distribution shift and better adaptation performance to online environments than the baseline algorithms, which indicates superior performance of our framework against the distribution shift.