World Action Models Enable Continual Imitation Learning with Recurrent Generative Replays
2026-06-25 • Robotics
RoboticsComputer Vision and Pattern Recognition
AI summaryⓘ
The authors developed a method called REGEN that helps robots remember how to do tasks over time without needing to keep the original human demonstration videos. They use a model (WAM) that can imagine future robot movements and visuals to create fake practice sessions. This helps the robot avoid forgetting previous skills while learning new ones. Their tests showed REGEN worked better than simple retraining and almost as well as methods needing stored real data. They also studied what makes this imagining tricky, finding that long-term visual errors and mismatch between actions and observations are the main challenges.
World Action ModelsRecurrent Generative ReplayContinual Imitation LearningCatastrophic ForgettingRobot PolicyPseudo-replay TrajectoriesExperience ReplayVisual Observation PredictionSimulated Robot ManipulationAction-Observation Consistency
Authors
Manish Kumar Govind, Dominick Reilly, Smit Patel, Hieu Le, Srijan Das
Abstract
Going beyond predicting robot actions, World Action Models (WAMs) can also generate future visual observations. We build on this generative capability to propose Recurrent Generative Replay (REGEN), a continual imitation learning framework that synthesizes pseudo-replay trajectories, enabling a robot policy to rehearse previously learned tasks without storing their original human demonstrations. During continual adaptation, REGEN recursively queries the WAM to synthesize pseudo-replay trajectories conditioned only on prior task instructions and current-task observations. Experiments in both simulation and real-world manipulation settings show that REGEN reduces catastrophic forgetting by up to $50\%$ relative to sequential fine-tuning, while approaching the performance of privileged experience replay methods that require access to real replay data. Finally, we analyze the factors limiting generated replay, identifying long-horizon visual degradation and action-observation inconsistency as the primary bottlenecks. Our results establish WAMs as a promising foundation for continual robot learning without stored demonstrations.