Domain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target Data
2026-06-29 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors address a problem where robots trained in simulation struggle when moved to the real world, especially when using camera images. They created AIDA, a system that helps the robot learn better from very limited real-world data by imagining reliable scenarios based on what it already knows. AIDA avoids using unrealistic imagined situations and uses a method to check if its imagined states match back to the original states, improving learning. Their experiments showed that this approach helps robots perform better on various control tasks without needing much real-world interaction.
Reinforcement LearningSim-to-Real TransferDomain AdaptationVision-Based ControlState-Distribution ShiftImagination RolloutsSelf-Consistency LossMuJoCoGymnasium-RoboticsDistribution-Shift-Aware Discriminator
Authors
Hyunwoo Park, Sang-Hyun Lee
Abstract
Sim-to-real transfer remains a major obstacle for reinforcement learning (RL), especially for vision-based control where image observations exacerbate the state-distribution shift between simulation and the real world. Domain adaptation (DA) is a promising remedy for this challenge. Prior sim-to-real DA works have demonstrated encouraging results, yet these approaches typically assume substantially more target data, which is not available in practice. Indeed, their performance degrades significantly when the target data budget is reduced. To address this challenge, we propose AIDA (Adaptive Imagination for Domain Adaptation), a domain adaptation framework for visual reinforcement learning that addresses sim-to-real transfer under scarce target data without requiring additional interaction with the target environment. Our key idea is adaptive imagination: generating reliable and semantic imagination rollouts to augment limited target data. Specifically, AIDA employs a distribution-shift-aware discriminator that truncates rollouts when imagined transitions drift into low-confidence regions, so that only reliable transitions contribute to the augmentation. On these reliable transitions, AIDA introduces a self-consistency loss that cycles through state -> image observation -> state, penalizing discrepancies between the original and reconstructed states. This provides additional adaptation signals beyond the scarce target data. Our experiments demonstrate that adaptive imagination effectively truncates unreliable rollouts. By enforcing a self-consistency loss on the resulting reliable transitions, AIDA learns semantically meaningful state representations and outperforms baselines across five MuJoCo tasks and two Gymnasium-Robotics tasks.