Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

2026-06-03Robotics

Robotics
AI summary

The authors studied how well robots can learn to navigate using world models, which predict what will happen next in their environment. They tested different training methods on a drone navigating through unpredictable settings and then tried these models on a real drone, including a challenging scenario where the drone had to fly almost entirely based on its internal predictions. They found that if a model learned robustly during early self-supervised training, it worked better in the real world, even in difficult tasks. Additionally, the authors discovered that two key factors affecting model performance are the size of the model’s internal compressed representation and how much training data is used at once.

world modelsDreamerV3self-supervised learningreinforcement learningsim-to-real transferquadrotor navigationlatent spacetraining sequence lengthgenerative modelenvironmental variability
Authors
Luca Zanatta, Grzegorz Malczyk, Kostas Alexis
Abstract
World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.