BadWorld: Adversarial Attacks on World Models

2026-06-15Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors study visual world models (VWMs), which predict future video frames based on a single image and user actions. They create BadWorld, a new way to test how easily these models can be fooled by small changes that look normal but disrupt the model's predictions. Their method does not need the true future videos and works even when future user actions are unknown. They find that these attacks can cause serious failures in the models’ outputs, suggesting that VWMs might be risky for important applications but could also be used to protect privacy.

Visual World ModelsAdversarial AttacksSelf-Supervised LearningDenoising DynamicsBi-level OptimizationTrajectory-Adaptive AttackControl-Agnostic PerturbationsFuture Video PredictionStructural FragilityPrivacy Protection
Authors
Linghui Shen, Mingyue Cui, Xingyi Yang
Abstract
Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. First, to bypass the need for future supervision, we propose a self-supervised velocity attack that directly disrupts the early denoising dynamics of the model. Second, to ensure the attack generalizes across unpredictable user actions, we formulate a trajectory-adaptive bi-level optimization that actively mines hard control sequences to forge control-agnostic perturbations. Evaluated on representative VWMs with continuous and discrete controls, BadWorld exposes severe structural fragility. Visually indistinguishable adversarial images reliably trigger catastrophic degradation in future rollouts, leading to incomplete denoising, structural collapse, and control inconsistency. These findings reveal critical risks for deploying VWMs in safety-critical systems while highlighting a practical mechanism for privacy protection.