BRICKS-WM: Building Reusability via Interface Composition Kinetics for Structured World Models
2026-06-15 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address a problem in model-based reinforcement learning where current methods mix up how the agent and environment behave, making it hard to reuse parts of the model. They introduce BRICKS-WM, which separates the world into different parts: one for the agent and one for the background environment, connected through a learned interface. This separation helps keep background dynamics stable even if the agent changes, allowing reuse of parts of the model. Their approach performs as well as traditional methods while supporting modularity and reusability.
Model-Based Reinforcement LearningLatent DynamicsModular ModelsWorld ModelsContinuous ControlLatent InterfaceAgent DynamicsBackground Dynamics
Authors
Shaowei Zhang, Jiahan Cao, Xunlan Zhou, Shenghua Wan, De-Chuan Zhan
Abstract
Model-based Reinforcement Learning (MBRL) has achieved remarkable success in continuous control by leveraging latent world models. However, prevailing approaches typically rely on monolithic latent dynamics, entangling environment dynamics into a coupled process. This coupling severely limits reusability: altering the agent necessitates retraining the entire world from scratch, even if the environment remains constant. To address this, we introduce BRICKS-WM (Building Reusability via Interface Composition Kinetics for Structured World Models), a framework for the modular assembly of structured world models. Driven by the insight that the physical world is composed of independent entities, we posit that global dynamics can be modeled as a composition of distinct dynamical modules interacting via latent interfaces. As a minimal instantiation, we factorize the latent state space into an actuated Agent module and an external Background module, bridged by a learned latent interface. Unlike prior object-centric methods that prioritize visual segmentation, BRICKS-WM enforces a functional separation in transition dynamics, ensuring that background dynamics remains agnostic to the agent's dynamics. Empirically, BRICKS-WM achieves control performance comparable to strong monolithic baselines when trained from scratch, and enables the reuse of frozen background dynamics across agents.