Learning Action-Conditional and Object-Centric Gaussian Splatting World Models for Rigid Objects

2026-06-01Robotics

RoboticsComputer Vision and Pattern RecognitionMachine Learning
AI summary

The authors present MRO-GWM, a new model that helps robots predict how rigid objects will move in 3D when interacted with. Their approach represents each object as a set of Gaussians, which lets the model handle different shapes and multiple objects at once. They use a special transformer to predict future movements based on past observations and planned actions, even when some parts are hidden. The model was tested on synthetic scenes with household objects and used to control robot manipulation in simulations.

world modelrigid body dynamicsGaussian representationtransformer architecture3D object modelingcanonical framepartial observationmodel predictive controlnon-prehensile manipulationrobot end effector
Authors
Jens U. Kreber, Lukas Mack, Joerg Stueckler
Abstract
World models enable intelligent agents to predict the consequences of their actions on the environment. In this paper, we propose Multi Rigid Object Gaussian World Model (MRO-GWM), a novel model that learns action-conditional dynamics of rigid objects in 3D. By representing the scene by object-centric Gaussians, we can represent arbitrary object shapes and multi-object scenes. We develop a novel spatio-temporal transformer architecture that predicts future rigid body motion from a history of object Gaussians and future actions. Objects are represented by their Gaussians in a canonical frame, which allows for describing object motion as rigid body transformation. Our model is trained on reconstructions from multiple viewpoints, which requires the model to handle partial observations of objects due to occlusions. We analyze prediction performance of our approach on synthetic datasets composed of typical household objects with multi-object dynamics and interactions by a robot end effector. We also evaluate our model in model-predictive control for non-prehensile manipulation in simulation.