ATM: Action-Consistency Transfer Matrix for Diagnosing and Improving Latent World Models
2026-06-08 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial IntelligenceRobotics
AI summaryⓘ
The authors created a tool called ATM to quickly check if a learned model understands how actions affect the world, without needing slow and complex tests. ATM looks at how actions relate to changes in the model's internal representations, showing where the model works well or fails. This method is much faster than traditional evaluations and helps pick better models. They also propose using this action-awareness as a training guide to make models better at planning tasks.
latent world modelaction semanticstransition dynamicsplanningCEM plannerrepresentation learningmodel evaluationtransfer matrixgoal-conditioned planningpost-hoc probe
Authors
Jiaheng Chen
Abstract
Latent world models are increasingly used for control and goal-conditioned planning, yet assessing whether their learned representations are useful for planning usually requires slow, planner-coupled simulator evaluation with CEM or similar planners. Such evaluation is black-box and model-complexity-dependent: under the same protocol, different world models may require minutes to hours per checkpoint. In this work, we propose ATM, an Action-Consistency Transfer Matrix for diagnosing whether latent transitions preserve action semantics relevant to planning. ATM compares action information in real encoded transitions and model-predicted transitions through lightweight post-hoc probes, producing an interpretable matrix that reveals representation quality, transition-domain inconsistency, and failure modes without simulator rollout. It can also be collapsed into a simple screening score for within-task ranking across checkpoints, variants, and world models. When the true success gap is non-trivial, ATM achieves highly reliable pairwise ranking, while reducing minutes-to-hours CEM evaluation to seconds-level transition analysis, yielding more than 100x speedup in our setup. We further introduce AITS, showing that action-identifiability is not only diagnostic but also a useful training signal for improving downstream planning without changing the planner.