Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

2026-03-05 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceRobotics

AI summaryⓘ

The authors focus on improving world models, which are tools that predict how environments change when actions are taken, useful for tasks like planning and learning. They point out that existing methods use many pieces (tokens) to represent observations, making real-time decision-making very slow. To fix this, the authors created CompACT, a new tokenizer that compresses observations into as few as 8 tokens, speeding up planning without losing important details. Their approach shows much faster planning speeds while still performing well, making world models more practical for real-world uses.

world modelslatent representationstokenizerdiscrete tokensenvironment dynamicsaction-conditioned modelsdecision-time planningpolicy learningreal-time controlcompression

Authors

Dongwon Kim, Gawon Seo, Jinsung Lee, Minsu Cho, Suha Kwak

Abstract

World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into as few as 8 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.

View PDFOpen arXiv