Kairos: A Native World Model Stack for Physical AI

2026-06-15Artificial Intelligence

Artificial IntelligenceComputer Vision and Pattern Recognition
AI summary

The authors present Kairos, a new type of world model designed to help robots and AI understand and operate in the physical world efficiently over long periods. Kairos learns from diverse data like videos and robot interactions using a special training method that builds knowledge step-by-step. It also uses a unique attention system to remember important information across different time scales while minimizing errors as it predicts future states. Finally, Kairos is designed to work quickly on common hardware, making it practical for real-life applications. Their tests show Kairos balances accuracy and speed well compared to other systems.

world modelphysical AIcross-embodiment data curriculumpre-trainingtemporal attentionerror accumulationstate propagationdeployment optimizationlong-horizon predictionrobot interactions
Authors
Kairos Team, Fei Wang, Shan You, Qiming Zhang, Tao Huang, Zuoyi Fu, Zhisheng Zheng, Yunlong Xi, Feng Lv, Xiaoming Wu, Zeyu Liu, Cong Wan, Pu Li, Ruiqing Yang, Xiaoou Li, Wei Wang, Kangkang Zhu, Yuwei Zhang, Shi Fu, Xiaoning Wu, Xuzeng Fan, Dacheng Tao, Xiaogang Wang
Abstract
World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.