FF-JEPA: Long-Horizon Planning in World Models with Latent Planners

2026-06-08Artificial Intelligence

Artificial Intelligence
AI summary

The authors propose a new method called FF-JEPA to improve how agents plan long sequences of actions without needing a picture of their goal. They combine two models: one that predicts the next step based on actions, and another that predicts intermediate targets without actions. This makes planning simpler and more efficient for complex tasks. Their tests show this method works better than older ones that struggle with long planning horizons.

Joint Embedding Predictive ArchitecturesForward dynamics modellatent space planningCross-Entropy Methodhierarchical planningsubgoal predictionlong-horizon planninggoal-free planning
Authors
Sergi Masip, Jonathan Swinnen, Yutong Hu, Renaud Detry, Tinne Tuytelaars
Abstract
Joint Embedding Predictive Architectures (JEPAs) have shown promising world modeling capabilities, enabling planning in latent space by optimizing action trajectories using methods like the Cross-Entropy Method (CEM). These methods are, however, too computationally expensive and ineffective for long-horizon planning. Furthermore, these methods typically require an explicit image of the goal state, which is not always possible in real-world tasks. In this work, we tackle these limitations by proposing Forward-Forward-JEPA (FF-JEPA), a hierarchical approach leveraging two forward dynamics models. Alongside a standard action-conditioned forward model, we introduce an action-free latent planner that predicts the next subgoal given the current state. This approach removes the need for goal images and enables long-horizon planning by decomposing complex trajectories into a sequence of tractable, short-term optimization problems. Preliminary results on PushT demonstrate that FF-JEPA successfully overcomes flat world models' long-horizon collapse, highlighting this approach as a promising direction for goal-free planning.