Predictive Objectives Discard Exogenous Control-Relevant Features: A Controlled Mechanistic Study

2026-06-29Machine Learning

Machine Learning
AI summary

The authors study how certain learning methods called JEPA-style objectives predict future internal states (latents) to understand environments. They find these methods can miss important features that affect control but are unrelated to time predictability, even if these features are easy to encode. By testing different versions of these methods, the authors show that only when some reward information is included do these important control-related features get captured well. This improvement requires very little reward data and works under various conditions and model sizes. They also show that the learned internal representations don't separate important classes as clearly as fully supervised methods would.

JEPAlatent representationcontrol-relevancetemporal predictabilityreward-grounded learninginverse dynamicsbisimulationfeature controllabilitypredictive objectivesself-supervised learning
Authors
Ayan Pendharkar
Abstract
Joint-embedding predictive (JEPA-style) objectives learn representations by predicting future latents. In doing so they can discard features that are exogenous (uncontrollable by the agent) yet control-relevant, even when those features are trivially encodable. This occurs because the objective optimizes temporal predictability rather than control-relevance. We isolate this failure mode in a controlled 2x2 experimental design that varies feature controllability and relevance independently, using a predictability knob that decouples a feature's temporal predictability from its control-relevance. Comparing six objectives: reconstruction, JEPA, action-conditioned JEPA, controllability-based JEPA, inverse dynamics under a random policy, and reward-grounded JEPA, we observe that all evaluated reward-free predictive objectives leave the exogenous control-relevant feature near chance accuracy, while a reward-grounded variant retains it selectively. The remedy is label-efficient and robust: as little as 2% of reward-labeled transitions recovers the feature, the effect holds across two environments with different surface forms, and it persists across latent dimensions from 16 to 1024. Comparing the learned latent geometry against bisimulation theory's prediction, the JEPA latent realizes only a small fraction of the class separation a supervised reference attains.