Representation Learning for Spatiotemporal Physical Systems

2026-03-13Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors studied how machine learning models understand physical systems that change over time, focusing not just on predicting the next frame but on figuring out the system's underlying physical rules. They found that training models to predict future frames can be slow and prone to errors, so they tested if other learning methods might work better for scientific tasks. Their experiments showed that some general self-supervised learning techniques, especially those working with compressed data representations like JEPAs, performed better than methods focused on predicting exact pixel values. This suggests that learning deeper, more abstract features might help better capture the physics of the system.

spatiotemporal systemsmachine learningself-supervised learningphysical parametersnext-frame predictionautoregressive rolloutjoint embedding predictive architectures (JEPAs)latent spacepixel-level objectives
Authors
Helen Qu, Rudy Morel, Michael McCabe, Alberto Bietti, François Lanusse, Shirley Ho, Yann LeCun
Abstract
Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation of a system's governing physical parameters. Accuracy on these tasks offers a uniquely quantifiable glimpse into the physical relevance of the representations of these models. We evaluate the effectiveness of general-purpose self-supervised methods in learning physics-grounded representations that are useful for downstream scientific tasks. Surprisingly, we find that not all methods designed for physical modeling outperform generic self-supervised learning methods on these tasks, and methods that learn in the latent space (e.g., joint embedding predictive architectures, or JEPAs) outperform those optimizing pixel-level prediction objectives. Code is available at https://github.com/helenqu/physical-representation-learning.