Bridging the Simulation-to-Experiment Gap with Generative Models using Adversarial Distribution Alignment
2026-04-01 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address the problem that computer simulations of physical systems are often imperfect, while experimental data is incomplete. They propose a method called Adversarial Distribution Alignment (ADA) that first trains a model on full but approximate simulation data, then adjusts it to match partial real experimental observations. This approach helps combine the strengths of simulations and experiments to better understand complex systems. They tested their method on protein data and showed it works well in matching different types of observations.
simulation-to-experiment gapgenerative modeldistribution alignmentBoltzmann distributionadversarial trainingphysical simulationspartial observabilityprotein modelingdata-driven modeling
Authors
Kai Nelson, Tobias Kreiman, Sergey Levine, Aditi S. Krishnapriyan
Abstract
A fundamental challenge in science and engineering is the simulation-to-experiment gap. While we often possess prior knowledge of physical laws, these physical laws can be too difficult to solve exactly for complex systems. Such systems are commonly modeled using simulators, which impose computational approximations. Meanwhile, experimental measurements more faithfully represent the real world, but experimental data typically consists of observations that only partially reflect the system's full underlying state. We propose a data-driven distribution alignment framework that bridges this simulation-to-experiment gap by pre-training a generative model on fully observed (but imperfect) simulation data, then aligning it with partial (but real) observations of experimental data. While our method is domain-agnostic, we ground our approach in the physical sciences by introducing Adversarial Distribution Alignment (ADA). This method aligns a generative model of atomic positions -- initially trained on a simulated Boltzmann distribution -- with the distribution of experimental observations. We prove that our method recovers the target observable distribution, even with multiple, potentially correlated observables. We also empirically validate our framework on synthetic, molecular, and experimental protein data, demonstrating that it can align generative models with diverse observables. Our code is available at https://kaityrusnelson.com/ada/.