When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark
2026-06-08 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study how models trained on small data sets can still work well on larger data sets, a process called size transfer. They find that simply using local information (architectural locality) in the model is not enough for stable results when scaling up. Instead, the key is how the 'smoothed score' behaves locally, which depends on how information spreads spatially in the data. They prove a mathematical result describing this relationship and create a new test setup to examine these effects. Their experiments show that when spatial mixing is strong, models can reliably handle larger systems, but when it weakens, the models struggle to generalize.
generative modelingsize transferarchitectural localityquasi-localityGaussian smoothingTweedie's formulareverse diffusionspatial mixingreceptive fieldscore function
Authors
Wenjie Xi
Abstract
Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.