Scalable Physics-Inspired Transformers for Spin Glasses
2026-06-22 • Machine Learning
Machine Learning
AI summaryⓘ
The authors designed a new transformer model inspired by physics to better study complicated spin glass systems, which are important in physics and optimization. They improved how the model pays attention to key parts of the data and used faster sampling methods to speed up simulations on large systems significantly. Their approach allows detailed analysis of important quantities like probability distributions and free energies in models where previous machine-learning methods struggled. This work helps scientists simulate bigger and more complex spin glasses efficiently on a single GPU.
Boltzmann distributionspin glasstransformersparse attentionpositional embeddingsFlashAttentionSherrington-Kirkpatrick modelEdwards-Anderson modelvariational autoregressive networksstatistical mechanics
Authors
Lu Zhong, Wenli Duan, Jing Liu, Pan Zhang, Ying Tang
Abstract
Efficient sampling of the Boltzmann distribution in frustrated spin glasses is central to statistical mechanics and combinatorial optimization. Despite advances in machine-learning-based approaches, two issues persist: limited understanding of why variational models fail to benefit from increased scale, unlike the monotonic scaling law of large language models; and high computational cost on large systems that negates advantages over classical sampling methods. Here, we develop a physics-inspired transformer with interpretable sparse attention and spin-tailored positional embeddings to address these challenges. By further leveraging FlashAttention for parallel ancestral sampling, it achieves up to two orders of magnitude speedup over vanilla variational autoregressive networks, enabling neural-network simulations of spin-glass systems to unprecedented sizes on a single GPU. It can resolve full probability distributions, free energies, and overlap statistics across temperatures, for Sherrington-Kirkpatrick and 2D or 3D Edwards-Anderson models, where existing machine-learning methods encounter limitations at certain temperatures. This framework thus establishes a scalable paradigm for frustrated spin-glass systems.