Towards Physical Intuitions for Alignment Dynamics: A Case Study With Randomness Crystallization

2026-06-29Computation and Language

Computation and Language
AI summary

The authors suggest that understanding how language models change during additional training can be better explained using ideas from physics, specifically thermodynamic phase transitions. They compare the model’s training process to crystallization, where the model starts with many possible behaviors (like a liquid), then focuses on one main behavior during fine-tuning (like forming crystals), and finally adjusts within that focused behavior during reinforcement learning. They propose ways to measure these stages and confirm their idea with different random tasks. This approach helps explain how models settle into certain behaviors and what changes alignment methods can or cannot make.

language modelsalignmentthermodynamic phase transitioncrystallizationentropysupervised finetuningreinforcement learningsampling distributionsmodel dynamicsprobability redistribution
Authors
Kunal Samanta, Ari Holtzman, Peter West
Abstract
The alignment of language models is typically studied through the lens of capability benchmarks, but the dynamics of how models change during post-training remain poorly understood. We argue that the physical sciences, and thermodynamic phase-transition theory in particular, offer a principled and underexplored vocabulary for reasoning about these dynamics. As a case study, we instantiate this position through the lens of material Crystallization, which is a well-studied thermodynamic phase transition. For tasks like random number generation, this breaks into 3 phases: (1) the high entropy liquid phase in the pretrained model, with many distinct sampling distributions promptable from the model; (2) the nucleation phase caused by supervised finetuning, in which behavior collapses onto a single seed distribution present in the pretrained LLM; and (3) a settling phase in which reinforcement learning techniques redistribute probability of the collapsed distribution, but largely keep it concentrated on the same options as the seed distribution. We propose intuitive metrics to verify the transitions between these phases, and validate the idea across a range of random tasks. Crystallization is one instance of a broader class of physical frameworks we believe alignment research should import to answer questions about where alignment-induced structure comes from, why it converges where it does, and what it fundamentally cannot change.