Annotations Mitigate Post-Training Mode Collapse

2026-05-11Computation and Language

Computation and Language
AI summary

The authors show that fine-tuning models to follow instructions often reduces their ability to generate diverse meanings, especially as models get bigger. They introduce a method called annotation-anchored training, which keeps the variety learned during initial training by using semantic annotations as guides. This approach helps models maintain diversity while still following instructions well. Their results show much less loss of diversity compared to standard fine-tuning approaches.

post-trainingsupervised fine-tuningsemantic diversitymode collapsepretraining distributionannotation-anchored trainingsemantic annotationsinstruction followingmodel scalingdiversity collapse
Authors
Jacob Mitchell Springer, Madhu Advani, Lukas Aichberger, Arwen Bradley, Eran Malach, Omid Saremi, Sinead Williamson, Preetum Nakkiran, Etai Littwin, Aditi Raghunathan
Abstract
Post-training (via supervised fine-tuning) improves instruction-following, but often induces semantic mode collapse by biasing models toward low-entropy fine-tuning data at the expense of the high-entropy pretraining distribution. Crucially, we find this trade-off worsens with scale. To close this semantic diversity gap, we propose annotation-anchored training, a principled method that enables models to adopt the preference-following behaviors of post-training without sacrificing the inherent diversity of pretraining. Our approach is simple: we pretrain on documents paired with semantic annotations, inducing a rich annotation distribution that reflects the full breadth of pretraining data, and we preserve this distribution during post-training. This lets us sample diverse annotations at inference time and use them as anchors to guide generation, effectively transferring pretraining's semantic richness into post-trained models. We find that models trained with annotation-anchored training can attain $6 \times$ less diversity collapse than models trained with SFT, and improve with scale.