Embarrassingly Simple Self-Distillation Improves Code Generation

2026-04-01Computation and Language

Computation and Language
AI summary

The authors show that a large language model can get better at writing code by teaching itself using just its own generated outputs, without needing extra feedback or complex training methods. They do this by generating sample solutions and then fine-tuning the model on those samples, a process they call simple self-distillation (SSD). This method improves performance on coding problems, especially harder ones, and works across different model sizes and types. The authors explain that SSD helps the model focus more on accurate code parts while still exploring useful variations. Overall, SSD offers a straightforward way to enhance code generation after initial training.

Large Language ModelCode GenerationSelf-DistillationFine-TuningPass@1Token DistributionTemperature SamplingModel ScalingExploration vs PrecisionLiveCodeBench
Authors
Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang
Abstract
Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.