Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

2026-06-08Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors explain why a quick method to shrink neural networks called post-training quantization (PTQ) sometimes fails when trying to use very low bit sizes, while a slower method called quantization-aware training (QAT) can fix these problems. They use a geometric picture where successful training follows a low-loss path inside a wider area (a valley), and quantization errors happen when the model jumps outside this safe area. PTQ might pick bad points outside this valley, causing big errors. QAT helps guide the model back into the safe valley by using gradients that sense these boundaries, which the authors prove mathematically and confirm through experiments.

post-training quantizationquantization-aware trainingneural network quantizationloss landscapegradient descentstraight-through estimatorHessianlow-bitwidth quantizationmodel compression
Authors
Hanyang Li, Jianhao Ma, Ying Cui
Abstract
Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.