JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors introduce JacQuant, a new method for training low-bit models more reliably than the common Straight-Through Estimator (STE), which often struggles near certain boundaries. JacQuant learns a simple, inexpensive model of how sensitive the network is to changes in parameters, using this information to make training more stable and faster. They show that JacQuant works well on large language models quantized to very low bits, achieving better accuracy with little extra computational cost. Their method integrates smoothly with existing quantizers and comes with theoretical guarantees for training convergence.

Quantization-aware trainingStraight-Through EstimatorLow-bit quantizationLarge language modelsGradient estimationSensitivity analysisVariance-reduced optimizationNon-convex optimizationConvergence guarantees

Authors

Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li

Abstract

Quantization-aware training (QAT) is widely deployed but typically relies on the Straight-Through Estimator (STE), which passes gradients through non-differentiable quantizers by fiat. This often makes training brittle near bin boundaries and weakly aligned with the actual behavior of the low-precision model. We introduce JacQuant, a QAT framework that learns a lightweight surrogate of the model's local sensitivity to parameter changes and uses it to stabilize and accelerate training within standard variance-reduced optimizers. The surrogate is inexpensive (diagonal or block-diagonal), data-driven, and compatible with common weight and activation quantizers. On code-preserving training phases, we prove convergence for non-convex objectives and obtain linear rates under a PL condition, and we relate the learned sensitivity to end-to-end output fidelity via a simple calibration argument. Across LLM benchmarks at $\leq 2$ bits, JacQuant consistently reaches higher accuracy than STE-based QAT, and the runtime analyses on various models show that the added cost remains negligible under practical group sizes. The method is drop-in and requires no changes to the forward quantizers; our empirical claims are scoped to ultra-low-bit LLM QAT.

View PDFOpen arXiv