Shift-and-Sum Quantization for Visual Autoregressive Models

2026-06-15Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors study how to make visual autoregressive models (VAR)—which generate images step-by-step—run faster using post-training quantization (PTQ), a way to shrink model size with little data. They find two main problems: big errors when simplifying certain calculations and mismatches between how often certain patterns appear versus their predicted chances. To fix this, the authors create a new method that better combines simplified calculations and adjusts data sampling to match predicted probabilities. Their approach improves results in several image generation tasks across different VAR models.

post-training quantizationvisual autoregressive modelsattention mechanismquantization errorscodebook entriescalibration dataimage generationsampling strategy
Authors
Jaehyeon Moon, Bumsub Ham
Abstract
Post-training quantization (PTQ) enables efficient deployment of deep networks using a small set of data. Its application to visual autoregressive models (VAR), however, remains relatively unexplored. We identify two key challenges for applying PTQ to VAR: (i) large reconstruction errors in attention-value products, especially at coarse scales where high attention scores occur more frequently; and (ii) a discrepancy between the sampling frequencies of codebook entries and their predicted probabilities due to limited calibration data. To address these challenges, we propose a PTQ framework tailored for VAR. First, we introduce a shift-and-sum quantization method that reduces reconstruction errors by aggregating quantized results from symmetrically shifted duplicates of value tokens. Second, we present a resampling strategy for calibration data that aligns sampling frequencies of codebook entries with their predicted probabilities. Experiments on class-conditional image generation, inpainting, outpainting, and class-conditional editing show consistent improvements across VAR architectures, establishing a new state of the art in PTQ for VAR.