SVD-Surgeon: Optimal Singular-Value Surgery for Large Language Model Compression

2026-06-22 • Machine Learning

Machine LearningComputation and Language

AI summaryⓘ

The authors present SVD-Surgeon, a new way to make large language models smaller and faster without retraining. They use a math technique called Optimal Brain Surgeon on the singular values from singular value decomposition (SVD) to smartly adjust which parts to keep or remove. This approach helps keep the model accurate even after compression and works well when combined with other SVD compression methods. They tested it on popular language models like OPT and LLaMA 2-7B and saw better balance between size reduction and performance.

large language modelssingular value decompositionmodel compressionOptimal Brain Surgeonsingular valuesmodel pruningperplexitySVD-LLMOPT modelLLaMA

Authors

Mahmoud Safari, Frank Hutter

Abstract

Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their deployment is constrained by substantial memory and compute requirements. Low-rank compression via singular value decomposition (SVD) is an effective remedy, but existing methods focus on how to factorize and which components to keep. We introduce SVD-Surgeon, a training-free method that brings the Optimal Brain Surgeon (OBS) framework to the singular-value basis. Treating each singular value as a parameter, it computes a closed-form update of the retained singular values that compensates, to second order in the model loss, for those removed by truncation. The same analysis yields a saliency for choosing which values to prune. As it operates directly on the singular-value factorization, SVD-Surgeon can be layered on top of existing SVD compressors. Applied to SVD-LLM, a leading SVD-based method, it improves the perplexity-compression trade-off on the OPT family and LLaMA 2-7B without any retraining.

View PDFOpen arXiv