MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

2026-06-03 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors identify that standard methods struggle to reduce the size of big multi-modal language models to 4-bit precision because different data types (modalities) have very different value patterns, including unusual extreme values. They propose MorphoQuant, which adjusts for these differences by selectively managing outliers and aligning quantization settings with modal characteristics. Their approach improves accuracy by better handling varied data distributions and outlier values during quantization. Tests show their method outperforms existing 4-bit quantization approaches and even beats some models using higher precision.

Post-Training Quantization4-bit QuantizationLarge Language ModelsMulti-modal ModelsOutlier DetectionQuantization GridBias CompensationDistribution HeterogeneityCross-modal MorphologyScienceQA Benchmark

Authors

Yue Wu, Changyuan Wang, Zixuan Wang, Shilin Ma, Yansong Tang

Abstract

Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language Models (OLLMs) due to the extreme distribution heterogeneity and disparate outlier patterns across modalities. To address this, we propose MorphoQuant, a modality-aware PTQ framework engineered to preserve cross-modal morphology and mitigate outlier loss. Specifically, we introduce Distribution-Aware Bias Compensation (DABC), which selectively absorbs long-tailed outliers into channel-wise biases. This mechanism safeguards outlier magnitudes while maintaining high-precision discretization for dense inliers, thereby preserving accurate discretization across diverse modal distribution. Complementing this, we propose Morphology-Directed Quantization Function Optimization (MDQFO) to co-optimize the quantization grid with the bias mask, ensuring fine-grained alignment across modalities. Extensive evaluations on Qwen2.5-Omni across benchmarks like MMMU and Video-MME demonstrate our approach's superiority. Notably, our W4A4 model achieves 76.63% on ScienceQA, significantly outperforming SOTA W4A4 methods and surprisingly surpassing the W4A16 baseline, which fully demonstrates the exceptional accuracy-efficiency trade-off of our framework.

View PDFOpen arXiv