Learned JPEG Compression for DNN Vision
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed a method called J4D to improve JPEG image compression specifically for use by deep neural networks rather than humans. They created a way to adjust JPEG settings by making the compression process mathematically smooth and measurable, allowing optimization through training. This approach helps keep file sizes small while making sure the neural networks can still understand the images well. Their tests show J4D works better than standard JPEG compression for various networks and datasets. They also demonstrate the possibility of finding JPEG settings that work well across different neural network types.
JPEGlossy compressiondeep neural networksdifferentiable quantizationentropycompression ratebackpropagationprobabilistic quantizationimage encodingoptimization
Authors
Kaixiang Zheng, Ahmed H. Salamah, Siyu Chen, En-Hui Yang
Abstract
JPEG, a lossy image compression technique designed for human viewers, has maintained its dominance for decades. However, in the era of artificial intelligence (AI), a substantial portion of image data, often compressed by JPEG, is and will continue to be consumed by deep neural networks (DNNs) instead of humans, thus creating a need to optimize JPEG for DNN inference performance. To this end, we propose learned JPEG compression for DNN vision (J4D), a novel training framework for determining JPEG encoding parameters to minimize compression rate while maximizing DNN inference performance. The major challenge of solving this optimization problem lies in representing the JPEG codec and compression rate in closed form. By incorporating a differentiable soft quantizer based on a probabilistic quantization scheme, we not only obtain a differentiable proxy for the JPEG codec, but are also able to compute the entropy of the coded source analytically, which is a close estimate of the actual compression rate. Equipped with both the differentiable JPEG codec and the information-theoretic rate estimator, we are then able to solve the aforementioned optimization problem with backpropagation. After training, the learned encoding parameters will be subsequently used in actual JPEG encoding based on probabilistic quantization. Extensive experimental results across multiple datasets and DNN architectures demonstrate that J4D consistently and significantly outperforms the default JPEG and other competitive JPEG codecs optimized for DNNs. Notably, compared to the default JPEG, J4D achieves an increase in accuracy by as much as 11.60% at the same rate, or a reduction of compression rate up to 80.05% at the same accuracy. Additionally, with the help of J4D, we show the potential to design universal JPEG encoding parameters for various DNN architectures for the first time.