HASTE: A Framework for Training-Free, Dynamic, and Steerable Compression of Pre-Trained Convolutional Neural Networks

2026-06-29Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors propose HASTE, a new tool that makes big CNN models run faster on devices with limited computing power without needing any extra training or data. It works by finding similar parts in the model's features during use and combining them to reduce computation. They tested this on popular image tasks and showed it can cut computing work by almost half with very little loss in accuracy. Their approach offers a way to efficiently shrink models on the fly, which can help in deploying AI on smaller devices.

Convolutional Neural NetworksDynamic ExecutionLocality-Sensitive HashingModel CompressionFeature MapsInferenceFLOPsChannel MergingResNet34Token Merging
Authors
Lukas Meiner, Jens Mehnert, Alexandru Paul Condurache
Abstract
Deploying large convolutional neural networks (CNNs) on resource-constrained devices is challenging due to their high computational cost. While dynamic execution methods are promising, existing approaches for CNNs typically require specialized training or fine-tuning, limiting their effectiveness when applied to pre-trained models and requiring data access. To address this gap, we propose HASTE (Hashing for Tractable Efficiency), a plug-and-play convolution module that enables training-free, dynamic compression of large pre-trained CNNs. At inference time, HASTE uses locality-sensitive hashing to identify and merge redundant channels of latent feature maps on a patch-wise basis. This process simultaneously compresses the depth of both input features and their corresponding filters, resulting in computationally cheaper convolutions. We conduct extensive experiments on CIFAR-10 and ImageNet across a range of architectures, demonstrating a 46.2% FLOPs reduction in a ResNet34 on CIFAR-10 with only a 1.25% drop in accuracy, without any retraining. We support our claims by comprehensive ablation studies to validate our core design choices, an analysis of the method's properties and limitations, and a discussion that connects our channel merging scheme to the conceptually related task of token merging in Vision Transformers. Our results demonstrate that HASTE provides an effective solution for steerable compression of pre-trained CNNs at runtime, opening new possibilities for the deployment of efficient deep learning methods.