OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns

2026-05-11Machine Learning

Machine Learning
AI summary

The authors propose OUIDecay, a new way to adjust weight decay in convolutional neural networks that changes for each layer and over time. Instead of using the same weight decay everywhere, their method watches how each layer behaves during training using a metric called the Overfitting-Underfitting Indicator (OUI) based on activations. This helps the model decide how much regularization each layer needs without using extra validation data. Their tests showed that OUIDecay often improves performance compared to fixed or gradient-based decay methods while being simple and efficient to run.

weight decayconvolutional neural networksregularizationOverfitting-Underfitting Indicatoractivation patternsadaptive weight decayvalidation lossEfficientNetResNetDenseNet
Authors
Alberto Fernández-Hernández, Jose I. Mestre, Cristian Pérez-Corral, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí
Abstract
Weight decay remains one of the most widely used regularization mechanisms for training convolutional neural networks, yet it is still commonly applied as a fixed coefficient shared by all layers throughout training. This uniform treatment ignores that different layers may follow different structural dynamics and therefore may require different regularization strengths. In this work, we propose OUIDecay, an adaptive layer-wise and time-dependent weight decay scheduler for CNNs driven by the Overfitting-Underfitting Indicator (OUI), an activation-based metric previously shown to provide early information about regularization quality. OUIDecay uses a lightweight batch-based formulation of OUI to monitor the structural behavior of each layer online and periodically rescales its weight decay relative to the other layers in the network. Unlike gradient-based adaptive decay methods, our approach relies on functional information extracted from activation patterns and does not require validation data. Experiments on EfficientNet-B0 with Stanford Cars, ResNet50 with Food101, DenseNet121 with CIFAR100, and MobileNetV2 with CIFAR10 show that OUIDecay achieves the best mean best-validation-loss in 7 out of 8 evaluated settings. These results indicate that activation-driven weight decay adaptation is a practical and effective alternative to fixed decay and gradient-based adaptive decay, while keeping the method lightweight and suitable for online use.