Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

2026-06-15Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors study how well deep learning models can generalize, meaning how accurately they perform on new data, using something called robustness-based generalization bounds. They point out that current methods give bounds that are too loose and not helpful because they look at robustness as if it’s the same everywhere in the input space. Their new approach breaks the input into smaller parts and measures robustness differently in each part, which leads to tighter and more realistic bounds on error. Testing on ImageNet data, their bounds better match actual model performance compared to previous methods.

generalizationrobustnessgeneralization bounds0-1 lossdeep learningImageNeterror ratesdata-dependentmodel-dependentsub-regions
Authors
Abdul-Rauf Nuhu, Parham M. Kebria, Vahid Hemmati, Mahmoud N. Mahmoud, Edward Tunstel, Abdollah Homaifar
Abstract
Generalization is a critical property of data-driven models, particularly deep learning models deployed in safety-critical applications. Robustness-based generalization bounds have gained attention as a principled way to link robustness properties to generalization performance, often in a data-dependent manner. However, most existing bounds suffer from vacuousness in practical settings, yielding loose upper bounds that greatly exceed the actual error rates and limiting their usefulness for real-world evaluation. While this issue is often attributed to the uncertainty term, a substantial part of the problem originates from the robustness term itself, particularly for the 0-1 loss. Existing approaches typically treat the robustness term as a global measure, ignoring its variation across different sub-regions of the input space. In this work, we propose a generalization bound that addresses this limitation by scaling the robustness term according to the number of stable and unstable samples within each sub-region. Our bounds incorporate both data- and model-dependent factors while maintaining practical relevance (yielding tighter upper bounds on true error). Experiments on models trained on the ImageNet dataset show that our bounds remain consistently non-vacuous and achieve the tightest estimates among existing methods, closely aligning with empirical performance across a range of robust deep neural networks.