Rethinking Evaluation Paradigms in IBP-based Certified Training

2026-06-01Machine Learning

Machine LearningArtificial IntelligenceComputer Vision and Pattern Recognition
AI summary

The authors look at how to make deep neural networks more reliably safe against tricky inputs called adversarial attacks. They point out that previous studies usually pick just one setting to measure performance, which can be misleading because there's a trade-off between being accurate on normal data and being robust against attacks. To fix this, the authors use a method that finds the best balance between these two goals for different training approaches. This way, they fairly compare methods and discover that some older results weren't as good as thought, while also finding combinations that work better together.

deep neural networksadversarial perturbationsneural network verificationcertified trainingnatural accuracycertified accuracyPareto frontmulti-objective optimisationhyperparameter tuningrobustness guarantees
Authors
Konstantin Kaulen, Hadar Shavit, Holger H. Hoos
Abstract
Deep neural networks achieve strong performance on many supervised learning tasks but remain vulnerable to adversarial perturbations. Neural network verification provides mathematically rigorous robustness guarantees, yet at substantial computational cost. To mitigate this, certified training techniques optimise for verifiable robustness during training, typically inducing a trade-off between natural and certified accuracy controlled by method-specific hyperparameters. Because these metrics are inherently conflicting, the common practice of reporting a single configuration is problematic: it can mislead conclusions about overall performance and prevents unbiased assessments of the state of the art. We address this by evaluating certified training methods via Pareto front comparisons over the natural--certified accuracy trade-off. To enable fair, method-agnostic comparisons, we perform efficient automated multi-objective hyperparameter optimisation to identify a set of Pareto-optimal configurations for each method. This approach often uncovers substantial undertuning in previously reported configurations, yielding superior performance and establishing a new state of the art. Leveraging these fronts, we present the first comprehensive multi-objective comparison of certified training approaches, showing that prior advancements are less pronounced than assumed and revealing previously unreported performance complementarities.