Sensitivity as a Double-Edged Sword: A Trade-off Between Discriminability and Adversarial Robustness

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors find that standard fully connected (FC) classifiers in neural networks are sensitive to small malicious changes, making them vulnerable to attacks. They note that simpler classifiers based on measuring distance using the l2 norm are more robust but not as accurate. To balance these traits, the authors design a new method called Hybrid Prototype Mixing (HPM) that combines the strengths of both, using stable dataset-wide and dynamic batch-based reference points. Because their method is complex and tricky to test against attacks, they create a new testing approach (Mixed Surrogate Attack) for better evaluation. Their experiments show that adding this module improves robustness with little extra training on top of current strong models.

neural networksadversarial perturbationsfully connected classifierl2 distance classifierHybrid Prototype MixingprototypeStraight-Through Estimatorgradient obfuscationAutoAttackadversarial robustness
Authors
Kai Wang
Abstract
Modern neural networks are highly susceptible to adversarial perturbations. In this work, we identify that part of this vulnerability stems from the sensitivity of the widely used fully connected (FC) classifiers to such perturbations. In contrast, simple $\ell_2$ distance-based classifiers exhibit significantly greater robustness. We provide thorough theoretical and empirical analysis showing that while FC classifiers' high sensitivity makes them discriminative, it also makes them vulnerable. Conversely, $\ell_2$-classifiers' insensitivity grants robustness but limits performance. Motivated by this trade-off, we propose a novel $\ell_2$-reclassifier based on a Hybrid Prototype Mixing (HPM) framework. This method retains the discriminative power of FC classifiers while leveraging the robustness of $\ell_2$ distance. It yields $\ell_2$-distance-based predictions by fusing two prototype types: (1) stable, dataset-level prototypes updated via EMA, and (2) dynamic, batch-level prototypes generated from the FC classifier's predictions using a Straight-Through Estimator (STE). However, this dynamic, STE-based architecture introduces significant challenges for evaluation, such as gradient obfuscation and forward discontinuity. To address this, we propose a new, rigorous evaluation protocol, the Mixed Surrogate Attack (MSA), which uses multiple surrogates along with powerful AutoAttack to ensure a fair and robust assessment. Extensive experiments demonstrate that our lightweight, plug-and-play module, with minimal fine-tuning, effectively enhances the adversarial robustness of various existing SOTA adversarially trained models.