Detecting Adversarial Evasion Attacks Against Autoencoder-Based Network Intrusion Detection Systems
2026-07-01 • Cryptography and Security
Cryptography and Security
AI summaryⓘ
The authors study attacks where bad network traffic is changed in a sneaky way so it looks normal but fools a detection system based on images of packet data. They propose two new methods to catch these sneaky attacks: one looks for unusual patterns in image error areas, and the other checks consistency of time gaps between packets. Testing on traffic from IoT devices shows their methods detect almost all attacks very accurately. Their work suggests combining image-based and packet-based checks can help stop these tricky attacks on network intrusion detectors.
evasion attacksmachine learningnetwork intrusion detection system (NIDS)adversarial examplesPANDA frameworkautoencoderinter-arrival timemasked FGSMIoT trafficreconstruction error
Authors
Niklas Bunzel, Ashim Siwakoti
Abstract
Evasion attacks deliberately manipulate input to an ML-based system to produce an incorrect prediction while the manipulated input still appears benign. The PANDA framework has demonstrated that adversarial examples developed for the vision domain can be transferred to the network domain by converting packet sequences into invertible grayscale images, enabling gradient-based attacks such as masked FGSM against autoencoder-based network intrusion detection systems (NIDS). These attacks manipulate the NIDS anomaly score without altering the underlying attack semantics, leaving defenders without a straightforward way to distinguish between benign flows and carefully perturbed malicious traffic. In this paper, we propose two complementary detectors: the Residual Localisation Detector (RLD), which tracks the spatial concentration of reconstruction errors in the inter-arrival time feature region in image space; and the Feature-Space Perturbation Consistency (FPC) Detector, which operates directly on packet-level inter-arrival time features in packet-feature space. We evaluate both detectors on benign, malicious, and adversarial traffic from multiple IoT devices in the UQ-IoT dataset. Both detectors achieve near-perfect detection performance (TNR, TPR, precision, recall, and F1-score $\geq 0.99$) against adversarial examples across the evaluated IoT traffic. Our results indicate that integrating reconstruction-based scoring with perturbation consistency checks, in both image space and packet-feature space, offers a practical defence against emerging PANDA-style adversarial attacks on NIDS.