Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
2026-04-30 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study how quantum machine learning models, which use quantum computing to classify images, can be fooled by sneaky changes in the images called adversarial attacks. Instead of training the model with these tricky images, which can be hard or cause problems, they propose using a quantum autoencoder to clean or 'purify' the images before classification. Their method also gives a way to tell when an image might still be suspicious even after cleaning. Tests show their approach improves accuracy a lot when facing these attacks.
quantum machine learningadversarial attacksvariational quantum classifiersquantum autoencoderadversarial trainingimage classificationadversarial perturbationsmodel robustnessconfidence metric
Authors
Emma Andrews, Sahan Sanjaya, Prabhat Mishra
Abstract
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.