Scalable Malware Family Classification Using Quantum Kernel Based Machine Learning
2026-06-15 • Cryptography and Security
Cryptography and Security
AI summaryⓘ
The authors address the difficulty of accurately identifying types of malware, which are harmful software programs, especially as they become more similar and hidden. They propose a method combining quantum computing ideas with machine learning to better classify malware into families. Their approach creates simplified yet informative features from malware files and uses a special quantum-based technique to capture complex patterns. Tested on a large dataset, their method performed better than traditional techniques while handling many samples efficiently. This shows that integrating quantum kernels can improve malware detection accuracy.
Malware classificationQuantum kernelMachine learningLinear Discriminant AnalysisParameterized quantum circuitsNyström approximationRidge regressionKernel methodsMulticlass classification
Authors
Ratun Rahman, Hassan Jalil Hadi, Christopher Gabriel Pedraza Pohlenz, Ali Shoker
Abstract
The classification of malware families is a key challenge in cybersecurity, which enables threat attribution, analysis of attack operations, and the formulation of effective defense strategies. Emerging malware samples are becoming increasingly structurally similar and obfuscated, making accurate multiclass classification challenging for traditional machine learning models, especially when deployed at scale. In this research, we propose a scalable Quantum Kernel-based Machine Learning (QKML) framework for malware family classification that addresses both accuracy and efficiency constraints. The proposed framework extracts structural features from executable files and uses a supervised Linear Discriminant Analysis (LDA) projection to generate a compact, class-aware representation well suited for quantum processing. The nonlinear relationships among malware families are captured using a fidelity-based quantum kernel built from parameterized quantum circuits. We use the Nyström approximation method to obtain a low-rank approximation of the quantum kernel, which enables effective multiclass classification via ridge regression and enables learning from all available training samples without incurring the quadratic computational cost of kernel matrix construction. The proposed model achieves strong classification performance, with 80.88% accuracy, outperforming classical machine learning baselines under identical feature and data splits, according to experimental evaluation on a large-scale malware dataset that includes 18,836 samples across 23 malware families. These findings suggest that scalable quantum-kernel-based machine learning can offer measurable performance advantages for real-world malware family classification tasks.