A Hybrid Approach For Malware Classification Using Secondary Features Fusion
2026-06-02 • Cryptography and Security
Cryptography and SecurityArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors address the challenge of detecting malware and also identifying which family or group the malware belongs to, which helps in better handling of threats. They designed a method that combines different features from malware, like API calls and patterns in code, and then uses a voting system among algorithms to improve accuracy. They tested their method on a Microsoft dataset and found it to be very accurate and effective compared to existing methods. This work helps make malware detection tools smarter by not just spotting malware but also classifying it correctly.
malware detectionmalware classificationAPI callsn-gramsfeature fusionfeature selectionalgorithm fusionbinary classificationmulti-class classificationAUC
Authors
Raja Khurram Shahzad, Muhammad Mustaqeem, Haroon Elahi
Abstract
The number of malware (either variant or novel) is rapidly increasing, making malware detection and mitigation a complex problem. One approach to improving malware mitigation is automatic detection and malware family classification. However, traditional malware detection methods cannot classify detected malware into their respective families, hindering effective malware mitigation. Consequently, this paper proposes a method to automate malware detection and classification of the detected malware into respective malware families. The proposed method uses feature fusion after extracting relevant malware features such as API calls and fixed and variable length n-grams with a customized feature selection method. Moreover, for the predictive model, a voting based approach is proposed for algorithm fusion. For the experimental evaluation of the proposed method, both binary and multi-class classification approaches are applied to the data set provided by Microsoft. Finally, the experimental results are compared with the state of the art. The experimental results indicate the effectiveness and efficiency of the proposed approach with an AUC of 0.989, accuracy of 99.72%, and a log loss of 0.01.