A Multi-task Mixture of Experts Framework for Malware Classification, Packing Detection, and Family Attribution
2026-06-29 • Cryptography and Security
Cryptography and SecurityArtificial Intelligence
AI summaryⓘ
The authors address the challenge of detecting various types of malware, which can be tricky because malware comes in many forms and can be hidden or altered. They propose a smart system that uses multiple specialized 'expert' models working together to handle different tasks like identifying malware types, detecting if the file is packed or unpacked, and distinguishing malware from safe software. They test different setups and find that their best model, called Multi-Gate Mixture of Experts, works very well even when malware samples are modified to confuse detectors. Their approach helps create more reliable and scalable malware detection tools by dividing the work among expert models.
Malware ClassificationMixture of Experts (MoE)Portable Executable (PE)EMBER FeaturesMalware PackingAdversarial SettingsMulti-Task LearningTask-Specific RoutingMalware DetectionDistribution Shift
Authors
Jithin S., Roshin Sleeba C., Anvin Mariya P. B., Asmitha K. A., Vinod P., Serena Nicolazzo, Antonino Nocera
Abstract
Malware classification remains a challenging problem due to its inherent heterogeneity, the presence of packed binaries, and the diverse distribution of malware families. Traditional single-model detection mechanisms often fail to generalize across such diverse data, leading to degraded performance, particularly on obfuscated and rare malware samples. In this work, we propose a unified multi-task malware analysis framework based on Mixture of Experts (MoE) architectures. The proposed system evaluates performance across two different input representations, i.e., high-dimensional EMBER feature sets and raw 1D byte arrays extracted from Portable Executable files. It simultaneously performs three critical tasks: malware family classification, packed versus unpacked detection, and malware versus benign identification. By decomposing the problem into specialized expert networks and employing adaptive gating mechanisms, the model enables effective task-specific learning while maintaining overall scalability. We investigate multiple architectural variants, including Homogeneous MoE, Heterogeneous MoE, and Multi-Gate MoE (MMoE). Performance is evaluated in both standard and adversarial settings using original and mutated samples. The obtained results demonstrate that the Multi-Gate MoE model achieves the best performance, reaching a combined detection rate of 0.9744 with only $2.56\%$ failure rate. Moreover, this configuration exhibits improved robustness under mutation-induced distribution shifts. Our findings highlight the effectiveness of expert specialization and task-specific routing in handling complex malware distributions, making the proposed framework a promising direction for scalable and resilient malware detection systems.