Traffic-CBM: A Structurally Interpretable Multimodal Framework for Encrypted Traffic Classification
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created Traffic-CBM, a new method to classify encrypted internet traffic that can explain its decisions better than previous methods. Instead of mixing all data into a hard-to-understand format, they organize different types of traffic information into clear groups called concepts. These concepts represent summaries of traffic features like flow statistics or packet data, making it easier to see what evidence influenced a prediction. Tests show Traffic-CBM performs well while also giving clearer explanations.
Encrypted traffic classificationFlow statisticsPacket sequencesByte-level representationsMultimodal learningConcept representationTemporal encodingHierarchical modelsInterpretabilityTraffic evidence
Authors
Honglei Jin, Wenshuo Chen, Shaofeng Liang, Haozhe Jia, Menshuo Zhao, Shuxu Jin, Songning Lai, Yutao Yue
Abstract
Encrypted traffic classification has achieved strong performance, but its decision process remains difficult to interpret. Existing methods usually combine flow statistics, packet sequences, and byte-level representations into opaque latent features, making it unclear which type of evidence actually drives the prediction. In this paper, we propose Traffic-CBM, a structurally interpretable multimodal framework for encrypted traffic classification. Instead of directly fusing heterogeneous traffic signals into a black-box representation, Traffic-CBM organizes them into a unified hierarchical concept space. These concepts are not manually annotated semantic attributes; rather, they are scalar evidence summaries constrained by predefined traffic evidence groups. More specifically, grouped flow statistics are mapped to statistical concepts, dedicated temporal encoders learn temporal concepts from disjoint feature subspaces, and byte-level evidence is further organized into packet-level and cross-packet concepts. This design turns heterogeneous traffic evidence into an explicit concept representation and makes different levels of traffic evidence easier to analyze. We evaluate Traffic-CBM on multiple encrypted traffic benchmarks. Results show that it achieves competitive and balanced classification performance while providing a clearer structural interpretation interface than conventional end-to-end fusion models. Further analyses suggest that the learned concept space is actively used in the prediction process and provides a clearer structural explanation of multimodal traffic evidence.