CAN We Trust Your Results? A Cross-Dataset Study of Automotive IDS Evaluation

2026-06-29Cryptography and Security

Cryptography and SecurityMachine Learning
AI summary

The authors point out that modern cars are very connected and need good security for their internal communication. They explain that while many systems to detect attacks (called IDS) have been studied for the car's network (CAN bus), testing these systems is tricky because different studies use different setups and data. To fix this, the authors created a framework to fairly test IDS methods using several publicly available datasets. Their tests show that how well these detection methods work can change a lot depending on the dataset, highlighting the need for consistent testing to understand how well these methods perform in different real-world situations.

Intrusion Detection System (IDS)Controller Area Network (CAN) busIn-vehicle communicationBenchmarking frameworkCross-dataset evaluationCybersecurityAutomotive networksMalicious activity detectionExperimental setupDataset variability
Authors
Beatrix Koltai, Gergely Acs, Andras Gazdag
Abstract
The increasing connectivity of modern vehicles has made securing in-vehicle communication networks a critical challenge. Intrusion Detection Systems (IDS) have been widely studied as a defense mechanism for detecting malicious activities on the Controller Area Network (CAN) bus. However, the evaluation of CAN IDS methods remains difficult due to inconsistencies in experimental setups and the lack of standardized benchmarking frameworks. As a result, reported performance often depends on dataset-specific characteristics and may not reflect how detection methods behave in different environments. This work introduces a benchmarking framework for consistent evaluation of CAN IDSs across multiple datasets. Using the proposed framework, we integrate seven publicly available CAN IDS datasets collected under different experimental conditions and perform cross-dataset evaluation of five conceptually different IDS approaches. Our results highlight how detection performance can vary significantly across datasets, demonstrating the importance of cross-dataset benchmarking for assessing the robustness and generalization capabilities of CAN IDS methods.