MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection

2026-05-11 • Software Engineering

Software EngineeringCryptography and SecurityMachine Learning

AI summaryⓘ

The authors address problems in detecting software vulnerabilities, especially when some types of vulnerabilities are rare or harder to find. They explain these problems by looking at how data points are arranged in a specific geometric space. To fix this, they created MARGIN, a method that adjusts how these data points are spaced to better separate different vulnerability types. Their tests show that MARGIN improves detection accuracy and creates clearer, more reliable data groupings compared to existing methods.

software vulnerability detectiondeep learningembedding geometryhyperspherical representationadaptive margin metric learningvon Mises-Fisher distributionVoronoi cellsgeometric regularizationclassificationimbalanced datasets

Authors

Yuteng Zhang, Huifang Ma, Jiahui Wei, Qingqing Li, Yafei Yang

Abstract

Software vulnerability detection is critical for ensuring software security and reliability. Despite recent advances in deep learning, real-world vulnerability datasets suffer from two severe challenges: frequency imbalance and difficulty imbalance. We reinterpret these challenges from an embedding geometry perspective, observing that such imbalances induce geometric distortions in hyperspherical representation space. To address this issue, we propose MARGIN, a metric-based framework that learns discriminative vulnerability representations through adaptive margin metric learning and hyperspherical prototype modeling. MARGIN dynamically adjusts geometric regularization according to the distribution structure estimated by the von Mises-Fisher concentration, aligning the probability mass of embedding distributions with their corresponding Voronoi cells, thereby reducing geometric distortion and yielding more stable decision boundaries. Extensive experiments on public vulnerability datasets show that MARGIN consistently outperforms strong baselines, achieving notable improvements in classification and detection, especially on challenging, imbalanced datasets. Further analysis demonstrates that MARGIN produces more structured embedding geometries, improving robustness, interpretability, and generalization.

View PDFOpen arXiv