CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

2026-04-23 • Cryptography and Security

Cryptography and SecuritySoftware Engineering

AI summaryⓘ

The authors created a dataset called CrossCommitVuln-Bench that shows 15 real Python security problems (vulnerabilities) introduced across several code changes (commits). Each single change looked safe on its own, so common security tools often missed the big problem until all changes were combined. Their tests showed that analyzing code one commit at a time detected only 13% of these issues, and even looking at the full code together caught just 27%. They provide this dataset and tools openly to help improve methods for finding security issues that develop gradually over multiple code changes.

vulnerabilitycommitstatic analysisSASTPythonCVEssecurity toolscodebaseSemgrepBandit

Authors

Arunabh Majumdar

Abstract

We present CrossCommitVuln-Bench, a curated benchmark of 15 real-world Python vulnerabilities (CVEs) in which the exploitable condition was introduced across multiple commits - each individually benign to per-commit static analysis - but collectively critical. We manually annotate each CVE with its contributing commit chain, a structured rationale for why each commit evades per-commit analysis, and baseline evaluations using Semgrep and Bandit in both per-commit and cumulative scanning modes. Our central finding: the per-commit detection rate (CCDR) is 13% across all 15 vulnerabilities - 87% of chains are invisible to per-commit SAST. Critically, both per-commit detections are qualitatively poor: one occurs on commits framed as security fixes (where developers suppress the alert), and the other detects only the minor hardcoded-key component while completely missing the primary vulnerability (200+ unprotected API endpoints). Even in cumulative mode (full codebase present), the detection rate is only 27%, confirming that snapshot-based SAST tools often miss vulnerabilities whose introduction spans multiple commits. The dataset, annotation schema, evaluation scripts, and reproducible baselines are released under open-source licenses to support research on cross-commit vulnerability detection.

View PDFOpen arXiv