ATTAIN: Automated Exploit Failure Analysis through Trace-Driven Diff Analysis

2026-06-08 • Software Engineering

Software Engineering

AI summaryⓘ

The authors present ATTAIN, a new method to find which library versions have a security problem. It works by running exploits on different versions, spotting differences in how they behave, and then using a smart tool to check code changes related to the vulnerability. Compared to older ways that look only at code commits or exploits, ATTAIN is more accurate and uses less computing power. It was tested on many vulnerabilities and versions, performing better than existing methods, even when exploits fail or commit messages are unclear.

exploitlibrary versionvulnerabilitycommit-based analysisSZZ algorithmtrace-driven analysisdiff explorationlarge language model (LLM)CVEF1-score

Authors

Xinwei Mao, Zirui Chen, Xing Hu, Xin Xia

Abstract

Exploits are widely used to check whether library vulnerabilities appear in different versions and to mark affected version ranges. Exploit-based checks sometimes fail because exploits stop running on many versions after API or environment changes. Commit-based methods, such as SZZ-style analysis, sometimes miss the right introduce commits and spread labels incorrectly along long version chains. These problems leave many affected versions unlabeled or wrongly labeled and make manual exploit failure analysis very expensive and impractical at scale. We present ATTAIN, a trace-driven diff analysis framework with three modules to assess vulnerability presence across evolving library versions. The modules are trace construction, diff exploration, and affected-version judgment. The trace construction module executes an exploit across historical library versions and compares their behaviors to capture cross-version execution divergences. Using these divergences, the diff exploration module guides an LLM through a finite-state tool loop to autonomously search over version changes and collect vulnerability-relevant diff hunks. The affected-version judgment module reasons over the collected evidence to determine whether the vulnerability exists in each version and outputs the affected version range. We evaluate ATTAIN on an extensive dataset comprising 224 CVEs and 25,943 library versions across 128 libraries. ATTAIN achieves an F1-score of 93.24%, outperforming the commit-based methods V-SZZ and LLM4SZZ by 116.28% and 33.30%, respectively. ATTAIN uses short tool-guided prompts and a fixed number of iterations, keeping token usage low. It matches or surpasses existing methods on frequent CWE types, including cases where exploit runs fail for non-vulnerability reasons or commit messages do not clearly delimit affected versions.

View PDFOpen arXiv