Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

2026-05-05 • Software Engineering

Software Engineering

AI summaryⓘ

The authors present a new method that uses reinforcement learning (RL) to help Rust static analysis tools better tell real memory safety issues from false alarms. Their system learns from Rust’s intermediate code and uses fuzz testing to check suspicious warnings, reducing mistakes in the process. This approach improves accuracy and finds more true bugs compared to previous methods, especially when combined with dynamic testing. Overall, the authors show that mixing learning and testing can make Rust memory safety tools more reliable and useful for developers.

Static AnalysisRust Programming LanguageMemory SafetyReinforcement LearningMid-level Intermediate Representation (MIR)False PositivesFuzz Testingcargo-fuzzBug DetectionHybrid Static-Dynamic Analysis

Authors

P Akilesh, Leuson Da Silva, Foutse Khomh, Sridhar Chimalakonda

Abstract

Static analysis tools are essential for ensuring memory safety in Rust programs, particularly as Rust gains adoption in safety-critical domains. However, existing tools such as Rudra and MirChecker suffer from high false positive rates, which diminish developer trust, increase manual review effort, and may obscure genuine vulnerabilities. This paper presents a novel reinforcement learning (RL)-based approach for automatically classifying and suppressing spurious warnings in static memory safety analysis for Rust. To achieve this, we design an RL agent that learns a warning suppression policy by extracting contextual features from Rust's Mid-level Intermediate Representation (MIR) and optimizing its decisions through interaction with static analysis outputs. To improve decision quality, we integrate dynamic validation via cargo-fuzz as an auxiliary feedback mechanism, allowing the agent to selectively validate suspicious warnings through targeted fuzz testing. Our evaluation shows that the proposed approach significantly outperforms state-of-the-art LLM-based baselines, achieving 65.2% accuracy and an F1 score of 0.659, an improvement of 17.1% over the best LLM baseline. With a recall of 74.6%, our method successfully identifies nearly three-quarters of true bugs while substantially reducing false positives, improving precision from 25.6% in raw Rudra output to 59.0%. Incorporating dynamic fuzzing further boosts performance, yielding additional improvements of 10.7 percentage points in accuracy and 8.6 percentage points in F1 score over the RL-only variant. Overall, our work demonstrates that combining reinforcement learning with hybrid static-dynamic analysis can substantially reduce false positives and improve the practical usability of memory safety verification tools for Rust.

View PDFOpen arXiv