Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale

2026-06-15 • Software Engineering

Software Engineering

AI summaryⓘ

The authors address the challenge of loop invariant synthesis, which is important for software verification but difficult to solve. They created VerIbmc, a tool that combines symbolic methods with open, locally-run language models to avoid relying on costly or private cloud services. Their approach uses feedback from a verifier to improve solutions iteratively, trying two prompting styles called Chain-of-Thought and Tree-of-Thought. Testing on many problems showed their best setup solved 86% of them and performed well compared to cloud-based tools, all while running on a single local machine without needing proprietary APIs. This shows it's possible to do effective and private invariant synthesis with open models and symbolic techniques.

loop invariant synthesisformal software verificationneuro-symbolic methodslarge language modelsopen-weight modelssymbolic invariant generationESBMC verifierChain-of-Thought promptingTree-of-Thought promptingprivacy-preserving computation

Authors

Muhammad A. A. Pirzada, Julian Parsert, Weiqi Wang, Konstantin Korovin, Lucas C. Cordeiro

Abstract

Loop invariant synthesis remains a central and pivotal bottleneck in formal software verification. Recent LLM-based Neuro-Symbolic tools have achieved impressive solve rates. However, these tools rely on proprietary, often expensive cloud APIs, which constitute a hurdle for privacy-sensitive industrial deployments where the source code cannot leave the organisation or where cost is a factor. We present VerIbmc, a neuro-symbolic pipeline that pairs symbolic invariant generation with locally deployable open-weight language models with the ESBMC verification tool. Our pipeline combines a deterministic symbolic invariant synthesis phase with an iterative LLM refinement loop driven by structured verifier feedback. In addition, we provide two types of pipelines that differ in their prompting strategy: Chain-of-Thought vs. Tree-of-Thought. We conduct an extensive experimental evaluation with five open-weight models (ranging from 7B to 120B parameters) across five benchmark families comprising of 520 problems (499 after excluding 21 with unavoidable overflow). Overall, the best single configuration (GPT-OSS-120B) solves 431 of 499 problems (86.4%). Additionally, on the four benchmark suites shared with the strongest cloud-API tools, VerIbmc is competitive running only on a single local machine. The evaluation shows symbolic invariant synthesis solves 75 problems without any LLM call and yields up to +35 additional problems for the weakest model. Importantly, all inference runs entirely on a single local machine using open-weight models -- no cloud API or proprietary model is required. Overall, we demonstrate that a neuro-symbolic approach based on LLMs can be used effectively for invariant synthesis in a privacy-preserving and energy-efficient manner, without having to resort to expensive proprietary frontier models locked behind APIs.

View PDFOpen arXiv