Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering

2026-05-11Computation and Language

Computation and Language
AI summary

The authors studied how to make large language models generate more correct database queries in the Cypher language by adding rules that check the query’s grammar and database structure after the query is made. They found that checking grammar makes the queries syntactically correct, and adding checks against the database’s schema helps ensure the queries match the structure needed to work. However, stricter checks can sometimes lead to fewer usable answers. Their work shows that simple rule checks can improve reliability when turning natural language questions into Cypher queries and helps explain how different types of rules affect results.

Large Language ModelsText2CypherNatural Language QueryGrammar ValidationSchema ConstraintsDatabase SchemaSyntax ValidityQuery ExecutionInference FilteringConfidence Scoring
Authors
Makbule Gulcin Ozsoy
Abstract
Large language models (LLMs) allow users to query databases using natural language by translating questions into executable queries. Despite strong progress on tasks such as Text2SQL, Text2SPARQL, and Text2Cypher, most existing methods focus on better prompting, fine-tuning, or iterative refinement. However, they often do not explicitly enforce structural constraints, such as syntactic validity and schema consistency. This can reduce reliability, since generated queries must satisfy both syntax rules and database schema constraints to be executable. In this work, we study how structured constraints can be used in test-time inference for Text2Cypher. We focus on post-generation validation to improve query correctness. We extend a confidence-based inference framework with a sequential filtering process that combines confidence scoring, grammar validation, and schema constraints before final aggregation. This lets us analyze how different constraint types affect generated queries. Our experiments with two instruction-tuned models show that grammar-based filtering improves syntactic validity. Schema-aware filtering further improves execution quality by enforcing consistency with the database structure. However, stronger filtering also increases the number of empty predictions and reduces execution coverage. Overall, we show that adding simple structural checks at test time improves the reliability of Text2Cypher generation, and we provide a clearer view of how syntax and schema constraints contribute differently.