EntSQL: A Benchmark for Grounding Text-to-SQL in Long-Context Enterprise Knowledge

2026-06-02 • Computation and Language

Computation and Language

AI summaryⓘ

The authors created a new test called EntSQL to see how well computer programs can turn natural language questions into SQL queries using detailed, private business documents. This test focuses on real company situations where understanding special business rules and internal knowledge is important. EntSQL has over a thousand examples in both Chinese and English, covering various business areas, and many questions need complex SQL commands. The authors found that even the best current systems struggle a lot, showing this is a hard problem.

Text-to-SQLSQL generationenterprise knowledgebenchmarkschema generalizationlong-context groundingsemantic examplesbusiness domainsnatural language processinglarge language models

Authors

Chengxi Liao, Tao Xu, Zulong Chen, Chuanfei Xu, Yiyan Wang, Xinyun Wang, Yanlong Zhang, Xiaojun Chen, Zhibo Yang, Zeyi Wen

Abstract

Text-to-SQL enables natural language access to databases, and recent LLMs have substantially advanced its capabilities. Existing benchmarks such as Spider, BIRD, and Spider~2.0 evaluate schema generalization, large-scale databases, and realistic workflows, but largely overlook enterprise scenarios where SQL generation depends on private business knowledge, such as internal metrics, reporting conventions, and organizational rules. We introduce EntSQL, an enterprise-oriented Text-to-SQL benchmark for evaluating long-context grounding over proprietary business documents. EntSQL contains 1,066 aligned Chinese-English semantic examples across five business domains, with most examples requiring domain knowledge beyond the question and schema and involving complex SQL structures. On English inputs, the best evaluated system reaches only 15.9\% when long-form documents are provided, highlighting the difficulty of grounding SQL generation in enterprise knowledge.

View PDFOpen arXiv