Bespoke-Card: Why Tune When You Can Generate? Synthesizing Workload-Specific Cardinality Estimators

2026-06-08Databases

Databases
AI summary

The authors created Bespoke-Card, a system that builds custom tools to better estimate how many database records match certain queries. Instead of using one-size-fits-all guesses, it makes specific estimators designed for a given workload, using agents that design, code, and test them. This approach helps the database plan queries more efficiently, reducing runtime and improving estimate accuracy. It works quickly and cheaply, showing promise for improving database query planning beyond traditional or generic methods.

cardinality estimationquery optimizerPostgreSQLworkload-specific modelsq-errorregression analysisquery planningdatabase performancelearned estimatorssubplan
Authors
Johannes Wehrstein, Anton Winter, Timo Eckmann, Carsten Binnig
Abstract
Cardinality estimators are built to support arbitrary schemas and workloads, forcing them to rely on generic statistics even when the schema and workload is known in advance, leaving optimizers prone to large errors and poor plans. We present Bespoke-Card, an agent-driven system that synthesizes workload-specific cardinality estimators as executable code: a planning agent designs the estimators strategies, a coding agent implements them, and a validator scores the estimates against true cardinalities and PostgreSQL estimates, forming a robust and deterministic harness. Going beyond naive prompting, Bespoke-Card uses structured q-error feedback, regression analysis, concrete outlier subplans, a curriculum isolating join-only, filter-only, and full-subplan errors, and archival selection of the best implementation. Injecting its estimates into the optimizer cuts total PostgreSQL runtime on JOB by 33% and reduces median q-error over all JOB subplans by 41%, while synthesizing a strong estimator in under one hour for less than $10. Bespoke-Card is opening a new avenue for cardinality estimation next to classical generic estimators and learned estimator architectures.