A Set-Theoretic Approach to Detecting Logic Bugs in DBMS Inner Join Optimizations

2026-06-22Databases

DatabasesSoftware Engineering
AI summary

The authors focus on the part of a database that figures out the best way to answer questions, especially when combining data from multiple tables. Because this process is complicated, mistakes that cause wrong answers (logical bugs) can happen and are hard to find. They created a tool called JoinEquiv that changes queries in smart ways based on math rules to check if the database gives the same correct answers. With this method, they found many new bugs in popular databases, showing that their approach helps make databases more reliable.

query optimizerjoin optimizationINNER JOINlogical bugsmetamorphic testingset theoryquery transformationDBMSnatural jointesting oracle
Authors
Ce Lyu, Changzheng Wei, Yanhao Wang, Jie Liang, Li Lin, Hanghang Wu, Minghao Zhao, Ying Yan, Aoying Zho
Abstract
The query optimizer is a fundamental component of database management systems that determines the most efficient execution strategy for a given query by evaluating alternative query plans. Among its tasks, join optimization plays a central role, as the order of joins in multi-table queries can significantly affect execution performance. However, due to the inherent complexity of join optimization, logical bugs are inevitable and often difficult to detect. While existing fuzzing tools have shown notable success in uncovering crash- and performance-related errors, effectively identifying logical bugs -- cases in which the system produces incorrect query results -- remains largely unresolved. In this paper, we propose a metamorphic testing approach to detect DBMS bugs related to INNER JOIN optimization through the lens of set theory. For each testing case, equivalent queries are generated based on a basic set operation -- intersection -- and three semantics-preserving transformation rules, i.e., symmetric join transformation, asymmetric difference transformation, and symmetric difference transformation, are introduced. These rules rewrite a simple NATURAL/INNER JOIN query into a more complex, yet semantically equivalent, form. We implement this design in JoinEquiv, which serves as a testing oracle to systematically uncover logical inconsistencies in DBMS query processing by comparing the results of original and transformed queries. Using JoinEquiv, we uncovered 29 previously unknown issues in mainstream DBMSs (MySQL, TiDB, DuckDB, and Percona), and 27 of them were officially confirmed. JoinEquiv reveals deep logical flaws in DBMS optimizers and executors, underscoring its value in enhancing DBMS robustness.