Names Are All You Need: Effective and Safe Regression Test Selection for Python

2026-05-25 • Software Engineering

Software Engineering

AI summaryⓘ

The authors present NameRTS, a new way to speed up testing in Python by only running tests affected by recent code changes. Because Python is dynamically typed, it is hard to track which tests to run using traditional methods, so the authors use a graph that links code elements with their names to decide which tests might be impacted. NameRTS improves testing efficiency by skipping many unnecessary tests while mostly avoiding missed tests, outperforming a previous method called BabelRTS. They also created a new dataset to fairly evaluate their approach. Overall, their method helps make Python testing faster and more reliable.

Regression Test SelectionDynamic TypingCall GraphDependency AnalysisPythonEager ImportingReachabilityTest CoverageSoftware TestingPruning Strategies

Authors

You Wang, Michael Pradel, Zhongxin Liu

Abstract

Regression test selection reduces the cost of regression testing by executing only those tests affected by a code change. Despite extensive study of RTS in statically typed languages, achieving effective and safe RTS in Python is challenging. Python's dynamic typing makes precise call-graph construction difficult, which can cause call-graph-based RTS to miss affected tests. Python's eager importing mechanism, in contrast, renders file-level dependency analysis overly conservative. This paper presents NameRTS, the first Python RTS approach based on fine-grained dependency analysis. NameRTS models a Python program as a bipartite graph of code element nodes and name nodes, with edges capturing definitions and references. RTS is formulated as a reachability problem on this graph: a test is selected if any modified code element is reachable from the names used in that test. This design avoids call-graph construction, enabling a conservative analysis amenable to safety. To control dependency cascades introduced by coarse name matching, NameRTS applies two pruning strategies that leverage prior test executions and context information to refine name matching. To evaluate NameRTS, we construct the first Python RTS dataset with a ground truth indicating which test files are affected by each commit. We compare NameRTS with the best-performing baseline, BabelRTS, an RTS technique based on coarse file-level dependencies. On this benchmark, NameRTS skips 69.90% of test files on average, outperforming BabelRTS by 146.5%. It also reduces end-to-end testing time by 45.59%, yielding a 107.7% improvement over BabelRTS. In terms of safety, NameRTS selects all affected tests for 99.6% of commits, with only rare misses in exceptional cases. In contrast, BabelRTS is safe for 76.6% of commits. These results demonstrate the effectiveness of NameRTS, paving the way for more efficient regression testing in Python.

View PDFOpen arXiv