RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation
2026-06-15 • Artificial Intelligence
Artificial IntelligenceMachine Learning
AI summaryⓘ
The authors developed RecourseBench, a tool to fairly compare different algorithmic recourse methods that suggest how someone can change an unfavorable decision made by a model. Their system breaks the evaluation into separate parts like data, models, and evaluation, making it easier to add or change components. They also created automated tests to check if each method reproduces the original reported results, improving trust in comparisons. Additionally, RecourseBench offers an interactive web interface to explore and compare methods on various datasets and models.
algorithmic recoursecounterfactual explanationsbenchmarkreproducibilitymodel evaluationautomated testingdata preprocessingmachine learning modelsinteractive interfacedynamic registry
Authors
Zahra Khotanlou, Hashir Ahmed, Chenghao Tan, Ahmed Abdelaal, Amir-Hossein Karimi
Abstract
Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce \emph{RecourseBench}, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers -- Data, Preprocessing, Model, Recourse Method, and Evaluation -- governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.