Do Metrics for Counterfactual Explanations Align with User Perception?

2026-03-16Artificial Intelligence

Artificial IntelligenceHuman-Computer Interaction
AI summary

The authors studied whether the common ways to measure the quality of counterfactual explanations in AI actually match what people think is a good explanation. They asked participants to rate explanations on several qualities and compared these ratings to standard algorithmic metrics. They found that the usual metrics often don't align well with human opinions and that combining many metrics doesn't help much. This means current measures may miss important parts of what makes an explanation useful to people. The authors suggest that AI explanation evaluations should focus more on human perspectives.

counterfactual explanationsexplainable AIhuman judgmentalgorithmic metricsevaluation metricstrustworthy AIexplanation qualityhuman-centered evaluationempirical study
Authors
Felix Liedeker, Basil Ell, Philipp Cimiano, Christoph Düsing
Abstract
Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation metrics that are rarely validated against human judgments of explanation quality. This raises the question of whether such metrics meaningfully reflect user perceptions. We address this question through an empirical study that directly compares algorithmic evaluation metrics with human judgments across three datasets. Participants rated counterfactual explanations along multiple dimensions of perceived quality, which we relate to a comprehensive set of standard counterfactual metrics. We analyze both individual relationships and the extent to which combinations of metrics can predict human assessments. Our results show that correlations between algorithmic metrics and human ratings are generally weak and strongly dataset-dependent. Moreover, increasing the number of metrics used in predictive models does not lead to reliable improvements, indicating structural limitations in how current metrics capture criteria relevant for humans. Overall, our findings suggest that widely used counterfactual evaluation metrics fail to reflect key aspects of explanation quality as perceived by users, underscoring the need for more human-centered approaches to evaluating explainable artificial intelligence.