Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

2026-05-01 • Machine Learning

Machine LearningArtificial IntelligenceMultiagent Systems

AI summaryⓘ

The authors introduce a new way to ensure fairness in a complex learning problem where decisions involve groups of options, but feedback is limited to overall results rather than individual contributions. They adapt the Shapley value from game theory to a version called the K-Shapley value, which measures contributions within limited group sizes. They then design an algorithm, K-SVFair-FBF, that learns these values and balances fairness and learning efficiency despite noisy feedback. Their method shows promising results in experiments involving federated learning and social network influence.

multi-armed banditsfull-bandit feedbackShapley valueK-Shapley valuefairnessregret boundMonte Carlo approximationbudgeted combinatorial banditsfederated learningsocial influence maximization

Authors

Shradha Sharma, Swapnil Dhamal, Shweta Jain

Abstract

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

View PDFOpen arXiv