Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

2026-05-01Machine Learning

Machine LearningArtificial IntelligenceMultiagent Systems
AI summary

The authors introduce a new way to ensure fairness in a complex learning problem where decisions involve groups of options, but feedback is limited to overall results rather than individual contributions. They adapt the Shapley value from game theory to a version called the K-Shapley value, which measures contributions within limited group sizes. They then design an algorithm, K-SVFair-FBF, that learns these values and balances fairness and learning efficiency despite noisy feedback. Their method shows promising results in experiments involving federated learning and social network influence.

multi-armed banditsfull-bandit feedbackShapley valueK-Shapley valuefairnessregret boundMonte Carlo approximationbudgeted combinatorial banditsfederated learningsocial influence maximization
Authors
Shradha Sharma, Swapnil Dhamal, Shweta Jain
Abstract
We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.