FairBED: A Bayesian Experimental Design Approach to Gathering Fairer Data

2026-06-22Machine Learning

Machine Learning
AI summary

The authors point out that machine learning can be unfair because the data used to train models might already have biases. Instead of just fixing models, they suggest changing how data is collected to make it fairer from the start. They create FairBED, a method that tries to gather data that doesn’t reveal sensitive information while still teaching the model what it needs to know. Their approach leads to better fairness without losing accuracy compared to usual data collection methods.

machine learning fairnessdata acquisitionsensitive attributesBayesian experimental designinformation gaindemographic parityfair datasetsfairness-accuracy trade-off
Authors
Marcel Hedman, Emily Alger, Brieuc Lehmann, Chris Holmes, Tom Rainforth
Abstract
Frameworks for ensuring fairness in machine learning typically focus on learning fair models from existing data. But this endeavor is often undermined by biases already present in that data. We therefore look to modify the data acquisition process itself to help gather fairer data that is inherently more suitable for training fair predictors. To this end, we introduce FairBED, which provides novel formulations for quantifying the fairness of datasets themselves based on the idea that fair datasets should be uninformative about sensitive attributes. We then use this to construct practical fairness-aware Bayesian experimental design (BED) objectives that maximize expected information gain about the target quantity of interest while minimizing expected information gain about sensitive attributes. We further derive a theoretical link between FairBED and demographic parity, and show empirically that models trained on data gathered using FairBED provide improved fairness-accuracy trade-offs compared to randomly acquired data and conventional BED.