Tailoring Strictly Proper Scoring Rules for Downstream Tasks: An Application to Causal Inference

2026-06-02 • Machine Learning

Machine Learning

AI summaryⓘ

The authors look at how models usually learn from data using a general rule that doesn't fit well for specific problems, like estimating cause-and-effect. They focus on a method called Inverse Probability Weighting, which can struggle when certain numbers are very close to 0 or 1, causing mistakes. To fix this, they create a new way to train models that matches the exact kind of errors that matter for this problem. They test their method and find it works better than usual approaches when estimating the average effect of treatments.

Probabilistic modelsLog-lossInverse Probability Weighting (IPW)Propensity scoreBias and varianceStrictly proper scoring rulesAverage Treatment Effect (ATE)Causal inferenceNeural networksGradient boosting

Authors

Roman Plaud, Alexandre Perez-Lebel, Antoine Saillenfest, Thomas Bonald, Marine Le Morvan, Gaël Varoquaux, Matthieu Labeau

Abstract

Probabilistic models are typically trained using task-agnostic objectives like log-loss, which can lead to significant errors in downstream estimation. This disconnect is especially critical in Inverse Probability Weighting (IPW) for causal inference, where propensity score errors near $0$ and $1$ often lead to high bias and variance. We propose a principled framework for deriving task-specific strictly proper scoring rules by matching the local curvature of the downstream error metric. We apply this to the Average Treatment Effect (ATE) estimation, deriving a closed-form loss and its corresponding canonical probability mapping that can be readily integrated with any model like a neural network or a gradient boosting algorithm. Extensive evaluations on causal inference benchmarks demonstrate that our tailored objective consistently outperforms standard likelihood-based and covariate-balancing approaches.

View PDFOpen arXiv