Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality
2026-05-11 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address the problem of estimating not just average treatment effects but the full outcome distribution after interventions, which includes detailed information like risks and uncertainties. They point out weaknesses in existing GAN-based methods for this task, such as mismatch of objectives and unstable techniques. To improve this, they propose GANICE, a new method that targets the right distribution, minimizes a specific risk measure called the Wasserstein risk, and is mathematically proven to be optimal. Their experiments show that GANICE works better than previous approaches.
distributional causal inferencegenerative adversarial networks (GANs)counterfactual estimationWasserstein distancedensity ratio estimationminimax optimalityBesov spacestreatment effectsinterventional distribution
Authors
Shu Tamano, Masaaki Imaizumi
Abstract
Distributional causal inference requires estimating not only average treatment effects but also interventional outcome distributions, including quantiles, tail risks, and policy-dependent uncertainty. As a method for distributional causal inference, generative adversarial network (GAN)-based counterfactual methods are flexible tools for this task. However, these methods have several limitations. First, the objectives of certain techniques do not coincide with the statistical risk of the identifiable causal target, and therefore provide limited theoretical guarantees regarding estimable counterfactual distributions or optimality. Second, they tend to rely on unstable density-based methods, such as density ratio estimation. In this paper, we propose GANICE (GAN for Interventional Conditional Estimation) with several advantages: it (i) clarifies the conditional interventional distribution for each treatment--covariate state as the causal estimation target; (ii) estimates the conditional distribution such that its averaged Wasserstein risk is minimized; (iii) establishes minimax optimality. GANICE achieves these advantages through the introduction of the extended Wasserstein distance, the incorporation of a cellwise critic in its dual, and an optimality proof based on Besov space theory. Our experiments demonstrate that GANICE consistently outperforms existing methods.