Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions

2026-06-29Machine Learning

Machine Learning
AI summary

The authors propose new ways to estimate the Sliced Wasserstein distance, a method that compares data distributions more efficiently by looking at one-dimensional projections. Unlike traditional methods that need sorting and full datasets, their approach uses cumulative distribution functions (CDFs), which avoids sorting and can work well with large datasets in parallel. Their method is especially useful for data like mixtures of Gaussians and works well in federated learning, where data is kept local and only summary statistics like CDFs are shared.

Sliced Wasserstein distanceWasserstein distanceone-dimensional optimal transportcumulative distribution functionquantile functionMonte Carlo estimationmixtures of Gaussiansfederated learningparallel computationdata projections
Authors
Christophe Vauthier, Quentin Mérigot, Anna Korba
Abstract
The Sliced Wasserstein (SW) distance has emerged as a computationally attractive alternative to the Wasserstein distance by leveraging one-dimensional optimal transport along random projections. Standard estimators of the SW distance rely on Monte Carlo averages of one-dimensional Wasserstein distances computed via quantile functions, which require sorting projected samples and access to full datasets. In this work, we introduce a new class of estimators for the Sliced Wasserstein distance based on cumulative distribution functions (CDFs) of projected measures, that avoid sorting and scale via massive dataset parallelism. This class includes several estimators, some of them being indexed by hyperparameters controlling their variance or smoothness. We show that they are especially well suited to scenarios in which CDFs are more tractable than quantile functions, such as mixtures of Gaussians, and moreover that they are also naturally compatible with federated learning, since CDFs of projected data can be computed and aggregated locally without requiring the exchange of raw samples.