AI summaryⓘ
The authors prove that their nonparametric Thompson Sampling algorithm, called ρ-NPTS_SG, performs as well as theoretically possible (optimal regret) for a wide range of risk measures like CVaR and Sharpe ratio, even when rewards are not modeled parametrically. Their method works under weaker conditions than previous approaches, requiring only continuity of the risk measure instead of stronger smoothness or dominance assumptions. They handle both bounded and sub-Gaussian reward distributions by developing new technical tools that manage complexity in the algorithm’s probability updates. This makes their result the first to give instance-optimal guarantees for complicated risk functionals without restrictive assumptions.
Thompson SamplingNonparametric BanditsRisk-Averse BanditsRegret AnalysisCVaRSharpe RatioSub-Gaussian TailsDiscrete Posterior ProjectionDirichlet DistributionInstance-Optimality
Abstract
We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $ρ$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $ρ$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.