Multi-Armed Sequential Hypothesis Testing by Betting

2026-03-18Machine Learning

Machine Learning
AI summary

The authors study how to test multiple options one by one when only one option might really show something interesting. They want to quickly and reliably find if any option is different from the 'no effect' case, even if several are different. They create methods that perform almost as well as if you already knew the most promising option in advance. To do this, they develop new algorithms and mathematical tools to carefully balance exploring options and exploiting the best-looking one. Their work builds on old ideas about betting and confidence but extends them to more complex scenarios.

sequential testingcomposite hypothesismultiple armse-processeslog-optimalityexpected rejection timeupper confidence boundKelly criterionconcentration inequalitiesoptimal wealth growth
Authors
Ricardo J. Sandoval, Ian Waudby-Smith, Michael I. Jordan
Abstract
We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We consider the composite global null hypothesis $\mathscr{P}$ that all arms are null in a certain sense (e.g. all dosages of a treatment are ineffective) and we are interested in rejecting $\mathscr{P}$ in favor of a composite alternative $\mathscr{Q}$ where at least one arm is non-null (e.g. there exists an effective treatment dosage). We posit an optimality desideratum that we describe informally as follows: even if several arms are non-null, we seek $e$-processes and sequential tests whose performance are as strong as the ones that have oracle knowledge about which arm generates the most evidence against $\mathscr{P}$. Formally, we generalize notions of log-optimality and expected rejection time optimality to more than one arm, obtaining matching lower and upper bounds for both. A key technical device in this optimality analysis is a modified upper-confidence-bound-like algorithm for unobservable but sufficiently "estimable" rewards. In the design of this algorithm, we derive nonasymptotic concentration inequalities for optimal wealth growth rates in the sense of Kelly [1956]. These may be of independent interest.