Analysis of Search Heuristics in the Multi-Armed Bandit Setting

2026-04-09 • Neural and Evolutionary Computing

Neural and Evolutionary Computing

AI summaryⓘ

The authors study how different methods decide between several options when you can only compare two at a time, focusing on a setup called Dueling Bandits. They look at evolutionary algorithms and find that one common type (the (1+1) EA) struggles to consistently pick the best option, called the Condorcet winner, especially when differences are small. Another method, an Estimation of Distribution Algorithm (EDA), does much better at favoring the best option. They also show that making multiple comparisons (duels) can help the (1+1) EA improve its chances of picking the best option.

Multi-Armed BanditDueling BanditsCondorcet winnerEvolutionary algorithm(1+1) EAEstimation of Distribution Algorithm (EDA)Max-Min Ant SystemExploration-exploitation tradeoffStationary distributionStochastic comparison

Authors

Jasmin Brandt, Barbara Hammer, Timo Kötzing, Jurek Sander

Abstract

We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.

View PDFOpen arXiv