Diversity is the Strength of the AI Crowd

2026-06-29 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors looked at how to better predict future events using several AI models together, instead of just one. They found that simply picking the most accurate models isn't best, because many top models tend to make similar mistakes. Instead, combining different models that make different kinds of errors leads to better overall predictions. Their work suggests that diversity among AI forecasters is important for improving accuracy.

AI forecastinglarge language modelsensemblingmodel diversityMetaculus AI Benchmarkbinary predictioncorrelationsuperforecaster-level accuracy

Authors

Matthew Aitchison, Scott Jeen, Toby Shevlane, Ben Day

Abstract

Top AI forecasting systems are approaching superforecaster-level accuracy on future world events, but still rely primarily on off-the-shelf LLMs combined with forecasting-specific context gathering and scaffolding. We study how to improve this recipe through ensembling: given a fixed number of samples, which off-the-shelf model forecasts should be combined to maximize accuracy? On binary questions from the Metaculus AI Benchmark, we find that individual accuracy is not enough: many frontier LLMs make highly correlated predictions, limiting the value of additional forecasts from the same or similar models. Instead, the strongest ensembles combine accurate but diverse forecasters, with models such as \model{Grok 4} contributing disproportionately because their predictions are less correlated with other frontier LLMs. These results suggest that the strength of the AI crowd comes not from sampling more forecasts indiscriminately, but from combining forecasts across models with complementary errors, motivating forecasting systems that explicitly optimize for both model quality and diversity.

View PDFOpen arXiv