Minimax Limits of k-Fold Cross-Validation via Majority

2026-05-25Machine Learning

Machine Learning
AI summary

The authors examine how the accuracy of k-fold cross-validation depends on the number of folds, k, focusing on a simple binary classification method called the majority algorithm. They find that even this straightforward method shows complex behaviors that past theories didn't fully capture. Their analysis proves that as k grows with the sample size, the error in the cross-validation risk estimate can’t shrink faster than roughly the square root of k divided by the sample size. This means that cross-validation has inherent limits when reusing data, and these findings challenge some previous theoretical results. The authors suggest their approach provides a clearer benchmark for understanding cross-validation accuracy.

k-fold cross-validationmean-squared errorbinary classificationmajority algorithmempirical risk minimizationrisk estimationminimax theoryerror boundsdata reusesample size
Authors
Ido Nachum, Rüdiger Urbanke, Thomas Weinberger
Abstract
We study the mean-squared error of $k$-fold cross-validation as a risk estimator, with particular emphasis on how its accuracy depends on the number of folds $k$. Despite the widespread use of cross-validation, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates. To obtain sharp and interpretable results, we focus on the majority algorithm in binary classification, a minimal yet nontrivial empirical risk minimization procedure. We provide a fine-grained analysis of its cross-validation behavior, showing that even this simple algorithm exhibits subtle and delicate phenomena for which existing theory provides loose and even vacuous bounds. Leveraging this analysis, we introduce a minimax framework for cross-validation risk estimation and prove that no empirical risk minimization algorithm can achieve an $O(1/n)$ minimax mean-squared error when the number of folds grows with the number of samples $n$; instead, a lower bound of order $Ω(\sqrt{k}/n)$ is unavoidable. Our results reveal fundamental limitations of cross-validation as a data-reuse strategy, clarify gaps and inaccuracies in prior theoretical work, and position the majority algorithm as a natural benchmark that any tight analysis of cross-validation should be able to explain.