BROS: Bias-Corrected Randomized Subspaces for Memory-Efficient Single-Loop Bilevel Optimization

2026-05-11Machine Learning

Machine Learning
AI summary

The authors present BROS, a new method to solve a tricky type of optimization problem used in deep learning tasks like tuning hyperparameters and cleaning data. Unlike existing methods that either use too much memory or don't guarantee good results, BROS works efficiently by focusing updates on random parts of the problem and using a special correction to keep estimates accurate. They prove that BROS is as fast as the best exact methods in theory and show through experiments that it uses much less memory without losing performance. This makes it easier to handle large neural networks in practice.

stochastic bilevel optimizationhyperparameter learningmemory efficiencysingle-loop methodsRademacher bi-probe correctionHessian estimatorsample complexitydata reweightingrepresentation learningVision Transformer (ViT)
Authors
Hengrui Zhang, Boao Kong, Engao Zhang, Kun Yuan
Abstract
Stochastic bilevel optimization (SBO) has become a standard framework for hyperparameter learning, data reweighting, representation learning, and data-mixture optimization in deep learning. Existing exact single-loop SBO methods and memory-efficient surrogate SBO methods either create severe memory pressure for large lower-level neural networks or lack competitive convergence guarantees under standard assumptions. In this paper, we propose BROS, a memory-efficient single-loop SBO method with the same convergence rate order as exact single-loop SBO methods. BROS performs lower and auxiliary updates in randomized subspaces with a Rademacher bi-probe correction that recovers an unbiased Hessian-action estimator. We prove that BROS preserves the $\mathcal O(\varepsilon^{-2})$ sample complexity of MA-SOBA for finding an $\varepsilon$-stationary point under only standard assumptions. Experiments on hyper-data cleaning, data-mixture learning, hyper-representation learning, and ViT sample reweighting show that BROS reduces peak memory by up to 44.9% while closely matching full-space baseline performance.