On efficient robust regression with subquadratic samples

2026-05-18Data Structures and Algorithms

Data Structures and Algorithms
AI summary

AI summary is being generated…

Authors
Deeksha Adil, Jarosław Błasiok, Hongjie Chen, Deepak Narayanan Sridharan
Abstract
We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.