Efficiently Learning Drifting Halfspaces with Massart Noise
2026-06-09 • Machine Learning
Machine Learning
AI summaryⓘ
The authors examine how to learn a changing rule (called a drifting concept) when some labels are noisy but with limited errors, known as Massart noise. They focus on a type of simple decision boundary called margin-separable linear classifiers and provide an efficient algorithm that works well even as the rule changes over time. They also show that their method improves on earlier results in an easier (realizable) case. Additionally, the authors demonstrate that their algorithm's performance is close to the best possible due to an inherent tradeoff between how well one can learn and computational limits. This means no simple method can achieve much better error rates under similar conditions.
Massart noisedrifting conceptonline learningmargin-separable linear classifiershalfspacesmarginrandom classification noiseinformation-computation tradeoffrealizable settinglearning with noise
Authors
Mingchen Ma, Guyang Cao, Jelena Diakonikolas, Ilias Diakonikolas
Abstract
We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output, in each round, a hypothesis with small prediction error. We study the complexity of this learning problem for the fundamental class of margin-separable linear classifiers (halfspaces). On the positive side, we give a computationally efficient learner achieving error $η+ \tilde O(Δ^{1/3}/γ)$, where $η$ upper bounds the Massart noise rate, $Δ$ is the drift rate, and $γ$ is the margin. Interestingly, in the realizable setting, an adaptation of our techniques yields an efficient learner with an improved error rate over prior work. On the lower-bound side, we provide formal evidence of an information-computation tradeoff, strongly suggesting that our algorithm's performance is essentially optimal. Specifically, while the information-theoretically optimal error scales with $Δ^{1/2}$, we prove that $Δ^{1/3}$-scaling is unavoidable for low-degree polynomial tests, even in the special case of random classification noise.