Provable Data Scaling Law for Meta Learning via Complexity Minimization

2026-06-01Machine Learning

Machine Learning
AI summary

The authors study why pre-training helps machine learning models learn faster with less new data. They propose a new idea called complexity minimization, which finds the simplest way to represent different tasks by focusing on how complex the models need to be for each one. Their math shows this approach explains why more pre-training data leads to better performance on new tasks with only a few examples. They also show that adding complexity control to current meta-learning methods helps models learn more efficiently in practice.

pre-trainingmeta-learningrepresentation learningsample complexityfew-shot adaptationcomplexity regularizationdownstream taskregressionmodel complexity
Authors
Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui
Abstract
Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.