Sublinearly Structured Deep Neural Networks Achieve Feature Learning Consistency for Compositional Functions

2026-06-22Machine Learning

Machine Learning
AI summary

The authors study why deep neural networks (DNNs) work well in practice by looking at them from a statistics perspective. They show that certain DNNs that grow slowly with the amount of data can consistently learn important features and make good predictions, even when the networks have more parameters than training examples. Their results apply to networks like CNNs used for image recognition, explaining why these models perform well on large image datasets. They also prove these networks can approximate complex functions that are built in a hierarchical way, similar to the structure of images. This helps clarify the theory behind the success of many popular deep learning models.

deep neural networksfeature learningstatistical consistencyover-parameterizationconvolutional neural networkshierarchical compositional functionsuniversal approximationimage classificationsublinear growthmachine learning theory
Authors
Sehwan Kim, Yan Sun, Faming Liang
Abstract
Over the past decade, deep neural networks (DNNs) have achieved remarkable success on complex machine-learning tasks, yet the theoretical foundations of their performance remain incomplete. From a statistical viewpoint, a natural question is: can DNNs attain feature-learning and prediction consistency comparable to that of classical models? While a full characterization is open, we provide positive results for a broad subclass. We establish feature-learning consistency guarantees for sublinearly structured DNNs-architectures whose input/output dimensions and number of hidden neurons grow sublinearly with the sample size-when learning hierarchically compositional target functions. Importantly, this consistency still holds even in the conventional "over-parameterized" regime where the total number of parameters exceeds the number of training samples. Empirically, sublinearly structured DNNs match or surpass wide DNNs in prediction. A structural audit further indicates that widely used convolutional neural networks (CNNs), including AlexNet, VGGNet, ResNet, GoogLeNet, are sublinearly structured on their image classification benchmarks. We further prove that the sublinearly structured DNNs achieve universal approximation for hierarchically compositional functions in the large-sample limit. Moreover, images exhibit an inherent hierarchical, compositional structure. Taken together, these results explain, through a statistical lens, why many large-scale deep learning models succeed after adequate training on massive image datasets.