Non-asymptotic estimates of the minimal risk in statistical learning

2026-06-22 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study how well the Empirical Risk Principle (ERP) estimates the true risk in machine learning by proving formulas that give good upper and lower bounds on this risk with high confidence. They relax usual assumptions to allow cases where the risk behaves like Gaussian or exponential distributions rather than being strictly bounded. Their results show that the confidence in the lower bound does not depend on the number of model parameters or input dimensions, making it easier to detect if a learning machine is flawed. The upper bound’s confidence improves when the sample size is large compared to a complexity measure of the parameter space. Their work builds on advanced mathematical tools like Talagrand's concentration inequalities and transport-entropy inequalities.

Empirical Risk PrincipleConcentration InequalitiesMinimal RiskGaussian IntegrabilityExponential IntegrabilityTalagrand's InequalityTransport-Entropy InequalitiesOrlicz MetricStatistical LearningEmpirical Processes

Authors

Liming Wu, Sen Yang

Abstract

In this paper we prove some concentration inequalities for two types of error probabilities in the Empirical Risk Principle (ERP) in statistical learning, which provide a lower bound and an upper bound for the minimal risk (in terms of the minimal empirical risk) with non-asymptotic high confidence. The usual boundedness condition of the empirical risk function is relaxed to the Gaussian or exponential integrability condition. The confidence of the lower bound of the minimal risk is shown to be independent of the number of training parameters and the dimension of the input vectors, allowing one to detect the deficiency of a learning machine efficiently; and the confidence of the upper bound of the minimal risk is proved to be high provided that the sample size $n$ is much greater than the box dimension of the parameter set $Θ$ in the Orlicz metric $d_{ψ_1}$ associated with the risk functions. Our work is based on Talagrand's concentration inequalities (the sharp versions by Bousquet and Klein-Rio), transport-entropy inequalities and the recent progress in the theory of empirical processes and statistical learning.

View PDFOpen arXiv