Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression

2026-06-22 • Machine Learning

Machine LearningArtificial Intelligence

AI summaryⓘ

The authors explain a new way to tune hyperparameters for spline regression without searching through many options. Instead of testing many settings, they mathematically solve the best model complexity using known formulas from approximation theory and error estimation. Their method, called KORE, uses only a few model fits and gives results as accurate as traditional search methods but much faster. They show it works well on both simulated and real-world data, especially when the model depends on simple interactions among inputs.

Hyperparameter tuningSpline regressionApproximation theoryKolmogorov n-widthLeave-one-out errorPRESS identityANOVA decompositionCross-validationModel complexityRegularization

Authors

Yong Yi Bay, Kathleen A. Yearick

Abstract

Hyperparameter tuning almost always means search: fit the model at every value on a grid, score each by cross-validation, and keep the winner. For spline regression that search is unnecessary. The optimal resolution can be solved for in closed form, to the accuracy an exhaustive search reaches, at a fraction of the compute. Three ingredients make this possible: classical approximation theory pins the squared bias to a known power of the resolution G, exactly the Kolmogorov n-width of the smoothness class; the basis dimension is an explicit polynomial in G; and leave-one-out error follows from a single fit via the PRESS identity. Balancing the two known curves gives the minimizer analytically. We extend this calculus to many coordinates by replacing ambient input dimension with interaction order, the number of active low-order components in an ANOVA decomposition, yielding a scaling law in which the optimal resolution and error are power functions of the effective density (sample size per active component), with input dimension absent from the exponent. The law becomes an algorithm. KORE (Kolmogorov-optimal Order-aware Resolution Estimation) fits two pilot resolutions, solves a leverage-calibrated 2x2 system for the bias and noise scales, and evaluates the closed-form plug-in resolution with a tiny leave-one-out certificate: about a dozen fits instead of a full grid sweep, with a consistency guarantee as the sample grows. Across additive and sparse pairwise targets up to 80 input dimensions, KORE matches exhaustive 3-fold cross-validation and the full classical ladder (GCV, Mallows' Cp, AIC, BIC) while fitting roughly 8x fewer models; on 36 real tabular datasets it ranks first among 21 methods in accuracy per unit of compute, ahead of tuned boosters and kernel machines. When complexity lives in low interaction order, solving for the resolution beats searching for it.

View PDFOpen arXiv