Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids

2026-05-08Machine Learning

Machine Learning
AI summary

The authors present CUTS-GPR, a new method that makes Gaussian process regression (GPR) much faster and more exact when working with very large and high-dimensional data. They achieve this by creating a special way to multiply matrices quickly, which scales well with both the number of data points and the number of dimensions. Their approach uses an additive kernel combined with an incomplete data grid to simplify calculations. The authors show that their method can handle billions of data points and dozens of dimensions efficiently, enabling complex tasks like modeling potential energy surfaces in chemistry.

Gaussian process regressionkernel matrixmatrix-vector productadditive kernelhigh-dimensional datahyperparameter optimizationpotential energy surfacescomputational chemistryscalabilityincomplete grid
Authors
Mads Greisen Højlund, August Smart Lykke-Møller, Henry Moss, Ove Christiansen
Abstract
We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scaling with the amount of training data, $N$, and low-order polynomial scaling with dimensionality, $D$. This is obtained by combining an additive kernel with an incomplete grid and exploiting the resulting structure of the kernel matrix. We demonstrate the scalability of the matrix-vector product by running benchmarks with billions of data points and thousands of dimensions. Full GPR calculations, including hyperparameter optimization, are completed in a matter of hours for $N = 447 265$ and $D = 24$. We demonstrate that our CUTS-GPR enables Bayesian modeling of high-dimensional potential energy surfaces - a longstanding challenge in computational chemistry.