Nearly-Optimal Algorithm for Adversarial Kernelized Bandits

2026-05-11 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study a problem where a learning agent tries to make good choices in a changing environment, where rewards are chosen adversarially but come from a known space defined by kernel functions. They show that a method called the exponential-weight algorithm performs well, with regret growing sublinearly in the number of rounds and a complexity parameter related to the kernel. They prove this method is nearly the best possible for common kernels like squared exponential and Matérn. The authors also develop a faster version of the algorithm using a technique called Nyström approximation without losing much performance.

Kernelized banditsGaussian process banditsReproducing kernel Hilbert space (RKHS)Adversarial environmentExponential-weight algorithmRegret boundsInformation gainSquared exponential kernelMatérn kernelNyström approximation

Authors

Shogo Iwazaki

Abstract

This paper studies kernelized bandits (also known as Gaussian process bandits) in an adversarial environment, where the reward functions in a known reproducing kernel Hilbert space (RKHS) may be adversarially chosen at each round. We show that the exponential-weight algorithm achieves $\tilde{O}(\sqrt{T γ_T})$ adversarial regret, where $T$ and $γ_T$ denote the number of total rounds and the maximum information gain, respectively. For squared exponential (SE) and $ν$-Matérn kernels, we also show algorithm-independent lower bounds that guarantee the optimality of our algorithm up to polylogarithmic factors. Furthermore, we present a computationally efficient variant of our algorithm using Nyström approximation while maintaining nearly optimal regret guarantees.

View PDFOpen arXiv