Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study a type of problem where a system's behavior changes over time and the goal is to make good predictions based on features, using generalized linear models. Current methods for tracking these changes need to keep revisiting old data, which makes them slower and use more memory over time. They propose a new algorithm called DOMD-GLB that updates parameters efficiently without needing to look back, keeping computational cost constant per time step. Their method also comes with mathematical guarantees on how well it performs in different types of changing environments. This is the first approach in this area to achieve such efficiency with solid performance bounds.

generalized linear banditsnonstationary environmentsonline mirror descentdynamic regretmaximum likelihood estimationdiscountingpath lengthpiecewise-stationarycurvature parametercomputation complexity

Authors

Joongkyu Lee, Min-hwan Oh

Abstract

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $Γ_T$ the number of change points, and $c_μ$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.

View PDFOpen arXiv