A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study how to better pick communication channels when their quality is affected by noise that changes over time. They model channel noise as a disturbance in the reward you get from choosing a channel, and use information about the channel's state to predict this noise. To do this, they create two methods: one assumes a simple linear connection and the other uses a neural network for more complex patterns. Their approaches help choose channels more wisely and reduce mistakes compared to older methods.

multi-armed banditrestless banditcontextual banditmulti-play banditopportunity spectrum accesschannel noisereward perturbationupper confidence boundlinear modelneural network

Authors

Ruiyu Li, Guangxia Li, Xiao Lu, Jichao Liu, Yan Jin

Abstract

We study the restless contextual multi-play multi-armed bandit (MP-MAB) problem for channel allocation in the opportunity spectrum access (OSA) scenario. Most existing MP-MAB methods are impractical for real-world OSA systems as they assume many ideal conditions, incur a heavy computational cost, and most importantly, ignore the impact of channel noise which is directly related to the quality of service. In this study, we embody this impact by modeling channel noise as a perturbation of the arm's reward function in MP-MAB. As there is an implicit correlation between channel state information and channel noise, we take the former as a context for MP-MAB to present the perturbation caused by the latter. We investigate two types of correlation between the context and the perturbation -- linear and nonlinear, and derive two index policies, respectively. These policies learn the correlations through a linear model and a neural network, and use estimated noise value to adjust the upper confidence bound. Numerical experiments demonstrate that the proposed policies can achieve lower regret and select sub-optimal arms in a more reasonable way.

View PDFOpen arXiv