Randomized Least Squares Value Iteration itself is Joint Differentially Private
2026-06-01 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study how to keep users' private data safe when using reinforcement learning (RL), especially in sensitive areas like healthcare. They focus on a method called Randomized Least Squares Value Iteration (RLSVI), which uses randomness to explore different actions. They found that the same randomness used for exploration also helps protect privacy by adding noise, leading to a formal privacy guarantee called joint differential privacy. Their work provides a clear formula showing how much privacy protection RLSVI offers based on the problem's size and duration.
Reinforcement LearningRandomized ExplorationRandomized Least Squares Value Iteration (RLSVI)Joint Differential PrivacyTabular Markov Decision Process (MDP)Episodic SettingPrivacy MechanismsNoise InjectionPrivacy AnalysisDifferential Privacy
Authors
Haiyang Lu, Pratik Gajane, Shaojie Bai, Mohammad Sadegh Talebi
Abstract
As reinforcement learning (RL) increasingly applies to sensitive domains, such as health care and recommendation systems, privacy-preserving techniques have become essential to protect users' sensitive information. We investigate privacy-preserving RL under an episodic setting, focusing on algorithms based on randomized exploration, such as Randomized Least Squares Value Iteration (RLSVI). The overall goal is to study how randomized exploration interacts with the injected noise required by privacy mechanisms. In this work, we show a new privacy analysis that characterizes how the noise in RLSVI set for exploration simultaneously provides privacy protection. Specifically, we prove that RLSVI is $(\varepsilon(δ),δ)$-joint differentially private in tabular MDP as is with $\varepsilon(δ) = \frac{2AK}{H^2\log(2HSA)} + 2\sqrt{\frac{2AK\log(1/δ)}{H^2\log(2HSA)}}$, where $S$ and $A$ are the number of states and actions respectively, $H$ is the length of an episode and $K$ is the number of episodes.