Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning
2026-05-25 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors propose a simple method that forces a reinforcement learning agent's internal features to live in a low-dimensional space using a fixed orthonormal projection, without changing the learning algorithm or adding extra training steps. They prove mathematically that this method keeps the important information intact as long as the low-dimensional space is large enough. Their experiments show that performance is maintained or improved when features are compressed, and the size needed depends more on the task than the encoder size. They also find that their approach stabilizes the learned features and supports the idea that useful representations in RL lie on low-dimensional manifolds.
deep reinforcement learningneural representationsorthonormal projectionlow-dimensional manifoldvalue functionencoder featureslinear realizabilityrepresentation geometryfeature norm stabilizationeffective rank
Authors
Aleksandar Todorov, Matthia Sabatelli
Abstract
Deep reinforcement learning (RL) agents commonly rely on high-dimensional neural representations, despite growing evidence that task-relevant value and policy structure may be intrinsically low-dimensional. In this work, we present a simple yet effective representation-level prior that inserts a fixed orthonormal projection to constrain encoder features to a low-dimensional subspace, requiring no auxiliary objectives, pretraining, or changes to the underlying RL algorithm. Under a linear realizability assumption, we prove that when the bottleneck dimension exceeds the intrinsic rank of the optimal value function in feature space, the bottleneck preserves expressivity and leaves the induced gradient dynamics unchanged up to an equivalent low-dimensional parameterization. Empirically, we find that across both single and multi-task benchmarks, baseline performance is either matched or improved once the bottleneck dimension exceeds a small task-dependent threshold; in many cases, value representations can be compressed to extremely low dimensions without loss, and the minimal sufficient dimension depends far more on environment complexity than encoder width. In addition, we analyze representation geometry and find that orthogonal bottlenecks stabilize feature norms and are associated with higher effective rank. Together, these results support a representation-space interpretation of the manifold hypothesis in reinforcement learning and position orthogonal bottlenecks as a lightweight, architecture-agnostic mechanism for shaping RL representations.