Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

2026-05-25Machine Learning

Machine Learning
AI summary

The authors focus on improving how computers learn to reach goals using pre-collected data, especially in tasks that take many steps. They found that a major problem is the computer's value estimates not spreading correctly to new situations. To fix this, the authors designed a new method called Latent-Aligned Value Learning (LAVL), which combines understanding hidden data patterns with planning steps ahead. Their tests showed LAVL works better than existing methods, particularly on long tasks and when piecing together parts of different experiences.

Offline reinforcement learningGoal-conditioned reinforcement learningValue functionInductive biasLatent representationHierarchical planningLong-horizon tasksTrajectory stitchingOGBench
Authors
Hyungkyu Kang, Byeongchan Kim, Min-hwan Oh
Abstract
Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.