Adaptive Action Chunking via Multi-Chunk Q Value Estimation

2026-05-11 • Machine Learning

Machine LearningArtificial Intelligence

AI summaryⓘ

The authors address a limitation in reinforcement learning where deciding how many actions to predict at once is usually fixed, which can limit performance. They introduce Adaptive Action CHunking (ACH), a method that lets the system change the number of actions it plans ahead based on the current situation. Using a Transformer model, their approach guesses the value of different chunk sizes all at once and picks the best one dynamically. Tests on 34 tasks show that this flexible method learns better and adapts more easily than methods with fixed action chunks.

reinforcement learningimitation learningaction chunkingTransformer architecturevalue function estimationoffline-to-online learningdynamic action planningbootstrapping errorspolicygeneralization

Authors

Yongjae Shin, Jongseong Chae, Seongmin Kim, Jongeui Park, Youngchul Sung

Abstract

Action chunking emerged as a pivotal technique in imitation learning, enabling policies to predict cohesive action sequences rather than single actions. Recently, this approach has expanded to reinforcement learning (RL), enhancing behavioral consistency and reducing bootstrapping errors in value function estimation. However, existing methods rely on a fixed chunk length, creating a performance bottleneck as the optimal length varies across states and tasks. In this paper, we propose Adaptive Action CHunking (ACH), a novel offline-to-online RL algorithm that dynamically modulates chunk length during both training and inference. To find the optimal chunk length for a dynamically varying current state, we simultaneously estimate action-values for all candidate chunk lengths in a single forward pass, using a Transformer-based architecture. Our mechanism allows the agent to select the most effective chunk length adaptively based on the current state. Evaluated on 34 challenging tasks, ACH consistently outperforms fixed-length baselines, demonstrating superior generalization and learning efficiency in complex environments.

View PDFOpen arXiv