When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

2026-05-11Artificial Intelligence

Artificial Intelligence
AI summary

The authors study how to decide not just what actions a robot or program should take, but also how many steps it should commit to before checking again. They call this the 'commitment depth' and show that adjusting it based on the situation works better than using a fixed number. Their method, tested on puzzles like Sliding Puzzle and Sokoban, solves more problems and uses fewer moves than fixed strategies. They also provide math to show why changing commitment depth based on the state is always better than sticking to one number.

commitment depthlong-horizon reasoningreplanningmodel-native policiesvision-language modelsSliding PuzzleSokobanexecution erroradaptive policiesPareto dominance
Authors
Chen Li, Zhantao Yang, Fangyi Chen, Han Zhang, Anudeepsekhar Bolimera, Marios Savvides
Abstract
Long-horizon reasoning requires deciding not only what actions to take, but how deeply to commit before the next observation. We formalize this as \emph{commitment depth}: the number of primitive actions executed open-loop between replans. Commitment depth induces a trade-off between replanning cost and compounding execution error, yet most existing long-horizon systems fix it as a hand-designed scalar. In this work, we instead treat commitment depth as a learnable, state-conditioned variable of the policy itself. We instantiate this within a model-native vision--language policy that jointly predicts both what to execute and for how long. Across Sliding Puzzle and Sokoban, the resulting adaptive policy Pareto-dominates every non-degenerate fixed-depth baseline, achieving up to 12.5 percentage points higher solve rate while using approximately 25\% fewer primitive actions per episode. Despite using a 7B backbone, our method outperforms GPT-5.5 and Claude Sonnet on both tasks, while every tested open-weight vision--language model achieves 0\% zero-shot success. We further present a theoretical analysis showing that, under the standard commitment-depth surrogate, state-conditioned commitment strictly dominates any fixed depth whenever the locally optimal depth varies across states.