HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

2026-04-09Artificial Intelligence

Artificial Intelligence
AI summary

The authors created HiRO-Nav, a navigation agent that decides when to think carefully or act quickly based on how uncertain it is about its next move. They found that most actions don't need deep reasoning, but a few important steps do, especially when encountering new scenes or key objects. By training the agent to only focus extra thinking on these uncertain actions, they made it more efficient and better at completing tasks. Their experiments showed this approach balances success and computing effort better than always thinking or never thinking.

embodied navigationlarge reasoning modelsaction entropyreinforcement learningsequential decision-makingObjectNav benchmarkhybrid reasoningtoken efficiency
Authors
He Zhao, Yijun Yang, Zichuan Lin, Deheng Ye, Chunyan Miao
Abstract
Embodied navigation agents built upon large reasoning models (LRMs) can handle complex, multimodal environmental input and perform grounded reasoning per step to improve sequential decision-making for long-horizon tasks. However, a critical question remains: \textit{how can the reasoning capabilities of LRMs be harnessed intelligently and efficiently for long-horizon navigation tasks?} In simple scenes, agents are expected to act reflexively, while in complex ones they should engage in deliberate reasoning before acting.To achieve this, we introduce \textbf{H}ybr\textbf{i}d \textbf{R}eas\textbf{O}ning \textbf{Nav}igation (\textbf{HiRO-Nav}) agent, the first kind of agent capable of adaptively determining whether to perform thinking at every step based on its own action entropy. Specifically, by examining how the agent's action entropy evolves over the navigation trajectories, we observed that only a small fraction of actions exhibit high entropy, and these actions often steer the agent toward novel scenes or critical objects. Furthermore, studying the relationship between action entropy and task completion (i.e., Q-value) reveals that improving high-entropy actions contributes more positively to task success.Hence, we propose a tailored training pipeline comprising hybrid supervised fine-tuning as a cold start, followed by online reinforcement learning with the proposed hybrid reasoning strategy to explicitly activate reasoning only for high-entropy actions, significantly reducing computational overhead while improving decision quality. Extensive experiments on the \textsc{CHORES}-$\mathbb{S}$ ObjectNav benchmark showcases that HiRO-Nav achieves a better trade-off between success rates and token efficiency than both dense-thinking and no-thinking baselines.