Tmax: A simple recipe for terminal agents

2026-06-22Computation and Language

Computation and Language
AI summary

The authors created Tmax, a straightforward and effective training method for language model agents that complete tasks using a terminal. They addressed challenges like limited data and hard benchmarks by generating a large, diverse dataset with controlled difficulty and different user roles. Using this data, their 9-billion parameter model outperforms much bigger models on a popular test called Terminal-Bench 2.0. They also released their data, models, and code to help others build on their work.

language modelsreinforcement learning (RL)terminal agentsTerminal-Bench 2.0dataset generationsupervised fine-tuning (SFT)model parameterstaxonomyopen-sourcetraining recipes
Authors
Hamish Ivison, Junjie Oscar Yin, Rulin Shao, Teng Xiao, Nathan Lambert, Hannaneh Hajishirzi
Abstract
Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of these models, likely due to difficult benchmarks, a lack of data, and a lack of simple baseline recipes. We present Tmax, the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. While simple, our recipe achieves 27\% on Terminal-Bench 2.0 with only 9B parameters, outperforming much larger models from prior work. Concretely, we generate data using a novel taxonomy, combining difficulty control, personas, and verifier diversification, which allows us to cheaply generate large amounts of terminal environments for RL and SFT training. We open-source our terminal dataset, which is over 2.5x larger than previously released terminal-agent datasets. We then train open-weight models using RL with our data, using a simple, outcome-only recipe. We release our data, models, and code as a strong baseline for future open academic work on terminal agents at https://github.com/hamishivi/tmax.