TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning

2026-05-11Artificial Intelligence

Artificial Intelligence
AI summary

The authors introduce TimeClaw, a method to improve time series analysis by learning from multiple attempts at solving problems instead of just focusing on the current instance. Their approach explores different possible solutions, compares their quality, learns the best parts, and reuses this knowledge without changing the original model. They tested TimeClaw on 17 tasks related to finance and weather and found it consistently performed better than other methods. This shows that for scientific forecasting, learning from exploratory experience is as important as the initial ability to solve problems.

time series analysisexploratory executionhierarchical distillationtime series forecastingtool-use proceduresmetric-supervised learningfoundation modelsmodel freezingMTBenchreinjection
Authors
Hangchen Liu, Dongyuan Li, Renhe Jiang, Jiewen Deng, Weiwei Ye, Yoshihide Sekimoto
Abstract
Time series analysis underpins forecasting, monitoring, and decision making in domains such as finance and weather, where solving a task often requires both numerical accuracy and contextual reasoning. Recent progress has moved from specialized neural predictors to approaches built on LLMs and foundation models that can reason over time series inputs and use external tools. However, most such systems remain execution-centric: they focus on solving the current instance but learn little from exploratory execution. This is especially limiting in verifiable numeric settings, where multiple candidate executions and tool-use procedures may all be task-valid yet differ sharply in quantitative quality, and where early success can trigger tool-prior collapse that suppresses further exploration. To address this limitation, we present TimeClaw, an exploratory execution learning framework that turns exploratory execution into reusable hierarchical distilled experience through a four-stage loop: Explore, Compare, Distill, and Reinject. TimeClaw combines metric-supervised exploratory execution learning, task-aware tool dropout, and hierarchical distilled experience for inference-time reinjection, while keeping the base model frozen and avoiding online test-time adaptation. In an MTBench-aligned evaluation with 17 tasks that span finance and weather prediction and reasoning tasks, TimeClaw delivers consistent gains over the baselines. These results suggest that, for scientific systems, the bottleneck is not only execution-time capability, but how exploratory experience is compared, distilled, and reused.