TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning
2026-06-02 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors studied how well large language models (LLMs) can judge the quality of time series data, which is tricky because quality has many parts. They found that current models struggle to pick out important quality aspects and make solid comparisons. To fix this, they created TSQAgent, a system with three parts that work together to identify key quality features, analyze them quantitatively, and combine the results for better judgments. Their experiments show TSQAgent improves LLMs' understanding and helps select better data for tasks.
time series datadata quality assessmentlarge language modelsquality dimensionsquantitative analysisbenchmarkagentic reasoningdata selectiondownstream performanceTSQAgent
Authors
Shunyu Wu, Dan Li, Haozheng Ye, Weibin Feng, Jian Lou, Bo Zhang, Wenjie Feng, Chenjuan Guo, See-Kiong Ng
Abstract
Assessing the quality of time series (TS) data is fundamental yet inherently challenging due to the multifaceted nature of quality dimensions. Recently, large language models (LLMs) have emerged as a promising paradigm for TS quality assessment via pairwise comparison and per-dimension evaluation. However, existing approaches rely on manually predefined quality dimensions and purely text-based reasoning, leaving it unknown whether LLMs can identify truly relevant quality dimensions or perform grounded and quantitative quality comparisons. To investigate this, we construct TSQBench, a dedicated benchmark for evaluating LLMs on two progressive capabilities: (i) understanding and identifying relevant quality dimensions, and (ii) performing quality comparison under specific dimensions. Our analysis reveals that current LLMs consistently struggle with both dimension identification and evidence-grounded quality comparison. To address these limitations, we propose TSQAgent, a novel agentic reasoning framework for TS quality rating consisting of three collaborative roles: Perceiver for focused dimension selection, Inspector for dimension-wise quantitative analysis, and Adjudicator that aggregates and refines the final judgment. In particular, we introduce an agentic reasoning strategy that instills the ability to identify and prioritize the most relevant quality dimensions, and further propose an agent workflow equipped with external analytical tools to enable precise quantitative comparisons over selected dimensions. Experiments on both the proposed benchmark and eleven real-world datasets demonstrate that our framework not only substantially improves LLMs' capabilities in quality understanding and quantitative comparison but also effectively translates these improvements into better quality-aware data selection, leading to enhanced downstream performance and data efficiency.