Distribution-Aware Diffusion-LLM for Robust Ultra-Long-Term Time Series Forecasting

2026-06-22Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors look at how large language models (LLMs) can be used to predict future time series data, like weather or traffic patterns. They note that while LLMs are good at understanding long sequences of text, they have trouble with different kinds of data and estimating uncertainty. To fix this, the authors combine LLMs with a special type of model called a conditional diffusion model, which helps better predict future values and align different data types. Testing their method on six forecasting tasks, they find it works better than previous approaches, especially for very long-term and few-example predictions.

Time series forecastingLarge language models (LLMs)Conditional diffusion modelMultimodal dataProbabilistic modelingSemantic alignmentLong-term forecastingFew-shot learningLatent spaceDistribution-aware regularization
Authors
Falguni Ghosh, Vahid Hashemi, Bernhard Kainz
Abstract
Time series forecasting is a fundamental machine learning task. Recent work has explored Large Language Models (LLMs) for this purpose due to their strong generalization, pattern recognition, and zero-shot or few-shot capabilities. Despite their suitability for long-context learning, LLMs face challenges in multimodal settings: they lack calibrated probabilistic modeling for non-text data and struggle to align heterogeneous representations. To address these issues, we propose a new framework Diffusion-LLM that integrates a conditional diffusion model into an LLM-based forecasting pipeline. This joint design enables learning the conditional distribution of future data while improving semantic alignment in a shared latent space. We evaluate Diffusion-LLM on six long-term forecasting benchmarks, including ETT, Weather, and ECL. Our method consistently outperforms existing LLM-based baseline, achieving notable gains in ultra-long-term and few-shot forecasting and demonstrating the value of distribution-aware regularization for enhancing robustness and generalization in time series LLMs.