Yield Curve Forecasting using Machine Learning and Econometrics: A Comparative Analysis
2026-05-11 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors studied different methods to predict U.S. Treasury yield curves, an important financial measure, using 47 years of daily data. They compared traditional time-series models like ARIMA, classic machine learning, and newer deep learning methods such as RNNs and transformers. They found that simpler econometric models like ARIMA usually gave better forecasts than most machine learning or deep learning models, except for some cases where certain machine learning models performed well. The study also looked into whether deep learning models work better with stable or changing data inputs.
U.S. Treasury yield curveTime-series forecastingARIMAMachine learningDeep learningRecurrent Neural Networks (RNNs)TransformersStationarityEconometricsLightGBM (LGBM)
Authors
Aman Singh, Tokunbo Ogunfunmi, Sanjiv Das
Abstract
While machine learning has revolutionized many fields such as natural language processing (NLP) and computer vision, its impact on time-series forecasting is still widely disputed, especially in the finance domain. This paper compares forecasting performance on U.S. Treasury yield curve data across econometrics/time-series analysis, classical machine learning, and deep learning methods, using daily data over 47 years. The Treasury yield curve is important because it is widely used by every participant in the bond markets, which are larger than equity markets. We examine a variety of methods that have not been tested on yield curve forecasting, especially deep learning algorithms. The algorithms include the Autoregressive Integrated Moving Average (ARIMA) model and its extensions, naive benchmarks, ensemble methods, Recurrent Neural Networks (RNNs), and multiple transformers built for forecasting. ARIMA and naive econometric models outperform other models overall, except in one time block. Of the machine learning methods, TimeGPT, LGBM and RNNs perform the best. Furthermore, the paper explores whether stationary or nonstationary data are more appropriate as input to deep learning models.