Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting

2026-06-05Machine Learning

Machine Learning
AI summary

The authors address the problem of predicting solar power output for new photovoltaic (PV) plants before any real production data is available. They create fake, synthetic production histories using plant details and weather data to help train time-series models. Testing on 440 PV sites, the authors find that models aware of weather inputs perform about twice as well as traditional methods. Different models excel under different feedback scenarios, and the models do well regardless of how the synthetic data was generated, as long as it provides a realistic time context.

Photovoltaic (PV)Time-series forecastingCold-start problemSynthetic dataMeteorological covariatesFoundation modelsMean Absolute Error (MAE)Root Mean Square Error (RMSE)Inference-time conditioning
Authors
Lorenzo Longarini, Alessandro Rongoni, Simone Silenzi, Emanuele Frontoni, Riccardo Rosati
Abstract
At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator.