Supercharging Bayesian Inference with Reliable AI-Informed Priors
2026-05-11 • Machine Learning
Machine Learning
AI summaryⓘ
The authors present a method to improve how AI predictions are used as prior knowledge when there is little data available for statistical analysis. Their approach adjusts the AI-generated data before using it to form priors, which helps reduce errors carried over from the AI model. This adjustment leads to more accurate and trustworthy statistical inference. They test their method on a skin disease classification task and find it improves prediction results.
prior distributionstatistical inferenceDirichlet processsynthetic dataGaussian asymptoticsbias correctionposterior distributioncredible intervalsAI-informed priorspredictive modeling
Authors
Jongwoo Choi, Sean O'Hagan
Abstract
Modern predictive systems encode beliefs that can act as useful prior information for statistical inference in data-limited settings. Using them for prior construction introduces a tradeoff: an informative prior built from a predictive model can sharpen inference from limited data, but also risks propagating error from the model into the posterior. We propose a framework for AI-informed prior elicitation that mitigates this tension by rectifying the AI-induced law that generates synthetic data before using it to inform a prior. The rectified law can be embedded into synthetic data-driven prior elicitation techniques, including as a base measure in a Dirichlet process (DP) prior on the data-generating process. We refer to the resulting prior and corresponding posterior as the rectified AI prior and rectified AI posterior. We establish Gaussian asymptotics for the rectified AI posterior under non-vanishing prior strength and derive a first-order expression for its centering bias. Our rectified AI priors substantially reduce bias compared to standard approaches, improve the coverage of credible intervals, and make AI-powered prior information more reliable. We additionally apply the rectified AI prior to a real skin disease classification task and show that it can meaningfully boost predictive performance.