The Value of Mechanistic Priors in Sequential Decision Making

2026-05-11Machine Learning

Machine Learning
AI summary

The authors study hybrid mechanistic models that combine physical knowledge with learned corrections to improve decision-making with less data. They introduce a way to measure how useful such physical knowledge is by comparing the model’s suggested policy to the true best policy. They show that using these mechanistic priors can reduce the amount of data needed in the long run and provide a method to evaluate their effectiveness in practice. The authors also highlight risks when the prior knowledge is confidently wrong, especially early on, and test their findings on dosing simulations for a cancer drug. Lastly, they compare these models to large language model (LLM) priors, finding that LLMs may lack reliable mechanistic information, suggesting physical priors are safer for critical decisions.

Hybrid mechanistic modelsBayesian regretMutual informationPolicy recommendationSample complexityBurn-in regimePharmacokinetics5-fluorouracil (5-FU)Large language models (LLMs)Sequential decision-making
Authors
Itai Shufaro, Gal Benor, Shie Mannor
Abstract
Hybrid mechanistic models, physical priors with learned residuals, promise to reduce the data required for good decisions, but have no computable criterion to test this. We characterize the value of mechanistic priors in sequential decision-making within both asymptotic and burn-in regimes. To formalize this, we introduce the mechanistic information of a model -- the mutual information between the model's recommended policy $\hatπ$ and the true optimal policy $π^*$ -- quantified via an occupancy-weighted bias $B_μ$. In the asymptotic regime (large $N$), matched bounds reveal that Bayesian regret scales with the residual entropy $H_{\mathrm{mech}}$, delivering a theoretical sample complexity reduction of $H(μ)/H_{\mathrm{mech}}$ compared to an uninformed baseline. Furthermore, we provide a model certificate to determine empirical sample efficiency. Complementarily, in the clinically relevant burn-in regime (small $N$), we establish a lower bound on the penalty incurred by confidently wrong priors. We demonstrate both the asymptotic and burn-in bounds across 5-fluorouracil (5-FU) dosing simulations motivated by published FOLFOX pharmacokinetic data, where a hybrid prior yields large sample-efficiency gains in the burn-in regime. Finally, we contrast these grounded models with LLM priors, demonstrating that LLMs can suffer severe losses in mechanistic information, thereby motivating the exclusive use of physically-grounded priors for safety-critical applications.