Conformal Policy Control

2026-03-02 • Artificial Intelligence

Artificial IntelligenceMachine Learning

AI summaryⓘ

The authors address the problem of safely trying new actions in high-risk situations where mistakes can cause harm and stop the agent from working anymore. They propose a way to use a safe, known policy as a guide to limit how much a new, untested policy can change behavior, using a technique called conformal calibration to control risk. Their method works without needing the user to pick the perfect model or tune settings and can handle complex safety rules with guaranteed confidence from limited data. Experiments show that it can safely improve performance from the start in various tasks.

safe explorationpolicyconformal calibrationrisk tolerancehigh-stakes environmentsconservative optimizationfinite-sample guaranteesbehavior regulationmodel calibrationconstraint functions

Authors

Drew Prinster, Clara Fannjiang, Ji Won Park, Kyunghyun Cho, Anqi Liu, Suchi Saria, Samuel Stanton

Abstract

An agent must try new behaviors to explore and improve. In high-stakes environments, an agent that violates safety constraints may cause harm and must be taken offline, curtailing any future interaction. Imitating old behavior is safe, but excessive conservatism discourages exploration. How much behavior change is too much? We show how to use any safe reference policy as a probabilistic regulator for any optimized but untested policy. Conformal calibration on data from the safe policy determines how aggressively the new policy can act, while provably enforcing the user's declared risk tolerance. Unlike conservative optimization methods, we do not assume the user has identified the correct model class nor tuned any hyperparameters. Unlike previous conformal methods, our theory provides finite-sample guarantees even for non-monotonic bounded constraint functions. Our experiments on applications ranging from natural language question answering to biomolecular engineering show that safe exploration is not only possible from the first moment of deployment, but can also improve performance.

View PDFOpen arXiv