Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

2026-04-27Artificial Intelligence

Artificial Intelligence
AI summary

The authors explain that AI systems can still become unsafe even if they are fully authorized because their behavior can change over time or due to attacks. They propose a principle to manage risk by only allowing actions if the system's ability to handle them exceeds an estimated measure of hidden risks. Their framework, based on a mathematical theory called Aubin's viability theory, requires monitoring, anticipating problems, and restricting risky actions to prevent failures. They implemented this idea in a system called RiskGate, which uses statistical tests and a safety switch, and provides a score to predict potential issues. The paper focuses on the theoretical setup and implementation, with experimental testing planned for future work.

Autonomous AI agentsAubin's viability theoryRisk estimationBehavioral driftKL divergenceStatistical hypothesis testingMonotonic restrictionSafety marginPredictive governanceClosed-loop control
Authors
German Marin, Jatin Chaudhary
Abstract
Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) + RG(x)$ and allowing an action only when its capacity $S(x)$ exceeds $\hat{B}(x)$ by a safety margin. The \textbf{Agent Viability Framework}, grounded in Aubin's viability theory, establishes three properties -- monitoring (P1), anticipation (P2), and monotonic restriction (P3) -- as individually necessary and collectively sufficient for documented failure modes. \textbf{RiskGate} instantiates the framework with dedicated statistical estimators (KL divergence, segment-vs-rest $z$-tests, sequential pattern matching), a fail-secure monotonic pipeline, and a closed-loop Autopilot formalised as an instance of Aubin's regulation map with kill-switch-as-last-resort; a scalar Viability Index $VI(t) \in [-1,+1]$ with first-order $t^*$ prediction transforms governance from reactive to predictive. Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.