AITH: A Post-Quantum Continuous Delegation Protocol for Human-AI Trust Establishment
2026-04-09 • Cryptography and Security
Cryptography and SecurityArtificial Intelligence
AI summaryⓘ
The authors developed AITH, a new safety system designed to help humans trust AI agents that act on their behalf. Unlike older systems, AITH continuously checks AI actions and can quickly stop those that break rules or act suspiciously. It uses fast cryptographic methods and smart limits to keep AI behavior predictable and secure. The authors tested AITH extensively, fixing security issues and showing it lets most AI actions proceed without needing human approval.
AI trustcryptographic protocolscontinuous delegationmachine-verified securityadversarial auditingpost-quantum cryptographyML-DSA-87Tamarin ProverDolev-Yao modelaudit logging
Authors
Zhaoliang Chen
Abstract
The rapid deployment of AI agents acting autonomously on behalf of human principals has outpaced the development of cryptographic protocols for establishing, bounding, and revoking human-AI trust relationships. Existing frameworks (TLS, OAuth 2.0, Macaroons) assume deterministic software and cannot address probabilistic AI agents operating continuously within variable trust boundaries. We present AITH (AI Trust Handshake), a post-quantum continuous delegation protocol. AITH introduces: (1) a Continuous Delegation Certificate signed once with ML-DSA-87 (FIPS 204, NIST Level 5), replacing per-operation signing with sub-microsecond boundary checks at 4.7M ops/sec; (2) a six-check Boundary Engine enforcing hard constraints, rate limits, and escalation triggers with zero cryptographic overhead on the critical path; (3) a push-based Revocation Protocol propagating invalidation within one second. A three-tier SHA-256 Responsibility Chain provides tamper-evident audit logging. All five security theorems are machine-verified via Tamarin Prover under the Dolev-Yao model. We validate AITH through five rounds of multi-model adversarial auditing, resolving 12 vulnerabilities across four severity layers. Simulation of 100,000 operations shows 79.5% autonomous execution, 6.1% human escalation, and 14.4% blocked.