Referential Security as a New Paradigm for AI Evaluations

2026-05-25Cryptography and Security

Cryptography and SecurityArtificial Intelligence
AI summary

The authors explain that current security checks for AI systems often rely on names or labels that don’t change, but the actual AI behind them can be updated without notice. This creates problems because safety tests may not apply to the version of the AI people use later. They suggest a new idea called referential security, which focuses on clearly identifying exactly which AI system a safety claim is about. This makes it easier to repeat tests, track changes over time, and compare different systems reliably. By doing this, safety evaluations become more trustworthy and useful even as AI systems evolve.

referential securitymodel identitysecurity evaluationAI systemsreproducibilitylongitudinal auditcross-provider equivalencedynamic systemssafety claimsartifacts
Authors
Dan Ristea, Vasilios Mavroudis
Abstract
Security evaluations inherently depend on stable identifiers. Any finding, audit, or regulatory decision must remain attached to the specific artifact it pertains to. Continuously updated artificial intelligence systems violate this core assumption, with public model designations remaining static while underlying weights, prompts, retrieval mechanisms, misuse classifiers, inference settings, and serving infrastructures undergo unannounced modifications. Consequently, current evaluations frequently apply to superficial labels rather than identifiable and distinct systems. To resolve this, we propose referential security as a new paradigm for AI evaluation. The fundamental security question extends beyond whether a model is safe to whether subsequent parties can conclusively determine which system a specific safety claim addressed. This approach reframes model identity as an empirically verifiable property and separates referential stability from the substantive security claims it conditions. This framework brings tractability to three critical workflows that current practices handle poorly. Specifically, it enables reproducible evaluation, longitudinal audit validity, and cross-provider equivalence. By grounding these evaluations in verifiable artifacts, our approach ensures that safety audits and regulatory findings maintain their empirical utility across the operational lifecycle of dynamic systems.