GIF: Locally Sound Geometric Information Flow Control for LLMs

2026-06-22 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors developed a new method called Geometric Information Flow (GIF) to better track how information from input words influences outputs in large language models. Unlike previous methods, GIF uses mathematical tools like the model's Jacobian and local geometry to measure information flow accurately and efficiently. They proved that GIF reliably estimates information flow and tested it on tasks involving security issues like prompt injections and data leaks, where it performed very well. GIF can also work with smaller models to help understand larger ones, making it useful even without full internal access to the model.

Large Language ModelsInformation Flow ControlJacobianShannon Mutual InformationPrompt InjectionPrivacy LeakageAutomatic DifferentiationLocal GeometryDeclassifierLean 4 Proof

Authors

Adam Storek, Nikolaus Holzer, Zhuo Zhang, Suman Jana

Abstract

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. These range from prompt injections that manipulate downstream tool use to leakage of confidential information through model outputs. Recent Information Flow Control (IFC)-based defenses show promise but lack a principled semantic foundation for reasoning about information flow through the model itself. Since any input token may influence any output token in an autoregressive LLM, existing approaches suffer from severe taint explosion. We present Geometric Information Flow (GIF), a semantic framework for tracking information flow from input tokens to outputs. GIF uses the LLM Jacobian and local output geometry to upper-bound the Shannon mutual information between perturbed input spans and model outputs, yielding a scalable measure computable on large models via automatic differentiation and low-rank approximation. Unlike attention-based or correlational attribution heuristics, GIF satisfies local geometric soundness, and we provide a fully mechanized Lean 4 proof that it upper-bounds the true information flow induced by a given prompt under local regularity assumptions. We evaluate GIF on integrity and confidentiality tasks across multiple prompt-injection and privacy-leakage benchmarks. GIF achieves near-perfect recall even without a downstream declassifier, outperforming attention-based baselines. Combined with lightweight LLM-based declassifiers, it matches or exceeds the F1 of direct LLM-as-judge baselines such as GPT-5.5 xhigh reasoning while using up to 81x lower token cost. GIF flows detected with small surrogate models transfer to larger state-of-the-art models and other model families, even when the surrogate is up to 200x smaller, suggesting black-box deployment without gradient access.

View PDFOpen arXiv