Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents
2026-05-28 • Artificial Intelligence
Artificial IntelligenceComputation and Language
AI summaryⓘ
The authors study systems where multiple language models (LLMs) work together, each seeing only part of a problem. They show that even if each model's probability estimates are individually consistent, putting them together can break basic probability rules. They create a way to measure this inconsistency (called the compositional residual) and find methods to detect and fix it during use. The authors also test strategies to reduce these issues on the LLM side but find they don’t reliably help.
Large Language Models (LLMs)Probabilistic coherenceCompositional residualProbability axiomsL2 distanceJoint coherent polytopeRayleigh quotientBoyle-Dykstra projectionAnytime-valid e-processSequential coherence monitoring
Authors
Anany Kotawala
Abstract
Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, the L2 distance from the composed quote to the joint coherent polytope, computable at runtime from system output and the declared cross-component coupling constraints. A product-structure dichotomy characterises when local coherence suffices, and a Rayleigh-quotient prediction matches the observed residual within 7% on three of four relation classes. A hierarchical Boyle-Dykstra projection repairs the composition deterministically; an anytime-valid e-process gives sequential coherence monitoring. Across 1,876 ensemble cliques on a four-LLM mid-tier panel (frontier-panel rerun in Section 5.5), eps* > 0 on 33-94% of cliques, translating to +0.115 nats per bet of regret on 1,770 resolved bets under the proportional allocation rule (the gain collapses to +0.006 under bettors that themselves coherentise). Three intuitive LLM-side mitigations(retrieval, partition-aware prompting, aggregator-LLM) each fail or regress.