POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

2026-06-01Artificial Intelligence

Artificial Intelligence
AI summary

The authors studied how large language models working together can make smart decisions but sometimes fail in ways that are hard to detect, especially when safety is important. They created POIROT, a method that uses the different agents within the system to check each other's work instead of relying on an outside expert. Their tests show POIROT works better than single checkers, especially as tasks get harder or more complex. This means systems can monitor themselves to stay safer without needing separate oversight. They also provide an open-source tool and a test set called BLAME to help others evaluate these kinds of faults.

Large Language ModelsMulti-Agent SystemsEmergent FailuresHallucinationsEpistemic DiversityFault AttributionSafety-Critical SystemsEvaluation ProtocolOpen-Source Benchmark
Authors
Iñaki Dellibarda Varela, R. Sendra-Arranz, Pablo Romero-Sorozabal, J. M. Valverde-García, Annemarie F. Laudanski, Álvaro Gutiérrez, Eduardo Rocon, Manuel Cebrian
Abstract
Orchestrating Large Language Models into Multi-Agent Systems (LLM-MAS) has unlocked remarkable reasoning capabilities, yet emergent failures and hallucinations that resist characterisation block their deployment in safety-critical domains -- a gap made legally untenable by emerging AI regulation. Existing evaluation paradigms share a common flaw: centralised judgment creates single points of failure and demands domain-specific expertise. Here we present POIROT, a protocol that repurposes a system's own agents as its diagnostic layer, leveraging the epistemic diversity already present in the architecture. Across evaluated settings, POIROT outperforms single-LLM evaluator baselines, with gains that scale with problem complexity (OR = 1.60, $p = 0.008$), agent count, and fault dimensionality, persisting under compound fault conditions. These results demonstrate that safety oversight need not be externalised: the agents executing a role carry sufficient collective intelligence to audit it. We release POIROT as an open-source library alongside BLAME, a benchmark for fault attribution in safety-critical multi-agent systems.