Linguistic Firewall: Geometry as Defense in Multi-Agent Systems Routing

2026-06-29 • Artificial Intelligence

Artificial IntelligenceMultiagent Systems

AI summaryⓘ

The authors study how groups of specialized AI agents work together and how tasks get assigned to the right agent. They point out that current methods rely on agents' self-reported abilities or fixed profiles, which can be faked or misleading, causing security risks. To fix this, the authors introduce ANTAP, a system that tests agents directly to see what they can really do before assigning tasks. This approach avoids using text descriptions and uses mathematical methods to safely route tasks, making it much harder for bad agents to trick the system. Their tests show ANTAP is much better at stopping attacks that try to fool agent selection compared to older methods.

Large Language ModelsMulti-Agent SystemsTask RoutingSecurity VulnerabilitiesAgent Competence EvaluationBehavioral TestingEmbedding AttacksDescription ManipulationSemantic SpaceAlgebraic Projection

Authors

Dvir Alsheich, Adar Peleg, Ben Hagag, Rom Himelstein, Amit Levi, Avi Mendelson

Abstract

The rapid integration of Large Language Models (LLMs) has driven the evolution of Multi-Agent Systems (MAS), where specialized agents collaborate to execute complex workflows. Effective orchestration in these environments requires robust routing mechanisms to efficiently allocate tasks to the most suitable agent. However, existing routers fundamentally rely on unverified proxies, ranging from textual self-descriptions to static surrogate representations, to gauge an agent's competence. This reliance on non-empirical data creates a critical gap between an agent's projected profile and its actual operational capabilities, introducing severe security vulnerabilities. Malicious agents can easily misrepresent their proficiencies or harbor covert backdoors that evade both standard external analysis and static representation-learning techniques. In this work, we introduce ANTAP (Automatic Non-Textual Agent Picker), an evaluation-driven routing architecture that discards indirect proxies in favor of active capability testing. By dynamically querying agents to ascertain their true competencies empirically, ANTAP distills performance into fixed behavioral operators within a shared semantic space. At inference time, routing is performed via a purely non-textual algebraic projection, establishing a "linguistic firewall" that renders metadata-based attacks inexpressible. In our experiments, ANTAP achieves near-zero ASR against description-based injection attacks, compared to 67.3\% and above for the description-based router baseline. Against adaptive embedding attacks, ANTAP achieves substantially lower ASR than the embedding-based baseline, with a 20\% reduction, while remaining resilient to description manipulation by design.

View PDFOpen arXiv