COHORT: Collaborative Orchestration for Hardening via Offensive Replay on Emulated Topologies

2026-06-29Networking and Internet Architecture

Networking and Internet ArchitectureArtificial IntelligenceCryptography and SecurityMultiagent Systems
AI summary

The authors created COHORT, a system that uses multiple AI agents to automatically find ways to stop cyberattacks in a network without disrupting normal use. It tests each fix by replaying the attack on a virtual version of the network and checking if the attack still works or if regular connections break. This method is better than previous ones because it uses real device software and thorough testing instead of guesses by experts or simple simulations. In tests across different network setups and attacks, COHORT produced effective fixes almost half the time, significantly outperforming a simpler AI approach.

network securitymitigationmulti-agent systemLLM workflowGNS3 emulatoroffensive replayconnectivity testingcyberattackfirewallransomware
Authors
Chen Frydman, Aviram Zilberman, Rubin Krief, Abed Showgan, Andres Murillo, Sekiya Motoyoshi, Asaf Shabtai, Yuval Elovici, Rami Puzis
Abstract
Mitigating an observed adversary in an enterprise network typically takes weeks of expert work: an analyst derives a mitigation tailored to that adversary, validates it without breaking production, and verifies it disrupts the specific attack. The procedure relies on expert judgment and cannot safely be exercised against the production network. COHORT is the first end-to-end framework to automate this procedure for deployable mitigations. A role-decomposed multi-agent LLM workflow proposes candidates, implements them as real device commands, and refines them through a critique loop, all on a high-fidelity GNS3 emulator running real vendor firmware (firewall, switch, router). Each candidate is evaluated by offensive replay: re-executing the original adversary on the mitigated network for a paired comparison against the unmitigated baseline, rather than the reward-signal or expert-judgment proxies used in prior simulation, hybrid, and configuration-generation work. Two further checks complement replay: a connectivity-regression check (LAN ping and internet HTTP probe) rejects mitigations that disrupt legitimate LAN or internet connectivity, and a cumulative evaluation stacks approved mitigations onto a persistent state to surface compound effects. Across three topologies and four attack scenarios (ransomware, lateral movement, DNS exfiltration, data theft), 46.7% of generated mitigations both disrupt the attack and preserve connectivity under replay, 4.4 times the rate of a single-agent baseline using the same model and tool access. A demo video walking through the framework is available with our released artifacts.