StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs

2026-05-25Artificial Intelligence

Artificial Intelligence
AI summary

The authors studied a problem they call Structural Cognitive Overload (SCO), where multimodal large language models (MLLMs) fail to keep their reasoning consistent and safe. They created a tool named StructBreak to measure this problem and found a new type of attack that can trick these models into generating harmful content without needing to look inside the models. Testing on six popular models showed that SCO often causes unsafe outputs, revealing weaknesses in current safety methods. They also analyzed how SCO works inside the models and showed that existing protections are not enough for complex multimodal tasks.

Multimodal Large Language Models (MLLMs)Structural Cognitive Overload (SCO)StructBreakstructural consistencyblack-box attacktoxic generationsafety alignmentattention dynamicslatent space topologymodel alignment
Authors
Yang Luo, Xinran Liu, Tiantian Ji, Zhiyi Yin, Lingyun Peng, Shuyu Li
Abstract
Multimodal Large Language Models (MLLMs) excel at structural reasoning yet suffer from a sharp logical brittleness in structural consistency. We term this phenomenon Structural Cognitive Overload (SCO), a byproduct of the contention between deep reasoning and safety alignment. However, prior work has predominantly targeted typographic and pixel-level perturbations, leaving the study of SCO largely unexplored. To this end, we propose StructBreak, an automated end-to-end framework designed to quantify SCO. By leveraging StructBreak, we uncover a novel higher-order cognitive overload attack paradigm; notably, this attack operates under a practical black-box setting, requiring no internal model access. Consequently, we utilize this framework to establish a comprehensive benchmark spanning ten diverse threat scenarios. Empirical evaluations on six leading MLLMs reveal that SCO readily triggers toxic generation, yielding a 92% average ASR (up to 97% on Gemini 2.5). To elucidate the mechanism of SCO, we further conduct model-level interpretations spanning attention dynamics, latent space topology, and geometric analysis. Our findings reveal that StructBreak acts as a novel structural channel to circumvent safety filters. Furthermore, the limited efficacy of inherent safety mechanisms underscores that current alignment paradigms are insufficient for the era of complex multimodal reasoning.