How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation
2026-06-15 • Computation and Language
Computation and LanguageCryptography and SecurityComputers and SocietyInformation Retrieval
AI summaryⓘ
The authors created a way to test how well large language models (LLMs) used in web search avoid being tricked by false or harmful web pages. They built SearchGEO, which includes different types of attacks and ways to measure if the LLM wrongly supports bad information. Testing 13 different LLMs showed that some models are very vulnerable while others resist attacks better, and this changes depending on how the LLM is set up. Their results highlight the need to include protection against misleading web content as an important part of making LLMs safe for search tasks.
Large Language Models (LLMs)Web Search AgentsEndorsement CorruptionAdversarial AttacksSearchGEOAttack Success RateWeb Evidence ManipulationBackend Safety EvaluationRecommendation ReliabilityAgent-Skill Probe
Authors
Yimeng Chen, Zhe Ren, Firas Laakom, Yu Li, Dandan Guo, Jürgen Schmidhuber
Abstract
Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-rejects while GPT over-trusts. These findings argue for treating recommendation reliability under adversarial search content as a first-class dimension of backend safety evaluation.