Hallucinations in Organization-backed AI advisors: Evidence about Skepticism, Verification, and Reliance in Goal-Directed Use

2026-06-22Human-Computer Interaction

Human-Computer InteractionComputers and Society
AI summary

The authors reviewed studies about how people use AI systems that sometimes give wrong answers, called hallucinations. They looked at whether users doubt the AI's information, check it, succeed in verifying it, and then decide to trust it or not. Most studies measured if people relied on the AI but didn’t clearly separate if they were skeptical or actually checked the facts. The best ways to make users cautious—like warnings about possible errors—did not always work well. The authors suggest future research should better separate these steps to understand how people deal with possibly wrong AI answers.

Generative AIHallucinationUser skepticismVerificationRelianceAI advisoriesInformation accuracyEmpirical evidenceOrganizational AI useContent generation
Authors
Simon J. Blanchard, Aaron M. Garvey, Laura O'Laughlin
Abstract
Generative AI systems are increasingly used by organizations to deliver information to consumers, patients, students, employees, and citizens. These systems can hallucinate, producing plausible but inaccurate responses. A central question for AI-advised decisions is therefore not only whether users rely on inaccurate information, but whether they recognize that a response may require verification. To answer this question, we review emerging empirical evidence relevant to hallucination detection in goal-directed interactions, with a focus on organization-backed AI advisors. We distinguish three constructs that existing studies often conflate: whether users are skeptical of information presented, whether they check it, whether checking succeeds, and whether the result of user verification affects reliance on the information. Across studies examining product search, medical decision-making, content generation, and chatbot-assisted tasks, several patterns emerge. Nearly all studies measure reliance, while variables such as user skepticism and verification of the information are more often targeted by an intervention than measured directly. The cues used to prompt scrutiny of the AI response are predominantly related to the AI output, such as source citations, and the most deployable of these AI output interventions for organizations (general and specific warnings about the risk of hallucinations) show the weakest and most mixed effects in the studies reviewed. Although the existing literature posits that users may be more likely to scrutinize responses related to particular areas of content, no studies varied the content category, leaving this question open for further research. In future research, measuring skepticism and verification separately from reliance may clarify what current evidence shows, what it only implies, and which questions require further exploration.