When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

2026-05-01Cryptography and Security

Cryptography and SecurityArtificial IntelligenceComputation and Language
AI summary

The authors assessed the security of a medical chatbot that uses AI to provide health information. They used an AI tool to test for vulnerabilities and then manually checked the findings using browser inspection tools. They discovered that sensitive system details and users' private conversations were exposed and accessible without login, which goes against the privacy promises made. The authors suggest that simple tools can uncover serious security problems in such chatbots and recommend thorough independent reviews before these AI health tools are used publicly.

retrieval-augmented generation (RAG)medical chatbotsecurity assessmentprivacylarge language model (LLM)APIbrowser developer toolsdata exposureindependent reviewhealth information technology
Authors
Alfredo Madrid-García, Miguel Rujas
Abstract
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To report an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot and identify governance lessons for safe deployment of generative AI in health. Methods: We used a two-stage strategy. First, Claude Opus 4.6 supported exploratory prompt-based testing and structured vulnerability hypotheses. Second, candidate findings were manually verified using Chrome Developer Tools, inspecting browser-visible network traffic, payloads, API schemas, configuration objects, and stored interaction data. Results: The LLM-assisted phase identified a critical vulnerability: sensitive system and RAG configuration appeared exposed through client-server communication rather than restricted server-side. Manual verification confirmed that ordinary browser inspection allowed collection of the system prompt, model and embedding configuration, retrieval parameters, backend endpoints, API schema, document and chunk metadata, knowledge-base content, and the 1,000 most recent patient-chatbot conversations. The deployment also contradicted its privacy assurances: full conversation records, including health-related queries, were retrievable without authentication. Conclusions: Serious privacy and security failures in patient-facing RAG chatbots can be identified with standard browser tools, without specialist skills or authentication; independent review should be a prerequisite for deployment. Commercial LLMs accelerated this assessment, including under a false developer persona; assistance available to auditors is equally available to adversaries.