Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

2026-06-01Computation and Language

Computation and LanguageSocial and Information Networks
AI summary

The authors created EvoNote, a system that helps generate quick and accurate corrections for health misinformation on social media by learning from previous correction examples. Unlike existing methods that start fresh with every new post, EvoNote remembers past experiences to improve its responses over time. They tested EvoNote on a set of health posts and found that its corrections were often preferred over human-written ones and produced much faster. The improvements come from better use of evidence and reusable strategies. Overall, the authors suggest EvoNote could help control health misinformation more effectively.

Large Language ModelCommunity NotesHealth misinformationCredit assignmentEvidence groundingMultimodal benchmarkCorrection strategiesCrowd-sourced labelsHuman evaluationSelf-evolving systems
Authors
Zihang Fu, Fanxiao Li, Jianyang Gu, Haonan Wang, Preslav Nakov, Bryan Hooi, Min-Yen Kan, Jiaying Wu
Abstract
Large Language Model (LLM)-augmented Community Notes offer a scalable path for timely, evidence-grounded correction of health misinformation on social platforms. However, they still reset at every post, leaving useful correction experience from prior cases unused. We introduce EvoNote, an agentic framework that enables health Community Notes generation to self-evolve through an evolving experience memory of prior misinformation correction episodes. Its core is fine-grained credit assignment: EvoNote grounds trajectory-level feedback in health-specific note qualities and distills it into action-level memory for claim analysis, evidence acquisition, and note writing. We evaluate EvoNote on MM-HealthCN, a 1.2K-instance multimodal benchmark of user-flagged health posts with human-written Community Notes and crowd-derived helpfulness labels. Under a human-validated hierarchical utility judge, EvoNote-generated notes are preferred over corresponding human-written notes in 89.6% of cases; on a separate set of Needs More Ratings posts without a crowd helpfulness verdict, EvoNote produces helpful notes for 82.0% of cases. It also reduces the median time needed to produce a candidate correction from over 13 hours in the human-note pipeline to under 2 minutes. Analyses link these gains to stronger evidence use and reusable correction strategies, positioning self-evolving note generation as a promising paradigm for health misinformation governance.