Position: Correct Answer, Wrong Mechanism -- When AI Scientists Defend General Claims Their Own Data Contradicts

2026-06-22 • Machine Learning

Machine Learning

AI summaryⓘ

The authors argue that simply checking if AI scientists get the right answer isn't enough. They found that sometimes AI agents arrive at correct results by using wrong or misleading reasoning, which breaks under new conditions—a problem they call Correct Answer, Wrong Mechanism (CAWM). Their study with coding agents in physics simulations shows these agents are good tools but can't always be trusted as true scientific partners because they don't always verify their reasoning. The authors suggest simple tests to catch when agents rely on faulty logic, improving trustworthiness in AI-driven discoveries.

AI scientist systemsmechanism fidelityepistemic honestyCorrect Answer Wrong MechanismGeant4 simulationparticle identificationscientific co-authorsoutcome evaluationregime-shift testopen-ended claim-making

Authors

Steven Young Eulig

Abstract

AI scientist systems are described as tools, coauthors, or founders, but we evaluate them as if only the final answer matters. This position paper argues that outcome-only evaluation is insufficient, and that task outcome, mechanism fidelity, and epistemic honesty must be measured separately. Our evidence comes from 28 episodes of a coding agent attempting to rediscover a known particle identification observable in a Geant4 simulation, including an 8-episode probe across two additional frontier models. In 4/20 primary-model and 3/8 cross-model episodes, agents reach right-looking results through incorrect reasoning that breaks when conditions change, which we call Correct Answer, Wrong Mechanism (CAWM). Honesty and mechanism fidelity dissociate within a single agent trajectory. When given a partially misleading prior, all five agents reject the false component on evidence, yet one defends its chosen observable with physics inconsistent with its own data. In the simulation-based discovery setting studied here, coding agents prove reliable tools but unreliable scientific co-authors for open-ended claim-making, where co-author trust requires mechanism-fidelity verification they do not reliably self-apply. The failure is detectable, and we propose a lightweight test. A one-step regime-shift check needs only the agent's claim and flags the over-generalized cases. A companion recomputation flags the remaining cases when the correct observable is known. Together, these checks flag every CAWM case in this study.

View PDFOpen arXiv