Causal Discovery in the Era of Agents

2026-06-22Artificial Intelligence

Artificial IntelligenceMachine LearningSoftware Engineering
AI summary

The authors discuss how recent methods use large language models (LLMs) to help find cause-and-effect relationships but warn that these models might introduce errors from text patterns rather than actual data. They suggest that AI agents should support the process by explaining data and methods rather than making direct causal conclusions. To follow this idea, the authors created causal-learn+, a tool that helps people analyze data and interpret results while keeping conclusions based on solid data and formal methods. They show how this system works using personality data, keeping AI assistance reliable and transparent.

causal discoverylarge language modelscausal inferencegraph structuresformal algorithmsdata analysisexpert knowledgecausal-learnBig Five personality
Authors
Yujia Zheng, Vishal Verma, Mantej Gill, Haoyue Dai, Peter Spirtes, Kun Zhang
Abstract
Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.