Non-parametric recovery of causal diffusion mechanisms from steady-state observations

2026-06-29Machine Learning

Machine Learning
AI summary

The authors study complex systems that change over time and are influenced by each other in a cause-and-effect way. They show how to figure out the exact rules driving these changes using just one-time snapshots, like images of gene activity taken once per cell. Their approach relies on assuming the system is stable when measured and that its cause-effect relationships are known and simple enough. They create a method that estimates these hidden rules without assuming a specific form and prove it works reliably. They also test their method with simulations and suggest ways to choose settings for the estimation.

sparse multivariate systemscontinuous-time stochastic processcausal mechanismdiffusion processequilibrium distributiondrift functionnon-parametric estimationkernel estimatorcross-sectional datacausal graph
Authors
Richard Schwank, Mathias Drton
Abstract
We consider sparse multivariate stochastic systems that evolve in continuous time according to a causal mechanism and present methodology to recover the system's time-infinitesimal transition mechanism from mere cross-sectional data. This observational paradigm is motivated by applications such as gene expression analysis, where destructive experimental techniques may only allow recording data once over a cell's lifetime. Precisely, we assume the system follows a time-homogeneous diffusion process that has reached an equilibrium distribution at observation time. Further, we assume the causal mechanism is fully described by the diffusion drift, is acyclic, and its causal structure graph is known. In this setting, we prove that the full causal mechanism, i.e., the drift function, can be non-parametrically identified under a weak non-explosion criterion. We derive a non-parametric kernel estimator for this challenging inverse problem and prove its consistency. Moreover, we propose a cross-validation scheme for hyperparameter tuning, illustrate the behavior of our estimator in simulations, and we discuss connections with irreversible generative diffusion models and low-frequency sampled data.