Graphical conditional generative modeling for digital twin modeling

2026-06-15Computational Engineering, Finance, and Science

Computational Engineering, Finance, and ScienceMachine Learning
AI summary

The authors tackle the problem of overly complex digital twin models by creating simpler, yet still accurate, models that only use important variables. Their method finds these key variables by looking at how all parts of the target's behavior change, not just the average. They combine techniques that model full distributions with tools that identify and remove unimportant inputs. This results in clear, manageable models that work well in different scenarios, like control systems and economic data.

digital twinstochastic surrogate modelconditional generative modelingGaussian processkernel mode decompositionMarkov decision processvariable selectionmodel complexitycontrol systemsreinforcement learning
Authors
Zongren Zou, Théo Bourdais, Ricardo Baptista, Houman Owhadi
Abstract
Digital twin modeling, including control and data assimilation under model uncertainty, often faces an open-ended fidelity problem: adding variables, data streams, and time scales can indefinitely increase model complexity, ultimately producing systems that are difficult to maintain, validate, interpret, and use for stress or safety testing. As an alternative, one can seek parsimonious stochastic surrogate models built only on the variables needed to describe the relevant quantities of interest. We introduce a framework for discovering such variables from observational data by identifying which candidate inputs influence the full conditional law of a target quantity, rather than only its conditional mean. This distinction is essential in stochastic, coarse-grained, or partially observed systems, where dependencies may appear through changes in variability, tail behavior, multimodality, or uncertainty rather than through deterministic functional relationships. The framework couples conditional generative modeling, which learns the conditional distribution of the target given candidate inputs, with Gaussian-process-based analysis of variance (through kernel mode decomposition), which enables iterative pruning of non-influential inputs and interpretable structure discovery. In control settings, the resulting surrogate can be interpreted as a learned Markov decision process: the method identifies not only a transition model, but also the state, action, and memory variables needed to make the learned dynamics effectively Markovian. Across examples involving stochastic dynamical systems, missing variables, PDE control, reinforcement learning, and economic data, the discovered structures yield interpretable stochastic surrogates whose downstream performance is comparable to models trained on the full variable set.