MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance

2026-06-15Artificial Intelligence

Artificial Intelligence
AI summary

The authors address the problem where simulations don't perfectly match real-world data, making it hard to infer hidden parameters. They introduce MA-SBI, a new method that uses extra information like text labels to adjust the simulation outputs without needing exact known parameters or retraining. Their math shows how well this adjustment can reduce errors based on how related the extra information is to the simulation mismatch. They found MA-SBI performs as well as ideal methods on benchmark tests and works better than previous approaches in some real-world cases, like COVID data. Their method also avoids changing results when the simulation is already accurate.

Simulation-based inferenceSimulator misspecificationOptimal transportPosterior correctionAmortized posteriorMutual informationDonsker-VaradhanEpidemiological modelingRobust inferenceCalibration-free methods
Authors
Arunkumar V, Manoranjan Gandhudi, Gangadharan G. R., Arun Prakash, S. Senthilkumar
Abstract
Simulation-based inference (SBI) of latent parameters is often hindered by simulator misspecification, the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, the recent state-of-the-art for robust SBI, addresses this through optimal transport between learned representations of real and simulated observations, but requires ground-truth parameter calibration pairs that are typically unavailable in the very settings where SBI is needed. What practitioners do have is unstructured side-information such as regime labels, instruction text, and policy bulletins. We propose Misspecification-Aware Simulation-Based Inference (MA-SBI), a calibration-free framework that turns this side-channel into a posterior correction. A learned corrector maps side-channel text to an observation-space shift applied before any pre-trained amortized posterior, requiring no retraining and no parameter ground-truth. Our main theorem bounds achievable bias reduction by the mutual information between misspecification and side-channel, with a non-vacuous constant that extends to all sub-Gaussian noise via Donsker-Varadhan. On hide-the-calibration benchmarks, MA-SBI with text alone matches the oracle posterior across 10 seeds and two backbones (TOST equivalence), while RoPE given more data does not. The two approaches are complementary: where misspecification is structural and recoverable from parameter pairs, RoPE dominates, as the theory predicts. A stochastic variant improves posterior-predictive log-likelihood on real COVID and OxCGRT epidemiological data, and correctly leaves the posterior unchanged on a well-specified cognitive-science corpus.