Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data
2026-06-01 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address the problem of estimating how treatments that change over time affect survival, especially when the outcomes are rare and the data are large. They note that current methods like the ICE estimator are accurate but very slow and unstable because of rare events. To fix this, they propose a method that smartly samples and weights the data, which makes the calculations faster and more stable without losing accuracy. They tested their approach with simulations and a real health study, showing it works well for analyzing rare events over time.
causal inferencetime-varying treatmentssurvival analysisiterative conditional expectation (ICE) estimatorlongitudinal datarare outcomesclass imbalancelogistic regressionsubsamplingreweighting
Authors
Xiaohui Yin, Avijit Mitra, Ying Zhou, Kun Chen, Hong Yu
Abstract
Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.