Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory

2026-06-08Machine Learning

Machine Learning
AI summary

The authors study how the hidden states inside recurrent neural networks (RNNs) can be made more stable and interpretable by using a technique called backward coherence, which checks how well a hidden state can be reconstructed from the next one. They show that under certain assumptions, these hidden states follow a pattern called a quasi-reverse-martingale, which ensures the hidden states converge and have predictable behavior over time. Their method reduces instability and helps the RNN learn more reliably, supported by both simulations and real-world data tests in healthcare and forecasting. Additionally, they connect their approach to variational inference and provide theoretical guarantees under specific conditions.

Recurrent Neural NetworksHidden StateBackward CoherenceMartingaleVariational InferenceConcept DriftTime-uniform Confidence SequencesEcho-state PropertyKullback-Leibler DivergenceChange-point Detection
Authors
Yuan-chin Ivan Chang
Abstract
Recurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_φ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences. Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $φ$-mixing inputs, change-point tracking, and finite-sample concentration. Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.