Exploding and vanishing gradients in deep neural networks: the effect of residual connections

2026-06-15 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study why deep neural networks sometimes have problems with gradients either getting too big or too small, which makes learning hard. They use a math tool called multiplicative ergodic theory to understand this better. The paper explains how adding residual connections (shortcut links in networks) helps fix this problem by looking at a special mathematical measure called the Lyapunov spectrum. The authors use existing results from Furstenberg and Kifer to make their explanation more precise.

exploding gradientsvanishing gradientsdeep neural networksmultiplicative ergodic theoryresidual connectionsLyapunov exponentsLyapunov spectrumFurstenberg-Kifer theory

Authors

Vivek S Borkar

Abstract

The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapunov exponents due to Furstenberg and Kifer is exploited in order to make a precise statement about the Liapunov spectrum and the effect of residual connections on it.

View PDFOpen arXiv