Exploding and vanishing gradients in deep neural networks: the effect of residual connections

2026-06-15Machine Learning

Machine Learning
AI summary

The authors study why deep neural networks sometimes have problems with gradients either getting too big or too small, which makes learning hard. They use a math tool called multiplicative ergodic theory to understand this better. The paper explains how adding residual connections (shortcut links in networks) helps fix this problem by looking at a special mathematical measure called the Lyapunov spectrum. The authors use existing results from Furstenberg and Kifer to make their explanation more precise.

exploding gradientsvanishing gradientsdeep neural networksmultiplicative ergodic theoryresidual connectionsLyapunov exponentsLyapunov spectrumFurstenberg-Kifer theory
Authors
Vivek S Borkar
Abstract
The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapunov exponents due to Furstenberg and Kifer is exploited in order to make a precise statement about the Liapunov spectrum and the effect of residual connections on it.