Error Highways: Scaling Predictive Coding to Very Deep Networks

2026-06-22 • Machine Learning

Machine LearningNeural and Evolutionary Computing

AI summaryⓘ

The authors address a problem in predictive coding networks (PCNs), a brain-inspired way to train neural networks, where learning signals weaken as they move through many layers. They introduce a new method called highway error propagation (HEP) that directly connects deeper layers to the output error, preventing signal decay. This approach helps train very deep networks more effectively while keeping the learning process local and biologically plausible. Their tests on MNIST and Fashion-MNIST datasets show that HEP can successfully train networks up to 128 layers deep with good accuracy.

Predictive coding networksBack-propagationLocal learningFree energy functionFeedback matricesHighway error propagationMultilayer perceptronsMNISTFashion-MNISTNeural network depth

Authors

Amirhossein Mohammadi, Alexander G. Ororbia

Abstract

Predictive coding networks (PCNs) offer a biologically-plausible, local-learning alternative to back-propagation of errors (backprop). Nevertheless, they have remained largely confined to shallow architectures and evaluated on simple machine intelligence benchmarks. A central obstacle to scaling PCNs is that the learning signal decays rapidly as it propagates away from the clamped boundaries, leaving interior layers effectively unchanged. To directly counter this problem, we propose highway error propagation (HEP), a scheme that augments the free energy function underlying predictive coding (PC) by altering its neural structure with feedback matrices $V_{L\to i}$ that couple selected hidden states directly to the clamped output error. Since this coupling is linear in the hidden state, the highway pathway delivers a correction at every inference step whose magnitude is independent of depth, in contrast to vanilla PC where the output error reaches the $i$-th hidden layer with attenuation that decays exponentially in depth. This bypasses the Jacobian chain while preserving the local PC synaptic update rule. On MNIST and Fashion-MNIST, we show that HEP effectively trains MLPs of up to 128 layers with accuracy that is robust with respect to depth.

View PDFOpen arXiv