Informational Frustration in Neural Manifolds: Shannon Bottlenecks and the Limits of Learnability

2026-06-29 • Machine Learning

Machine LearningArtificial IntelligenceComputational Geometry

AI summaryⓘ

The authors explore why very large neural networks often learn well despite traditional theories predicting they shouldn’t. They propose a new theory that connects ideas from information theory, topology, and physics, suggesting that learning depends on a balance between the complexity of the data, the decision boundary, and the network’s weight structures. When this balance is broken, the network gets stuck memorizing instead of generalizing, a state they call Informational Frustration. They also explain the sudden improvement in learning called "grokking" as the network escaping this stuck state, and introduce a new optimization method to help manage this process.

overparameterizationgeneralizationShannon entropytopological entropyvon Neumann entropyphase transitiongrokkingstatistical mechanicsinformation theoryoptimization algorithms

Authors

Srinivasa Rao P., Vangmayi P Reddy

Abstract

Why overparameterised deep networks generalise so remarkably well remains one of the most stubborn open questions in machine learning theory. Classical frameworks like VC dimension and Rademacher complexity predict catastrophic overfitting in modern models, leaving a massive theoretical gap between theory and reality. In this paper, we bridge this divide by introducing a unified framework that links information theory, topology, and statistical mechanics to map the hard limits of deep learning. Central to our approach is the Entropic Learnability Horizon (ELH): a fundamental law stating that a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. We establish the Shannon-Topological Bottleneck Theorem, proving that when a target boundary's geometric complexity exceeds this informational horizon, the system undergoes a sudden entropic phase transition. It falls into a state of Informational Frustration - a glassy, rigid memorization phase where generalization becomes thermodynamically impossible. Using this lens, we show that the enigmatic phenomenon of "grokking" is actually an Entropic Release, where weights abruptly reorganise to unlock the bottleneck. Finally, we translate this theory into practice with Entropic Gradient Descent (EGD), an optimization algorithm that dynamically manages weight entropy to keep learning on track. Ultimately, this work repositions entropy not just as a tool for tracking uncertainty but as the fundamental physical currency that dictates whether a machine can learn.

View PDFOpen arXiv