Hyperparameter Transfer for Dense Associative Memories

2026-05-11 • Machine Learning

Machine Learning

AI summaryⓘ

The authors study a type of AI model called Dense Associative Memory (DenseAM), which works differently from usual neural networks because it shares weights across layers and uses special activation functions. They point out that existing methods to transfer hyperparameters (settings that affect learning) don't work well for DenseAM. Their work creates new methods to transfer these hyperparameters from small to large DenseAM models. They show that their theoretical ideas match well with real experiments.

Dense Associative Memoryhyperparametershyperparameter transferneural networksactivation functionsweight sharingenergy landscapemodel scaling

Authors

Roi Holtzman, Dmitry Krotov, Boris Hanin

Abstract

Dense Associative Memory (DenseAM) is a promising family of AI architectures that is represented by a neural network performing temporal dynamics on an energy landscape. While hyperparameter transfer methods are well-studied for feed-forward networks, these methods have not been developed for settings in which weights are shared across layers and within the layer, which is common in DenseAMs. Additionally, DenseAMs utilize rapidly peaking activation functions that are rarely used in feed-forward architectures. The confluence of these aspects makes DenseAM a challenging framework for using existing methods for hyperparameter transfer. Our work initiates the development of hyperparameter transfer methods for this class of models. We derive explicit prescriptions for how the hyperparameters tuned on small models can be transferred to models trained at scale. We demonstrate excellent agreement between these theoretical findings and empirical results.

View PDFOpen arXiv