Length Generalization with Log-Depth Recurrent Units

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors address a known problem where neural networks struggle to handle inputs longer than those seen during training. They introduce MLP-LDRU, a new type of recurrent model that processes information in a more balanced, parallel way to better generalize to longer sequences. Tested on various regular language tasks, their model significantly outperforms others in accuracy when dealing with longer sequences than trained on. They also show it works well on some more complex tasks like ListOps and natural language classification.

length generalizationrecurrent neural networkstransformersregular languagesMLP-LDRUparallel reductionassociativityListOpsNLP classificationout-of-distribution accuracy

Authors

Charles Pert, Dalal Alrajeh, Alessandra Russo

Abstract

Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.

View PDFOpen arXiv