Training-Free Looped Transformers

2026-05-22 • Machine Learning

Machine Learning

AI summaryⓘ

The authors propose a way to improve transformer models by reusing a middle part of the model multiple times during inference without any extra training. Instead of changing or retraining the model, they wrap a frozen model block and apply it repeatedly in a controlled manner inspired by mathematical methods for solving equations. They show this approach improves performance across multiple model types and tasks, such as question answering, without extra training. Their method treats repeated processing as smaller, careful updates rather than simply repeating the same step blindly.

transformersinferencepretrained modelslooped transformersfine-tuningODE approximationforward Euler methodmodel performanceMixture of Experts (MoE)question answering

Authors

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu

Abstract

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

View PDFOpen arXiv