Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models

2026-04-09Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors propose a system where several different fixed large language models act like parts of a network, passing information through a common space they all understand. They train small projection layers that help these models communicate and work together, improving performance on multiple question-answering tests compared to using any single model. This approach uses very few new trainable parameters while combining strengths from different models effectively. They also show that gradients can flow through these frozen models, enabling end-to-end training, and the final part of the system learns to choose the best information to use without direct instruction.

large language modelslatent spacefrozen modelslinear projectionbackpropagationcross-attentionmulti-model integrationARC-ChallengeOpenBookQAMMLU
Authors
Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee
Abstract
We present a feedforward graph architecture in which heterogeneous frozen large language models serve as computational nodes, communicating through a shared continuous latent space via learned linear projections. Building on recent work demonstrating geometric compatibility between independently trained LLM latent spaces~\cite{armstrong2026thinking}, we extend this finding from static two-model steering to end-to-end trainable multi-node graphs, where projection matrices are optimized jointly via backpropagation through residual stream injection hooks. Three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) encode the input into a shared latent space whose aggregate signal is injected into two larger frozen models (Phi-3-mini, Mistral-7B), whose representations feed a lightweight cross-attention output node. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3\% on ARC-Challenge, 82.8\% on OpenBookQA, and 67.2\% on MMLU, outperforming the best single constituent model by 11.4, 6.2, and 1.2 percentage points respectively, and outperforming parameter-matched learned classifiers on frozen single models by 9.1, 5.2, and 6.7 points. Gradient flow through multiple frozen model boundaries is empirically verified to be tractable, and the output node develops selective routing behavior across layer-2 nodes without explicit supervision.