PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer
2026-04-07 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors introduce Polynomial Mixer (PoM), a new method to mix information from different parts of a sequence that is simpler and faster than the usual self-attention method used in transformers. PoM uses a learned polynomial function to combine tokens into a smaller summary, letting each token get relevant context efficiently. They prove that PoM keeps the model's ability to understand and generate sequences just like traditional attention does. Tested on various tasks like text and image generation, their method matches old approaches in accuracy but uses less computing power, especially for long data sequences.
transformersself-attentiontoken mixingpolynomial functionsequence-to-sequencecomputational complexitytext generationimage generationcontextual mapping
Authors
David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta López-Rauhut, Thibaut Loiseau, Ségolène Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu
Abstract
This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.