Integrated electro-optic attention nonlinearities for transformers

2026-04-10Machine Learning

Machine Learning
AI summary

The authors explore using special optical devices called thin-film lithium niobate modulators to perform the nonlinear calculations needed in Transformer models more quickly. These devices serve as analog versions of Softmax and Sigmoid functions, which are key steps in processing information. Even when using low-precision inputs, their approach keeps the models nearly as accurate as usual. They also analyze how noise affects the system at very high speeds. Their work shows that combining optical hardware with digital processors could make Transformers faster and more energy-efficient.

TransformersSoftmax functionThin-film lithium niobateMach-Zehnder modulatorAnalog computationVision TransformersLarge Language ModelsQuantizationInference latencyElectro-optic devices
Authors
Luis Mickeler, Kai Lion, Alfonso Nardi, Jost Kellner, Pierre Didier, Bhavin J. Shastri, Niao He, Rachel Grange
Abstract
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog units. We further characterize system noise at encoding speeds up to 10 GBaud and assess model robustness under various noise conditions. Our findings suggest that TFLN modulators can serve as nonlinear function units within hybrid co-packaged hardware, enabling high-speed and energy-efficient nonlinear computation.