CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees

2026-06-01Hardware Architecture

Hardware Architecture
AI summary

The authors present Chimera, a special low-power microcontroller designed to quickly run transformer-based AI models, like those used in language and vision tasks, on small devices. It uses a tight combination of a transformer accelerator and multiple general-purpose processor cores to improve performance. They introduce a new memory system that lets different parts of the chip share data efficiently, reducing the time needed for important tasks by up to 16 times. Compared to current technologies, Chimera is more energy efficient and takes up less space, while still matching or beating others in speed and power use.

Microcontroller Unit (MCU)Transformer modelsReal-time inferenceLow-power edge computing22 nm FDX technologyRV32IMA coresMemory hierarchyL2 memory subsystemQuality of Service (QoS)Energy efficiency
Authors
Lorenzo Leone, Philip Wiese, Gamze İslamoğlu, Michael Rogenmoser, Davide Rossi, Francesco Conti, Luca Benini
Abstract
We present Chimera, a flexible and scalable Microcontroller Unit (MCU) designed to accelerate real-time inference of rapidly evolving transformer-based models at the ultra-low-power edge (hundred of mW). The chip, implemented in 22 nm FDX technology, integrates a transformer accelerator tightly coupled within a compute cluster featuring nine general-purpose RV32IMA cores. Scalability extends to the memory hierarchy through a novel L2 memory island subsystem, which enables data sharing across multiple clusters while delivering 563 Gb/s aggregate bandwidth. The L2 subsystem enforces quality-of-service guarantees for latency-critical traffic, achieving up to 16x latency reduction. Chimera achieves peak energy and area efficiencies of 3.1 TOPS/W and 281 GOPS/mm2, demonstrating 1.37x higher energy efficiency and up to 100x higher area efficiency compared to State of the Art (SoA) SoCs. Compared to SoA standalone accelerators, Chimera achieves comparable energy efficiency and up to 1.8x higher area efficiency.