Embedded Arena: Iterative Optimization via Hardware Feedback

2026-06-15 • Hardware Architecture

Hardware ArchitectureArtificial Intelligence

AI summaryⓘ

The authors explore using a large language model (LLM) agent to automatically optimize AI models for tiny devices like wildlife trackers and wearable sensors, which have strict limits on memory, power, and temperature. Their method involves a 'hardware-in-the-loop' system where the agent tests and adjusts the model directly on real devices to get feedback and improve it iteratively. Unlike other models that failed without this feedback, their approach quickly succeeded and even outperformed human experts. They achieved huge reductions in model size with small drops in accuracy, enabling low-power operation including solar-powered devices. They tested this on practical tools like an elk-detection camera and a speech transcription wearable for child development studies.

Embedded AIMicrocontrollers (MCUs)Hardware-in-the-loopModel compressionEnergy efficiencyLarge language models (LLMs)Edge computingFirmware optimizationVision modelsFeature Error Rate

Authors

Zhihan Zhang, Alexander Le Metzger, Jiuyang Lyu, Chun-Cheng Chang, Jiayi Shao, Yujia Liu, Emmanuel Azuh Mensah, Edward Wang, Kurtis Heimerl, Gregory D. Abowd, Shwetak Patel, Natasha Jaques, Vikram Iyer

Abstract

Embedded devices from wildlife monitoring stations to clinical wearables require local AI inference due to latency, communication, or privacy constraints. Optimizing models for heterogeneous microcontrollers (MCUs) requires simultaneously satisfying hard physical constraints on memory, power, and temperature while preserving accuracy, a multidimensional optimization that is today performed manually by experts. We ask whether an LLM agent can autonomously navigate this complex, multi-turn pipeline guided by real hardware feedback, and introduce a hardware-in-the-loop agent arena in which the agent iteratively refines both model and firmware -- compiling, flashing, and measuring on real hardware -- to enable closed-loop optimization. Frontier models, including Claude Opus 4.7 and Gemini 3.1 Pro, fail entirely without hardware feedback (0% deployment success), whereas our hardware-in-the-loop formulation achieves the first successful deployment within three iterations and can surpass human expert results within seven. This agentic co-optimization achieves 250x compression for vision models with <3.3% accuracy loss and 400x for audio with <6% Feature Error Rate loss, enabling battery-free operation on a commercial MCU via solar harvesting. We demonstrate practical impact in two real-world systems: an elk-detection camera trap (96.7% accuracy) and a phonetic-transcription wearable (8.44% FER) for child development research.

View PDFOpen arXiv