Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?
2026-03-26 • Artificial Intelligence
Artificial IntelligenceHardware ArchitectureMachine Learning
AI summaryⓘ
The authors studied how general coding agents, without special training for hardware, can improve hardware designs from high-level code. They created a two-step system: first, breaking designs into smaller parts and optimizing them separately; second, using multiple expert agents to find better overall improvements by combining and tweaking these parts. Tested on various benchmarks, their system achieved big speed boosts, sometimes over 20 times faster, and found well-known optimization methods on its own. The study shows that increasing the number of agents can significantly help in optimizing hardware designs automatically.
hardware designhigh-level synthesis (HLS)optimization agentsInteger Linear Programming (ILP)pragma directivesloop fusionmemory restructuringcode transformationsagent scalingbenchmarking
Authors
Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava
Abstract
We present an empirical study of how far general-purpose coding agents -- without hardware-specific training -- can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a two-stage pipeline that constructs and coordinates multiple autonomous optimization agents. In Stage~1, the pipeline decomposes a design into sub-kernels, independently optimizes each using pragma and code-level transformations, and formulates an Integer Linear Program (ILP) to assemble globally promising configurations under an area constraint. In Stage~2, it launches $N$ expert agents over the top ILP solutions, each exploring cross-function optimizations such as pragma recombination, loop fusion, and memory restructuring that are not captured by sub-kernel decomposition. We evaluate the approach on 12 kernels from HLS-Eval and Rodinia-HLS using Claude Code (Opus~4.5/4.6) with AMD Vitis HLS. Scaling from 1 to 10 agents yields a mean $8.27\times$ speedup over baseline, with larger gains on harder benchmarks: streamcluster exceeds $20\times$ and kmeans reaches approximately $10\times$. Across benchmarks, agents consistently rediscover known hardware optimization patterns without domain-specific training, and the best designs often do not originate from top-ranked ILP candidates, indicating that global optimization exposes improvements missed by sub-kernel search. These results establish agent scaling as a practical and effective axis for HLS optimization.