Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora

2026-04-10 • Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster Computing

AI summaryⓘ

The authors shared their experience running advanced math tests on Aurora, a powerful supercomputer with Intel CPUs and GPUs. They improved performance from about 0.6 to over 1 exaflop using standard precision, and even achieved over 11 exaflops by mixing precisions and using special Intel hardware features. They explain key system design choices that helped reach these speeds, such as smart resource management and strategies to avoid slowdowns. While some details are specific to Aurora, their insights may help other very large, mixed CPU-GPU systems.

exascale computingHPLmixed-precision arithmeticIntel GPUsresource mappingCPU-GPU pipeliningSlingshot-11 interconnectfault toleranceAMX accelerationheterogeneous systems

Authors

Kazushige Goto, Huda Ibeid, Kalyan Kumaran, Servesh Muralidharan, Anthony-Trung Nguyen, Aditya Nishtala

Abstract

Sustaining exascale performance in production requires engineering choices and operational practices that emerge only under real deployment constraints and demand coordination across system layers. This paper reports experience from three successive campaigns running HPL and HPL-MxP on Aurora, an Intel-based exascale system featuring the first large-scale deployment of Intel discrete GPUs, CPU-attached network interfaces, and the largest production Slingshot-11 interconnect. Aurora progressed from 0.585EF/s on 5,439 nodes to 1.01EF/s on 9,234 nodes in FP64 HPL, while HPL-MxP reached 11.64EF/s, an 11.5x speedup over FP64 enabled by mixed-precision arithmetic and Intel AMX acceleration. We identify and classify by role at production scale the system-level choices that sustained these results, including deterministic locality-aware resource mapping, explicit CPU-GPU pipelining, mixed-precision orchestration, and a hybrid P2P/collective resilience strategy introduced after synchronization stalls at scale. While some observations are Aurora-specific, the broader lessons are likely to apply to tightly coupled heterogeneous systems at extreme scale.

View PDFOpen arXiv