Enhancing Energy Efficiency in Scientific Workflows through CFD based PIVAEs

2026-05-22 • Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster Computing

AI summaryⓘ

The authors created a new AI-based system to help computers in high-performance computing (HPC) use less energy without slowing down too much. They combined physics simulations of cooling and heat with a special AI model to better understand and predict how different tasks use resources. By grouping similar tasks and trying different ways to schedule them, they found a balance where cutting CPU speed a bit saves significant energy with only a small delay. Their work shows how mixing physics knowledge with AI can improve scheduling in big computing systems to be more energy-efficient.

High Performance Computing (HPC)Energy ConsumptionScheduling StrategiesComputational Fluid Dynamics (CFD)Physics-Informed Variational Autoencoder (PIVAE)Workload CategorizationLocality Aware SchedulingSpeculative Aware SchedulingResource UtilizationTurnaround Time

Authors

Ali Zahir, Ashiq Anjum, Mark Wilkinson, Jeyan Thiyagalingam

Abstract

The growing complexity and scale of scientific workflows in high performance computing (HPC) environments have led to significant challenges in managing energy consumption without compromising computational performance. Traditional scheduling strategies often fail to account for the complex interplay between thermal dynamics, workload diversity, and system scalability, leading to inefficient and unsustainable energy usage. This paper introduces a novel, scalable, and AI-assisted scheduling framework for optimizing energy consumption in HPC environments without compromising performance. Central to our approach is the integration of Computational Fluid Dynamics (CFD) with a Physics-Informed Variational Autoencoder (PIVAE), enabling the generation of physically realistic synthetic workload data that bridges the gap between thermodynamic behavior and scheduler decision-making in complex, multi-scale HPC environments. By categorizing workflows based on resource utilization profiles, we evaluate multiple scheduling strategies such as Locality Aware and Speculative Aware Scheduling. These workflows, ranging from event reconstruction to anomaly detection, represent diverse computational intensities. Our results show that modest reductions in CPU performance (e.g., to 15%) can yield substantial energy savings (up to 10%) with only minor turnaround time increases (approximately 5-6%), identifying an optimal operational sweet spot. This work demonstrates how physics-informed generative modeling can enable adaptive, sustainable, and data-efficient scheduling for next-generation HPC infrastructures.

View PDFOpen arXiv