Prediction of Runtime Parameters of Parallel Chemistry Applications via Active and Generative Learning

2026-06-15Machine Learning

Machine Learning
AI summary

The authors created two machine learning methods to predict how long complex chemistry simulations will take when run on many computers. They used advanced techniques like active learning and generative learning combined with a specific type of model called gradient boosted regression trees. Their models were very accurate, with very low error rates when tested on a type of chemistry calculation called Coupled-Cluster with Singles and Doubles. They also showed that even with less training data, their models still performed well by using active learning. This helps make predicting runtime more efficient when there isn’t a lot of data available.

Machine LearningActive LearningGenerative LearningGradient Boosted Regression TreesParallel ComputingCoupled-ClusterRuntime PredictionRegressionError MetricsHigh Performance Computing
Authors
Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P Sadayappan, Karol Kowalski
Abstract
In this work, we develop two main Machine Learning based approaches to predict the runtime parameters of highly scalable parallel chemistry computations.These approaches employ active and generative learning together with the empirically determined gradient boosted regression tree models chosen among a rich suite of machine learning models. When evaluated on Coupled-Cluster with Singles and Doubles computations, our models achieve a mean absolute error percentage (MAPE) as low as 0.023 and a coefficient of determination as high as 99.9%. Furthermore, when combined with active learning to mitigate the lack of large amounts of training data, our models score a MAPE about 0.2 with 20-25% of the original dataset.