Implementing True MPI Sessions and Evaluating MPI Initialization Scalability

2026-05-05Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster Computing
AI summary

The authors explain that MPI Sessions are a new way to organize processes in parallel computing, designed to overcome limits of the old MPI_COMM_WORLD method. Existing systems like MPICH struggled to fully adopt Sessions because they were deeply built around MPI_COMM_WORLD. The authors describe how they restructured MPICH to support true MPI Sessions, which better separate the system from MPI_COMM_WORLD. Their work shows that this new design improves scalability, especially for very large computing systems.

MPIMPI_COMM_WORLDMPI SessionsMPICHparallel computingcommunicatorsscalabilityexascale systemsprocess setshierarchical design
Authors
Hui Zhou, Kenneth Raffenetti, Yanfei Guo, Michael Wilkins, Rajeev Thakur
Abstract
Sessions is one of the major features introduced in the MPI-4 standard. It offers an alternative to the traditional world communicator model by allowing applications to construct communicators from process sets, thereby eliminating the dependency on MPI_COMM_WORLD. The Sessions model was proposed as a more scalable solution for exascale systems, where MPI_COMM_WORLD was viewed as a potential scalability bottleneck. However, supporting Sessions is a significant challenge for established codebases like MPICH due to the deep integration of the world model in traditional MPI implementations. Although MPICH added support for the MPI-4 standard upon its release, it still internally relied on a global world communicator. This approach enabled applications written using the Sessions model to function, but it did not fulfill the full design intent of Sessions, which meant to decouple MPI from MPI_COMM_WORLD. We describe MPICH effort to support true MPI Sessions, including a major internal refactoring. We describe the architectural changes required to support true Sessions and evaluate the resulting implementation scalability. Our results demonstrate that true Sessions can offer significant scalability benefits by adopting explicit hierarchical designs.