AI summaryⓘ
The authors present FBench, a tool that helps understand and test how different input/output (I/O) setups affect the performance of large-scale computing applications without running the full applications each time. FBench uses data from recorded I/O activities to create simplified tests or replay patterns quickly, supporting common programming interfaces. Their tests show FBench accurately mimics real I/O behaviors and reveals interesting performance differences caused by file system choices and optimization settings. For example, they found that collective I/O can sometimes slow down writing compared to independent I/O, and using a burst buffer can improve speed without extra effort. FBench also helps users try out new I/O configurations quickly, potentially speeding up improvements by a lot.
I/O performanceHigh-Performance Computing (HPC)FBenchContext-Free Grammars (CFGs)POSIXMPI-IOLustre file systemCollective I/OBurst bufferBenchmarking
Authors
Zhaobin Zhu, Chen Wang, Kathryn Mohror, Sarah Neuwirth
Abstract
The I/O performance of large-scale HPC applications depends on a complex interplay of access patterns, middleware optimizations, and file system configurations. To systematically explore these effects without repeatedly rerunning full applications, we introduce FBench, a flexible and code-transparent benchmarking tool for what-if analysis and I/O performance exploration. FBench leverages context-free grammars (CFGs) derived from Recorder traces to either generate simplified global configuration files for benchmark execution or replay I/O patterns on-the-fly without additional preprocessing. It supports both POSIX and MPI-IO interfaces and allows users to inject optimization hints via JSON configuration files, enabling rapid experimentation with I/O settings without code changes. Our evaluation shows that FBench accurately reproduces I/O behavior for both synthetic and real workloads, capturing access patterns and performance trends across diverse optimizations and file system settings. For IOR and HACC-IO, FBench closely matches scaling behavior and sensitivity to Lustre striping parameters. For FLASH Sedov, it reveals that collective I/O on Lustre can yield up to 30x lower write bandwidth than independent I/O, largely independent of striping, and that switching to a burst buffer file system increases non-collective write bandwidth by about 1.5x without additional tuning. The evaluation with LAMMPS shows that FBench can significantly reduce the time required for what-if analyses and, with simple tuning, enable improvements of up to 8x.