Leveraging Machine Learning Hardware in Time Integration Algorithms

Cody Balos | 23-FS-013

Project Overview

LLNL invests significantly in supercomputers and software to run simulations crosscutting our mission areas. As such, it is critical that the machines and codes efficiently and effectively utilize all hardware capabilities available to achieve a strong return-on-investment. The rise of machine learning (ML) poses a challenge to traditional simulations in this regard as it is pushing industry to produce specialized hardware units for ML. For example, the NVIDIA graphics processing units (GPUs) that power Sierra at LLNL include Tensor Cores geared towards ML operations and the Cerebras CS-2 "wafer-scale" processor (a single chip), recently attached to Lassen, is similarly targeted at ML workloads and can achieve nearly 1 petaflop of performance. In this work, we examined the feasibility of leveraging ML hardware in time evolution (integration) algorithms for ordinary differential equations which are critical to many simulations relevant to stockpile stewardship, nuclear deterrence, as well as climate and energy resilience.

To mitigate the challenges associated with programming for various cutting-edge hardware targets, we took a predominantly analytic approach to understand how time integration methods can best utilize the unique memory architectures, low/mixed precession, and specialized parallelization associated with ML hardware and determine the most promising algorithms. For stiff problems where building an effective preconditioner is challenging, we have developed new approaches to allow for the use of mixed precision in exponential time integration methods while still reaching the target accuracy and improving performance by up to ~2x. We found that ML surrogate method approaches, while the most natural way to target ML hardware like the Cerebras, raise several complex issues such as robustly computing derivatives without automatic differentiation support and how to organize the communication from the high performance computing (HPC) cluster nodes to the ML hardware in an efficient manner. We found spectral deferred correction (SDC) and waveform relaxation methods to be less promising than the previously mentioned approaches. Our results lead us to the conclusion that ML hardware could be leveraged by some time integration algorithms to accelerate simulations. However, our results simultaneously demonstrate that effectively using the ML hardware is also highly dependent on other factors such as the choice of spatial discretization when the ordinary differential equations (ODEs) arise from partial differential equations as well as problem characteristics like stiffness, size, and the amount of coupling between variables in the ODE system. Overall, our results provide foundations for future development of advanced time integration methods that can use ML hardware.

Mission Impact

Our research has resulted in new approaches to using mixed precision in exponential time integration methods that could be tried immediately in codes at Lawrence Livermore National Laboratory relevant to the DOE/NNSA stockpile stewardship, nuclear deterrence, as well as climate and energy resilience missions. Our research has also laid a foundation for future research and development of advanced time evolution methods and can serve as a data point for decision makers to reference in the hardware procurement process.

Publications, Presentations, and Patents

Balos, Cody J., Steven Roberts, and David J. Gardner. "Leveraging Mixed Precision in Exponential Time Integration Methods." Outstanding Paper Award, IEEE High Performance Extreme Computing Conference (HPEC). In press, IEEE. 2023.

Cody J. Balos, Steven Roberts, and David J. Gardner, "Leveraging Mixed Precision in Exponential Time Integration Methods" (Paper presentation, Outstanding Paper Award, IEEE High Performance Extreme Computing Conference -HPEC, Virtually, September 2023).