Scientific Machine Learning at Extreme Scale: Optimal Control for Deep Learning in High-Performance Computing

Stefanie Guenther | 21-ERD-051

Project Overview

Scientific machine learning aims to leverage the tremendous success of data-driven machine-learning models to enhance physics-based simulations. However, training the underlying neural network remains a challenging task that often involves domain experts to tune the model parameters, and is computationally very expensive, both in terms of memory requirements and training compute times. This project developed advanced neural-network architectures that leverage many-core high-performance computing systems to improve and accelerate current machine-learning models. The scheme is based on a continuous learning approach that models neural networks as ordinary differential equations whose dynamics are to be learned during training. Instead of equipping each layer of a network with a set of trainable parameters, we developed a spline-based network architecture (SpliNet) that represents trainable parameters by a set of spline-basis functions, enabling a reduction in the number of parameters that is independent of the network discretization. The resulting training demonstrates greater robustness with respect to hyper-parameters such as network architecture, parameters for the training algorithm, and network initialization. It is integrated into a layer-parallel training scheme that breaks the serial network propagation and enables model-distributed learning on high-performance computing platforms.

Mission Impact

The project combines machine learning, high-performance simulations and data analysis to improve prediction for national security applications and is thereby well aligned with the Director's Initiative for Cognitive Simulation. Developing and deploying innovative and highly parallel deep learning strategies within the LLNL's facilities reinforces LLNL's mission and position in high-performance computing, and prepares the ground for a leading institution on foundational research in scientific machine learning.

Publications, Presentations, and Patents

Guenther, S. et al., 2021. "Spline Parameterization of Neural Network Controls for Deep Learning." SIAM Journal on Mathematics of Data Science 1030758 (2021). LLNL-JRNL-819654.

Guenther, S. et al., 2021. "Simultaneous Layer-Parallel Training for Deep Residual Networks." East Coast Optimization Meeting 2021, Center for Mathematics and Artificial Intelligence, George Mason University, Fairfax, Virginia. Virtual. LLNL-PRES-820910.