Scalable Multilevel Training of Large Neural Networks
Colin Ponce | 19-ERD-019
Deep neural networks (DNNs) have become increasingly popular in the past decade as their flexibility in learning complex patterns has enabled them to solve a wide array of challenging problems. One challenge with DNNs is to best ensure that they are able to generalize, that is, solve problems related to but not the same as the problems seen in training. If an understanding of the problem at hand can be built into the DNN and training, it often results in a more effective neural network. However, while a great deal of work exists on building one's problem understanding into the design of the neural network itself, comparatively little effort has gone into developing methods to import understanding into the training process.
Much data in the scientific world is hierarchical in nature, with both global and local phenomena. In this project, we developed techniques to train neural networks in a multilevel manner, enabling the learning to occur in a hierarchical fashion. We draw intuition and techniques from the field of Algebraic Multigrid, traditionally used for solving linear and nonlinear systems of equations. These techniques work by constructing a sequence of neural networks with similar designs, except that each one is a smaller version of the last. Training proceeds by doing some training work on each network while passing information back and forth between them. This is effective because it forces the training to work on the problem in a hierarchical fashion and therefore build hierarchical solutions. We find that, when applying multilevel training to scientific regression problems, in comparison to traditional training methods we are often able to achieve superior worst-case accuracy while maintaining equivalent average-case accuracy.
As a result of this work, we have produced and are currently open-sourcing a Python software package, built on top of the popular PyTorch deep learning software, which implements our methods. This will enables others at Lawrence Livermore National Laboratory and in the broader scientific community to leverage our research to produce improved DNNs. Scientifically, this means that future researchers will be able to use the same neural network designs as before, but then train them with our methods to produce DNNs that generalize better and therefore more fully solve their scientific problems. We find this to be especially true with regression (e.g. simulator approximation) problems (compared to classification problems).
Publications, Presentations, and Patents
Ponce, C. "Multilevel Methods for Neural Networks." Copper Mountain Conference, 3/25/2019.
Ponce, C. "Intralayer Multilevel Methods for Neural Networks." SIAM Conference on the Mathematics of Data Science, 6/16/2020.
Li, R. "MTNN: Multilevel Training of Neural Networks." AMG Summit, 10/21/2021.
Ponce, C. "Multilevel Regularization of Neural Networks for Regression." SIAM Conference on Computational Science and Engineering, 3/4/2021.
Li, R. "MTNN: Multilevel Training of Neural Networks." SIAM Conference on Applied Linear Algebra, 5/21/2021.
Liu, Y., Ponce, C., Brunton, S., Kutz, N. "Multiresolution Convolutional Autoencoders," arXiv, arXiv:2004.04946, 4/10/2020.