Hypothesis Testing via Artificial Intelligence: Generating Physically Interpretable Models of Scientific Data with Machine Learning

Brenden Petersen | 19-DR-003

Project Overview

Deep learning has demonstrated an exceptional ability to solve complex tasks (an engineering success); however, it has done so at the expense of the ability to generate new knowledge (a scientific failure). We propose an alternative framework—entitled Deep Symbolic Regression (DSR)—in which artificial neural networks (NNs) rapidly generate hypotheses about physical relationships among inputs. This framework bypasses the need to interpret an NN altogether, while still leveraging the representational power of deep learning. The resulting models are tractable mathematical expressions, which are inherently and readily human-interpretable and can provide insights into underlying physical phenomena. Further, we fold this methodology into the scientific process by allowing the scientist to directly integrate a priori knowledge and beliefs to accelerate learning. We demonstrate this methodology on symbolic regression—the problem of rediscovering underlying expressions describing a dataset—and achieve state-of-the-art performance across a wide variety of symbolic regression problems. Further, we generalize our DSR framework to apply to the more general class of symbolic optimization problems, in which one seeks to optimize a sequence of symbols or "tokens" under a black-box reward function. Examples of other symbolic optimization problems include neural architecture search and computational antibody design. Our generalized tool, Deep Symbolic Optimization (DSO), has been demonstrated on the task of learning symbolic control policies for reinforcement learning environments, and has been adopted as an enabling capability for computational antibody design.

Mission Impact

At the heart of many mission-critical areas of LLNL is a symbolic optimization problem. DSO, the technology proposed and developed in this LDRD, may provide significant value to any of these areas. A prime example is the use of DSO in computational antibody design, a rapidly growing mission focus area supporting LLNL's biosecurity mission. Specifically, DSO is now being used to support the Generative Unconstrained Intelligent Drug Engineering (GUIDE) program to search the space of mutant antibody sequences to improve critical quality attributes like binding affinity to a particular pathogen. The extreme success of DSO in the publishing arena has also led to a tangible increase in interest of external candidates who want to join LLNL in part to work with the DSO team.

Publications, Presentations, and Patents

Kim, Joanne T., Sookyung Kim, and Brenden K. Petersen. "An Interactive Visualization Platform for Deep Symbolic Regression." In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 5261-5263. Jan 2021.

Kim, Joanne T., Mikel Landajuela, and Brenden K. Petersen. "Distilling Wikipedia Mathematical Knowledge into Neural Network Models." In ICLR 2021 Workshop on the Role of Mathematical Reasoning on General Artificial Intelligence. May 2021.

Landajuela, Mikel, Brenden K. Petersen, Soo K. Kim, Claudio P. Santiago, Ruben Glatt, T. Nathan Mundhenk, Jacob F. Pettit, and Daniel M. Faissol. "Improving Exploration in Policy Gradient Search: Application to Symbolic Optimization." In ICLR 2021 Workshop on the Role of Mathematical Reasoning on General Artificial Intelligence. Virtual. May 2021.

Petersen, Brenden K., Mikel Landajuela Larma, Terrell N. Mundhenk, Claudio Prata Santiago, Soo Kyung Kim, and Joanne Taery Kim. "Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients." In International Conference on Learning Representations. May 2021.

Pettit, Jacob F., Brenden K. Petersen, Chase Cockrell, Dale B. Larie, Felipe Leno Silva, Gary An, and Daniel M. Faissol. "Learning Sparse Symbolic Policies for Sepsis Treatment." In ICML 2021 2nd Workshop on Interpretable ML in Healthcare. July 2021.

Petersen, Brenden K., Claudio Santiago, and Mikel Landajuela. "Incorporating Domain Knowledge into Neural-guided Search via in situ Priors and Constraints." In 8th ICML Workshop on Automated Machine Learning (AutoML). Virtual, July 2021.

Landajuela, Mikel, Brenden K. Petersen, Sookyung Kim, Claudio P. Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F. Pettit, and Daniel Faissol. "Discovering Symbolic Policies with Deep Reinforcement Learning." In International Conference on Machine Learning, pp. 5979-5989. PMLR, Virtual. 2021.

Mundhenk, T. Nathan, Mikel Landajuela, Ruben Glatt, Claudio P. Santiago, Daniel M. Faissol, and Brenden K. Petersen. "Symbolic Regression via Neural-Guided Genetic Programming Population Seeding." In Advances in Neural Information Processing Systems 34. December 6-14, 2021.

Landajuela, Mikel, Chak Shing Lee, Jiachen Yang, Ruben Glatt, Claudio P. Santiago, T. Nathan Mundhenk, Ignacio Aravena, Garrett Mulcahy, Brenden K. Petersen. "A Unified Framework for Deep Symbolic Regression." In Advances in Neural Information Processing Systems 35. 2022. Accepted.

1st place award in SRBench Competition @ Genetic and Evolutionary Computation Conference 2022, Interpretable Symbolic Regression for Data Science, Real World Track.

Petersen, Brenden K. "Deep Symbolic Regression." LLNL Data Science Challenge Summer 2020, Virtual.

Petersen, Brenden K. "Deep Symbolic Regression." LLNL Data Science Challenge Summer 2021. Virtual.

Petersen, Brenden K. "Deep Symbolic Regression." LLNL Lab Research Tech Exchange 2021, Virtual.

Landajuela, Mikel. "Deep Symbolic Regression." CED Technical Forum, Virtual, March 18, 2021.

Petersen, Brenden K. "Deep Symbolic Optimization." LLNL Data Science Institute Seminar, Virtual, April 22, 2021.

Landajuela, Mikel. "Deep Symbolic Optimization: A framework for symbolic optimization using deep learning." Center for Advanced Signal and Image Science (CASIS) 25th Annual Workshop, Virtual. August 4, 2021.

Landajuela, Mikel. "Deep Symbolic Optimization." LLNL Machine Learning Reading Group, Virtual, September 28, 2021.

Petersen, Brenden K. "Unified Deep Symbolic Regression." SRBench GECCO 2022 Competition Seminar, Virtual. September 14, 2022.

Landajuela, Mikel. "Deep Symbolic Optimization." IEEE Computer Society in the San Francisco/Oakland-East Bay Joint Chapter, Virtual. October 19, 2022.

Lawrence Livermore National Laboratory

| 7000 East Avenue • Livermore, CA 94550 | LLNL-WEB-846698

Operated by the Lawrence Livermore National Security, LLC for the Department of Energy's National Nuclear Security Administration Learn about the Department of Energy's Vulnerability Disclosure Program