Automated Software Integration
George Gamblin | 21-SI-005
Project Overview
Increasingly diverse mission needs, the emergence of artificial intelligence (AI) and cloud, and increasing hardware diversity are driving high performance computing (HPC) software to be more complex. Modern codes are built from hundreds of small, complex components, and much of the software development process involves integrating these components rather than developing new components from scratch. The goal of the Binary Understanding and Integration Logic for Dependencies (BUILD) project was to ease the task of software integration for developers across Lawrence Livermore National Laboratory's (LLNL) programs. The project focused on (1) modeling software compatibility, (2) modeling application binary interface (ABI) compatibility with binary analysis, (3) developing solver techniques to reason about compatibility, and (4) developing machine learning (ML) models to fill gaps in our understanding of software compatibility. The project has developed several key technologies that help developers—by accelerating development workflows, removing the need for rebuilds, and enabling faster, automatic, and less error-prone code sharing. These technologies are used in LLNL codes and will be ready for the new El Capitan Exascale system in Livermore Computing. Many of these technologies have been hardened and integrated with LLNL's Spack package manager, and they are already in use by production code teams. Results of BUILD have laid the groundwork for future advances in software integration—the ML and binary modification techniques developed in this project still need to be operationalized but have great potential to further speed up software integration.
Mission Impact
The capabilities developed under BUILD are aimed at enabling LLNL code teams to work effectively and to extract the most performance possible out of our next-generation graphics processing unit (GPU) machine, El Capitan. El Capitan will be used to maintain and enhance the safety, security, and effectiveness of the U.S. nuclear weapons stockpile. Generalized modeling techniques developed under this project enabled us to formalize the model for the AMD GPU libraries used on El Capitan, and they have been critical for code teams porting their software to the machine. The Shrinkwrap tool we developed here has been integrated in LLNL's Spack package manager and enables code teams to assemble binaries that will run reliably deterministically on the El Capitan machine. The binary reuse and solver techniques described here have all been integrated in Spack and have allowed teams to easily reuse guaranteed-compatible binaries from build caches, file system caches, and from existing Spack installations. This has accelerated developer workflows and avoided costly unnecessary rebuilds on our machines. In addition to benefits for simulation codes, we have worked with the Livermore Computing (Tri-lab Computing Environment) TCE2 group, and the solver techniques developed on this project have accelerated TCE2 build times by hours and in some cases days. The capabilities in Spack have encouraged vendors, like AMD, to ensure that their software is continuously maintained in Spack for their customers. This benefits our code teams as it ensures that the AMD packages they depend on are well-maintained—without expending their own effort. Finally, our work on ABI compatibility and on modeling package compatibility will accelerate porting efforts to El Capitan and its successors. In all, the BUILD SI has made great strides for effective, automatic software integration.
Publications, Presentations, and Patents
Todd Gamblin,"Beyond Version Solving: Implementing general package solvers with Answer Set Programming" (Presentation,PackagingCon, Virtual, November 9, 2021).
Harshitha Menon, Konstantinos Parasyris, Tom Scogland, and Todd Gamblin, "Searching for High-Fidelity Builds Using Active Learning" (Presentation, Mining Software Repositories Conference -MSR'22, Pittsburgh, PA, May 18-24 2022). LLNL-CONF-831078.
Todd Gamblin, Massimiliano Culpo, Gregory Becker, and Sergei Shudler, "Using Answer Set Programming for HPC Dependency Solving" (Presentation, Supercomputing 2022 -SC'22 - Dallas, Texas, November 13-18 2022). LLNL-CONF-839332.
Farid Zakaria, Thomas R. W. Scogland, Todd Gamblin, and Carlos Maltzhan, "Mapping Out the HPC Dependency Chaos" (Presentation, Supercomputing 2022 -SC'22, Dallas, Texas, November 13-18 2022). LLNL-CONF-840119.
Jeter, T.R., M. J. Bobbitt, and B. L. Rountree, "SpackNVD: A Vulnerability Audit Tool for Spack Packages." 2022 IEEE/ACM First International Workshop on Cyber Security in High Performance Computing (S-HPC), Dallas, TX, USA, 2022, pp. 9-17, doi: 10.1109/S-HPC56715.2022.00007.
Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell, Massimiliano Culpo, and Todd Gamblin, "Flexible and optimal dependency management via max-SMT" I(Presentation, IEEE/ACM International Conference on Software Engineering, 2023, Melbourne, Australia, May 2023).
Todd Gamblin and Massimiliano Culpo, "Optimizing Dependency Solves in Spack" (Presentation, PackagingCon, Berlin, Germany, October 26, 2023).
Daniel Nichols. Probabilistic Package Builds: Guiding Spack's Concretizer with Predicted Build Outcomes" (Presentation at PackagingCon, Berlin, Germany, October 26, 2023).
Harshitha Menon, "Learning to Predict and Improve Build Successes in Package Ecosystems" (Presentation, PackagingCon, Berlin, Germany, October 27, 2023).
Gregory Becker, "Explainability in Spack concretization" (Presentation, PackagingCon, Berlin, Germany, October 26, 2023).