Dependency Resolution for High-Performance Computing Software

George Todd Gamblin | 18-ERD-051

Overview

To take advantage of high-performance computing (HPC) machines, developers must leverage an increasing number of software libraries (or dependencies) on several different hardware platforms. Some of the largest HPC applications require hundreds of dependencies. Assembling these libraries correctly and with optimal performance requires careful selection of versions, configuration options, compilers, flags, and other parameters.

The Spack package manager (Spack 2018, Office of Advanced Scientific Computing Research 2016, Becker et al. 2016, Gamblin et al. 2015) was developed at Lawrence Livermore National Laboratory to address the configuration-selection problem. Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, and many supercomputers. It uses a greedy algorithm (to find a localized optimum solution) with a few heuristics to select good configurations. However, the configuration-selection problem is NP-complete (i.e., a problem whose status is unknown), and new algorithms are needed to handle the more complex configurations needed for emerging supercomputers.

We investigated the use of Boolean satisfiability (SAT), constraint satisfaction programming (CSP), and answer-set programming (ASP) solvers in Spack, so that Spack users can more naturally express constraints and more efficiently find good package configurations; they enable Spack to model complex interactions between C++, OpenMP, and other language runtimes, and to better handle optimized binary packages. Together, these benefits will allow computational scientists to focus on their domain problems without fighting the complexity of the HPC environment.

Background and Research Objectives

Spack manages dependency relationships between software packages. It ensures that any libraries and programs that a package relies on are configured, built, and installed correctly before the package itself is installed. Packages can have different versions and configuration options, and different versions of one package may depend on specific versions or configurations of another. Finding a valid set of package versions in this model is an NP-complete problem (i.e., a problem whose status is unknown), but Spack currently uses a greedy algorithm to find a localized optimum solution and solve for configuration versions. Because of this, in some instances, Spack may decide that certain package configurations are impossible when a valid configuration does in fact exist.

This project proceeded in two directions: First, we created models for the concepts that Spack’s dependency solver did not currently handle. Second, we looked at these models and investigated how their concepts could be represented in SAT, CSP, and ASP solvers. These solvers use sophisticated algorithms with backtracking, clause-learning, and other techniques to accelerate the search through possible configurations. However, encoding constraint problems for these solvers can be its own challenge.

In current versions of Spack, compiler runtime libraries and compiler capabilities are not first-class entities in the dependency solver. Rather, users of Spack or packagers must know the compatibility of specific software packages with specific compilers. Runtime libraries are not modeled, and it is currently possible to create binaries with Spack that do not link in the proper language runtime for newer versions of C++. We created a model where compilers are build dependencies, thus allowing different compilers to be used with different packages in a single build.

In addition to compilers, we looked at constraints imposed on the dependency graph by cross-compilation. In particular, in a cross-compiled environment, a package that is both a build and a link dependency must be built twice: once for the build host and once for the run host. This is another case where we must augment Spack’s model to create new nodes in the dependency graph. We also looked at special cases such as build dependencies that generate code to be inserted in libraries intended for the run environment. When these packages generate code in an interpreted language, they may not impose binary constraints on the run environment, but they constrain the version of the interpreter to be compatible between build and run environments.

While it is easy to annotate packages with information about these types of constraints, it is not possible to represent these types of constraints in Spack’s current dependency solver. The current solver relies strictly on direct dependency relationships to make incremental decisions and limits the number of versions of a package in a single build to one. Moreover, compilers are modeled as attributes of package nodes rather than as build dependencies.

To solve these issues, we attempted to instantiate our additions to Spack’s dependency model in SAT, CSP, and ASP solvers. In our experiments, the ASP versions of Spack models were much easier to construct and much more expressive than SAT versions. We also investigated CSP solvers, but we found the predicate functions to be slow compared to the alternatives. In all cases, the ability to express constraints not only on immediate dependencies, but on the entire problem helped us handle some of Spack’s more complex use cases.

Impact on Mission

Our research supports the advancement of the science, technology, and engineering competencies that are the foundation of the NNSA mission. It also supports the Laboratory's core competencies in HPC, simulation, and data science.

Spack is important to several Lawrence Livermore National Laboratory programs. It is the software release management technology for the DOE Exascale Computing Project, and it is used at HPC centers worldwide. The Laboratory needs these additions to Spack to fully exploit the computers our applications run on. The broader HPC community needs a better Spack dependency model to improve developer productivity and to reduce the complexity of specifying builds for cross-compiled architectures. The techniques we investigated enable Spack to continue to sustain its already rapid growth, and this research enables rapid integration of HPC software across advanced architectures and multiple modern software stacks.

Conclusion

During this project, we developed new dependency models for Spack and explored how to map them to SAT, CSP, and ASP solvers. The new dependency models support compiler and cross-compilation paradigms needed for exascale HPC, and our prototype solvers have shown that ASP is likely the best paradigm for representing Spack’s dependency models. We are working toward a complete implementation of Spack’s solver in ASP, and we will be investigating how best to harden and integrate this solver into Spack itself.

References

Becker, G., et al. 2016. "Managing Combinatorial Software Installations with Spack." Second International Workshop on HPC User Support Tools (HUST16), Salt Lake City, UT, November 2016. LLNL-CONF-701389.

Gamblin, T. 2018. "Binary Packaging for HPC with Spack." Free and Open Source Software Developers’ European Meeting (FOSDEM18), Brussels, Belgium, February 2018. LLNL-PRES-745747.

——— . 2018. "How Compilers Affect Dependency Resolution in Spack." Free and Open Source Software Developers’ European Meeting (FOSDEM18), Brussels, Belgium, February 2018. LLNL-PRES-745770.

Gamblin, T., et al. 2015. "The Spack Package Manager: Bringing Order to HPC Software Chaos." Supercomputing 2015 (SC15), Austin, Texas, November 2015. LLNL-CONF-669890.

Office of Advanced Scientific Computing Research, 2016. "Packaging a Wallop: Lawrence Livermore National Laboratory’s Time-Saving HPC Tool Eases the Way for Next Era of Scientific Simulations." ASCR Discovery, August 2016.

Spack: GitHub (web-based code repository; accessed 13 December 2018), https://github.com/spack/spack .

Publications and Presentations

Gamblin, T. 2018. "Binary Packaging for HPC with Spack." Free and Open Source Software Developers’ European Meeting (FOSDEM18), Brussels, Belgium, February 2018. LLNL-PRES-745747.

——— . 2018. "How Compilers Affect Dependency Resolution in Spack." Free and Open Source Software Developers’ European Meeting (FOSDEM18), Brussels, Belgium, February 2018. LLNL-PRES-745770.