High-Performance Computing Force Multiplier: Leading the Way for Extreme-Scale Converged Computing

Daniel Milroy | 22-ERD-041

Executive Summary

The goal of our project is to ensure that future high-performance computing and cloud converged workflows can more effectively use converged resources by reducing software complexity and increasing automation and performance. The innovative solutions developed during our project will support emerging, interdisciplinary, fundamental, mission-critical research that incorporates converged computing technologies.

Publications, Presentations, and Patents

Daniel Milroy and Claudia Misale, “KubeFlux: an HPC Scheduler Plugin for Kubernetes” (Presentation, KubeCon+CloudNativeCon Europe 2022, Valencia, Spain, May 18, 2022).

Daniel Milroy, “Cloud-Native HPC with Kubernetes and the Flux Framework” (Presentation,  NSDF workshop at eScience’22, Salt Lake City, UT, October 11, 2022).

Daniel Milroy and Claudia Misale, “Fluence: Approaching a Converged Computing Environment” (Presentation, Batch+HPC Day as part of KubeCon+CloudNativeCon NA 2022, Detroit, MI, October 24, 2022).

Milroy, Daniel J., Claudia Misale, Giorgis Georgakoudis, Tonia Elengikal, Abhik Sarkar, Maurizio Drocco, Tapasya Patki, et al. “One Step Closer to Converged Computing: Achieving Scalability with Cloud-Native HPC.” 2022 IEEE/ACM 4th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC), 2022. https://doi.org/10.1109/canopie-hpc56864.2022.00011.

Daniel Milroy, “Converged Computing: Bringing Together HPC and Cloud Communities” (Presentation, Session Leader, SC22 Birds-of-a-Feather Session, Dallas, TX, November 15, 2022).

Daniel Milroy, “The Flux Framework Next-Generation Hierarchical Resource Manager and Scheduler” (Presentation, Invited keynote at the EuroHPC Malleability Hackathon, Grenoble, France, January 24, 2023).

Michał Woźniak and Vanessa Sochat, “Enabling HPC and ML workloads with the latest Kubernetes Job features” (Presentation of “The Flux Operator” at KubeCon+CloudNativeCon Europe 2023, Amsterdam, Netherlands, April 21, 2023).

Daniel Milroy, “Enabling the Scientific Computing Continuum,” (Presentation at the Computing External Review Committee, Livermore, CA, April 18, 2023).

Daniel Milroy, “Minimizing the difference between HPC and cloud: convergence of communities and technologies,” (Presentation, Keynote talk of WOCC'23: The First International Workshop on Converged Computing of Cloud, HPC and Edge at ISC23, Hamburg, Germany, May 25, 2023).

Daniel Milroy, “Minimizing the difference between HPC and cloud: convergence of communities and technologies” (Presentation, Invited talk at the Laboratoire Jean Kuntzmann Seminar, Grenoble, France, June 1, 2023).

Daniel Milroy, “The Flux Framework: to El Capitan and Beyond” (Presentation, CEA-NNSA collaboration meeting, Asheville, NC, June 20, 2023).

Zeke Morton, “Flux Operator: Enabling HPC Workloads in Kubernetes” (Presentation, NLIT Summit, Milwaukee, WI, June 29, 2023).

Patki, Tapasya, Dong H. Ahn, Daniel J. Milroy, Jae-Seung Yeom, Jim Garlick, Mark Grondona, Stephen Herbein, Thomas Scogland. “Fluxion: A Scalable Graph-Based Resource Model for HPC Scheduling Challenges.” 18th Workshop on Workflows in Support of Large-Scale Science (WORKS 2023), 2023 (in press).