Achieving Peak Performance of High-Performance Computing Applications by Optimizing Parallelism Compilation

Giorgis Georgakoudis | 21-ERD-018

Project Overview

This report summarizes the activities and accomplishments on improving the performance of parallel high performance computing (HPC) applications through parallelism-aware compiler optimization. The motivating problem is that parallel HPC applications have sub-optimal performance because compilers, such as Clang/LLVM (and vendor derivatives) are oblivious to parallel execution semantics, hence fail to apply standard, existing standard optimizations on parallel code and also lack the capability for parallelism-specific compiler optimizations. To address this problem, we pursued three integrated thrusts in our approach: (1) develop new parallelism-aware compiler analyses to expose the semantics of parallel execution to compilation; (2) research and develop new parallelism-aware compiler optimizations targeting parallel graphics processing unit (GPU) execution; and (3) investigate new compiler-assisted techniques for performance portability and autotuning.

We based our development on the production-level Clang/LLVM compiler and networked with its developer community to upstream our implementations to its open-source version, making them available to the HPC community. Research highlights include developing parallelism-aware compiler analyses and optimizations for portable OpenMP GPU offloading that close the gap or even improve performance compared to non-portable GPU native programming models, prototyping an automated compiler translation method for performance-portable execution of Compute Unified Device Architecture (CUDA) codes to Advanced Micro Devices (AMD) GPUs through OpenMP, and proposing extensions to OpenMP for runtime autotuning using machine learning. The project produced numerous publications and presentations in significant venues and more than 50 commits in the LLVM upstream repository implementing parallelism-aware compilation and optimization (available upstream since LLVM version 13).

Mission Impact

HPC systems are critical infrastructure where advanced simulations are running to safeguard the NNSA stockpile, address DOE's energy and environmental security missions, and enhance the understanding of national security challenges to design effective responses. Executing those applications at the highest possible performance is paramount for efficacy and competitiveness. 

This project researched and developed fundamental computer science methods to improve the performance of HPC applications through parallelism-aware compiler optimization. Project developments are implemented and upstreamed on the production-level LLVM compiler to directly benefit DOE HPC mission applications. Further, automating application optimization through the compiler supports performance portability without incurring maintenance and development costs, otherwise required to optimize every single application manually. The project disseminated research findings through numerous publications and presentations in top-tier venues to strengthen LLNL's leadership in this area of research. During the course of the project, we formed and maintained strong collaborations with other national laboratories (Argonne National Laboratory, Brookhaven National Laboratory, Oak Ridge National Laboratory), industry (AMD, Intel), academia (Technical University Dortmund), and the LLVM development community. Those collaborations enriched the research environment and transferred substantial expertise to LLNL.

Publications, Presentations, and Patents

Doerfert, Johannes, Marc Jasper, Joseph Huber, Khaled Abdelaal, Giorgis Georgakoudis, Thomas Scogland, and Konstantinos Parasyris. "Breaking the Vendor Lock: Performance Portable Programming through OpenMP as Target Independent Runtime Layer." In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 494-504. PACT '22. Chicago, Illinois: Association for Computing Machinery, 2023. LLNL-CONF-834862, ISBN: 9781450398688,

Doerfert, Johannes, Atemn Patel, Joseph Huber, Shilei Tian, Jose M Monsalve Diaz, Barbara Chapman, and Giorgis Georgakoudis. "Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero Overhead Execution." In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 504-514. 2022. LLNL-CONF-826728, 10.1109/IPDPS53621.2022.00055.

Georgakoudis, Giorgis, Thomas R. W. Scogland, Chunhua Liao, and Bronis R. de Supinski. "Extending OpenMP to Support Automated Function Specialization Across Translation Units." In OpenMP in a Modern World: From Multi-device Support to Meta Programming, ed. by Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg, and Brandon Neth,159-173.Cham:Springer International Publishing, 2022. LLNL-CONF-837685, ISBN: 978-3-031-15922-0.

Huber, Joseph, Melanie Cornelius, Giorgis Georgakoudis, Shilei Tian, Jose M. Monsalve Diaz, Kuter Dinel, Barbara Chapman, and Johannes Doerfert. "Efficient Execution of OpenMP on GPUs." In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 41-52. 2022. LLNL-CONF-826728,

Liao, Chunhua, Anjia Wang, Giorgis Georgakoudis, Bronis R. de Supinski, Yonghong Yan, David Beckingsale, and Todd Gamblin. "Extending OpenMP for Machine Learning-Driven Adaptation." In Accelerator Programming Using Directives, ed. by Sridutt Bhalachandra, Christopher Daley, and Verónica Melesse Vergara, 49-69. Cham: Springer International Publishing, 2022. LLNL-CONF-826432, ISBN: 978-3-030-97759-7.

Parasyris, Konstantinos, Giorgis Georgakoudis, Johannes Doerfert, Ignacio Laguna, and Thomas R.W. Scogland. "Piper: Pipelining OpenMP Offloading Execution Through Compiler Optimization For Performance." In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), 100-110. 2022. LLNL-CONF-827970, 10.1109/P3HPC56579.2022.00015.

Huber, Joseph, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert, and Oscar Hernandez. "A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation". In OpenMP: Enabling Massive Node-Level Parallelism, ed. by Simon McIntosh-Smith, Bronis R. de Supinski, and Jannis Klinkenberg, 142-155. Best Paper award. Cham: Springer International Publishing, 2021. LLNL-CONF-819815, ISBN: 978-3-030-85262-7.

Jayatilaka, Tarindu, Hideto Ueno, Giorgis Georgakoudis, EunJung Park, and Johannes Doerfert. "Towards Compile-Time-Reducing Compiler Optimization Selection via Machine Learning." In 50th International Conference on Parallel Processing Workshop. New York, NY, USA: Association for Computing Machinery, 2021. LLNL-CONF-823515, ISBN: 9781450384414, https: //

Mattson, Timothy G., Todd A. Anderson, and Giorgis Georgakoudis. "PyOMP: Multithreaded Parallel Programming in Python." Computing in Science Engineering 23, no. 6 (2021): 77-80. LLNL-JRNL-829084,

Giorgis Georgakoudis, "Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems" (Presentation, SciDAC RAPIDS2 Platform Readiness Seminar, 2023). LLNL-PRES-845004.

Giorgis Georgakoudis, "Extending OpenMP to Support Automated Function Specialization Across Translation Units" (Presentation, SciDAC RAPIDS2 Platform Readiness Seminar, 2022). LLNL-PRES-840158.

Konstantinos Parasyris, "Piper: Pipelining OpenMP Offloading Execution through Compiler Optimization for Performance" (International Workshop on Performance, Portability and Productivity in HPC P3HPC, Dallas, TX, Nov 2022). LLNL-PRES-842174.

Thomas Scogland, "Extending OpenMP to Support Automated Function Specialization Across Translation Units" (18th International Workshop on OpenMP, Chattanooga, TN, Sept 2022). LLNL-PRES-840158.

Giorgis Georgakoudis, "Optimizing OpenMP GPU Execution in LLVM" (LLVM Developers Meeting, Virtual, Nov 2021).

Chunhua Liao, "Extending OpenMP for machine learning-driven adaptation" (8th International Workshop on Accelerator Programming Using Directives (WACCPD), Virtual, Nov 2021). LLNL-PRES-827741.

Giorgis Georgakoudis, "(OpenMP) Parallelism-Aware Optimizations" (LLVM Developers Meeting, Virtual, Oct 2020).