Detecting Data Races in High-Performance Computing

Markus Schordan | 17-ERD-023

Overview

High-performance computing is susceptible to data-race bugs. Data races occur when sequences of computed events do not happen in the intended order, creating a significant barrier to the integrity of parallel programs using shared-memory architectures. Many approaches to data-race detection are based on traditional static-analysis techniques that typically execute on a program’s source code or control-flow graph (the graphical representation of all possible orders of events in the execution of a program). Similar to classical data-flow analyses, such tools can produce false alarms that report spurious race conditions on error-free programs.

We developed a software verification technique that requires only a program’s source code as input to automatically detect a data-race bug that definitely exists, determine that a program is definitely data-race free, or report that it cannot determine either of the two cases. We also developed DataRaceBench, a benchmark suite for evaluating data race detection tools that consists of 126 microbenchmarks. We used the benchmark suite to systematically evaluate existing data race detection tools and our own tool, DRACO. Our tool can verify the absence of data races and can detect existing data races. As a verification tool it produces three possible results for a given program: a data race was detected (and definitely exists), the program is guaranteed not to have a data race, or the tool could not determine either of the two cases. In this last case, the remaining verification tasks can be passed to another tool. Our work is applicable to multi-processing programs using both central processing unit and graphics processing unit forms of threaded execution. In addition to better understanding and / or debugging a program, our methodology complements testing and reduces the number of tests to be run for a given program.

Impact on Mission

This research addresses NNSA goals of advancing the technological and engineering competencies that are at the foundation of NNSA missions and expands our ability to deal with broad national challenges. This work leveraged and advanced Lawrence Livermore National Laboratory's core competencies in high-performance computing, simulation, and data science by providing technology for analyzing parallel applications.

Publications, Presentations, Etc.

Jasper, M., et al. 2019. "RERS 2019: Combining Synthesis with Real-World Models." Proceedings: Tools and Algorithms for the Construction and Analysis of Systems - 25 Years of TACAS : TOOLympics, ETAPS 2019, Prague, Czech Republic, April 2019. LLNL-CONF-766478.

Liao, C., et al. 2017. "DataRaceBench: A Benchmark Suite for Systematic Evaluation of Data Race Detection Tools." International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, United States, November 2017. LLNL-CONF-728717.

––– . 2018. "A Semantics-Driven Approach to Improving DataRaceBench's OpenMP Standard Coverage." 14th International Workshop on OpenMP 2018, Barcelona, Spain, September 2018. LLNL-CONF-750770.

Lin, P.-H., et al. 2018. "Runtime and Memory Evaluation of Data Race Detection Tools." ISOLA 2018, Limassol, Cyprus, November 2018. LLNL-CONF-750746.

––– . 2019. "Exploring Regression of Data Race Detection Tools Using DataRaceBench." Correctness 2019: Third International Workshop on Software Correctness for HPC Applications, Denver, CO, November 2019. LLNL-CONF-787979.

Ye, F., et al. 2018 "Using Polyhedral Analysis to Verify OpenMP Applications are Data Race Free." Correctness 2018: Second International Workshop on Software Correctness for HPC Applications, Dallas, TX, November 2018. LLNL-CONF-756830.

DataRaceBench (open-source data race detection benchmark suite), https://github.com/LLNL/dataracebench . LLNL-CODE-732144.