Scalable Approximate Graph Clustering

Geoffrey Sanders | 20-FS-037

Project Overview

Spectral clustering is a key algorithm for the exploratory analysis of graph data, but it is known to experience difficulty with (1) massive scales; (2) realistic topologies; and (3) graphs that change over time. The project team developed and analyzed novel vertex embeddings, based on linear sketches rather than eigenvalues and eigenvectors, that address these concerns. The developed technology is encapsulated in the croquis C++ library, which efficiently computes sketches on streaming data in distributed memory using the team's You've Got Mail (YGM) communication library.

Mission Impact

The technology and software developed in this project will be applied to Lawrence Livermore National Laboratory's programmatic efforts. The croquis C++ library will form the basis of several planned analytics and tools to make high-performance computing resources accessible through jupyter notebooks at interactive speeds. The toolkit will also likely be used in subprojects of a workforce optimization project, specifically, to investigate the sketch technology for non-graph clustering applications (e.g., general dimensionality reduction and reducing the cost of large-scale autoencoders) and make use of the graph clustering directly.

Publications, Presentations, and Patents

Dunton, A., et al. 2020. "Scaling Graph Clustering with Sketches." 2020 Lawrence Livermore National Laboratory Summer SLAM, August 2020. LLNL-PRES-812992

Priest, B. W., et al. 2020. "Scaling Graph Clustering with Distributed Sketches." 2020 IEEE High Performance Extreme Computing Conference, HPEC (online), September 2020. LLNL-CONF-812693