Multilevel Markov chain Monte Carlo Sampling Methods for Discrete Graph Problems
Sarah Mackay | 23-FS-008
Project Overview
In the energy reliability space, there is a lack scalable algorithms to generate variations of large, complex network models that have particular characteristics. If such algorithms were available, researchers in cybersecurity and cyber-physical resilience would be poised to make provable claims about risk and vulnerability of large complex networks. To address this need, we designed algorithms that produce network variations according to a probability distribution that favors those with the desired characteristics. Our algorithm is an adaptation of recent work in multilevel Markov chain Monte Carlo (MCMC) methods for continuous, non-network structured problems. Our algorithm represents networks at multiple levels of resolution. We construct a hierarchy of networks of decreasing size, so that each one in the sequence is a coarse approximation of the last. We exploit the coarser graphs to learn approximate solutions, which we then refine to produce networks of the original resolution. Our theoretical results that prove that, given enough time, our new algorithm will produce networks from the correct probability distribution. Our preliminary empirical results show that our algorithm can be used to quickly generate desirable networks that are diverse from one another, compared to classical single level MCMC algorithms which are prone to "get stuck" and repeatedly generate very similar networks.
Mission Impact
We studied penalized graph labeling (PGL) methods, which have particular benefit to cybersecurity and cyber-physical resilience. Asset owners frequently deploy equipment from the same vendor across their system, leading to cybersecurity "monocultures" in which a single vulnerability can be repeatedly exploited to propagate malware throughout the system. PGL mitigates this by designing deployments that minimize same-vendor connections. PGL can also enable detection of near-bipartite subgraphs that indicate fraud or botnets. Additionally, within the Lawrence Livermore National Laboratory (LLNL) Cyber and Infrastructure Resilience program, it is common to simulate many variations of some network model. Effective methods for the full PGL problem class will provide the ability to generate variations in a rigorous manner, leading to provable probabilistic claims about the subsequent risk analysis. In particular, these techniques can be integrated into the existing LLNL software Squirrel, whose current method for generating network variations employs a greedy optimizer, which is subject to bias. Additional problems at LLNL can be viewed in the PGL framework, such as materials discovery and multiscale atomic physics.