Scalable Graph Motif Learning Capabilities for Operational Cyber Defense

Geoffrey Sanders | 19-FS-006

Overview

Securing enterprise-scale computer networks has become a fundamental national security task as institutions responsible for critical infrastructure-energy suppliers, financial institutions, and social media companies–depend heavily on their network services to meet operational goals. Detecting malicious and anomalous network activity is a challenging analysis goal. Approaches that employ hard signatures, i.e., reject previously-seen bad files, are important but severely limited. Behavioral analytics are useful for providing weak indicators that characterize specific network activity, offering features that gauge activity as "of interest" to analysts.

We performed a preliminary study on supervised and unsupervised learning techniques for automatically producing motifs of interest in massive labelled graph datasets with specific focus on graphs from cybersecurity applications. The project focused on higher-order topological graph analysis to aid in finding anomalous structure in large cybersecurity graphs representing network activity of an enterprise network for a significant period of time (day/week/month). Novel graph learning, motif visualization, feature importance visualization, and modelling techniques were developed to accomplish these tasks. Our efforts yielded technologies to aid in machine learning involving higher-order graph structure, which were then applied to cybersecurity graph datasets. While additional refinements are required to yield "push button" graph learning techniques for this particular type of data, significant preliminary steps were accomplished.

Impact on Mission

Results of this work add to Lawrence Livermore National Laboratory's core capabilities in high-performance computing, simulation, and data science and support the Laboratory's mission in cybersecurity.

Publications, Presentations, Etc.

Lafond, T. and G. Sanders. 2019. "Representing the Evolution of Communities in Dynamic Networks." International Congress on Industrial and Applied Mathematics (ICIAM), Valencia, Spain, July 2019. LLNL-PRES-781381.

Sanders, G. and R. Pearce. 2019. "HPC Graph Pattern Matching." Chesapeake Large-Scale Analytics Conference (CLSAC), Annapolis, MD, October 2019. LLNL-PROP-745284.