Industrial Control Systems Network Mapping and Data Analytics for Cybersecurity
Brian Kelley | 19-ERD-021
Project Overview
The need for new characterization methods for modern Industrial Control System (ICS) communication networks and hosts is motivated by a confluence of technological evolution and nation-state adversary focus, where our most critical infrastructures, and thus our society at large, are more vulnerable to abuse as they transform into advanced cyber-physical systems. Knowing the composition of ICS networks and their behavior is the first step towards securing them. Common methods for characterizing network hosts are primarily signature and rule-based; they depend on explicitly defined semantics of communication protocol data, and in some cases, active interrogation of network hosts. The primary objective of this project was to investigate new Machine Learning (ML) methods for discovering and predicting ICS network and host properties using only passively collected communication network data. This resulted in new feature representations that capture ICS device behavior, which ultimately yields useful information about ICS hosts without actively interrogating them and without explicit semantic knowledge of the network communication protocols. One approach is based on an unsupervised neural network method, IP2Vec, for learning vector representations of IP addresses. This method is augmented to include network traffic statistics and ICS protocol field values as features, which significantly improves the accuracy of ML classifiers when applied to the learning tasks of distinguishing between ICS and non-ICS network flows, and distinguishing ICS device manufacturer and model number. Another approach is based on a deep learning sequence-to-sequence model (Seq2Seq) for time-series forecasting. This method accurately predicts ICS network dynamics, which addresses problems of ICS data sparsity, model construction for cyber-physical resilience simulations, and verification of ICS device classification methods. The prospect for these new methods is substantial because they implicitly adapt to new communication protocols without disturbing the state of a network and can be integrated into existing tools to automate parts of the ICS security assessment process.
Mission Impact
The continuous and proper functioning of our nation's critical infrastructure is key to our national security and society at large, and ICS will only continue to expand and evolve into the cyber domain. Our ML methods for learning the behavior of ICS networks and hosts represents a new Laboratory capability that contributes to our nation's ability to secure critical infrastructure from cyberattack by providing relevant ICS device characterization services for constructing asset lists, assessing the security posture of ICS devices, assisting network analysts, and enhancing cyber-physical resilience simulations. This project supports the high-performance computing, simulation, and data science core competency, and addresses the Laboratory's cybersecurity and cyber-physical resilience mission research challenge by focusing specifically on the areas of cyber-physical characterization and characterizing complex network system behavior. Execution of this project has enhanced the Laboratory's workforce through the hiring of new staff, and education of the workforce on ICS security. As an outcome of this project, new engagements with federal agencies have already begun, and collaborations leveraging our ICS device characterization technology are being developed.
Publications, Presentations, and Patents
Merl, D., et al. 2019. Towards Machine Learning Enabled Mapping Services for Industrial Control Systems. LLNL. LLNL-TR-795524.
Reed, E. D., et al., 2019. Device Identification in Industrial Control Systems .LLNL. LLNL-POST-785247.
Kelley, B. M., et al., 2020a. Machine Learning for Mapping Industrial Control Systems: New Discovery Methods for Infrastructure Cybersecurity.LLNL. LLNL-POST-806197.
Kelley, B. M., et al., 2020b. Industrial Control System Mapping and Data Analytics for Cybersecurity. LLNL. LLNL-PRES-817438.
Kelley, B. M., et al. 2021a. A Neural Network Embedding of Distributed Network Protocol version 3 (DNP3) Features for Classification of Industrial Control System (ICS) Devices. LLNL. LLNL-IL-13660.
Chakraborty, I., et al. 2021a. "Industrial Control System Device Classification using Network Traffic Features and Neural Network Embeddings." To appear in Array. LLNL-JRNL-819535.
Chakraborty, I., et al. 2021b. Device Classification for Industrial Control Systems using Predicted Traffic Features. LLNL. LLNL-JRNL-825779.
Kelley, B. M., et al. 2021b. Real-Time Classification of Industrial Control System Devices. LLNL. LLNL-TR-827318.
Chakraborty, I., et al. 2021c. Generative Model based Traffic Feature Prediction. LLNL. LLNL-IL-13697.