Machine Learning-driven Mass Spectrometric Methods for the Screening of Emergent Threats
Carolyn Fisher | 21-FS-038
Project Overview
The detection of novel threat agents presents several challenges, a principal one being the development of methods able to screen generally for a rapidly increasing number of threat chemicals whose exact structures are unknown. With the use of machine learning (ML) tools, we can guide the development of analytical methods for agile, broad-spectrum detection of unbounded threat chemical families in complex mixtures. This project examined the feasibility of using ML tools to drive the creation of new data-driven spectrometric methods for binary classification of unknown chemicals. For this project, the specific classification question was between "fentanyl" and "non-fentanyl" chemical groups. The overarching goal was to develop an ML model that would classify mass spectral (MS) data between "fentanyls" and "non-fentanyls" with high accuracy (90+ %). The resulting workflow includes the automated data acquisition and processing, ML tools, and subsequent method development.
Our team, including members in the Forensic Science Center and the Center for Applied Scientific Computing, collected MS data and used ML techniques to develop a binary classification model to separately identify pure chemical standards of "fentanyls" (n = 250) and "non-fentanyl" chemicals (here, 440 pesticides as the "non-fentanyl" class were used). MS data obtained from both nominal mass and high-resolution techniques resulted in random forest (RF) models that successfully classified fentanyl and non-fentanyl chemicals with 97+% accuracy (with five-fold cross validation). These RF models were further validated by classifying a group of five fentanyls and five non-fentanyls with 100% accuracy. We are currently working on a manuscript to publish the RF models developed for the nominal mass and high-resolution MS data. This investment reflects an initial foundation on which we envision building a complete end-to-end series of ML-based analytical tools to enhance forensic chemists' sample exploitation capabilities. The next steps for this work would be to use more "real world" forensic data (e.g., run with only a single collision energy, samples with higher background signal, lower concentration of the analyte of interest, etc.) to create an RF model that can then be applied to real-world uncharacterized forensic data of interest. We were asked by a DARPA program manager to submit a white paper detailing our path forward for using ML for forensic chemistry applications and we look to this as a possible avenue for future funding.
Mission Impact
This project supports Lawrence Livermore National Laboratory's Mission Research Challenges in Forensic Science by improving and expediting scientists' ability to respond to chemical threat emergencies and to deter and defend against chemical threats in multiple domains. Fentanyls represent one of the highest priority chemical threat categories in real-world sample analyses, and the developed ML methodologies from this project can be applied to classification of unknown samples in forensic analyses. The developed workflow represents a generalized, broadly applicable approach to screening and identifying new classes of threat compounds including emergent chemical warfare agents, novel biotoxins, explosives, and biologically-derived material (e.g., peptides). Overall, this work supports the NNSA mission to develop science and technology tools and capabilities to meet future national security challenges.