Project Rosetta Stone

Nathan Smith | 23-FS-014

Project Overview

The identification of software vulnerabilities can be a laborious process which drains time and money. To expedite the time for manual static-analysis, researchers often use a class of dynamic-analysis called fuzz-testing where the application is run with various inputs to find unwarranted or vulnerable behavior in the code. While effective, fuzz-testing is limited to how effective the initial input corpus the fuzzer uses as well as the code harness which tells the fuzzer how to talk with the application is. These inefficiencies become abundantly clear when targeting applications where code is not available, where the researcher spends more time developing a better input corpus than finding vulnerabilities.

For this reason, we propose a new technique called Input Context Gathering (ICG). ICG uses taint, reaching definition, concolic, and symbolic analysis with the angr software framework to build profiles for different input types to help in recovering information about the input such as how it is read in, the types, how it is parsed, and where it is stored. Project Rosetta Stone (PRS) then creates a grammar output and provides an interface to generate an initial input corpus for fuzz-testing. To test the efficiency improvements, we constrain the targets to custom, pre-compiled Linux ELF-format executables that expect a simple structured input from a configuration file, standard input, socket input, and from the command line. For each input type, there were at least three different variations in parsing the input given. Once an input is successfully parsed, the program crashes to simulate the fuzzer has found the path we wanted it to identify. Each program was then fuzz-tested with AFL++ once with PRS and once without PRS then analyzed for runtimes until first crash.

PRS demonstrated the ability to recover expected structure for inputs into our binary test corpus for configuration file, standard input, socket input, and command line arguments. With this information, it was able to improve fuzzing performance by a considerable amount for our test corpus with some tests completing in less than 10 seconds whereas they would otherwise take hours to a couple days until the first crash.

Mission Impact

Our work aligns with initiatives targeting climate and energy security and harmonizes with the lab's mission in Integrated Deterrence and Technology Competition, thereby contributing to the enhancement of cross-domain technology analysis. Project Rosetta Stone (PRS) aligns with these objectives, providing methodological advancement in identifying vulnerabilities in pre-compiled software and shortening the time required for such identifications. PRS is foundational for extending fuzz-testing efforts in Global Security's E-program, addressing both software and firmware. Moreover, the input recovery capability, ICG, of PRS serves as a valuable technique for malware analysts for the expedited triage of threats emanating from pre-compiled applications. Several projects within the software assurance space at Lawrence Livermore National Laboratory have expressed interest in using the work. While PRS does not have its own follow-on work, the research from PRS will be continued and integrated into software assurance efforts at the lab in the new fiscal year which are funded by the Department of Energy and the Department of Homeland Security.  

Publications, Presentations, and Patents

Nathan K. Smith, "Contextual Analysis: Project Rosetta Stone"(Presentation, Cybersecurity Summer Institute, Livermore, CA, July, 2023).