David Brumley’s research while affiliated with Carnegie Mellon University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (63)


Static Analysis
  • Chapter

January 2025

David Brumley

The Mayhem Cyber Reasoning System

March 2018

·

717 Reads

·

34 Citations

IEEE Security and Privacy Magazine

Thanassis Avgerinos

·

David Brumley

·

John Davis

·

[...]

·

Ned Williamson

Mayhem is one of the first generation of autonomous computer security bots that finds and fixes vulnerabilities without any human intervention. Mayhem won the DARPA Cyber Grand Challenge (CGC) contest and $2,000,000 in August 2016 against six other finalists. The contest was the result of a two-year DARPA program, but the R&D necessary to compete stands on the shoulders of decades of basic academic and industry scientific research in program analysis, verification, and self-healing systems. The Mayhem system alone was developed over a decade of research in academia, which was spun out to a company called ForAllSecure. Mayhem is now being commercialized by ForAllSecure to autonomously check and protect the world’s software from exploitable bugs. In this article, we look back and give our story in creating Mayhem, and also look forward to a vision where autonomous security bots like Mayhem will radically improve the security of computer systems.




Scaling Up DPLL(T) String Solvers Using Context-Dependent Simplification

July 2017

·

35 Reads

·

92 Citations

Lecture Notes in Computer Science

Efficient reasoning about strings is essential to a growing number of security and verification applications. We describe satisfiability checking techniques in an extended theory of strings that includes operators commonly occurring in these applications, such as contains,index_of\mathsf {contains}, \mathsf {index\_of} and replace\mathsf {replace}. We introduce a novel context-dependent simplification technique that improves the scalability of string solvers on challenging constraints coming from real-world problems. Our evaluation shows that an implementation of these techniques in the SMT solver cvc4 significantly outperforms state-of-the-art string solvers on benchmarks generated using PyEx, a symbolic execution engine for Python programs. Using a test suite sampled from four popular Python packages, we show that PyEx uses only 41%41\% of the runtime when coupled with cvc4 than when coupled with cvc4’s closest competitor while achieving comparable program coverage.





Automatically deriving pointer reference expressions from binary code for memory dump analysis

August 2015

·

32 Reads

·

10 Citations

Given a crash dump or a kernel memory snapshot, it is often desirable to have a capability that can traverse its pointers to locate the root cause of the crash, or check their integrity to detect the control flow hijacks. To achieve this, one key challenge lies in how to locate where the pointers are. While locating a pointer usually requires the data structure knowledge of the corresponding program, an important advance made by this work is that we show a technique of extracting address-independent data reference expressions for pointers through dynamic binary analysis. This novel pointer reference expression encodes how a pointer is accessed through the combination of a base address (usually a global variable) with certain offset and further pointer dereferences. We have applied our techniques to OS kernels, and our experimental results with a number of real world kernel malware show that we can correctly identify the hijacked kernel function pointers by locating them using the extracted pointer reference expressions when only given a memory snapshot.


Figure 2: Submission Patterns Before and After a Flag Disclosure
Automatic Problem Generation for Capture-the-Flag Competitions
  • Conference Paper
  • Full-text available

August 2015

·

1,462 Reads

·

35 Citations

Computer security games, especially capture-the-flag (CTF) competitions, are growing in popularity. A typical CTF contest presents users with a set of hacking challenges, where correct solutions reveal a text " flag " that can be submitted to a scoring server. In traditional CTF architectures, the problem and the flag are the same across the competition. In this paper we discuss automatic problem generation (APG), where a given challenge is not fixed, but rather can have many different automatically generated problem instances. APG offers players a unique competition experience and can facilitate deliberate practice where problems vary just enough to make sure a user can replicate the solution idea. APG also allows competition administrators the ability to detect when users submit a copied flag from another user to the scoring server. In 2014 we ran a large-scale CTF competition called PicoCTF, where we measured the prevalence of flag sharing. Our results indicate that about 0.8% of flags submitted to AGP problems were copied, with 14% of teams submitting at least one shared flag. In 68% of flag sharing cases, teams went on to eventually solve the problem on their own.

Download

Citations (59)


... However, without an appropriate mutation strategy or runtime feedback mechanism, the probability of triggering such hidden interfaces is extremely low. Taint analysis is another type of popular solution to finding bugs in IoT [8], [31], [19], [10], [9], [43], [17]. But it can neither find hidden interfaces, since we can hardly define the taint source or taint sink related to hidden interfaces. ...

Reference:

EAGLEYE: Exposing Hidden Web Interfaces in IoT Devices via Routing Analysis
Saluki: Finding Taint-style Vulnerabilities with Static Property Checking
  • Citing Conference Paper
  • January 2018

... It has been tested on practice machines or challenges such as those found in VulnHub, 3 HackTheBox 4 or TryHackMe. 5 In its current iteration, version 0.8 released on May 12, 2023, installation includes setting cookies to simulate a browser session. Its output requires entering it into the terminal and so does the input that it takes from the result of the previously executed command. ...

The Mayhem Cyber Reasoning System
  • Citing Article
  • March 2018

IEEE Security and Privacy Magazine

... Equation (19) expresses the optimal probability p for the adversary A launching an attack by looking for steganography. The numerator, (B A leak − C A look ), represents the net benefit to A after accounting for the cost of looking for hidden information (C A look ). ...

How Shall We Play a Game?: A Game-theoretical Model for Cyber-warfare Games
  • Citing Conference Paper
  • August 2017

... We also consider the PyEx [37] benchmark, which we do not put into any of these groups, as it contains large formulae with complex predicates (substr, contains, etc.). We note that we omit the small Transducer+ [18] benchmark because it contains exclusively formulae with replace all. ...

Scaling Up DPLL(T) String Solvers Using Context-Dependent Simplification
  • Citing Conference Paper
  • July 2017

Lecture Notes in Computer Science

... Code injection was a mainstream method for exploiting memory-safety issues (e.g., buffer overflow) decades ago [70], [67], [16], [83], [57]. Attackers place malicious payloads, called shellcode, in controllable memory regions and corrupt control data (e.g., return addresses and function pointers) to divert the control flow and execute the shellcode. ...

Your Exploit is Mine: Automatic Shellcode Transplant for Remote Exploits
  • Citing Conference Paper
  • May 2017

... Strategy 1 can only solve the issue of html and js file emulation. For web interface files that need the support of certain modules, like asp and php, we decide to take advantage of tool Firmadyne proposed by Chen et al. [16]. Firmadyne provides a system-level emulation given a firmware. ...

Towards Automated Dynamic Analysis for Linux-based Embedded Firmware
  • Citing Conference Paper
  • January 2016

... However, it is time-consuming and expensive to learn hands-on cybersecurity through the simulation-based training environment built by a CTF system, e.g., Cyber Range. Moreover, the CTF-based method is mainly aimed at experienced rather than beginner players (Burket et al., 2015;Deljkic et al., 2019;Maki et al., 2020;Yamin & Katt, 2022). ...

Automatic Problem Generation for Capture-the-Flag Competitions

... Gu et al. (2006) introduced a concept called "vulnerability-specific predicates" to generalise attack signatures, potentially catching variants of known attacks. (Brumley et al., 2006) proposed automatic signature generation techniques to rapidly respond to new threats. ...

Towards automatic generation of vulnerability based signatures
  • Citing Article
  • January 2006

... Similarly, Matryoshka [39] homes in on conditional statements pertinent to the target branch, streamlining the analysis process. Another approach is offered by MergePoint [40], which alleviates performance burdens by alternating between dynamic and static symbolic execution strategies. Furthermore, Eclipser [41] selectively focuses on a limited set of comparison instructions to formulate approximate path constraints, contributing to a more efficient execution analysis. ...

Enhancing Symbolic Execution with Veritesting
  • Citing Article
  • May 2016

Communications of the ACM

... Investigators collect RAM dumps from the digital crime scene; they apply their technical skills to capture, analyze, and identify potential digital evidence, often called digital artifacts. Simply, a memory dump is a snapshot, which is technically a bit-bybit copy of the complete or part of the RAM of the used machine [1]; for example, it can be a core dump for the whole RAM or just a core dump for a specific process that is running in the captured machine [2], [3]. This type of investigation is becoming more common as the number of cybercrimes increases. ...

Automatically deriving pointer reference expressions from binary code for memory dump analysis
  • Citing Conference Paper
  • August 2015