Conference Paper

LibAFL: A Framework to Build Modular and Reusable Fuzzers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, VRust suffers from several limitations: 1) it strictly requires source code to conduct analyses, 2) it suffers from a high number of false alarms, and 3) it does not provide an analyst with enough data to (re-)construct exploit transactions. In contrast, fuzzing is a technique that does not suffer from any of these limitations [3,15,16,46]. The fuzzing input given to the analysis target can also usually be crafted into exploit transactions. ...
... A popular approach to uncover bugs is coverage-guided fuzzing [15,16,53,61]. This technique mutates the inputs based on instructioncoverage data, or feedback information, collected during the target's execution, to uncover new paths in the application. ...
... In this section, we detail the implementation of FuzzDelSol. Fuzz-DelSol uses the state-of-the-art Libafl [16] fuzzer. Libafl's design allows the FuzzDelSol to include its own feedback mechanism in the fuzzing mutation. ...
Preprint
Solana has quickly emerged as a popular platform for building decentralized applications (DApps), such as marketplaces for non-fungible tokens (NFTs). A key reason for its success are Solana's low transaction fees and high performance, which is achieved in part due to its stateless programming model. Although the literature features extensive tooling support for smart contract security, current solutions are largely tailored for the Ethereum Virtual Machine. Unfortunately, the very stateless nature of Solana's execution environment introduces novel attack patterns specific to Solana requiring a rethinking for building vulnerability analysis methods. In this paper, we address this gap and propose FuzzDelSol, the first binary-only coverage-guided fuzzing architecture for Solana smart contracts. FuzzDelSol faithfully models runtime specifics such as smart contract interactions. Moreover, since source code is not available for the large majority of Solana contracts, FuzzDelSol operates on the contract's binary code. Hence, due to the lack of semantic information, we carefully extracted low-level program and state information to develop a diverse set of bug oracles covering all major bug classes in Solana. Our extensive evaluation on 6049 smart contracts shows that FuzzDelSol's bug oracles find bugs with a high precision and recall. To the best of our knowledge, this is the largest evaluation of the security landscape on the Solana mainnet.
... If VulnLoc were written in C/C++, VulnLoc would be able to run faster. In the fuzzing field, some frameworks have already been proposed so that different algorithms can be implemented in a uniform way to solve such a problem [25], [26]. For example, LibAFL [25] is a framework for building fuzzers in a modular manner. ...
... In the fuzzing field, some frameworks have already been proposed so that different algorithms can be implemented in a uniform way to solve such a problem [25], [26]. For example, LibAFL [25] is a framework for building fuzzers in a modular manner. LibAFL can reduce the cost of combining multiple fuzzing algorithms into a single fuzzer and can fairly and objectively evaluate the algorithms within the common implementation. ...
... If VulnLoc were written in C/C++, VulnLoc would be able to run faster. In the fuzzing field, some frameworks have already been proposed so that different algorithms can be implemented in a uniform way to solve such a problem [25], [26]. For example, LibAFL [25] is a framework for building fuzzers in a modular manner. ...
... In the fuzzing field, some frameworks have already been proposed so that different algorithms can be implemented in a uniform way to solve such a problem [25], [26]. For example, LibAFL [25] is a framework for building fuzzers in a modular manner. LibAFL can reduce the cost of combining multiple fuzzing algorithms into a single fuzzer and can fairly and objectively evaluate the algorithms within the common implementation. ...
Preprint
Fuzzing has contributed to automatically identifying bugs and vulnerabilities in the software testing field. Although it can efficiently generate crashing inputs, these inputs are usually analyzed manually. Several root cause analysis (RCA) techniques have been proposed to automatically analyze the root causes of crashes to mitigate this cost. However, outstanding challenges for realizing more elaborate RCA techniques remain unknown owing to the lack of extensive evaluation methods over existing techniques. With this problem in mind, we developed an end-to-end benchmarking platform, RCABench, that can evaluate RCA techniques for various targeted programs in a detailed and comprehensive manner. Our experiments with RCABench indicated that the evaluations in previous studies were not enough to fully support their claims. Moreover, this platform can be leveraged to evaluate emerging RCA techniques by comparing them with existing techniques.
... Since every fuzzing task is unique, fuzzers usually need to be adapted to the task given. For example, LibAFL offers good customization options and makes it easier to adapt a fuzzer to a specific task ( [18]). Fuzzing can also applied directly on embedded devices using hardware in the loop testing techniques ( [19,20]). ...
Preprint
Full-text available
Connected Medical Devices (CMDs) have a large impact on patients as they allow them to lead a more normal life. Any malfunction could not only remove the health benefits the CMDs provide, they could also cause further harm to the patient. Due to this, there are many safety regulations which must be adhered to prior to a CMD entering the market. However, while many detailed safety regulations exist, there are a fundamental lack of cybersecurity frameworks applicable to CMDs. While there are recent regulations which aim to enforce cybersecurity practices, they are vague and do not contain the concrete steps necessary to implement cybersecurity. This paper aims to fill that gap by describing a framework, CyMed, to be used by vendors and ens-users, which contains concrete measures to improve the resilience of CMDs against cyber attack. The CyMed framework is subsequently evaluated based on practical tests as well as expert interviews.
... AFL++ [12] combines other open source fuzzers into a new fuzzer, which contains a variety of novel improvements. LibAFL [13] deconstructs AFL into modules to integrate orthogonal techniques. ...
Preprint
Traditional coverage grey-box fuzzers perform a breadth-first search of the state space of Program Under Test (PUT). This aimlessness wastes a lot of computing resources. Directed grey-box fuzzing focuses on the target of PUT and becomes one of the most popular topics of software testing. The early termination of unreachable test cases is a method to improve directed grey-box fuzzing. However, existing solutions have two problems: firstly, reachability analysis needs to introduce extra technologies (e.g., static analysis); secondly, the performance of reachability analysis and auxiliary technologies lack versatility. We propose FGo, a probabilistic exponential cut-the-loss directed grey-box fuzzer. FGo terminates unreachable test cases early with exponentially increasing probability. Compared to other technologies, FGo makes full use of the unreachable information contained in iCFG and doesn't generate any additional overhead caused by reachability analysis. Moreover, it is easy to generalize to all PUT. This strategy based on probability is perfectly adapted to the randomness of fuzzing. The experiment results show that FGo is 106% faster than AFLGo in reproducing crashes. We compare multiple parameters of probabilistic exponential cut-the-loss algorithm and analyze them in detail. In addition, for enhancing the inerpretability of FGo, this paper discusses the difference between the theoretical performance and the practical performance of probabilistic exponential cut-the-loss algorithm.
... To implement the snapshot-based fuzzing algorithm and both the dataflow and comparison waypoints, we build ItyFuzz from scratch. We use LibAFL [7] as a backbone and implement a separate state corpus to support snapshotting the states. We also incorporate the dataflow and comparison waypoints into ItyFuzz using customized feedback. ...
Preprint
Full-text available
Smart contracts are critical financial instruments, and their security is of utmost importance. However, smart contract programs are difficult to fuzz due to the persistent blockchain state behind all transactions. Mutating sequences of transactions are complex and often lead to a suboptimal exploration for both input and program spaces. In this paper, we introduce a novel snapshot-based fuzzer ItyFuzz for testing smart contracts. In ItyFuzz, instead of storing sequences of transactions and mutating from them, we snapshot states and singleton transactions. To explore interesting states, ItyFuzz introduces a dataflow waypoint mechanism to identify states with more potential momentum. ItyFuzz also incorporates comparison waypoints to prune the space of states. By maintaining snapshots of the states, ItyFuzz can synthesize concrete exploits like reentrancy attacks quickly. Because ItyFuzz has second-level response time to test a smart contract, it can be used for on-chain testing, which has many benefits compared to local development testing. Finally, we evaluate ItyFuzz on real-world smart contracts and some hacked on-chain DeFi projects. ItyFuzz outperforms existing fuzzers in terms of instructional coverage and can find and generate realistic exploits for on-chain projects quickly.
... In recent years, grey-box or mutation-based fuzzing has shown its effectiveness in memory error bug detection [28][29][30][31][32][33][34][35][36][37]. AFL [6], which is an important milestone in the area of software security testing [38], has been applied to DBMS fuzzing. However, the fuzzer, like AFL, performs poorly in generating structural inputs, such as SQL statements. ...
Article
Full-text available
Database Management Systems (DBMSs) are the core of management information systems. Thus, detecting security bugs or vulnerabilities of DBMSs is an essential task. In recent years, grey-box fuzzing has been adopted to detect DBMS bugs for its high effectiveness. However, the seed scheduling strategy of existing fuzzing techniques does not consider the seeds’ correctness, which is inefficient in finding vulnerabilities in DBMSs. Moreover, current tools cannot correctly generate SQL statements with nested structures, which limits their effectiveness. This paper proposes a fuzzing solution named Squill to address these challenges. First, we propose correctness-guided mutation to utilize the correctness of seeds as feedback to guide fuzzing. Second, Squill embeds semantics-aware instantiation to correctly fill semantics to SQL statements with nested structures by collecting the context information of AST nodes. We implemented Squill based on Squirrel and evaluated it on three popular DBMSs: MySQL, MariaDB, and OceanBase. In our experiment, Squill explored 29% more paths and found 3.4× more bugs than the existing tool. In total, Squill detected 30 bugs in MySQL, 27 in MariaDB, and 6 in OceanBase. Overall, 19 of the bugs are fixed with 9 CVEs assigned. The results show that Squill outperforms the previous fuzzer in terms of both code coverage and bug discovery.
... These mechanisms allow a fuzzer to keep only those testcases that result in new coverage, leading the fuzzing campaign to go deeper in the application code, thus increasing the chances of eventually reaching the location of a bug. This simple idea is at the base of many modern fuzz testing projects, such as AFL++ [18], LibAFL [19], and libfuzzer [27], even though, as we will describe in this paper, the actual implementation is project-specific and can have a relevant impact on the performance of the fuzzer. ...
Preprint
Full-text available
AFL is one of the most used and extended fuzzer, adopted by industry and academic researchers alike. While the community agrees on AFL's effectiveness at discovering new vulnerabilities and at its outstanding usability, many of its internal design choices remain untested to date. Security practitioners often clone the project "as-is" and use it as a starting point to develop new techniques, usually taking everything under the hood for granted. Instead, we believe that a careful analysis of the different parameters could help modern fuzzers to improve their performance and explain how each choice can affect the outcome of security testing, either negatively or positively. The goal of this paper is to provide a comprehensive understanding of the internal mechanisms of AFL by performing experiments and by comparing different metrics used to evaluate fuzzers. This can help to show the effectiveness of some techniques and to clarify which aspects are instead outdated. To perform our study we performed nine unique experiments that we carried out on the popular Fuzzbench platform. Each test focuses on a different aspect of AFL, ranging from its mutation approach to the feedback encoding scheme and its scheduling methodologies. Our findings show that each design choice affects different factors of AFL. While some of these are positively correlated with the number of detected bugs or the coverage of the target application, other features are instead related to usability and reliability. Most important, we believe that the outcome of our experiments indicates which parts of AFL we should preserve in the design of modern fuzzers.
Article
Agile hardware development methodology has been widely adopted over the past decade. Despite the research progress, the industry still doubts its applicability, especially for the functional verification of complicated processor chips. Functional verification commonly employs a simulation-based method of co-simulating the design under test with a reference model and checking the consistency of their outcomes given the same input stimuli. We observe limited collaboration and information exchange through the design and verification processes, dramatically leading to inefficiencies when applying the conventional functional verification workflow to agile development. In this paper, we propose workflow integration with collaborative task delegation and dynamic information exchange as the design principles to effectively address the challenges on functional verification under the agile development model. Based on workflow integration, we enhance the functional verification workflows with a series of novel methodologies and toolchains. The diff-rule based agile verification methodology (DRAV) reduces the overhead of building reference models with runtime execution information from designs under test. We present the RISC-V implementation for DRAV, DiffTest, which adopts information probes to extract internal design behaviors for co-simulation and debugging. It further integrates two plugins, namely XFUZZ for effective test generation guided by design coverage metrics and LightSSS for efficient fault analysis triggered by co-simulation mismatches. We present the integrated workflows for agile hardware development and demonstrate their effectiveness in designing and verifying RISC-V processors with 33 functional bugs found in NutShell. We also illustrate the efficiency of the proposed toolchains with a case study on a functional bug in the L2 cache of XiangShan.
Article
Fuzzing has become an important method for finding vulnerabilities in software. For fuzzing programs expecting structural inputs, syntactic- and semantic-aware fuzzing approaches have been particularly proposed. However, they still cannot fuzz in-memory data stores sufficiently, since some code paths are only executed when the required data are available. In this article, we propose a data-aware fuzzing method, DAFuzz, which is designed by considering the data used during fuzzing. Specifically, to ensure different data-sensitive code paths are exercised, DAFuzz first loads different kinds of data into the stores before feeding fuzzing inputs. Then, when generating inputs, DAFuzz ensures the generated inputs are not only syntactically and semantically valid but also use the data correctly. We implement a prototype of DAFuzz based on Superion and use it to fuzz Redis and Memcached . Experiments show that DAFuzz covers 13~95% more edges than AFL, Superion, AFL++, and AFLN et , and discovers vulnerabilities over 2.7× faster. In total, we discovered four new vulnerabilities in Redis and Memcached . All the vulnerabilities were reported to developers and have been acknowledged and fixed.
Chapter
Fuzz testing, which repeatedly executes a given program with auto-generated random inputs, and records its dynamic control flow, aims to discover sources of unexpected program behavior impacting security, which can then be fixed easier by directed developer effort. When targeting IoT devices, fuzzing faces the problem that the small IoT processors often lack the observability required for fuzzing, e.g., a high-performance trace port, while software-emulation on a faster host CPU is often slow, and compilation of the IoT application to a different ISA for faster native execution on the host introduces inaccuracies in the fuzzing process. To overcome all three of these drawbacks for RISC-V-based IoT processors, which are expected to dominate future IoT applications with their lack of ISA licensing costs, we modify an open-source RISC-V core for use in an FPGA-based hardware-accelerated fuzzing system. Our fuzzer has demonstrated up to four times the performance of the state-of-the-art QEMU-based fuzzing tool AFL++, even when running on very fast x86 host processors clocked at 4.95 GHz.KeywordsSecurityFuzzingLibAFLTaPaSCoRISC-VCoverage
Article
AFL is one of the most used and extended fuzzer, adopted by industry and academic researchers alike. While the community agrees on AFL’s effectiveness at discovering new vulnerabilities and at its outstanding usability, many of its internal design choices remain untested to date. Security practitioners often clone the project “as-is” and use it as a starting point to develop new techniques, usually taking everything under the hood for granted. Instead, we believe that a careful analysis of the different parameters could help modern fuzzers to improve their performance and explain how each choice can affect the outcome of security testing, either negatively or positively. The goal of this paper is to provide a comprehensive understanding of the internal mechanisms of AFL by performing experiments and by comparing different metrics used to evaluate fuzzers. This can help to show the effectiveness of some techniques and to clarify which aspects are instead outdated. To perform our study we performed nine unique experiments that we carried out on the popular Fuzzbench platform. Each test focuses on a different aspect of AFL, ranging from its mutation approach to the feedback encoding scheme and its scheduling methodologies. Our findings show that each design choice affects different factors of AFL. While some of these are positively correlated with the number of detected bugs or the coverage of the target application, other features are instead related to usability and reliability. Most important, we believe that the outcome of our experiments indicates which parts of AFL we should preserve in the design of modern fuzzers.
Conference Paper
Full-text available
Recent years have witnessed a wide array of results in software testing, exploring different approaches and methodologies ranging from fuzzers to symbolic engines, with a full spectrum of instances in between such as concolic execution and hybrid fuzzing. A key ingredient of many of these tools is Satisfiability Modulo Theories (SMT) solvers, which are used to reason over symbolic expressions collected during the analysis. In this paper, we investigate whether techniques borrowed from the fuzzing domain can be applied to check whether symbolic formulas are satisfiable in the context of concolic and hybrid fuzzing engines, providing a viable alternative to classic SMT solving techniques. We devise a new approximate solver, FUZZY-SAT, and show that it is both competitive with and complementary to state-of-the-art solvers such as Z3 with respect to handling queries generated by hybrid fuzzers.
Conference Paper
Full-text available
Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually written format specifications, in this paper we present a technique to automatically generate and mutate inputs for unknown chunk-based binary formats. We propose a technique to identify dependencies between input bytes and comparison instructions, and later use them to assign tags that characterize the processing logic of the program. Tags become the building block for structure-aware mutations involving chunks and fields of the input. We show that our techniques performs comparably to structure-aware fuzzing proposals that require human assistance. Our prototype implementation WEIZZ revealed 16 unknown bugs in widely used programs.
Conference Paper
Full-text available
Fuzz testing techniques are becoming pervasive for their ever-improving ability to generate crashing trial cases for programs. Memory safety violations however can lead to silent corruptions and errors, and a fuzzer may recognize them only in the presence of sanitization machinery. For closed-source software combining sanitization with fuzzing incurs practical obstacles that we try to tackle with an architecture-independent proposal called QASan for detecting heap memory violations. In our tests QASan is competitive with standalone sanitizers and adds a moderate 1.61x average slowdown to the AFL++ fuzzer while enabling it to reveal more heap-related bugs.
Conference Paper
Full-text available
In this paper, we present AFL ++ , a community-driven open-source tool that incorporates state-of-the-art fuzzing research, to make the research comparable, reproducible, combinable and-most importantly-useable. It offers a variety of novel features, for example its Custom Mutator API, able to extend the fuzzing process at many stages. With it, mutators for specific targets can also be written by experienced security testers. We hope for AFL ++ to become a new baseline tool not only for current, but also for future research, as it allows to test new techniques quickly, and evaluate not only the effectiveness of the single technique versus the state-of-the-art, but also in combination with other techniques. The paper gives an evaluation of hand-picked fuzzing technologies-shining light on the fact that while each novel fuzzing method can increase performance in some targets-it decreases performance for other targets. This is an insight future fuzzing research should consider in their evaluations.
Article
Full-text available
Coverage-guided fuzz testing has gained prominence as a highly effective method of finding security vulnerabilities such as buffer overflows in programs that parse binary data. Recently, researchers have introduced various specializations to the coverage-guided fuzzing algorithm for different domain-specific testing goals, such as finding performance bottlenecks, generating valid inputs, handling magic-byte comparisons, etc. Each such solution can require non-trivial implementation effort and produces a distinct variant of a fuzzing tool. We observe that many of these domain-specific solutions follow a common solution pattern. In this paper, we present FuzzFactory, a framework for developing domain-specific fuzzing applications without requiring changes to mutation and search heuristics. FuzzFactory allows users to specify the collection of dynamic domain-specific feedback during test execution, as well as how such feedback should be aggregated. FuzzFactory uses this information to selectively save intermediate inputs, called waypoints, to augment coverage-guided fuzzing. Such waypoints always make progress towards domain-specific multi-dimensional objectives. We instantiate six domain-specific fuzzing applications using FuzzFactory: three re-implementations of prior work and three novel solutions, and evaluate their effectiveness on benchmarks from Google's fuzzer test suite. We also show how multiple domains can be composed to perform better than the sum of their parts. For example, we combine domain-specific feedback about strict equality comparisons and dynamic memory allocations, to enable the automatic generation of LZ4 bombs and PNG bombs.
Conference Paper
Full-text available
Many kinds of memory safety vulnerabilities have been endangering software systems for decades. Amongst other approaches, fuzzing is a promising technique to unveil various software faults. Recently, feedback-guided fuzzing demonstrated its power, producing a steady stream of security-critical software bugs. Most fuzzing efforts-especially feedback fuzzing-are limited to user space components of an operating system (OS), although bugs in kernel components are more severe, because they allow an attacker to gain access to a system with full privileges. Unfortunately, kernel components are difficult to fuzz as feedback mechanisms (i.e., guided code coverage) cannot be easily applied. Additionally, non-determinism due to interrupts, kernel threads, statefulness, and similar mechanisms poses problems. Furthermore, if a process fuzzes its own kernel, a kernel crash highly impacts the performance of the fuzzer as the OS needs to reboot. In this paper, we approach the problem of coverage-guided kernel fuzzing in an OS-independent and hardware-assisted way: We utilize a hypervisor and In-tel's Processor Trace (PT) technology. This allows us to remain independent of the target OS as we just require a small user space component that interacts with the targeted OS. As a result, our approach introduces almost no performance overhead, even in cases where the OS crashes, and performs up to 17,000 executions per second on an off-the-shelf laptop. We developed a framework called kernel-AFL (kAFL) to assess the security of Linux, macOS, and Windows kernel components. Among many crashes, we uncovered several flaws in the ext4 driver for Linux, the HFS and APFS file system of macOS, and the NTFS driver of Windows.
Conference Paper
Full-text available
Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.
Conference Paper
Full-text available
Calling context trees are one of the most fundamental data structures for representing the interprocedural control flow of a program, providing valuable information for program understanding and optimization. Nodes of a calling context tree associate performance metrics to whole distinct paths in the call graph starting from the root function. However, no explicit information is provided for detecting short hot sequences of activations, which may be a better optimization target in large modular programs where groups of related functions are reused in many different parts of the code. Furthermore, calling context trees can grow prohibitively large in some scenarios. Another classical approach, called edge profiling, collects performance metrics for caller-callee pairs in the call graph, allowing it to detect hot paths of fixed length one. We study a generalization of edge and context-sensitive profiles by introducing a novel data structure called k-calling context forest (k-CCF). Nodes in a k-CCF associate performance metrics to paths of length at most k that lead to each distinct routine of the program, providing edge profiles for k=1, full context-sensitive profiles for k equal to infinity, as well as any other intermediate point in the spectrum. We study the properties of the k-CCF both theoretically and experimentally on a large suite of prominent Linux applications, showing how to construct it efficiently and discussing its relationships with the calling context tree. Our experiments show that the k-CCF can provide effective space-accuracy tradeoffs for interprocedural contextual profiling, yielding useful clues to the hot spots of a program that may be hidden in a calling context tree and using less space for small values of k, which appear to be the most interesting in practice.
Conference Paper
Full-text available
Fuzz testing is an automated technique providing random data as input to a software system in the hope to expose a vulnerability. In order to be effective, the fuzzed input must be common enough to pass elementary consistency checks; a JavaScript interpreter, for instance, would only accept a semantically valid program. On the other hand, the fuzzed input must be uncommon enough to trigger exceptional behavior, such as a crash of the interpreter. The LangFuzz approach resolves this conflict by using a grammar to randomly generate valid programs; the code fragments, however, partially stem from programs known to have caused invalid behavior before. LangFuzz is an effective tool for security testing: Applied on the Mozilla JavaScript interpreter, it discovered a total of 105 new severe vulnerabilities within three months of operation (and thus became one of the top security bug bounty collectors within this period); applied on the PHP interpreter, it discovered 18 new defects causing crashes.
Conference Paper
Full-text available
Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a pro- gram and test the resulting values. We present an alterna- tive whitebox fuzz testing approach inspired by recent ad- vances in symbolic execution and dynamic test generation. Our approach records an actual run of the program un- der test on a well-formed input, symbolically evaluates the recorded trace, and gathers constraints on inputs capturing how the program uses these. The collected constraints are then negated one by one and solved with a constraint solver, producing new inputs that exercise different control paths in the program. This process is repeated with the help of a code-coverage maximizing heuristic designed tofind defects as fast as possible. We have implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for whitebox fuzzing of arbitrary file-reading Windows ap- plications. We describe key optimizations needed to make dynamic test generation scale to large input files and long execution traces with hundreds of millions of instructions. We then present detailed experiments with several Windows applications. Notably, without any format-specific knowl- edge, SAGE detects the MS07-017 ANI vulnerability, which was missed by extensive blackbox fuzzing and static analy- sis tools. Furthermore, while still in an early stage of de- velopment, SAGE has already discovered 30+ new bugs in large shipped Windows applications including image pro- cessors, media players, and file decoders. Several of these bugs are potentially exploitable memory access violations.
Article
Full-text available
This project started as a simple experiment to try to better understand an observed phenomenon, that of programs crashing when a noisy dial-up line is used. As a result of testing a comprehensive list of utility programs on several versions of Unix, it appears that this is not an isolated problem. Thus, this paper supplies a list of bug reports to fix the utilities that we were able to crash. This should also improve the quality reliability of Unix utilities. This paper also supplies a simple, but effective test method (and tools).
Conference Paper
Full-text available
We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.
Conference Paper
We present the first approach to automatic exploit generation for heap overflows in interpreters. It is also the first approach to exploit generation in any class of program that integrates a solution for automatic heap layout manipulation. At the core of the approach is a novel method for discovering exploit primitives---inputs to the target program that result in a sensitive operation, such as a function call or a memory write, utilizing attacker-injected data. To produce an exploit primitive from a heap overflow vulnerability, one has to discover a target data structure to corrupt, ensure an instance of that data structure is adjacent to the source of the overflow on the heap, and ensure that the post-overflow corrupted data is used in a manner desired by the attacker. Our system addresses all three tasks in an automatic, greybox, and modular manner. Our implementation is called GOLLUM, and we demonstrate its capabilities by producing exploits from 10 unique vulnerabilities in the PHP and Python interpreters, 5 of which do not have existing public exploits.
Article
Coverage-based greybox fuzzing (CGF) is one of the most successful approaches for automated vulnerability detection. Given a seed file (as a sequence of bits), a CGF randomly flips, deletes or copies some bits to generate new files. CGF iteratively constructs (and fuzzes) a seed corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely to produce valid files (or valid chunks in files), for applications processing complex file formats. In this work, we introduce smart greybox fuzzing (SGF) which leverages a high-level structural representation of the seed file to generate new files. We define innovative mutation operators that work on the virtual file structure rather than on the bit level which allows SGF to explore completely new input domains while maintaining file validity. We introduce a novel validity-based power schedule that enables SGF to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. Our evaluation demonstrates the effectiveness of SGF. On several libraries that parse complex chunk-based files, our tool AFLSMART achieves substantially more branch coverage (up to 87% improvement), and exposes more vulnerabilities than baseline AFL. Our tool AFLSMART has discovered 42 zero-day vulnerabilities in widely-used, well-tested tools and libraries; so far 17 CVEs were assigned.
Conference Paper
In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop (AFL), has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the bugs it can find since it simply does not cover large regions of code. If it does not cover parts of the code, it will not find bugs there. We propose a two-pronged approach to increase the coverage achieved by AFL. First, the approach automatically identifies branches exercised by few AFL-produced inputs (rare branches), which often guard code that is empirically hard to cover by naively mutating inputs. The second part of the approach is a novel mutation mask creation algorithm, which allows mutations to be biased towards producing inputs hitting a given rare branch. This mask is dynamically computed during fuzz testing and can be adapted to other testing targets. We implement this approach on top of AFL in a tool named FairFuzz. We conduct evaluation on real-world programs against state-of-the-art versions of AFL. We find that on these programs FairFuzz achieves high branch coverage at a faster rate that state-of-the-art versions of AFL. In addition, on programs with nested conditional structure, it achieves sustained increases in branch coverage after 24 hours (average 10.6% increase). In qualitative analysis, we find that FairFuzz has an increased capacity to automatically discover keywords.
Conference Paper
Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, towards critical system calls or dangerous locations, or towards functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, we introduce Directed Greybox Fuzzing (DGF) which generates inputs with the objective of reaching a given set of target program locations efficiently. We develop and evaluate a simulated annealing-based power schedule that gradually assigns more energy to seeds that are closer to the target locations while reducing energy for seeds that are further away. Experiments with our implementation AFLGo demonstrate that DGF outperforms both directed symbolic-execution-based whitebox fuzzing and undirected greybox fuzzing. We show applications of DGF to patch testing and crash reproduction, and discuss the integration of AFLGo into Google's continuous fuzzing platform OSS-Fuzz. Due to its directedness, AFLGo could find 39 bugs in several well-fuzzed, security-critical projects like LibXML2. 17 CVEs were assigned.
Conference Paper
Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test is generated by slightly mutating a seed input. If the test exercises a new and interesting path, it is added to the set of seeds; otherwise, it is discarded. We observe that most tests exercise the same few "high-frequency" paths and develop strategies to explore significantly more paths with the same number of tests by gravitating towards low-frequency paths. We explain the challenges and opportunities of CGF using a Markov chain model which specifies the probability that fuzzing the seed that exercises path i generates an input that exercises path j. Each state (i.e., seed) has an energy that specifies the number of inputs to be generated from that seed. We show that CGF is considerably more efficient if energy is inversely proportional to the density of the stationary distribution and increases monotonically every time that seed is chosen. Energy is controlled with a power schedule. We implemented the exponential schedule by extending AFL. In 24 hours, AFLFAST exposes 3 previously unreported CVEs that are not exposed by AFL and exposes 6 previously unreported CVEs 7x faster than AFL. AFLFAST produces at least an order of magnitude more unique crashes than AFL.
Conference Paper
The ability to introspect into the behavior of software at runtime is crucial for many security-related tasks, such as virtual machine-based intrusion detection and low-artifact malware analysis. Although some progress has been made in this task by automatically creating programs that can passively retrieve kernel-level information, two key challenges remain. First, it is currently difficult to extract useful information from user-level applications, such as web browsers. Second, discovering points within the OS and applications to hook for active monitoring is still an entirely manual process. In this paper we propose a set of techniques to mine the memory accesses made by an operating system and its applications to locate useful places to deploy active monitoring, which we call tap points. We demonstrate the efficacy of our techniques by finding tap points for useful introspection tasks such as finding SSL keys and monitoring web browser activity on five different operating systems (Windows 7, Linux, FreeBSD, Minix and Haiku) and two processor architectures (ARM and x86).
Article
Fuzz testing is an effective technique for finding security vulnerabilities in software. Fuzz testing is a form of blackbox random testing which randomly mutates well-formed inputs and tests the program on the resulting data. In some cases, grammars are used to randomly generate the well-formed inputs. This also allows the tester to encode applicationspecific knowledge (such as corner cases of particular interest) as part of the grammar, and to specify test heuristics by assigning probabilistic weights to production rules. Although fuzz testing can be remarkably effective, the limitations of blackbox random testing are well-known. For instance, the then branch of the conditional statement “if (x==10) then”has only one in 2 32 chances of being exercised if x is a randomly chosen 32-bit input value. This intuitively
Conference Paper
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
Article
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arises due to factors such as the work-imbalance and the serial fraction. The second one is due to the interaction between the algorithm and the architecture and arises due to latency and contention in the network. A top-down approach to scalability study of shared memory parallel systems is proposed in this research. We define the notion of overhead functions associated with the different algorithmic and architectural characteristics to quantify the scalability of parallel systems; we develop a method for separating the algorithmic overhead into a serial component and a work-imbalance component; we also develop a method for isolating the overheads due to network latency and contention from the ...
Peach fuzzing platform
  • M Eddington
  • Eddington M.
GRIMOIRE: Synthesizing Structure while Fuzzing
  • Tim Blazytko
  • Cornelius Aschermann
  • Moritz Schlögel
  • Ali Abbasi
  • Sergej Schumilo
  • Simon Wörner
  • Thorsten Holz
  • Blazytko Tim
Marcel Böhme, Valentin Manès, and Sang Kil Cha. 2020. Boosting Fuzzer Efficiency: An Information Theoretic Perspective
  • Marcel Böhme
  • Valentin Manès
  • Sang Kil Cha
  • Böhme Marcel
Payer, and Antony L. Hosking. 2021. Seed Selection for Successful Fuzzing
  • Adrian Herrera
  • Hendra Gunadi
  • Shane Magrath
  • Michael Norrish
  • Mathias Payer
  • Antony L Hosking
  • Herrera Adrian
Symbolic-Model-Guided Fuzzing of Cryptographic Protocols. Master's thesis
  • Maximilian Ammann
  • Ammann Maximilian
AURORA: Statistical Crash Analysis for Automated Root Cause Explanation
  • Tim Blazytko
  • Moritz Schlögel
  • Cornelius Aschermann
  • Ali Abbasi
  • Joel Frank
  • Simon Wörner
  • Thorsten Holz
  • Blazytko Tim
Daniele Cono D'Elia, and Davide Balzarotti. 2021. The Use of Likely Invariants as Feedback for Fuzzers
  • Andrea Fioraldi
  • Daniele Cono D'elia
  • Davide Balzarotti
  • Fioraldi Andrea
MOPT: Optimized Mutation Sched- uling for Fuzzers
  • Chenyang Lyu
  • Shouling Ji
  • Chao Zhang
  • Yuwei Li
  • Wei-Han Lee
  • Yu Song
  • Raheem Beyah
  • Lyu Chenyang
Symbolic execution with SymCC: Don't interpret, compile!
  • Sebastian Poeplau
  • Aurélien Francillon
  • Poeplau Sebastian
FuzzIL: Coverage Guided Fuzzing for JavaScript Engines pdfsubject=Not set
  • Samuel Groß
Unicorefuzz: On the Viability of Emulation for Kernelspace Fuzzing
  • Dominik Maier
  • Benedikt Radtke
  • Bastian Harren
  • Maier Dominik
Keean Schupke Oleg Kiselyov, Ralf Laemmel
  • Oleg Keean Schupke
  • Ralf Kiselyov
  • Laemmel
  • Oleg Kiselyov Keean Schupke
Genoa. Alessandro Mantovani, Andrea Fioraldi, and Davide Balzarotti. 2022. Fuzzing with data dependency information
  • Alessandro Mantovani
  • Andrea Fioraldi
  • Davide Balzarotti
  • Mantovani Alessandro