To read the full-text of this research, you can request a copy directly from the authors.
... 1) Unstructured Fuzzing: Unstructured fuzzing methods analyse the instructions of a binary executable and try to produce an input that will crash the program. Unstructured fuzzers can be "dumb", i.e. completely random [16], but usually they employ genetic algorithms by mutating input aiming at high coverage of the executable [17], [18], [19], [20]. The instructions need not necessarily be machine code. ...
... First, can we produce Asset Administration Shells by randomly fuzzing the binaries, without further input? We consider well-established C++ fuzzers AFL [18], AFL++ [34] and TortoiseFuzzer [35], standard Go fuzzing [23], and Python fuzzers Atheris [21] and Pythonfuzz [22]. For each class of the meta-model, we generate through templates and the formalized meta-model a small program that expects a JSON text as input corresponding to the given class. ...
... None of the unstructured fuzzers considered (AFL [18], AFL++ [34], TortoiseFuzzer [35], Atheris [21], Pythonfuzz [22], standard Go fuzzing [23]) could produce a result in the given time frame, except the trivial example, an empty JSON object: {}. We trace that back to the difficulty of the problem. ...
... However, these frameworks are primarily designed to test programs that take files as inputs. The input representation used by a fuzzer can significantly impact on its ability to generate effective inputs [1,19,33,45,46]. We observe this to be the case for firmware fuzzing. ...
... Streams can also be extended using a predefined set of interesting values or values collected into a stream-specific dictionary that is dynamically updated as part of the input-to-state stage (Section 3.2). For interesting values, we use a reduced set of values used by existing fuzzers [18,19,51]. We observed that values with a bit pattern of all ones or all zeros (i.e., the bytes 0xff and 0x00) are effective. ...
... This challenge has led to the development of various techniques targeted at addressing this issue [7,9,17,21,27]. Notably, the input-tostate (I2S) replacement technique introduced by Redqueen [2] has proven highly effective at directly solving many complex comparisons and has been incorporated into several fuzzing frameworks [8,18,19,31]. I2S replacement operates in two stages: first, the fuzzer adds instrumentation to record compar- Bytes read from the data register (DR) are scattered throughout the input Figure 6: With a file-based input representation, bytes read from the data register of the UART peripheral and stored into the buffer buf become scattered throughout the input. ...
Rapid embedded device proliferation is creating new targets and opportunities for adversaries. However, the complex interactions between firmware and hardware pose challenges to applying automated testing, such as fuzzing. State-of-the-art methods re-host firmware in emulators and facilitate complex interactions with hardware by provisioning for inputs from a diversity of methods (such as interrupts) from a plethora of devices (such as modems). We recognize a significant disconnect between how a fuzzer generates inputs (as a monolithic file) and how the inputs are consumed during re-hosted execution (as a stream, in slices, per peripheral). We demonstrate the disconnect to significantly impact a fuzzer's effectiveness at discovering inputs that explore deeper code and bugs. We rethink the input generation process for fuzzing mono-lithic firmware and propose a new approach-multi-stream input generation and representation; inputs are now a collection of independent streams, one for each peripheral. We demonstrate the versatility and effectiveness of our approach by implementing: i) stream specific mutation strategies; ii) efficient methods for generating useful values for peripherals; iii) enhancing the use of information learned during fuzzing; and iv) improving a fuzzer's ability to handle roadblocks. We design and build a new fuzzer, MULTIFUZZ, for testing monolithic firmware and evaluate our approach on synthetic and real-world targets. MULTIFUZZ passes all 66 unit tests from a benchmark consisting of 46 synthetic binaries targeting a diverse set of microcontrollers. On an evaluation with 23 real-world firmware targets, MULTIFUZZ outperforms the state-of-the-art fuzzers Fuzzware and Ember-IO. MULTIFUZZ reaches significantly more code on 14 out of the 23 firmware targets and similar coverage on the remainder. Further, MUL-TIFUZZ discovered 18 new bugs on real-world targets, many thoroughly tested by previous fuzzers.
... However, VRust suffers from several limitations: 1) it strictly requires source code to conduct analyses, 2) it suffers from a high number of false alarms, and 3) it does not provide an analyst with enough data to (re-)construct exploit transactions. In contrast, fuzzing is a technique that does not suffer from any of these limitations [3,15,16,46]. The fuzzing input given to the analysis target can also usually be crafted into exploit transactions. ...
... A popular approach to uncover bugs is coverage-guided fuzzing [15,16,53,61]. This technique mutates the inputs based on instructioncoverage data, or feedback information, collected during the target's execution, to uncover new paths in the application. ...
... In this section, we detail the implementation of FuzzDelSol. Fuzz-DelSol uses the state-of-the-art Libafl [16] fuzzer. Libafl's design allows the FuzzDelSol to include its own feedback mechanism in the fuzzing mutation. ...
Solana has quickly emerged as a popular platform for building decentralized applications (DApps), such as marketplaces for non-fungible tokens (NFTs). A key reason for its success are Solana's low transaction fees and high performance, which is achieved in part due to its stateless programming model. Although the literature features extensive tooling support for smart contract security, current solutions are largely tailored for the Ethereum Virtual Machine. Unfortunately, the very stateless nature of Solana's execution environment introduces novel attack patterns specific to Solana requiring a rethinking for building vulnerability analysis methods. In this paper, we address this gap and propose FuzzDelSol, the first binary-only coverage-guided fuzzing architecture for Solana smart contracts. FuzzDelSol faithfully models runtime specifics such as smart contract interactions. Moreover, since source code is not available for the large majority of Solana contracts, FuzzDelSol operates on the contract's binary code. Hence, due to the lack of semantic information, we carefully extracted low-level program and state information to develop a diverse set of bug oracles covering all major bug classes in Solana. Our extensive evaluation on 6049 smart contracts shows that FuzzDelSol's bug oracles find bugs with a high precision and recall. To the best of our knowledge, this is the largest evaluation of the security landscape on the Solana mainnet.
... More recently, fuzzing has been increasingly applied to space systems. Fuzzing is an automated software testing approach in which a system under test is repeatedly presented with automatically generated inputs [19]. Presenting a program that expects structured input with unexpected formats or random data aims to find unhandled edge cases resulting in bugs. ...
... We set up fuzzing for libCSP with LibAFL [19], a highly customizable fuzzing library implemented in Rust. To fuzz the CAN interface backend, we write a custom harness that allows us to put fuzzer-generated pseudo packets into the queue and test if certain parts of the library crash. ...
The development of safety-critical aerospace systems is traditionally dominated by the C language. Its language characteristics make it trivial to accidentally introduce memory safety issues resulting in undefined behavior or security vulnerabilities. The Rust language aims to drastically reduce the chance of introducing bugs and consequently produces overall more secure and safer code. However, due to its relatively short lifespan, industry adaption in safety-critical environments is still lacking. This work provides a set of recommendations for the development of safety-critical space systems in Rust. Our recommendations are based on insights from our multi-fold contributions towards safer and more secure aerospace systems: We provide a comprehensive overview of ongoing efforts to adapt Rust for safety-critical system programming, highlighting its potential to enhance system robustness. Next, we introduce a procedure for partially rewriting C-based systems in Rust, offering a pragmatic pathway to improving safety without necessitating a full system overhaul. During the execution of our rewriting case study, we identify and fix three previously undiscovered vulnerabilities in a popular open-source satellite communication protocol. Finally, we introduce a new Rust compiler target configuration for bare metal PowerPC. With this, we aim to broaden Rust's applicability in space-oriented projects, as the architecture is commonly encountered in the domain, e.g., in the James Webb Space Telescope.
... More recently, only Fuzzilli [65] (published 2023, open-sourced early 2019) was used by multiple works for their evaluation, even before the paper was published. This does not account for techniques replicated in AFL++ or LibAFL [57], which reimplement many successful techniques proposed [7], [15], [90], [110]. On average, a fuzzing paper evaluates against 3.2 other fuzzers. ...
Fuzzing has proven to be a highly effective approach to uncover software bugs over the past decade. After AFL popularized the groundbreaking concept of lightweight coverage feedback, the field of fuzzing has seen a vast amount of scientific work proposing new techniques, improving methodological aspects of existing strategies, or porting existing methods to new domains. All such work must demonstrate its merit by showing its applicability to a problem, measuring its performance, and often showing its superiority over existing works in a thorough, empirical evaluation. Yet, fuzzing is highly sensitive to its target, environment, and circumstances, e.g., randomness in the testing process. After all, relying on randomness is one of the core principles of fuzzing, governing many aspects of a fuzzer's behavior. Combined with the often highly difficult to control environment, the reproducibility of experiments is a crucial concern and requires a prudent evaluation setup. To address these threats to validity, several works, most notably Evaluating Fuzz Testing by Klees et al., have outlined how a carefully designed evaluation setup should be implemented, but it remains unknown to what extent their recommendations have been adopted in practice. In this work, we systematically analyze the evaluation of 150 fuzzing papers published at the top venues between 2018 and 2023. We study how existing guidelines are implemented and observe potential shortcomings and pitfalls. We find a surprising disregard of the existing guidelines regarding statistical tests and systematic errors in fuzzing evaluations. For example, when investigating reported bugs, ...
... Whenever the computer program wants to interact with the database it uses library function calls. For our vulnerability detection tool we apply the same approach as used by libAFL [10]: We hook the library functions and check the query parameter for unsanitized special elements. If we find any query with unsanitized elements we raise an error. ...
In modern software development, vulnerability detection is crucial due to the inevitability of bugs and vulnerabilities in complex software systems. Effective detection and elimination of these vulnerabilities during the testing phase are essential. Current methods, such as fuzzing, are widely used for this purpose. While fuzzing is efficient in identifying a broad range of bugs and vulnerabilities by using random mutations or generations, it does not guarantee correctness or absence of vulnerabilities. Therefore, non-random methods are preferable for ensuring the safety and security of critical infrastructure and control systems. This paper presents a vulnerability detection approach based on symbolic execution and control flow graph analysis to identify various types of software weaknesses. Our approach employs a divide-and-conquer algorithm to eliminate irrelevant program information, thus accelerating the process and enabling the analysis of larger programs compared to traditional symbolic execution and model checking methods.
... These techniques aim to uncover different types of programming errors, suboptimal coding practices, deprecated functions, or potential memory safety issues. Generally speaking, these security evaluation methods fall into two main categories: static analysis [9,7,43] and dynamic analysis [60,25,55,26,72]. Static analysis involves reviewing the program's code without executing it and detecting issues like buffer overflows and improper API use by applying predefined rules and patterns [43]. ...
Large language models (LLMs) have shown great potential for automatic code generation and form the basis for various tools such as GitHub Copilot. However, recent studies highlight that many LLM-generated code contains serious security vulnerabilities. While previous work tries to address this by training models that generate secure code, these attempts remain constrained by limited access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes by automatically synthesizing secure codes, which reduces the effort of finding suitable training data. HexaCoder comprises two key components: an oracle-guided data synthesis pipeline and a two-step process for secure code generation. The data synthesis pipeline generates pairs of vulnerable and fixed codes for specific Common Weakness Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing vulnerable code. A security oracle identifies vulnerabilities, and a state-of-the-art LLM repairs them by extending and/or editing the codes, creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA) method. Each example of our fine-tuning dataset includes the necessary security-related libraries and code that form the basis of our novel two-step generation approach. This allows the model to integrate security-relevant libraries before generating the main code, significantly reducing the number of generated vulnerable codes by up to 85% compared to the baseline methods. We perform extensive evaluations on three different benchmarks for four LLMs, demonstrating that HexaCoder not only improves the security of the generated code but also maintains a high level of functional correctness.
... We implement our proposed design in a tool called DarthShader, amounting to 10, 000 lines of code. While not building on a domainspecific fuzzer, we do reuse components of libAFL [20] for general fuzzer housekeeping, tree-sitter [34] for parsing shaders, and naga [5] for its IR. Below, we highlight two of our building blocks. ...
A recent trend towards running more demanding web applications, such as video games or client-side LLMs, in the browser has led to the adoption of the WebGPU standard that provides a cross-platform API exposing the GPU to websites. This opens up a new attack surface: Untrusted web content is passed through to the GPU stack, which traditionally has been optimized for performance instead of security. Worsening the problem, most of WebGPU cannot be run in the tightly sandboxed process that manages other web content, which eases the attacker's path to compromising the client machine. Contrasting its importance, WebGPU shader processing has received surprisingly little attention from the automated testing community. Part of the reason is that shader translators expect highly structured and statically typed input, which renders typical fuzzing mutations ineffective. Complicating testing further, shader translation consists of a complex multi-step compilation pipeline, each stage presenting unique requirements and challenges. In this paper, we propose DarthShader, the first language fuzzer that combines mutators based on an intermediate representation with those using a more traditional abstract syntax tree. The key idea is that the individual stages of the shader compilation pipeline are susceptible to different classes of faults, requiring entirely different mutation strategies for thorough testing. By fuzzing the full pipeline, we ensure that we maintain a realistic attacker model. In an empirical evaluation, we show that our method outperforms the state-of-the-art fuzzers regarding code coverage. Furthermore, an extensive ablation study validates our key design. DarthShader found a total of 39 software faults in all modern browsers -- Chrome, Firefox, and Safari -- that prior work missed. For 15 of them, the Chrome team assigned a CVE, acknowledging the impact of our results.
... Interestingly, the field of fuzzing, otherwise renowned for effective bug finding [19], shows a noticeable gap in testing networkfacing applications. Despite the advent of AFL [43] sparking a renaissance of fuzzing methods with hundreds of new fuzzers proposed [2,6,14,15,29,42,45], most of them are limited to a specific subset of software. Most benchmarks [20,21,30], academic papers in general [34], and public industry initiatives such as OSS-Fuzz [19] focus predominantly on comparably simple C/C++ Linux programs that consume byte-oriented input. ...
Network-facing applications are commonly exposed to all kinds of attacks, especially when connected to the internet. As a result, web servers like Nginx or client applications such as curl make every effort to secure and harden their code to rule out memory safety violations. One would expect this to include regular fuzz testing, as fuzzing has proven to be one of the most successful approaches to uncovering bugs in software. Yet, surprisingly little research has focused on fuzzing network applications. When studying the underlying reasons, we find that the interactive nature of communication, its statefulness, and the protection of exchanged messages render typical fuzzers ineffective. Attempts to replay recorded messages or modify them on the fly only work for specific targets and often lead to early termination of communication. In this paper, we discuss these challenges in detail, highlighting how the focus of existing work on protocol state space promises little relief. We propose a fundamentally different approach that relies on fault injection rather than modifying messages. Effectively, we force one of the communication peers into a weird state where its output no longer matches the expectations of the target peer, potentially uncovering bugs. Importantly, this weird peer can still properly encrypt/sign the protocol message, overcoming a fundamental challenge of current fuzzers. In effect, we leave the communication system intact but introduce small corruptions. Since we can turn either the server or the client into the weird peer, our approach is the first that can effectively test client-side network applications. Evaluating 16 targets, we show that Fuzztruction-Net outperforms other fuzzers in terms of coverage and bugs found. Overall, Fuzztruction-Net uncovered 23 new bugs in well-tested software, such as the web servers Nginx and Apache HTTPd and the OpenSSH client.
... Step 4: Fuzzing: The final step is fuzzing the control logic using the proposed PLC runtime along with a fuzzer. To do that, we implemented a custom fuzzer in Rust based on the LibAFL [42] fuzzing library. In further detail, LibAFL is a popular library that provides a set of utilities for building custom fuzzers. ...
Rigorous testing methods are essential for ensuring the security and reliability of industrial controller software. Fuzzing, a technique that automatically discovers software bugs, has also proven effective in finding software vulnerabilities. Unsurprisingly, fuzzing has been applied to a wide range of platforms, including programmable logic controllers (PLCs). However, current approaches, such as coverage-guided evolutionary fuzzing implemented in the popular fuzzer American Fuzzy Lop Plus Plus (AFL++), are often inadequate for finding logical errors and bugs in PLC control logic applications. They primarily target generic programming languages like C/C++, Java, and Python, and do not consider the unique characteristics and behaviors of PLCs, which are often programmed using specialized programming languages like Structured Text (ST). Furthermore, these fuzzers are ill suited to deal with complex input structures encapsulated in ST, as they are not specifically designed to generate appropriate input sequences. This renders the application of traditional fuzzing techniques less efficient on these platforms. To address this issue, this paper presents a fuzzing framework designed explicitly for PLC software to discover logic bugs in applications written in ST specified by the IEC 61131-3 standard. The proposed framework incorporates a custom-tailored PLC runtime and a fuzzer designed for the purpose. We demonstrate its effectiveness by fuzzing a collection of ST programs that were crafted for evaluation purposes. We compare the performance against a popular fuzzer, namely, AFL++. The proposed fuzzing framework demonstrated its capabilities in our experiments, successfully detecting logic bugs in the tested PLC control logic applications written in ST. On average, it was at least 83 times faster than AFL++, and in certain cases, for example, it was more than 23,000 times faster.
... Notable fuzzers include AFL++ [11] and hongfuzz 9 , with LibAFL providing customization for task-specific adaptations [12]. Fuzzing is also viable on embedded devices through hardware in the loop testing [10,3], demonstrating its adaptability and effectiveness in various settings. ...
Connected Medical Devices (CMDs) significantly benefit patients but are also vulnerable to malfunctions that can harm. Despite strict safety regulations for market entry, there's a notable shortage of specific cybersecurity frameworks for CMDs. Existing regulations on cybersecurity practices are often broad and lack detailed implementation steps. This paper introduces the CyMed framework, designed for vendors and end-users, offering explicit strategies to enhance the cybersecurity of CMDs. The effectiveness of CyMed is assessed through practical testing and expert interviews.
... Despite the approach being straightforward, clever engineering stunts (like code instrumentation) made the technique very effective and scalable. The pioneer of the approach is AFL [13] (and its successors AFL++ and the more recent LibAFL [6]), which, since 2013, has found plenty of bugs daily. The problem is that the above-mentioned tools are devised to tackle stateless systems (as explained in Section 2), i.e. systems that do not have any notion of sessions and do not need to implement any state model. ...
Fuzzing has been proven extremely effective in finding vulnerabilities in software. When it comes to fuzz stateless systems, analysts have no doubts about the choice to make. In fact, among the plethora of stateless fuzzers devised in the last 20 years, AFL (with its descendants AFL++ and LibAFL) stood up for its effectiveness, speed and ability to find bugs. On the other hand, when dealing with stateful systems, it is not clear what is the best tool to use. In fact, the research community struggles to devise (and benchmark) effective and generic stateful fuzzers. In this short paper, we discuss the reasons that make stateful fuzzers difficult to devise and benchmark.
... Note that the decomposition of a fuzzer into components presented here is designed for analyzing the various techniques used; it may seem to be inconvenient for practical implementation of tools. The decomposition of a fuzzer into components in terms of efficient and configurable implementation of such tools is considered in [26]. ...
The article presents a survey of software dynamic analysis methods. The main focus of the survey is on methods supported by tools, targeted on software security verification and applicable to system software. The survey examines in detail fuzzing and dynamic symbolic execution techniques. Dynamic taint data analysis is excluded due to difficulty of gathering technical details of its implementation. Review of fuzzing and dynamic symbolic execution is focused mostly on the techniques used in supporting tools, not on tools themselves, because their number exceeds 100 already. Also, the techniques of fuzzing counteraction are surveyed.
... Since every fuzzing task is unique, fuzzers usually need to be adapted to the task given. For example, LibAFL offers good customization options and makes it easier to adapt a fuzzer to a specific task ( [18]). Fuzzing can also applied directly on embedded devices using hardware in the loop testing techniques ( [19,20]). ...
Connected Medical Devices (CMDs) have a large impact on patients as they allow them to lead a more normal life. Any malfunction could not only remove the health benefits the CMDs provide, they could also cause further harm to the patient. Due to this, there are many safety regulations which must be adhered to prior to a CMD entering the market. However, while many detailed safety regulations exist, there are a fundamental lack of cybersecurity frameworks applicable to CMDs. While there are recent regulations which aim to enforce cybersecurity practices, they are vague and do not contain the concrete steps necessary to implement cybersecurity. This paper aims to fill that gap by describing a framework, CyMed, to be used by vendors and ens-users, which contains concrete measures to improve the resilience of CMDs against cyber attack. The CyMed framework is subsequently evaluated based on practical tests as well as expert interviews.
... AFL++ [12] combines other open source fuzzers into a new fuzzer, which contains a variety of novel improvements. LibAFL [13] deconstructs AFL into modules to integrate orthogonal techniques. ...
Traditional coverage grey-box fuzzers perform a breadth-first search of the state space of Program Under Test (PUT). This aimlessness wastes a lot of computing resources. Directed grey-box fuzzing focuses on the target of PUT and becomes one of the most popular topics of software testing. The early termination of unreachable test cases is a method to improve directed grey-box fuzzing. However, existing solutions have two problems: firstly, reachability analysis needs to introduce extra technologies (e.g., static analysis); secondly, the performance of reachability analysis and auxiliary technologies lack versatility. We propose FGo, a probabilistic exponential cut-the-loss directed grey-box fuzzer. FGo terminates unreachable test cases early with exponentially increasing probability. Compared to other technologies, FGo makes full use of the unreachable information contained in iCFG and doesn't generate any additional overhead caused by reachability analysis. Moreover, it is easy to generalize to all PUT. This strategy based on probability is perfectly adapted to the randomness of fuzzing. The experiment results show that FGo is 106% faster than AFLGo in reproducing crashes. We compare multiple parameters of probabilistic exponential cut-the-loss algorithm and analyze them in detail. In addition, for enhancing the inerpretability of FGo, this paper discusses the difference between the theoretical performance and the practical performance of probabilistic exponential cut-the-loss algorithm.
... To implement the snapshot-based fuzzing algorithm and both the dataflow and comparison waypoints, we build ItyFuzz from scratch. We use LibAFL [7] as a backbone and implement a separate state corpus to support snapshotting the states. We also incorporate the dataflow and comparison waypoints into ItyFuzz using customized feedback. ...
Smart contracts are critical financial instruments, and their security is of utmost importance. However, smart contract programs are difficult to fuzz due to the persistent blockchain state behind all transactions. Mutating sequences of transactions are complex and often lead to a suboptimal exploration for both input and program spaces. In this paper, we introduce a novel snapshot-based fuzzer ItyFuzz for testing smart contracts. In ItyFuzz, instead of storing sequences of transactions and mutating from them, we snapshot states and singleton transactions. To explore interesting states, ItyFuzz introduces a dataflow waypoint mechanism to identify states with more potential momentum. ItyFuzz also incorporates comparison waypoints to prune the space of states. By maintaining snapshots of the states, ItyFuzz can synthesize concrete exploits like reentrancy attacks quickly. Because ItyFuzz has second-level response time to test a smart contract, it can be used for on-chain testing, which has many benefits compared to local development testing. Finally, we evaluate ItyFuzz on real-world smart contracts and some hacked on-chain DeFi projects. ItyFuzz outperforms existing fuzzers in terms of instructional coverage and can find and generate realistic exploits for on-chain projects quickly.
The growing complexity and size of hardware de-
signs necessitates novel, scalable approaches to verification, as
latent bugs and security flaws have devastating impact. This
is especially critical since bugs in hardware designs cannot be
patched after manufacturing. Currently, dynamic verification is
the predominant methodology for detecting hardware design
flaws, where detection efficiency is primarily determined by
the choice of (random) inputs to the design under test. More
elaborate recent methods adapt principles from greybox software
fuzzing to achieve high coverage in short time. However, these
existing greybox methods rely on heavy instrumentation or soft-
ware conversion, which requires access to the design source code.
Fuzzing of blackbox hardware designs has only been possible
with random, undirected input generation up until now, which
requires a long time to cover the majority of possible hardware
states. In this work, we propose FUZZ-E, a novel scalable method
for coverage-guided hardware design fuzzing, where coverage
is indirectly estimated through on-chip voltage measurements
on FPGAs. The side-channel-based FUZZ-E approach enables
testing blackbox hardware designs without requiring access
to any internal signals. We provide an extensive analysis of
the correlation between hardware design coverage and voltage
fluctuations, and show how FUZZ-E significantly reduces the
verification time required to achieve desirable design coverage.
Agile hardware development methodology has been widely adopted over the past decade. Despite the research progress, the industry still doubts its applicability, especially for the functional verification of complicated processor chips. Functional verification commonly employs a simulation-based method of co-simulating the design under test with a reference model and checking the consistency of their outcomes given the same input stimuli. We observe limited collaboration and information exchange through the design and verification processes, dramatically leading to inefficiencies when applying the conventional functional verification workflow to agile development. In this paper, we propose workflow integration with collaborative task delegation and dynamic information exchange as the design principles to effectively address the challenges on functional verification under the agile development model. Based on workflow integration, we enhance the functional verification workflows with a series of novel methodologies and toolchains. The diff-rule based agile verification methodology (DRAV) reduces the overhead of building reference models with runtime execution information from designs under test. We present the RISC-V implementation for DRAV, DiffTest, which adopts information probes to extract internal design behaviors for co-simulation and debugging. It further integrates two plugins, namely XFUZZ for effective test generation guided by design coverage metrics and LightSSS for efficient fault analysis triggered by co-simulation mismatches. We present the integrated workflows for agile hardware development and demonstrate their effectiveness in designing and verifying RISC-V processors with 33 functional bugs found in NutShell. We also illustrate the efficiency of the proposed toolchains with a case study on a functional bug in the L2 cache of XiangShan.
Fuzzing has become an important method for finding vulnerabilities in software. For fuzzing programs expecting structural inputs, syntactic- and semantic-aware fuzzing approaches have been particularly proposed. However, they still cannot fuzz in-memory data stores sufficiently, since some code paths are only executed when the required data are available. In this article, we propose a data-aware fuzzing method, DAFuzz, which is designed by considering the data used during fuzzing. Specifically, to ensure different data-sensitive code paths are exercised, DAFuzz first loads different kinds of data into the stores before feeding fuzzing inputs. Then, when generating inputs, DAFuzz ensures the generated inputs are not only syntactically and semantically valid but also use the data correctly. We implement a prototype of DAFuzz based on Superion and use it to fuzz Redis and Memcached . Experiments show that DAFuzz covers 13~95% more edges than AFL, Superion, AFL++, and AFLN et , and discovers vulnerabilities over 2.7× faster. In total, we discovered four new vulnerabilities in Redis and Memcached . All the vulnerabilities were reported to developers and have been acknowledged and fixed.
Fuzz testing, which repeatedly executes a given program with auto-generated random inputs, and records its dynamic control flow, aims to discover sources of unexpected program behavior impacting security, which can then be fixed easier by directed developer effort. When targeting IoT devices, fuzzing faces the problem that the small IoT processors often lack the observability required for fuzzing, e.g., a high-performance trace port, while software-emulation on a faster host CPU is often slow, and compilation of the IoT application to a different ISA for faster native execution on the host introduces inaccuracies in the fuzzing process. To overcome all three of these drawbacks for RISC-V-based IoT processors, which are expected to dominate future IoT applications with their lack of ISA licensing costs, we modify an open-source RISC-V core for use in an FPGA-based hardware-accelerated fuzzing system. Our fuzzer has demonstrated up to four times the performance of the state-of-the-art QEMU-based fuzzing tool AFL++, even when running on very fast x86 host processors clocked at 4.95 GHz.KeywordsSecurityFuzzingLibAFLTaPaSCoRISC-VCoverage
Recent years have witnessed a wide array of results
in software testing, exploring different approaches and methodologies ranging from fuzzers to symbolic engines, with a full spectrum of instances in between such as concolic execution
and hybrid fuzzing. A key ingredient of many of these tools is
Satisfiability Modulo Theories (SMT) solvers, which are used to
reason over symbolic expressions collected during the analysis. In
this paper, we investigate whether techniques borrowed from the
fuzzing domain can be applied to check whether symbolic formulas are satisfiable in the context of concolic and hybrid fuzzing
engines, providing a viable alternative to classic SMT solving
techniques. We devise a new approximate solver, FUZZY-SAT,
and show that it is both competitive with and complementary
to state-of-the-art solvers such as Z3 with respect to handling
queries generated by hybrid fuzzers.
Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually written format specifications, in this paper we present a technique to automatically generate and mutate inputs for unknown chunk-based binary formats. We propose a technique to identify dependencies between input bytes and comparison instructions, and later use them to assign tags that characterize the processing logic of the program. Tags become the building block for structure-aware mutations involving chunks and fields of the input. We show that our techniques performs comparably to structure-aware fuzzing proposals that require human assistance. Our prototype implementation WEIZZ revealed 16 unknown bugs in widely used programs.
Fuzz testing techniques are becoming pervasive for their ever-improving ability to generate crashing trial cases for programs. Memory safety violations however can lead to silent corruptions and errors, and a fuzzer may recognize them only in the presence of sanitization machinery. For closed-source software combining sanitization with fuzzing incurs practical obstacles that we try to tackle with an architecture-independent proposal called QASan for detecting heap memory violations. In our tests QASan is competitive with standalone sanitizers and adds a moderate 1.61x average slowdown to the AFL++ fuzzer while enabling it to reveal more heap-related bugs.
In this paper, we present AFL ++ , a community-driven open-source tool that incorporates state-of-the-art fuzzing research, to make the research comparable, reproducible, combinable and-most importantly-useable. It offers a variety of novel features, for example its Custom Mutator API, able to extend the fuzzing process at many stages. With it, mutators for specific targets can also be written by experienced security testers. We hope for AFL ++ to become a new baseline tool not only for current, but also for future research, as it allows to test new techniques quickly, and evaluate not only the effectiveness of the single technique versus the state-of-the-art, but also in combination with other techniques. The paper gives an evaluation of hand-picked fuzzing technologies-shining light on the fact that while each novel fuzzing method can increase performance in some targets-it decreases performance for other targets. This is an insight future fuzzing research should consider in their evaluations.
Coverage-guided fuzz testing has gained prominence as a highly effective method of finding security vulnerabilities such as buffer overflows in programs that parse binary data. Recently, researchers have introduced various specializations to the coverage-guided fuzzing algorithm for different domain-specific testing goals, such as finding performance bottlenecks, generating valid inputs, handling magic-byte comparisons, etc. Each such solution can require non-trivial implementation effort and produces a distinct variant of a fuzzing tool. We observe that many of these domain-specific solutions follow a common solution pattern.
In this paper, we present FuzzFactory, a framework for developing domain-specific fuzzing applications without requiring changes to mutation and search heuristics. FuzzFactory allows users to specify the collection of dynamic domain-specific feedback during test execution, as well as how such feedback should be aggregated. FuzzFactory uses this information to selectively save intermediate inputs, called waypoints, to augment coverage-guided fuzzing. Such waypoints always make progress towards domain-specific multi-dimensional objectives. We instantiate six domain-specific fuzzing applications using FuzzFactory: three re-implementations of prior work and three novel solutions, and evaluate their effectiveness on benchmarks from Google's fuzzer test suite. We also show how multiple domains can be composed to perform better than the sum of their parts. For example, we combine domain-specific feedback about strict equality comparisons and dynamic memory allocations, to enable the automatic generation of LZ4 bombs and PNG bombs.
Many kinds of memory safety vulnerabilities have been endangering software systems for decades. Amongst other approaches, fuzzing is a promising technique to unveil various software faults. Recently, feedback-guided fuzzing demonstrated its power, producing a steady stream of security-critical software bugs. Most fuzzing efforts-especially feedback fuzzing-are limited to user space components of an operating system (OS), although bugs in kernel components are more severe, because they allow an attacker to gain access to a system with full privileges. Unfortunately, kernel components are difficult to fuzz as feedback mechanisms (i.e., guided code coverage) cannot be easily applied. Additionally, non-determinism due to interrupts, kernel threads, statefulness, and similar mechanisms poses problems. Furthermore, if a process fuzzes its own kernel, a kernel crash highly impacts the performance of the fuzzer as the OS needs to reboot. In this paper, we approach the problem of coverage-guided kernel fuzzing in an OS-independent and hardware-assisted way: We utilize a hypervisor and In-tel's Processor Trace (PT) technology. This allows us to remain independent of the target OS as we just require a small user space component that interacts with the targeted OS. As a result, our approach introduces almost no performance overhead, even in cases where the OS crashes, and performs up to 17,000 executions per second on an off-the-shelf laptop. We developed a framework called kernel-AFL (kAFL) to assess the security of Linux, macOS, and Windows kernel components. Among many crashes, we uncovered several flaws in the ext4 driver for Linux, the HFS and APFS file system of macOS, and the NTFS driver of Windows.
Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.
Calling context trees are one of the most fundamental data structures for representing the interprocedural control flow of a program, providing valuable information for program understanding and optimization. Nodes of a calling context tree associate performance metrics to whole distinct paths in the call graph starting from the root function. However, no explicit information is provided for detecting short hot sequences of activations, which may be a better optimization target in large modular programs where groups of related functions are reused in many different parts of the code. Furthermore, calling context trees can grow prohibitively large in some scenarios. Another classical approach, called edge profiling, collects performance metrics for caller-callee pairs in the call graph, allowing it to detect hot paths of fixed length one. We study a generalization of edge and context-sensitive profiles by introducing a novel data structure called k-calling context forest (k-CCF). Nodes in a k-CCF associate performance metrics to paths of length at most k that lead to each distinct routine of the program, providing edge profiles for k=1, full context-sensitive profiles for k equal to infinity, as well as any other intermediate point in the spectrum. We study the properties of the k-CCF both theoretically and experimentally on a large suite of prominent Linux applications, showing how to construct it efficiently and discussing its relationships with the calling context tree. Our experiments show that the k-CCF can provide effective space-accuracy tradeoffs for interprocedural contextual profiling, yielding useful clues to the hot spots of a program that may be hidden in a calling context tree and using less space for small values of k, which appear to be the most interesting in practice.
Fuzz testing is an automated technique providing random data as input to a software system in the hope to expose a vulnerability. In order to be effective, the fuzzed input must be common enough to pass elementary consistency checks; a JavaScript interpreter, for instance, would only accept a semantically valid program. On the other hand, the fuzzed input must be uncommon enough to trigger exceptional behavior, such as a crash of the interpreter. The LangFuzz approach resolves this conflict by using a grammar to randomly generate valid programs; the code fragments, however, partially stem from programs known to have caused invalid behavior before. LangFuzz is an effective tool for security testing: Applied on the Mozilla JavaScript interpreter, it discovered a total of 105 new severe vulnerabilities within three months of operation (and thus became one of the top security bug bounty collectors within this period); applied on the PHP interpreter, it discovered 18 new defects causing crashes.
Fuzz testing is an effective technique for finding security vulnerabilities in software. Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a pro- gram and test the resulting values. We present an alterna- tive whitebox fuzz testing approach inspired by recent ad- vances in symbolic execution and dynamic test generation. Our approach records an actual run of the program un- der test on a well-formed input, symbolically evaluates the recorded trace, and gathers constraints on inputs capturing how the program uses these. The collected constraints are then negated one by one and solved with a constraint solver, producing new inputs that exercise different control paths in the program. This process is repeated with the help of a code-coverage maximizing heuristic designed tofind defects as fast as possible. We have implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for whitebox fuzzing of arbitrary file-reading Windows ap- plications. We describe key optimizations needed to make dynamic test generation scale to large input files and long execution traces with hundreds of millions of instructions. We then present detailed experiments with several Windows applications. Notably, without any format-specific knowl- edge, SAGE detects the MS07-017 ANI vulnerability, which was missed by extensive blackbox fuzzing and static analy- sis tools. Furthermore, while still in an early stage of de- velopment, SAGE has already discovered 30+ new bugs in large shipped Windows applications including image pro- cessors, media players, and file decoders. Several of these bugs are potentially exploitable memory access violations.
This project started as a simple experiment to try to better understand an observed phenomenon, that of programs crashing when a noisy dial-up line is used. As a result of testing a comprehensive list of utility programs on several versions of Unix, it appears that this is not an isolated problem. Thus, this paper supplies a list of bug reports to fix the utilities that we were able to crash. This should also improve the quality reliability of Unix utilities. This paper also supplies a simple, but effective test method (and tools).
We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.
We present the first approach to automatic exploit generation for heap overflows in interpreters. It is also the first approach to exploit generation in any class of program that integrates a solution for automatic heap layout manipulation. At the core of the approach is a novel method for discovering exploit primitives---inputs to the target program that result in a sensitive operation, such as a function call or a memory write, utilizing attacker-injected data. To produce an exploit primitive from a heap overflow vulnerability, one has to discover a target data structure to corrupt, ensure an instance of that data structure is adjacent to the source of the overflow on the heap, and ensure that the post-overflow corrupted data is used in a manner desired by the attacker. Our system addresses all three tasks in an automatic, greybox, and modular manner. Our implementation is called GOLLUM, and we demonstrate its capabilities by producing exploits from 10 unique vulnerabilities in the PHP and Python interpreters, 5 of which do not have existing public exploits.
Coverage-based greybox fuzzing (CGF) is one of the most successful approaches for automated vulnerability detection. Given a seed file (as a sequence of bits), a CGF randomly flips, deletes or copies some bits to generate new files. CGF iteratively constructs (and fuzzes) a seed corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely to produce valid files (or valid chunks in files), for applications processing complex file formats. In this work, we introduce smart greybox fuzzing (SGF) which leverages a high-level structural representation of the seed file to generate new files. We define innovative mutation operators that work on the virtual file structure rather than on the bit level which allows SGF to explore completely new input domains while maintaining file validity. We introduce a novel validity-based power schedule that enables SGF to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. Our evaluation demonstrates the effectiveness of SGF. On several libraries that parse complex chunk-based files, our tool AFLSMART achieves substantially more branch coverage (up to 87% improvement), and exposes more vulnerabilities than baseline AFL. Our tool AFLSMART has discovered 42 zero-day vulnerabilities in widely-used, well-tested tools and libraries; so far 17 CVEs were assigned.
In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop (AFL), has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the bugs it can find since it simply does not cover large regions of code. If it does not cover parts of the code, it will not find bugs there. We propose a two-pronged approach to increase the coverage achieved by AFL. First, the approach automatically identifies branches exercised by few AFL-produced inputs (rare branches), which often guard code that is empirically hard to cover by naively mutating inputs. The second part of the approach is a novel mutation mask creation algorithm, which allows mutations to be biased towards producing inputs hitting a given rare branch. This mask is dynamically computed during fuzz testing and can be adapted to other testing targets. We implement this approach on top of AFL in a tool named FairFuzz. We conduct evaluation on real-world programs against state-of-the-art versions of AFL. We find that on these programs FairFuzz achieves high branch coverage at a faster rate that state-of-the-art versions of AFL. In addition, on programs with nested conditional structure, it achieves sustained increases in branch coverage after 24 hours (average 10.6% increase). In qualitative analysis, we find that FairFuzz has an increased capacity to automatically discover keywords.
Existing Greybox Fuzzers (GF) cannot be effectively directed, for instance, towards problematic changes or patches, towards critical system calls or dangerous locations, or towards functions in the stack-trace of a reported vulnerability that we wish to reproduce. In this paper, we introduce Directed Greybox Fuzzing (DGF) which generates inputs with the objective of reaching a given set of target program locations efficiently. We develop and evaluate a simulated annealing-based power schedule that gradually assigns more energy to seeds that are closer to the target locations while reducing energy for seeds that are further away. Experiments with our implementation AFLGo demonstrate that DGF outperforms both directed symbolic-execution-based whitebox fuzzing and undirected greybox fuzzing. We show applications of DGF to patch testing and crash reproduction, and discuss the integration of AFLGo into Google's continuous fuzzing platform OSS-Fuzz. Due to its directedness, AFLGo could find 39 bugs in several well-fuzzed, security-critical projects like LibXML2. 17 CVEs were assigned.
Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test is generated by slightly mutating a seed input. If the test exercises a new and interesting path, it is added to the set of seeds; otherwise, it is discarded. We observe that most tests exercise the same few "high-frequency" paths and develop strategies to explore significantly more paths with the same number of tests by gravitating towards low-frequency paths. We explain the challenges and opportunities of CGF using a Markov chain model which specifies the probability that fuzzing the seed that exercises path i generates an input that exercises path j. Each state (i.e., seed) has an energy that specifies the number of inputs to be generated from that seed. We show that CGF is considerably more efficient if energy is inversely proportional to the density of the stationary distribution and increases monotonically every time that seed is chosen. Energy is controlled with a power schedule.
We implemented the exponential schedule by extending AFL. In 24 hours, AFLFAST exposes 3 previously unreported CVEs that are not exposed by AFL and exposes 6 previously unreported CVEs 7x faster than AFL. AFLFAST produces at least an order of magnitude more unique crashes than AFL.
The ability to introspect into the behavior of software at runtime is crucial for many security-related tasks, such as virtual machine-based intrusion detection and low-artifact malware analysis. Although some progress has been made in this task by automatically creating programs that can passively retrieve kernel-level information, two key challenges remain. First, it is currently difficult to extract useful information from user-level applications, such as web browsers. Second, discovering points within the OS and applications to hook for active monitoring is still an entirely manual process. In this paper we propose a set of techniques to mine the memory accesses made by an operating system and its applications to locate useful places to deploy active monitoring, which we call tap points. We demonstrate the efficacy of our techniques by finding tap points for useful introspection tasks such as finding SSL keys and monitoring web browser activity on five different operating systems (Windows 7, Linux, FreeBSD, Minix and Haiku) and two processor architectures (ARM and x86).
Fuzz testing is an effective technique for finding security vulnerabilities in software. Fuzz testing is a form of blackbox random testing which randomly mutates well-formed inputs and tests the program on the resulting data. In some cases, grammars are used to randomly generate the well-formed inputs. This also allows the tester to encode applicationspecific knowledge (such as corner cases of particular interest) as part of the grammar, and to specify test heuristics by assigning probabilistic weights to production rules. Although fuzz testing can be remarkably effective, the limitations of blackbox random testing are well-known. For instance, the then branch of the conditional statement “if (x==10) then”has only one in 2 32 chances of being exercised if x is a randomly chosen 32-bit input value. This intuitively
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arises due to factors such as the work-imbalance and the serial fraction. The second one is due to the interaction between the algorithm and the architecture and arises due to latency and contention in the network. A top-down approach to scalability study of shared memory parallel systems is proposed in this research. We define the notion of overhead functions associated with the different algorithmic and architectural characteristics to quantify the scalability of parallel systems; we develop a method for separating the algorithmic overhead into a serial component and a work-imbalance component; we also develop a method for isolating the overheads due to network latency and contention from the ...
Peach fuzzing platform
Jan 2022
M Eddington
Eddington M.
GRIMOIRE: Synthesizing Structure while Fuzzing
Jan 2019
Tim Blazytko
Cornelius Aschermann
Moritz Schlögel
Ali Abbasi
Sergej Schumilo
Simon Wörner
Thorsten Holz
Blazytko Tim
Marcel Böhme, Valentin Manès, and Sang Kil Cha. 2020. Boosting Fuzzer Efficiency: An Information Theoretic Perspective
Jan 2020
Marcel Böhme
Valentin Manès
Sang Kil Cha
Böhme Marcel
Payer, and Antony L. Hosking. 2021. Seed Selection for Successful Fuzzing
Jan 2021
Adrian Herrera
Hendra Gunadi
Shane Magrath
Michael Norrish
Mathias Payer
Antony L Hosking
Herrera Adrian
Symbolic-Model-Guided Fuzzing of Cryptographic Protocols. Master's thesis
Maximilian Ammann
Ammann Maximilian
AURORA: Statistical Crash Analysis for Automated Root Cause Explanation
Jan 2020
Tim Blazytko
Moritz Schlögel
Cornelius Aschermann
Ali Abbasi
Joel Frank
Simon Wörner
Thorsten Holz
Blazytko Tim
Daniele Cono D'Elia, and Davide Balzarotti. 2021. The Use of Likely Invariants as Feedback for Fuzzers
Jan 2021
Andrea Fioraldi
Daniele Cono D'elia
Davide Balzarotti
Fioraldi Andrea
MOPT: Optimized Mutation Sched- uling for Fuzzers
Jan 2019
Chenyang Lyu
Shouling Ji
Chao Zhang
Yuwei Li
Wei-Han Lee
Yu Song
Raheem Beyah
Lyu Chenyang
Symbolic execution with SymCC: Don't interpret, compile!
Jan 2020
Sebastian Poeplau
Aurélien Francillon
Poeplau Sebastian
FuzzIL: Coverage Guided Fuzzing for JavaScript Engines pdfsubject=Not set
Samuel Groß
Unicorefuzz: On the Viability of Emulation for Kernelspace Fuzzing
Jan 2019
Dominik Maier
Benedikt Radtke
Bastian Harren
Maier Dominik
Keean Schupke Oleg Kiselyov, Ralf Laemmel
Jan 2022
Oleg Keean Schupke
Ralf Kiselyov
Laemmel
Oleg Kiselyov Keean Schupke
Genoa. Alessandro Mantovani, Andrea Fioraldi, and Davide Balzarotti. 2022. Fuzzing with data dependency information