Alessandro Mantovani’s research while affiliated with EURECOM and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


Dissecting American Fuzzy Lop – A FuzzBench Evaluation - RCR Report
  • Article

February 2023

·

14 Reads

·

1 Citation

ACM Transactions on Software Engineering and Methodology

·

Alessandro Mantovani

·

·

This report describes the artifacts of the “Dissecting American Fuzzy Lop – A FuzzBench Evaluation” paper. The artifacts are available online at https://github.com/eurecom-s3/dissecting_afl and archived at https://doi.org/10.6084/m9.figshare.21401280 and consists in the produced code, the setup to run the experiments in FuzzBench and the generated reports. We claim the Functional badge as the patches to AFL are easy to enable and the experiments are easy to run thanks to the FuzzBench service, but the evaluations are self contained and the modifications to AFL are as-is. For the purpose of reproducing the experiments, no particular skills are needed as the process is straightforward and described in https://google.github.io/fuzzbench/getting-started/adding-a-new-fuzzer/#requesting-an-experiment.


Dissecting American Fuzzy Lop – A FuzzBench Evaluation

January 2023

·

34 Reads

·

21 Citations

ACM Transactions on Software Engineering and Methodology

AFL is one of the most used and extended fuzzer, adopted by industry and academic researchers alike. While the community agrees on AFL’s effectiveness at discovering new vulnerabilities and at its outstanding usability, many of its internal design choices remain untested to date. Security practitioners often clone the project “as-is” and use it as a starting point to develop new techniques, usually taking everything under the hood for granted. Instead, we believe that a careful analysis of the different parameters could help modern fuzzers to improve their performance and explain how each choice can affect the outcome of security testing, either negatively or positively. The goal of this paper is to provide a comprehensive understanding of the internal mechanisms of AFL by performing experiments and by comparing different metrics used to evaluate fuzzers. This can help to show the effectiveness of some techniques and to clarify which aspects are instead outdated. To perform our study we performed nine unique experiments that we carried out on the popular Fuzzbench platform. Each test focuses on a different aspect of AFL, ranging from its mutation approach to the feedback encoding scheme and its scheduling methodologies. Our findings show that each design choice affects different factors of AFL. While some of these are positively correlated with the number of detected bugs or the coverage of the target application, other features are instead related to usability and reliability. Most important, we believe that the outcome of our experiments indicates which parts of AFL we should preserve in the design of modern fuzzers.


Score calculation experiment score
Dissecting American Fuzzy Lop -- A FuzzBench Evaluation
  • Preprint
  • File available

December 2022

·

636 Reads

·

1 Citation

AFL is one of the most used and extended fuzzer, adopted by industry and academic researchers alike. While the community agrees on AFL's effectiveness at discovering new vulnerabilities and at its outstanding usability, many of its internal design choices remain untested to date. Security practitioners often clone the project "as-is" and use it as a starting point to develop new techniques, usually taking everything under the hood for granted. Instead, we believe that a careful analysis of the different parameters could help modern fuzzers to improve their performance and explain how each choice can affect the outcome of security testing, either negatively or positively. The goal of this paper is to provide a comprehensive understanding of the internal mechanisms of AFL by performing experiments and by comparing different metrics used to evaluate fuzzers. This can help to show the effectiveness of some techniques and to clarify which aspects are instead outdated. To perform our study we performed nine unique experiments that we carried out on the popular Fuzzbench platform. Each test focuses on a different aspect of AFL, ranging from its mutation approach to the feedback encoding scheme and its scheduling methodologies. Our findings show that each design choice affects different factors of AFL. While some of these are positively correlated with the number of detected bugs or the coverage of the target application, other features are instead related to usability and reliability. Most important, we believe that the outcome of our experiments indicates which parts of AFL we should preserve in the design of modern fuzzers.

Download



Fuzzing with Data Dependency Information

March 2022

·

268 Reads

·

6 Citations

Recent advances in fuzz testing have introduced several forms of feedback mechanisms, motivated by the fact that for a large range of programs and libraries, edge-coverage alone is insufficient to reveal complicated bugs. Inspired by this line of research, we examined existing program representations looking for a match between expressiveness of the structure and adaptability to the context of fuzz testing. In particular, we believe that data dependency graphs (DDGs) represent a good candidate for this task, as the set of information embedded by this data structure is potentially useful to find vulnerable constructs by stressing combinations of defuse pairs that would be difficult for a traditional fuzzer to trigger. Since some portions of the dependency graph overlap with the control flow of the program, it is possible to reduce the additional instrumentation to cover only "interesting" data-flow dependencies, those that help the fuzzer to visit the code in a distinct way compared to standard methodologies. To test these observations, in this paper we propose DDFuzz, a new approach that rewards the fuzzer not only with code coverage information, but also when new edges in the data dependency graph are hit. Our results show that the adoption of data dependency instrumentation in coverage-guided fuzzing is a promising solution that can help to discover bugs that would otherwise remain unexplored by standard coverage approaches. This is demonstrated by the 72 different vulnerabilities that our data-dependency driven approach can identify when executed on 38 target programs from three different datasets.


Registered Report: Dissecting American Fuzzy Lop - A FuzzBench Evaluation

March 2022

·

372 Reads

AFL is one of the most used and extended fuzzing projects, adopted by industry and academic researchers alike. While the community agrees on AFL's effectiveness at discovering new vulnerabilities and at its outstanding usability, many of its internal design choices remain untested to date. Security practitioners often clone the project "as-is" and use it as a starting point to develop new techniques, usually taking everything under the hood for granted. Instead, we believe that a careful analysis of the different parameters could help modern fuzzers to improve their performance and explain how each choice can affect the outcome of security testing, either negatively or positively. The goal of this paper is to provide a comprehensive understanding of the internal mechanisms of AFL by performing experiments and comparing different metrics used to evaluate fuzzers. This will prove the efficacy of some patterns and clarify which aspects are instead outdated. To achieve this, we set up nine unique experiments that we carried out on the popular Fuzzbench platform. Each test focuses on a different aspect of AFL, ranging from its mutation approach to the feedback encoding scheme and the scheduling methodologies. Our preliminary findings show that each design choice affects different factors of AFL. While some of these are positively correlated with the number of detected bugs or the target coverage, other features are related to usability and reliability. Most important, the outcome of our experiments will indicate which parts of AFL we should preserve in modern fuzzers.




Fig. 3. Dataset composition (cardinality = 46, 295)
Fig. 4. Byte Frequency Distribution w.r.t. Schemes
Fig. 6. Pattern stored inside the .rsrc section
Fig. 7. The string 0x0300ba99 in .rdata section
Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem

February 2020

·

622 Reads

·

23 Citations

An open research problem on malware analysis is how to statically distinguish between packed and non-packed executables. This has an impact on antivirus software and malware analysis systems, which may need to apply different heuristics or to resort to more costly code emulation solutions to deal with the presence of potential packing routines. It can also affect the results of many research studies in which the authors adopt algorithms that are specifically designed for packed or non-packed binaries. Therefore, a wrong answer to the question "is this executable packed?" can make the difference between malware evasion and detection. It has long been known that packing and entropy are strongly correlated, often leading to the wrong assumption that a low entropy score implies that an executable is NOT packed. Exceptions to this rule exist, but they have always been considered as one-off cases, with a negligible impact on any large scale experiment. However, if such an assumption might have been acceptable in the past, our experiments show that this is not the case anymore as an increasing and a remarkable number of packed malware samples implement proper schemes to keep their entropy low. In this paper, we empirically investigate and measure this problem by analyzing a dataset of 50K low-entropy Windows malware samples. Our tests show that despite all samples have a low entropy value, over 30% of them adopt some form of runtime packing. We then extended our analysis beyond the pure entropy, by considering all static features that have been proposed so far to identify packed code. Again, our tests show that even a state of the art machine learning classifier is unable to conclude whether a low-entropy sample is packed or not by relying only on features extracted with static analysis.


Citations (7)


... By leveraging lightweight instrumentation to obtain runtime feedback for driving the exploration, greybox fuzzing strikes a balance between the efficiency of blackbox fuzzing [17] and the depth of whitebox fuzzing [20]. The arising of mature greybox fuzzers like AFL [13], AFL++ [12], Honggfuzz [23] is furthering the popularity. ...

Reference:

Harnessing Large Language Models for Seed Generation in Greybox Fuzzing
Dissecting American Fuzzy Lop – A FuzzBench Evaluation
  • Citing Article
  • January 2023

ACM Transactions on Software Engineering and Methodology

... Fuzzy testing is a vulnerability detection technique that has shown remarkable results in identifying software defects, particularly when automated [22,3,12,8]. This technique involves generating faulty input to feed the functionalities of a SUT and monitoring its behavior according to the actual requirements and performance of the system [28,23]. ...

Dissecting American Fuzzy Lop -- A FuzzBench Evaluation

... Consequently, the scheduler often degenerates into a round-robin approach. Attempting to use finer-grained data-flow-guided feedback metrics [31,41] to address this issue can easily lead to an explosion in the seed corpus of the fuzzing scheduler [53]. Secondly, the current mutators are agnostic of the target program branches. ...

Fuzzing with Data Dependency Information
  • Citing Conference Paper
  • June 2022

... For Fileless Malware detection, we targeted a family of stealthy malware samples that impersonate benign programs, evading conventional security solutions but are detectable by GNN-based provenance analysis. The malware samples were chosen in accordance with guidelines from the literature [79], [80] to minimize ex- Table 4 in the appendix. The Banking banking trojan [79] steals banking credentials from victim machines and spreads through spam and compromised download links. ...

Does Every Second Count? Time-based Evolution of Malware Behavior in Sandboxes
  • Citing Conference Paper
  • January 2021

... Examining the entropy of each section in the binary file is one method to identify such content. Entropy is a measure used to assess the degree of uncertainty in a set of numbers (or bytes), quantifying the difficulty of independently predicting each element in the series [43]. ...

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem

... Many research papers [4,[14][15][16][17] and industrial tools [13] regard the high entropy of sections as a sign of packed programs. However, Mantovani et al. [18] find that more than 30% of their 50K Windows malware datasets are low-entropy packed samples. These packed samples adopt multiple data encoding tricks to evade entropy-based detection. ...

Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem