Article

Vulnerability Analysis for X86 Executables Using Genetic Algorithm and Fuzzing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Fuzzing was successfully used to discover security bugs in popular programs, though released without source code. It becomes a major tool in security analysis, but needs large input space, ineffective. This paper presents a new method for the identification of vulnerabilities in executable program called GAFuzzing (Genetic Algorithm Fuzzing), which combines static and dynamic analysis to extend random Fuzzing. First, it uses static analysis to obtain the structural behavior, interface and interest region of code, then formally describes test requirement. Second, it uses genetic algorithm to intelligently direct test data generation and improve the testing objective. Unlike many software testing tools, our implementation analyzes the executables without source code directly. Our evaluation shows that GAFuzzing is superior to random Fuzzing for vulnerabilty analysis.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Table 7 summarizes all the metaheuristics and fitness functions related to code coverage testing using metaheuristics. The used meta-heuristics are the genetic algorithm (GA) (Fraser and Arcuri 2011;Charmchi and Cami 2021;Michael et al. 2001;Bottaci 2002;Sparks et al. 2007;Liu et al. 2008;Cao et al. 2009a, b;Rauf et al. 2010;Andrews et al. 2014;Shuai et al. 2013Shuai et al. , 2015aPałka et al. 2016;Paduraru et al. 2017;Arcuri 2017;Wei et al. 2018;Zhu et al. 2018;Wang et al. 2019b), evolutionary algorithm (EA) (Harman et al. 2002;Tlili et al. 2006;Baresel and Sthamer 2003;Afshan et al. 2013; LD(N(pa, AL(pa, i)), i)) Harman et al. (2002) Evolutionary algorithm Branch distance Bottaci (2002) Genetic algorithm Relational and logical predicate Baresel and Sthamer (2003) Evolutionary algorithm Node-node oriented fitness function Evolutionary algorithm The fitness of the sequence is determined based on the closest path to the test aim ...
... Rights reserved. Liu et al. (2008) Genetic algorithm It has many predicates in common with a CDPPath of the point. ...
... It has many predicates in common with a CDPPath of the point (Liu et al. 2008) ...
Article
Full-text available
The security of an application is critical for its success, as breaches cause loss for organizations and individuals. Search-based software security testing (SBSST) is the field that utilizes metaheuristics to generate test cases for the software testing for some pre-specified security test adequacy criteria This paper conducts a systematic literature review to compare metaheuristics and fitness functions used in software security testing, exploring their distinctive capabilities and impact on vulnerability detection and code coverage. The aim is to provide insights for fortifying software systems against emerging threats in the rapidly evolving technological landscape. This paper examines how search-based algorithms have been explored in the context of code coverage and software security testing. Moreover, the study highlights different metaheuristics and fitness functions for security testing and code coverage. This paper follows the standard guidelines from Kitchenham to conduct SLR and obtained 122 primary studies related to SBSST after a multi-stage selection process. The papers were from different sources journals, conference proceedings, workshops, summits, and researchers’ webpages published between 2001 and 2022. The outcomes demonstrate that the main tackled vulnerabilities using metaheuristics are XSS, SQLI, program crash, and XMLI. The findings have suggested several areas for future research directions, including detecting server-side request forgery and security testing of third-party components. Moreover, new metaheuristics must also need to be explored to detect security vulnerabilities that are still unexplored or explored significantly less. Furthermore, metaheuristics can be combined with machine learning and reinforcement learning techniques for better results. Some metaheuristics can be designed by looking at the complexity of security testing and exploiting more fitness functions related to detecting different vulnerabilities.
... Fuzz testing is a common approach for finding vulnerabilities in software [4][5][6][7][8]. Many fuzzers exist and range from a simple random input generator to highly sophisticated testing tools. ...
... 3. We propose a fitness function (objective function) by analyzing and identifying different code parameters, which guide the fuzzer to generate inputs which can trigger uncommon behavior within interpreters. 4. We implement these techniques in a full-fledged (to be) open sourced fuzzing tool called IFuzzer that can target any language interpreter with minimal configuration changes. 5. We show the effectiveness of IFuzzer empirically by finding new bugs in Mozilla's JavaScript engine SpiderMonkey-including several exploitable security vulnerabilities. ...
Conference Paper
We present an automated evolutionary fuzzing technique to find bugs in JavaScript interpreters. Fuzzing is an automated black box testing technique used for finding security vulnerabilities in the software by providing random data as input. However, in the case of an interpreter, fuzzing is challenging because the inputs are piece of codes that should be syntactically/semantically valid to pass the interpreter’s elementary checks. On the other hand, the fuzzed input should also be uncommon enough to trigger exceptional behavior in the interpreter, such as crashes, memory leaks and failing assertions. In our approach, we use evolutionary computing techniques, specifically genetic programming, to guide the fuzzer in generating uncommon input code fragments that may trigger exceptional behavior in the interpreter. We implement a prototype named IFuzzer to evaluate our technique on real-world examples. IFuzzer uses the language grammar to generate valid inputs. We applied IFuzzer first on an older version of the JavaScript interpreter of Mozilla (to allow for a fair comparison to existing work) and found 40 bugs, of which 12 were exploitable. On subsequently targeting the latest builds of the interpreter, IFuzzer found 17 bugs, of which four were security bugs.
... Code coverage was chosen as a metric for Hermes because, although it is " well-known that random testing usually provides low code coverage, and performs poorly overall [in that respect] " [13], [14] , code coverage has been extensively used as a metric to measure the performance of fuzz testing [32], [47], [16], [48], [5], [29], [46], additionally, there is a general " lack of measurable parameters that describe fuzz test completeness " to draw from [29]. Specifically, we chose line coverage for the entire method where a target bug type was identified as the metric used for our analysis. ...
... In the context of the Crawler4J crawler, the number of mutations equals the number of crawler requests to the server as each request included a single mutation of the protocol. Previous research has utilized a variety of performance metrics including number of fuzzed inputs, total errors found, errors found per hour, and number of distinct errors found per hour [15], [47], [16], [5]. The number of fuzzed inputs (mutations) was chosen because it is less subjective than a pure time comparison. ...
Conference Paper
Full-text available
Security assurance cases (security cases) are used to represent claims for evidence-based assurance of security properties in software. A security case uses evidence to argue that a particular claim is true, e.g., buffer overflows cannot happen. Evidence may be generated with a variety of methods. Random negative testing (fuzz testing) has become a popular method for creating evidence for the security of software. However, traditional fuzz testing is undirected and provides only weak evidence for specific assurance concerns, unless significant resources are allocated for extensive testing. This paper presents a method to apply fuzz testing in a targeted way to more economically support the creation of evidence for specific security assurance cases. Our experiments produced results with target code coverage comparable to an exhaustive fuzz test run while significantly reducing the test execution time when compared to exhaustive methods. These results provide specific evidence for security cases and provide improved assurance.
... The numbers of bugs found by different tools are listed inTable IV while Peach 2.3 and DXFuzzing based on the sample file shows inFigure 6. GAMutator found 3 additional bugs and generated 3600 test cases. Smart fuzzer from paper [3] and GAFuzzing from paper [6] do not perform well in this experiment. V. CONCLUSION Whitebox fuzzing is complex and costly in time. ...
... DXFuzzing enriches current mutation methodology with multi-dimension input nodes mutation strategy without combination explosion, so DXFuzzing could find more vulnerability that never will been found by one-dimension mutation fuzzing. Paper [6] also combined fuzzing with genetic algorithm, but the fuzzing it used is simply randomly testing technique and the way it used genetic algorithm is to cover suspected vulnerable point, however this is hard to apply in practical large applications because there are many program strong program checks in them. Different from it, DXFuzzing used genetic algorithm to find interesting combination. ...
Article
Full-text available
Test case mutation and generation (m&g) based on data samples is an effective way to generate test cases for Knowledge-based fuzzing, but present m&g technique is only capable of one-dimensional m&g at a time, based on a data sample, and thus it is impossible to find a vulnerability that can only be detected by multi-dimensional m&g. This paper proposes a mathematical model FTSG that formally describes Fuzzing Test Suite Generation based on m&g, and can process multi-dimension input elements m&g, which is done by a Genetic Algorithm Mutation operator (GAMutator). By execution-oriented input-output (I/O) analysis, the influence relationships between input elements and insecure functions in target application were collected. Based on these relationships, GAMutator can directly mutate corresponding input elements to trigger the suspected vulnerability in a target insecure function, which could never have been found by one-dimension m&g fuzzing. Importantly, GAMutator does not bring the input combination explosion, and the number of test cases it generates is linear with the number of insecure functions. Finally, an experiment on Libpng has proved that FTSG could effectively enrich the ability of knowledge-based fuzzing technique to find vulnerabilities.
... In 2008, Hong Liu Guang et al. proposes an architecture to perform black box testing using genetic algorithms [3]. Guang's paper shows how computational intelligence techniques can help black box testing to find vulnerabilities. ...
... copyright software and digital patents). In 2008, Hong Liu Guang et al. proposes an architecture to perform black box testing using genetic algorithms [3]. Guang's paper shows how computational intelligence techniques can help black box testing to find vulnerabilities. ...
Conference Paper
Full-text available
Automated software testing has become a fundamental requirement for several software engineering methodologies. Software development companies very often outsource the test of their products. In such cases, the hired companies sometimes have to test softwares without any access to the source code. This type of service is called black box testing, which includes presentation of some ad-hoc input to the software followed by an assessment of the outcome. The common place for black box testing is sequential approach and slow pace of work. This ineffectiveness is due to the combinatorial explosion of software parameters and payloads. This work presents a neuro-fuzzy and multi-agent system architecture for improving black box testing tools for client-side vulnerability discovery, specifically, memory corruption flaws. Experiments show the efficiency of the proposed hybrid intelligent approach over traditional black box testing techniques.
... Because many checks & verifications are mathematically designed and it is very hard to pass them, so these techniques cannot effectively pass these checks & verifications. References [6] and [7] try to use genetic algorithms to improve code coverage and then start Fuzzing on target software, while genetic algorithm is only used as an advanced intelligent random search algorithm, so it is also hard to pass these strong checks & verifications and the Fuzzing technology with genetic algorithm is only applicable to some simple experimental programs but useless to the practical software programs. ...
... Because many checks & verifications are mathematically designed and it is very hard to pass them, so these techniques cannot effectively pass these checks & verifications. References [6] and [7] try to use genetic algorithms to improve code coverage and then start Fuzzing on target software, while genetic algorithm is only used as an advanced intelligent random search algorithm, so it is also hard to pass these strong checks & verifications and the Fuzzing technology with genetic algorithm is only applicable to some simple experimental programs but useless to the practical software programs. Fuzzing tool FileFuzz [12] improves the number of semi-valid test cases discovered [2] by mutating and generating test cases based on a correct data sample, but because it considers nothing about the types, semantic attributes and the constraints among input elements, the test-space is large and the number of semi-valid test cases is still very low, and thus a lot of invalid test cases are generated which affects the Fuzzing effectiveness. ...
Article
Full-text available
Knowledge-based Fuzzing technologies have been applied successfully in software vulnerability mining, however, its current methods mainly focus on Fuzzing target software using a single data sample with one or multi-dimension input mutation [1], and thus the vulnerability mining results are not stable, false negatives of vulnerability are high and the selection of data sample depends on human analysis. To solve these problems, this paper proposes a model named Fuzzing Test Suite Generation model using multi data sample combination (FTSGc), which can automatically select multi data samples combination from a large scale data sample set to fuzz target software and generate the test cases that can cover more codes of the software vulnerabilities. To solve Data Sample Coverage Problem (DSCP) in the proposed FTSGc, a method of covering maximum nodes' semantic attributes with minimum running cost is put forward and a theorem named Maximum Coverage Theorem is given to select the data sample combination. We conclude that DSCP is actually the Set Covering Problem (SCP). Practical experimental results show that the proposed Fuzzing method works much better than the other current Fuzzing method on the Ability of Vulnerability Mining (AVM).
... Song and his team have presented the BitBlaze method for the analysis of malicious software [39] . Liu and his colleagues have identified vulnerabilities in x86 programs using obfuscation methods and genetic algorithms [40] . Kroes and his team have provided automatic detection methods for memory management errors using Delta pointers [41] . ...
Article
Full-text available
Different abnormalities are commonly encountered in computer network systems. These types of abnormalities can lead to critical data losses or unauthorized access in the systems. Buffer overflow anomaly is a prominent issue among these abnormalities, posing a serious threat to network security. The primary objective of this study is to identify the potential risks of buffer overflow that can be caused by functions frequently used in the PHP programming language and to provide solutions to minimize these risks. Static code analyzers are used to detect security vulnerabilities, among which SonarQube stands out with its extensive library, flexible customization options, and reliability in the industry. In this context, a customized rule set aimed at automatically detecting buffer overflows has been developed on the SonarQube platform. The memoization optimization technique used while creating the customized rule set enhances the speed and efficiency of the code analysis process. As a result, the code analysis process is not repeatedly run for code snippets that have been analyzed before, significantly reducing processing time and resource utilization. In this study, a memoization-based rule set was utilized to detect critical security vulnerabilities that could lead to buffer overflow in source codes written in the PHP programming language. Thus, the analysis process is not repeatedly run for code snippets that have been analyzed before, leading to a significant reduction in processing time and resource utilization. In a case study conducted to assess the effectiveness of this method, a significant decrease in the source code analysis time was observed.
... Genetic Algorithms. The most frequently used ML technique for input generation is the genetic algorithm (GA) [13,14,17,28,30]. GAs, a type of unsupervised ML inspired by biological evolution, provide the core algorithms in evolutionary fuzzers. ...
Preprint
Fuzzing has played an important role in improving software development and testing over the course of several decades. Recent research in fuzzing has focused on applications of machine learning (ML), offering useful tools to overcome challenges in the fuzzing process. This review surveys the current research in applying ML to fuzzing. Specifically, this review discusses successful applications of ML to fuzzing, briefly explores challenges encountered, and motivates future research to address fuzzing bottlenecks.
... The key problem of Fuzzing technology is to generate high semi-valid [2] test cases that can pass checks and verifications (such as fixed fields, checksum, length counting, number counting, encoding, decoding, hash computation, encryption and decryption) in programs. For improving the high semi-validity of test cases, Current knowledge-based Fuzzing technologies [1][3] [4][5] [6][7] mainly focus on Fuzzing target software based on a single data sample with one or multi-dimension input mutation, and they only consider Fuzzing based on a single data sample and have nothing to do with how to select automatically the data sample combination from a large scale data sample set. Normally, a single data sample only covers parts of file format or network protocol, so it is impossible for them to cover the codes that execute other file format or protocol knowledge and it is impossible to mine the vulnerabilities in these codes segments. ...
Article
Full-text available
Current knowledge-based Fuzzing technologies mainly focus on Fuzzing target software based on a single data sample with one or multi-dimension input mutation, and thus the vulnerability mining results are not stable, false negatives of vulnerability are high and the selection of data sample depends on human analysis. To solve these problems, this paper proposes a model named Fuzzing Test Suite Generation model using multiple data sample combination (FTSGc), which can automatically select multiple data samples combination from a large scale data sample set to fuzz target software and generate the test cases that can cover more instances of software vulnerabilities. To solve FTSGc, a theorem named Maximum Coverage Theorem is given to select the data sample combination. Practical experimental results show that the proposed Fuzzing technology works much better than the current Fuzzing technologies on the Ability of Vulnerability Mining (AVM).
... Our approach is closest to this approach with added feature of simplifying the weight calculation and path to traverse to reach vulnerable statements. On the similar lines, Liu et al. construct control dependence predicate path (CDP- Path) from the binary of the application and apply GA to construct inputs to reach vulnerable statements [17]. Their fitness function depends on the number of predicates in CDPPath covered by inputs. ...
Article
Full-text available
In this paper, we present a hybrid approach for buffer overflow detection in C code. The approach makes use of static and dynamic analysis of the application under investigation. The static part consists in calculating taint dependency sequences (TDS) between user controlled inputs and vulnerable statements. This process is akin to program slice of interest to calculate tainted data- and control-flow path which exhibits the dependence between tainted program inputs and vulnerable statements in the code. The dynamic part consists of executing the program along TDSs to trigger the vulnerability by generating suitable inputs. We use genetic algorithm to generate inputs. We propose a fitness function that approximates the program behavior (control flow) based on the frequencies of the statements along TDSs. This runtime aspect makes the approach faster and accurate. We provide experimental results on the Verisec benchmark to validate our approach.
... Our approach is close to this approach with added feature of simplifying the weight calculation and path to traverse to reach vulnerable statements, even in the presence of simpler constraints. On the similar lines, Liu et al. construct control dependence predicate path (CDPPath) from the binary of the application and apply GA to construct inputs to reach vulnerable statements [31]. Their fitness function depends on the number of predicates in CDPPath covered by inputs. ...
Conference Paper
Full-text available
We propose an approach in the form of a light weight smart fuzzer to generate string based inputs to detect buffer overflow vulnerability in C code. The approach is based on an evolutionary algorithm which is a combination of genetic algorithm and evolutionary strategies. In this preliminary work we focus on the problem that there are constraints on string inputs that must be satisfied in order to reach the vulnerable statement in the code and we have very little or no knowledge about them. Unlike other similar approaches, our approach is able to generate such inputs without knowing these constraints explicitly. It learns these constraints automatically while generating inputs dynamically by executing the vulnerable program. We provide few empirical results on a benchmarking dataset-Verisec suite of programs.
Book
Book with the selected papers of the CRITIS 2018 conference in Kaunas, Lithuania
Article
Full-text available
La Seguridad Informática se encuentra en constante evolución y dinamismo. La aplicación de técnicas de Inteligencia Artificial se convierte en una práctica indispensable en el tratamiento y detección de amenazas a que se encuentran expuestas las organizaciones. Este artículo se enfoca en un estudio bibliográfico relacionado con la aplicación de técnicas de Inteligencia Artificial en la Seguridad Informática, enfatizando en los Sistemas Detectores de Intrusos, detección de correo no deseado o spam, antivirus, así como otras aplicaciones en las que la utilización de la Inteligencia Artificial se considera importante.
Article
Information Security is evolving and dynamic. Application of Artificial Intelligence techniques becomes an essential practice in the treatment and detection of threats to which organizations are exposed. This article focuses on a literature review concerning the application of AI techniques in computer security, with emphasis on Intrusion Detection Systems, detection of unwanted mail or spam, antivirus and other applications where the use of Artificial Intelligence is considered important.
Article
Full-text available
Current advanced Fuzzing technique can only implement vulnerability mining on a single vulnerable statement each time, and this paper proposes a new multi-dimension Fuzzing technique, which uses niche genetic algorithm to generate test cases and can concurrently approach double vulnerable targets with the minimum cost on the two vulnerable statements each time. For that purpose, a corresponding mathematical model and the minimum cost theorem are presented. The results of the experiment show that the efficiency of the new proposed Fuzzing technique is much better than current advanced Fuzzing techniques.
Article
During the multi-dimensional Fuzzing technique, how to construct the influencing relationships between input elements and vulnerable statements is a key problem. This paper applies the virtual machine based taint analysis technique on multi-dimensional Fuzzing, gives detailed design and the experiment result shows the method is feasible.
Article
Knowledge-based Fuzzing technology successfully applies in software vulnerability mining, however, current Fuzzing technology mainly focuses on fuzzing target software based on single data sample and thus the vulnerability mining results are not stable, false negatives of vulnerability are high and the selection of data sample depends on people's analysis. To solve these problems, this paper proposes a model named Fuzzing Test Suite Generation model based on data sample combination (FTSGc) which can automatically select data samples combination from large scale data sample set to fuzz target software. To solve Data Sample Combination Problem (DSCP), this paper proposes a method of covering all possible basic blocks in Control Flow Graph (CFG) with minimum running cost and gives a theorem named Maximum Degree Coverage (MFD) to select data sample combination and gets the conclusion that DSCP is actually the Set Covering Problem (SCP). Practical experiment results show that the proposed Fuzzing technology which selects automatically data sample combination based on CFG works much better than current Fuzzing technology on both the Ability of Vulnerability Mining (AVM) and the Efficiency of Vulnerability Mining (EVM).
Conference Paper
Full-text available
This paper presents a toolset for model checking x86 executables. The members of the toolset are CodeSurfer/x86, WPDS++, and the Path Inspector. CodeSurfer/x86 is used to extract a model from an executable in the form of a weighted pushdown system. WPDS++ is a library for answering generalized reachability queries on weighted pushdown systems. The Path Inspector is a software model checker built on top of CodeSurfer and WPDS++ that supports safety queries about the program's possible control configurations.
Conference Paper
Full-text available
Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application's original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin's versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.
Conference Paper
Full-text available
The automatic identification of security-relevant flaws in binary executables is still a young but promising research area. In this paper, we describe a new approach for the identification of vulnerabilities in object code we called smart fuzzing. While conventional fuzzing uses random input to discover crash conditions, smart fuzzing restricts the input space by using a preliminary static analysis of the program, then refined by monitoring each execution. In other words, the search is driven by a mix of static and dynamic analysis in order to lead the execution path to selected corner cases that are the most likely to expose vulnerabilities, thus improving the effectiveness of fuzzing as a means for finding security breaches in black-box programs.
Article
ATOM (Analysis Tools with OM) is a single framework for building a wide range of customized program analysis tools. It provides the common infrastructure present in all code-instrumenting tools; this is the difficult and time-consuming part. The user simply defines the tool-specific details in instrumentation and analysis routines. Building a basic block counting tool like Pixie with ATOM requires only a page of code. ATOM, using OM link-time technology, organizes the final executable such that the application program and user's analysis routines run in the same address space. Information is directly passed from the application program to the analysis routines through simple procedure calls instead of inter-process communication or files on disk. ATOM takes care that analysis routines do not interfere with the program's execution, and precise information about the program is presented to the analysis routines at all times. ATOM uses no simulation or interpretation. ATOM has been implemented on the Alpha AXP under OSF/1. It is efficient and has been used to build a diverse set of tools for basic block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.
Conference Paper
Software vulnerabilities have had a devastating effect on the Internet. Worms such as CodeRed and Slammer can compromise hundreds of thousands of hosts within hours or even minutes, and cause millions of dollars of damage (25, 42). To successfully combat these fast auto- matic Internet attacks, we need fast automatic attack de- tection and filtering mechanisms. In this paper we propose dynamic taint analysis for au- tomatic detection of overwrite attacks, which include most types of exploits. This approach does not need source code or special compilation for the monitored program, and hence works on commodity software. To demonstrate this idea, we have implemented TaintCheck, a mechanism that can perform dynamic taint analysis by performing binary rewriting at run time. We show that TaintCheck reliably detects most types of exploits. We found that TaintCheck produced no false positives for any of the many different programs that we tested. Further, we describe how Taint- Check could improve automatic signature generation in several ways.
Conference Paper
Several approaches have been proposed to perform vulnerability analysis of applications written in high-level languages. However, little has been done to automatically identify security-relevant flaws in binary code. In this paper, we present a novel approach to the identification of vulnerabilities in x86 executables in ELF binary format. Our approach is based on static analysis and symbolic execution techniques. We implemented our approach in a proof-of-concept tool and used it to detect taint-style vulnerabilities in binary code. The results of our evaluation show that our approach is both practical and effective
Conference Paper
Dynamic taint analysis is gaining momentum. Techniques based on dynamic tainting have been successfully used in the context of application security, and now their use is also being explored in dif- ferent areas, such as program understanding, software testing, an d debugging. Unfortunately, most existing approaches for dynamic tainting are defined in an ad-hoc manner, which makes it difficult to extend them, experiment with them, and adapt them to new con- texts. Moreover, most existing approaches are focused on data-flow based tainting only and do not consider tainting due to control flow, which limits their applicability outside the security domain. To address these limitations and foster experimentation with dynamic tainting techniques, we defined and developed a general framework for dynamic tainting that (1) is highly flexible and customizable, (2) allows for performing both data-flow and control-flow based taint- ing conservatively, and (3) does not rely on any customized run- time system. We also present DYTAN, an implementation of our framework that works on x86 executables, and a set of preliminary studies that show how DYTAN can be used to implement different tainting-based approaches with limited effort. In the studies, we also show that DYTAN can be used on real software, by using FIRE- FOX as one of our subjects, and illustrate how the specific char- acteristics of the tainting approach used can affect efficiency and accuracy of the taint analysis, which further justifies the use of our framework to experiment with different variants of an approach. Categories and Subject Descriptors: D.2.5 (Software Engineer-
Article
This book sets out to explain what genetic algorithms are and how they can be used to solve real-world problems. The first objective is tackled by the editor, Lawrence Davis. The remainder of the book is turned over to a series of short review articles by a collection of authors, each explaining how genetic algorithms have been applied to problems in their own specific area of interest. The first part of the book introduces the fundamental genetic algorithm (GA), explains how it has traditionally been designed and implemented and shows how the basic technique may be applied to a very simple numerical optimisation problem. The basic technique is then altered and refined in a number of ways, with the effects of each change being measured by comparison against the performance of the original. In this way, the reader is provided with an uncluttered introduction to the technique and learns to appreciate why certain variants of GA have become more popular than others in the scientific community. Davis stresses that the choice of a suitable representation for the problem in hand is a key step in applying the GA, as is the selection of suitable techniques for generating new solutions from old. He is refreshingly open in admitting that much of the business of adapting the GA to specific problems owes more to art than to science. It is nice to see the terminology associated with this subject explained, with the author stressing that much of the field is still an active area of research. Few assumptions are made about the reader's mathematical background. The second part of the book contains thirteen cameo descriptions of how genetic algorithmic techniques have been, or are being, applied to a diverse range of problems. Thus, one group of authors explains how the technique has been used for modelling arms races between neighbouring countries (a non- linear, dynamical system), while another group describes its use in deciding design trade-offs for military aircraft. My own favourite is a rather charming account of how the GA was applied to a series of scheduling problems. Having attempted something of this sort with Simulated Annealing, I found it refreshing to see the authors highlighting some of the problems that they had encountered, rather than sweeping them under the carpet as is so often done in the scientific literature. The editor points out that there are standard GA tools available for either play or serious development work. Two of these (GENESIS and OOGA) are described in a short, third part of the book. As is so often the case nowadays, it is possible to obtain a diskette containing both systems by sending your Visa card details (or $60) to an address in the USA.
Conference Paper
Test data generation is one of the hardest tasks in the software life-cycle. Many testing methods try to answer this question, all in a heuristic way. Symbolic execution is one such software testing method that can be used either for program evaluation or in order to assist the automated test data generation process. A number of systems employing symbolic execution for test data generation have already been built. In this paper, a new symbolic execution system is presented, which can be used regardless of the language in which the program under test is written. The system is called VOLCANO and the scripts are written in SYMEXLAN (SYMbolic EXecution LANguage), a scripting language that can be either an intermediate representation for many other languages or a symbolic execution language that facilitates the symbolic execution process