Dawson R. Engler's research while affiliated with Stanford University and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (116)
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces c...
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces c...
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces c...
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces c...
Symbolic execution calls for specialized address translation. Unlike a pointer on a traditional machine model, which corresponds to a single address, a symbolic pointer may represent multiple feasible addresses. A symbolic pointer dereference manipulates symbolic state, potentially submitting many theorem prover requests in the process. Hence, desi...
Symbolic binary execution is a dynamic analysis method which explores program paths to generate test cases for compiled code. Throughout execution, a program is evaluated with a bit-vector theorem prover and a runtime interpreter as a mix of symbolic expressions and concrete values. Left untended, these symbolic expressions grow to negatively impac...
Many recent tools use dynamic symbolic execution to perform tasks ranging from automatic test generation, finding security flaws, equivalence verification, and exploit generation. However, while symbolic execution is promising, it perennially struggles with the fact that the number of paths in a program increases roughly exponentially with both cod...
Programmers across a wide range of disciplines (e.g., bioinformatics, neuroscience, econometrics, finance, data mining, information retrieval, machine learning) write scripts to parse, transform, process, and extract insights from data. To speed up iteration times, they split their analyses into stages and write extra code to save the intermediate...
Verifying code equivalence is useful in many situations, such as checking: yesterday’s code against today’s, different implementations
of the same (standardized) interface, or an optimized routine against a reference implementation. We present a tool designed
to easily check the equivalence of two arbitrary C functions. The tool provides guarantees...
We present MINESTRONE, a novel architecture that integrates static analysis, dynamic confinement, and code diversification techniques to enable the identification, mitigation and containment of a large class of software vulnerabilities in third-party software. Our initial focus is on software written in C and C++; however, many of our techniques ar...
It can be painfully hard to take software that runs on one person's machine and get it to run on another machine. Online forums and mailing lists are filled with discussions of users' troubles with compiling, installing, and configuring software and their myriad of dependencies. To eliminate this dependency problem, we created a system called CDE t...
Computational scientists often prototype data analysis scripts using high-level languages like Python. To speed up execution times, they manually refactor their scripts into stages (separate functions) and write extra code to save intermediate results to disk in order to avoid recomputing them in subsequent runs. To eliminate this burden, we enhanc...
How Coverity built a bug-finding tool, and a business, around the unlimited supply of bugs in software systems.
This effort has developed and deployed a broad range of tools for finding serious errors in code. They are designed to find large numbers of errors in large source bases quickly, and with few false reports. We validated these tools by suing them to find bugs in important open-source projects (e.g., Linux, BSD, and many other widely-used projects)....
We present a study of how Linux kernel developers respond to bug reports issued by a static analysis tool. We found that developers prefer to triage reports in younger, smaller, and more actively-maintained files (§2), first address easy-to-fix bugs and defer difficult (but possibly critical) bugs (§3), and triage bugs in batches rather than indivi...
This article presents EXE, an effective bug-finding tool that automatically generates inputs that crash real code. Instead of running code on manually or randomly constructed input, EXE runs it on symbolic input initially allowed to be anything. As checked code runs, EXE tracks the constraints on each symbolic (i.e., input-derived) memory location....
This talk will draw on our efforts in using static analysis, model checking, and symbolic execution to find bugs in real code,
both in academic and commercial settings. The unifying religion driving all these efforts has been: results matter more than
anything. That which works is good, that which does not is not. While this worldview is simple, re...
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996. Includes bibliographical references (p. 99-106). by Dawson R. Engler. M.S.
Recent work has used variations of symbolic execution to automatically generate high-coverage test inputs (3, 4, 7, 8, 14). Such tools have demonstrated their ability to find very subtle errors. However, one challenge they all face is how to effectively handle the exponential number of paths in checked code. This paper presents a new technique for...
We present a new symbolic execution tool, KLEE, ca- pable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions o...
Statistical debugging uses dynamic instrumentation and machine learning to identify predicates on program state that are strongly predictive of program failure. Prior approaches have only considered simple, atomic predicates such as the directions of ...
Automatic tools for finding software errors require knowledge of the rules a program must obey, or "specifications," before they can identify bugs. We present a method that combines factor graphs and static program analysis to automatically infer spec- ifications directly from programs. We illustrate the approach on inferring functions in C program...
This paper shows how to use model checking to find serious errors in file systems. Model checking is a for- mal verification technique tuned for finding corner-case errors by comprehensively exploring the state spaces de- fined by a system. File systems have two dynamics that make them attractive for such an approach. First, their errors are some o...
This article presents EXE, an effective bug-finding tool that automatically generates inputs that crash real code. Instead of running code on manually or randomly constructed input, EXE runs it on symbolic input initially allowed to be anything. As checked code runs, EXE tracks the constraints on each symbolic (i.e., input-derived) memory location....
Many current systems allow data produced by potentially malicious sources to be mounted as a file system. File system code must check this data for dangerous values or invariant violations before using it. Because file system code typically runs inside the operating system kernel, even a single unchecked value can crash the machine or lead to an ex...
Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically in- ferring specifications directly from programs. The key strength of the approach is that it can incorporat...
Systems code defines an error-prone execution state space built from deeply nested conditionals and function call chains, massive amounts of code, and enthusiastic use of casting and pointer operations. Such code is hard to test and difficult to inspect, yet a single error can crash a machine or form the basis of a security breach. This paper prese...
Storage systems such as file systems, databases, and RAID sys - tems have a simple, basic contract: you give them data, they do not lose or corrupt it. Often they store the only copy, making its irrevocable loss almost arbitrarily bad. Unfortunatel y, their code is exceptionally hard to get right, since it must correc tly recover from any crash at...
This talk tries to distill several years of experience using both model checking and static analysis to find errors in large
software systems. We initially thought that the tradeoffs between the two was clear: static analysis was easy but would mainly
find shallow bugs, while model checking would require more work but would be strictly better — it...
This paper presents a technique that uses code to automatically generate its own test cases at run time by using a combination
of symbolic and concrete (i.e regular) execution The input values to a program (or software component) provide the standard
interface of any testing framework with the program it is testing and generating input values that...
File systems, RAID systems, and applications that require data consistency, among others, assure data integrity by carefully forcing valuable data to stable storage. Unfortu- nately, verifying that a system can recover from a crash to a valid state at any program counter is very difficult. Previous techniques for finding data integrity bugs have be...
Static program checking tools can find many serious bugs in software, but due to analysis limitations they also frequently emit false error reports. Such false positives can easily render the error checker useless by hiding real errors amidst the false. E#ective error report ranking schemes mitigate the problem of false positives by suppressing the...
Network protocols must work. The e#ects of protocol specification or implementation errors range from reduced performance, to security breaches, to bringing down entire networks. However, network protocols are di#cult to test due to the exponential size of the state space they define. Ideally, a protocol implementation must be validated against all...
This paper describes experiences with software model checking after several years of using static analysis to nd errors. We initially thought that the trade-o between the two was clear: static analysis was easy but would mainly nd shallow bugs, while model checking would require more work but would be strictly better | it would nd more errors, the...
This paper describes RacerX, a static tool that uses flow-sensitive, interprocedural analysis to detect both race conditions and deadlocks. It is explicitly designed to find errors in large, complex multithreaded systems. It aggressively infers checking information such as which locks protect which operations, which code contexts are multithreaded,...
Source code analysis is an emerging technology in the software industry thatallows critical source code defects to be detected before a program runs. Althoughtheconcept of detecting programming errors at compile time is not new, the technologyto build effective tools that can process millions of lines of code and reportsubstantive defects with only...
Programmers generally attempt to perform useful work. If they performed an action, it was because they believed it served some purpose. Redundant operations violate this belief. However, in the past, redundant operations have been typically regarded as minor cosmetic problems rather than serious errors. This paper demonstrates that, in fact, many r...
This paper describes a system and annotation language, MECA, for checking security rules. MECA is expressive and designed for checking real systems. It provides a variety of practical constructs to e#ectively annotate large bodies of code. For example, it allows programmers to write programmatic annotators that automatically annotate large bodies o...
Many software defects result from the violation of programming rules: rules that describe how to use a programming language and its libraries and rules that describe the dos and don'ts within a given application, library or system. MJ is a language and an engine that can succinctly express many of these rules for programs written in Java. MJ progra...
interprocedural analysis to detect both race conditions and deadlocks. It is explicitly designed to find errors in large, complex multithreaded systems. It aggressively infers checking information such as which locks protect which operations, which code contexts are multithreaded, and which shared accesses are dangerous. It tracks a set of code fea...
A major obstacle to nding program errors in a real system is knowing what correctness rules the system must obey. These rules are often undocumented or speci ed in an ad hoc manner. This paper demonstrates techniques that automatically extract such checking information from the source code itself, rather than the programmer, thereby avoiding the ne...
Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER () a static, effective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access, pointer derefe...
The tradeoffs between static analysis and model checking while applying model checking for bug finding to large software system were discussed. A general, not-unreasonable belief is that bug will follow a 90-10 distribution. Thus, out of 1000 errors, 100 will account for most of the pain and 900 will be a waste of resources to fix. Fixing these 900...
Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER ( ARray CHeckER ) a static, effective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access,...
Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER (ARray CHeckER) a static, e#ective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access, po...
This paper explores z-ranking, a technique to rank error re- ports emitted by static program checking analysis tools. Such tools often use approximate analysis schemes, leading to false error reports. These reports can easily render the error checker useless by hiding real errors amidst the false, and by potentially causing the tool to be discarded...
Dynamic code generation allows programmers to use run-time information in order to achieve performance and expressiveness superior to those of static code. The 'C (Tick C) language is a superset of ANSI C that supports efficient and high-level use of dynamic code generation. LC provides dynamic code generation at the level of C expressions and stat...
Many system errors do not emerge unless some intricate sequence of events occurs. In practice, this means that most systems have errors that only trigger after days or weeks of execution. Model checking [4] is an e#ective way to find such subtle errors. It takes a simplified description of the code and exhaustively tests it on all inputs, using tec...
Traditional operating systems limit the performance, flexibility, and functionality of applications by fixing the interface and implementation of operating system abstractions such as interprocess communication and virtual memory. The exokernel operating system architecture addresses this problem by providing application-level management of physica...
This paper describes RacerX, a static tool that uses flow-sensitive, interprocedural analysis to detect both race conditions and deadlocks. It is explicitly designed to find errors in large, complex multithreaded systems. It aggressively infers checking information such as which locks protect which operations, which code contexts are multithreaded,...
Many system errors do not emerge unless some intricate sequence of events occurs. In practice, this means that most systems have errors that only trigger after days or weeks of execution. Model checking [4] is an effective way to find such subtle errors. It takes a simplified description of the code and exhaustively tests it on all inputs, using te...
This paper gives an overview of the metal language, which we have designed to make it easy to construct system-specific, static analyses. We call these analyses extensions because they act as the input to a generic analysis engine that runs the static analysis over a given source base. We also interchangeably refer to them as checkers because they...
This paper presents a novel approach to bug-finding analysis and an implementation of that approach. Our goal is to find as many serious bugs as possible. To do so, we designed a flexible, easy-to-use extension language for specifying analyses and an efficent algorithm for executing these extensions. The language, metal, allows the users of our sys...
Trent Jarger will discuss ongoing work in the verification of authorization hook placement in Linux. The idea is that we can develop tools to check that all security-sensitive kernel operations can be mediated properly. Dawson Engler will discuss ongoing work in static checking for kernal and driver bugs, including security bugs, based on his meta-...
Application-level networking is a promising software organization
for improving performance and functionality for important network
services. The Xok/ExOS exokernel system includes application-level
support for standard network services, while at the same time
allowing application writers to specialize networking services.
This paper describes how...
This paper shows how system-specific static analysis can find security errors that violate rules such as "integers from untrusted sources must be sanitized before use" and "do not dereference user-supplied pointers." In our approach, programmers write system-specific extensions that are linked into the compiler and check their code for errors. We d...
Abstract Complex systems have errors that involve mishandled cor- ner cases in intricate sequences of events. Conventional test- ing techniques usually miss these errors. In recent years, formal verification techniques such as [5] have gained pop- ularity in checking a property in all possible behaviors of a system. However, such techniques involve...
A major obstacle to finding program errors in a real system is knowing what correctness rules the system must obey. These rules are often undocumented or specified in an ad hoc manner. This paper demonstrates techniques that automatically extract such checking information from the source code itself, rather than the programmer, thereby avoiding the...
We present a study of operating system errors found by automatic, static, compiler analysis applied to the Linux and OpenBSD kernels. Our approach differs from previous studies that consider errors found by manual inspection of logs, testing, and surveys because static analysis is applied uniformly to the entire kernel source, though our approach n...
The use of model checking for validation requires that models of the underlying system be created. Creating such models is both difficult and error prone and as a result, verification is rarely used despite its advantages. In this paper, we present a method for automatically extracting models from low level software implementations. Our method is b...
A major obstacle to finding program errors in a real system is knowing what correctness rules the system must obey. These rules are often undocumented or specified in an ad hoc manner. This paper demonstrates techniques that automatically extract such checking information from the source code itself, rather than the programmer, thereby avoiding the...
We present a study of operating system errors found by automatic, static, compiler analysis applied to the Linux and OpenBSD kernels. Our approach differs from previous studies that consider errors found by manual inspection of logs, testing, and surveys because static analysis is applied uniformly to the entire kernel source, though our approach n...
Systems software such as OS kernels, embedded systems, and libraries must obey many rules for both correctness and performance. Common examples include "accesses to variable A must be guarded by lock B," "system calls must check user pointers for validity before using them," and "message handlers should free their buffers as quickly as possible to...
Binary tools such as disassemblers, just-in-time compilers, and executable code rewriters need to have an explicit representation of how machine instructions are encoded. Unfortunately, writing encodings for an entire instruction set by hand is both tedious and error-prone. We describe DERIVE, a tool that extracts bit-level instruction encoding inf...
Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibility of packet filters [18] and the speed of hand-...
Dynamic code generation allows specialized code sequences to be crafted using runtime information. Since this information is by definition not available statically, the use of dynamic code generation can achieve performance inherently beyond that of static code generation. Previous attempts to support dynamic code generation have been low-level, ex...
The use of model checking for validation requires that models of the underlying system be created. Creating such models is both difficult and error prone and as a result, verification is rarely used despite its advantages. In this paper we present a method for automatically extracting models from low level software implementations. Our method is ba...
Building systems such as OS kernels and embedded software is difficult. An important source of this difficulty is the numerous rules they must obey: interrupts cannot be disabled for ~too long," global variables must be protected by locks, user pointers passed to OS code must be checked for safety before use, etc. A single violation can crash the s...
Building systems such as OS kernels and embedded software is difficult. An important source of this difficulty is the numerous rules they must obey: interrupts cannot be disabled for "too long," global variables must be protected by locks, user pointers passed to OS code must be checked for safety before use, etc. A single violation can crash the s...
. It has long been thought that coarse-grain parallelism is much more efficient than fine-grain parallelism due to the overhead of process (thread) creation, context switching, and synchronization. On the other hand, there are several advantages to fine-grain parallelism: architecture independence, ease of programming, ease of use as a target for c...
Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error-prone. Users must typically specify thousands of details of instruction layout, such as opcode and field locations values, le...
Interfaces-the collection of procedures and data structures that
define a library, a subsystem, a module-are syntactically poor
programming languages. They have state (defined both by the interface's
data structures and internally), operations on this state (defined by
the interface's procedures), and semantics associated with these
operations. Giv...
Distributed systems must communicate. To communicate at all requires high-level protocols be built with manageable complexity. To communicate well requires protocols efficient both in design and implementation. The ASH system provides mechanisms to address both of these needs. To manage complexity, it provides a simple interface that allows protoco...
Dynamic code generation is the creation of executable code at runtime. Such "on-the-fly" code generation is a powerful technique, enabling applications to use runtime information to improve performance by up to an order of magnitude [4, 8, 20, 22, 23]. Unfortunately, previous general-purpose dynamic code generation systems have been either ineffici...
Dynamic code generation is an important technique for improving the performance of software by exploiting information known only at run time. `C (Tick C) is a superset of ANSI C that, unlike most prior systems, allows high-level, efficient, and machineindependent specification of dynamically generated code. `C provides facilities for dynamic code g...
The defining tragedy of the operating systems community has been the definition of an operating system as software that both multiplexes and abstracts physical resources. The view that the OS should abstract the hardware is based on the assumption that it is possible both to define abstractions that are appropriate for all areas and to implement th...
This paper is a short tutorial on VCODE, a fast, portable dynamic code generation system. It should be read after [1]. 1 Introduction The VCODE system is a set of C macros and support functions that allow programmers to portably and efficiently generate code at runtime. The VCODE interface is that of an idealized load-store RISC architecture. VCODE...
Programmers have traditionally been passive users of compilers, rather than active exploiters of their transformational abilities. This paper presents MAGIK, a system that allows programmers to easily and modularly incorporate applicationspecific extensions into the compilation process. The MAGIK system gives programmers two significant capabilitie...
tcc is a compiler that provides efficient and high-level access to dynamic code generation. It implements the `C ("Tick-C") programming language, an extension of ANSI C that supports dynamic code generation [15]. `C gives power and flexibility in specifying dynamically generated code: whereas most other systems use annotations to denote run-time in...
To provide modularity and performance, operating system kernels should have only minimal embedded functionality. Today's operating systems are large, inefficient and, most importantly, inflexible. In our view, most operating system performance and flexibility problems can be eliminated simply by pushing the operating system interface lower. Our goa...
It has long been thought that coarse-grain parallelism is much more efficient than fine-grain parallelism due to the overhead of process (thread) creation, context switching, and synchronization. On the other hand, there are several advantages to fine-grain parallelism: architecture independence, ease of programming, ease of use as a target for cod...
On traditional operating systems only trusted software such as privileged servers or the kernel can manage resources. This thesis proposes a new approach, the exokernel architecture, which makes resource management unprivileged but safe by separating management from protection: an exokernel protects resources, while untrusted application-level soft...
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminating copies) and send messages (thereby reducing send...
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exokernel system that allows specialized applications to achieve high performance without sacrificing the performance of unmodified UNIX programs. It eva...
tcc is a compiler that provides efficient and high-level access to dynamic code generation. It implements the 'C ("Tick-C") programming language, an extension of ANSI C that supports dynamic code generation [15]. 'C gives power and flexibility in specifying dynamically generated code: whereas most other systems use annotations to denote run-time in...
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exokernel system that allows specialized applications to achieve high performance without sacrificing the performance of unmod- ified UNIX programs. It e...
Chairman, Departmental Committee on Graduate Students
Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibility of packet filters [18] and the speed of hand-...
Application-specific safe message handlers (ASHs) are designed to provide applications with hardware-level network performance. ASHs are user-written code fragments that safely and efficiently execute in the kernel in response to message arrival. ASHs can direct message transfers (thereby eliminating copies) and send messages (thereby reducing send...
Citations
... This requires viewing packet header data at a lower abstraction layer than specific fields in a predefined header, typically as bits or bytes in a stream [54]. Protocol independent algorithms have historically been designed for serial execution on sequential processors such as CPUs, and rely heavily on decision trees and branching to eliminate redundant computation [23,54]. The flexibility they afford comes at the expense of speed however, and general algorithms can typically sustain a fraction of the throughput achievable by hardware accelerated 5-tuple algorithms [74]. ...
... Afterwards, support for just-in-time compilation in Google's V8 engine contributed to a dramatic performance improvement, thereby paving the way to JavaScript further. While JavaScript engines implement a protected environment, new vulnerabilities are constantly found in web browsers [3,11,14,20]. Advanced time-dependent attacks have also been ported to the web [8,17], as well as more traditional attacks such as formjackers ...
... Par exemple, en annotant les fonctions avec l'information des verrous qu'elles prennent et qu'elles libèrent, il est possible de vérifier qu'en tout point d'une fonction, le nombre de verrous occupés ne dépend pas du chemin suivi pour arriver à ce point (ce qui pourrait être le signe que le long de certains chemins d'exécution, tous les verrous pris ne sont pas libérés comme il faut) [96]. MECA [107] et CQUAL [30] ont été utilisés pour vérifier la bonne utilisation des pointeurs venant de l'espace utilisateur dans le noyau [47,107]. En effet, pour des raisons de sécurité, il est critique que le noyau soit parfaitement isolé des processus utilisateurs. ...
... Anomaly Detection: An alternative approach frames errors in terms of anomalies. Anomaly analysis leverages the observation from conventional programming languages that anomalous code is often wrong [Chilimbi and Ganapathy 2006;Dimitrov and Zhou 2009;Engler et al. 2001;Hangal and Lam 2002;Raz et al. 2002;Xie and Engler 2002]. This lets an analysis circumvent the difficulty of obtaining program correctness rules. ...
... For instance, the entire checker only consists of a few hundred lines of code but is sufficient to check many complicated software systems like Linux, LLVM, OpenJDK, which are written in various different languages (e.g., C, C++, Java). There are very few relevant research work in this perspective, and we are mainly inspired by the concept of "micro-grammar" proposed by Brown et al. [3]. The key insight is to abstract away irrelevant programming details through abstracting away language details. ...
... Many techniques have been developed to test the correctness of software systems. For example, test input generation explores the ranges of certain variables such as array indices and find the inputs to induce buffer overflow (Haller et al. 2013;Xie, Chou, and Engler 2003). Also, a common technique in software testing is to check for inconsistencies in parts of larger systems (Engler et al. 2001); Researchers discovered that implicit invariants exist for certain functions or modules, and their violations often result in invalid system states (Ernst et al. 2007;Martin, Livshits, and Lam 2005). ...
... For highly complex projects with systemic problems, there is a clear need for modernization. Modernization opportunities can be discovered by bug checkers [BNE16], but in the end, modernization solutions are implemented through system transformations. As a result, developers often face the need to perform code transformations over large bodies of code [SAE + 15, AL11]. ...
... Most tools can run automatically, and are easily adapted in modern development strategies as a result. Static analysis tools vary from robust and time-consuming analyses such as Fortify [156] (battlecard [13][14] and Veracode (battlecard 16) to light real-time analyses [157]. Several resources exist to help compare different tools, such as the OWASP list of source code analysis tools 22 and list of vulnerability scanning tools 23 , as well as Kompar 24 which allows easy comparison between static analysis tools. ...
... Then, the right hand side of the rule is synthesized by fixing the predicate and using Sketch [Solar-Lezama 2008]. Romano and Engler [2013] infer reduction rules to simplify expressions before passing them to a solver, thereby reducing the number of queries sent to, and subsequently time spent, in the solver. The rules are generated by symbolic program evaluation, and validated using a theorem prover. ...
... Extensive work has gone into data race detection in parallel programs using different techniques such as locket analysis based [20], [26]- [30], happens-before [31] relation based [2]- [4], [32], constraint solver based [33], offset span label based [16], [34], polyhedral model based [6]- [9], and MHP information based [8], [17], [18]. A majority of these techniques are either POSIX thread [35] based, or are specific to race detection in explicit parallelism of various programming languages, such as Java, C#, X10, and Chapel. ...