Westley Weimer's research while affiliated with University of Michigan and other places

Publications (159)

Preprint
Full-text available
When vulnerabilities are discovered after software is deployed, source code is often unavailable, and binary patching may be required to mitigate the vulnerability. However, manually patching binaries is time-consuming, requires significant expertise, and does not scale to the rate at which new vulnerabilities are discovered. To address these probl...
Preprint
Full-text available
Cannabis is one of the most common mind-altering substances. It is used both medicinally and recreationally and is enmeshed in a complex and changing legal landscape. Anecdotal evidence suggests that some software developers may use cannabis to aid some programming tasks. At the same time, anti-drug policies and tests remain common in many software...
Article
Full-text available
During the past decade, virtualization-based (e.g., virtual machine introspection) and hardware-assisted approaches (e.g., x86 SMM and ARM TrustZone) have been used to defend against low-level malware such as rootkits. However, these approaches either require a large Trusted Computing Base (TCB) or they must share CPU time with the operating system...
Article
Full-text available
Understanding how developers carry out different computer science activities with objective measures can help to improve productivity and guide the use and development of supporting tools in software engineering. In this article, we present two controlled experiments involving 112 students to explore multiple computing activities (code comprehensio...
Preprint
Full-text available
Understanding how novices reason about coding at a neurological level has implications for training the next generation of software engineers. In recent years, medical imaging has been increasingly employed to investigate patterns of neural activity associated with coding activity. However, such studies have focused on advanced undergraduates and p...
Preprint
Full-text available
There is a growing body of malware samples that evade automated analysis and detection tools. Malware may measure fingerprints ("artifacts") of the underlying analysis tool or environment and change their behavior when artifacts are detected. While analysis tools can mitigate artifacts to reduce exposure, such concealment is expensive. However, not...
Article
What code navigation strategies do developers use and what mechanisms do they employ to find relevant information Do their strategies evolve over the course of longer tasks Answers to these questions can provide insight to educators and software tool designers to support a wide variety of programmers as they tackle increasingly-complex software sys...
Article
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceed- ings) there was a wide ranging discussion at the eighth inter- national Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the International Conference on Software En- gineering on Friday 3rd July 2020). Topics included industry...
Conference Paper
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included...
Preprint
Full-text available
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included...
Article
Evolution provides a creative fount of complex and subtle adaptations that often surprise the scientists who discover them. However, the creativity of evolution is not limited to the natural world: Artificial organisms evolving in computational environments have also elicited surprise and wonder from the researchers studying them. The process of ev...
Article
Full-text available
We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Confer- ence on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI res...
Chapter
During the past decade, virtualization-based (e.g., virtual machine introspection) and hardware-assisted approaches (e.g., x86 SMM and ARM TrustZone) have been used to defend against low-level malware such as rootkits. However, these approaches either require a large Trusted Computing Base (TCB) or they must share CPU time with the operating system...
Preprint
Full-text available
We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Conference on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI resea...
Conference Paper
Programs written for hardware accelerators can often be difficult to debug. Without adequate tool support, program maintenance tasks such as fault localization and debugging can be particularly challenging. In this work, we focus on supporting hardware that is specialized for finite automata processing, a computational paradigm that has accelerated...
Preprint
We prove that certain formulations of program synthesis and reachability are equivalent. Specifically, our constructive proof shows the reductions between the template-based synthesis problem, which generates a program in a pre-specified form, and the reachability problem, which decides the reachability of a program location. This establishes a lin...
Article
As the hardware found within data centers becomes more heterogeneous, it is important to allow for efficient execution of algorithms across architectures. We present RAPID, a high-level programming language and combined imperative and declarative model for functionally- and performance-portable execution of sequential pattern-matching applications...
Conference Paper
Neutral networks in biology often contain diverse solutions with equal fitness, which can be useful when environments (requirements) change over time. In this paper, we present a method for studying neutral networks in software. In these networks, we find multiple solutions to held-out test cases (latent bugs), suggesting that neutral software netw...
Article
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to synthesize witness values that m...
Article
Full-text available
Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have o...
Article
We present MNCaRT, a comprehensive software ecosystem for the study and use of automata processing across hardware platforms. Tool support includes manipulation of automata, execution of complex machines, high-speed processing of NFAs and DFAs, and compilation of regular expressions. We provide engines to execute automata on CPUs (with VASim and In...
Article
Data centers account for a significant fraction of global energy consumption and represent a growing business cost. Most current approaches to reducing energy use in data centers treat it as a hardware, compiler, or scheduling problem. This article focuses instead on the software level, showing how to reduce the energy used by programs when they ex...
Conference Paper
The growing reliance on cloud-based services has led to increased focus on cloud security. Cloud providers must deal with concerns from customers about the overall security of their cloud infrastructures. In particular, an increasing number of cloud attacks target resource allocation in cloud environments. For example, vulnerabilities in a hypervis...
Article
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed...
Article
Automated repair techniques produce variant php interpreters, which should naturally serve as the tested interpreters. However, the answer to the question of what should serve as the testing interpreter is less obvious. php's default test harness configuration uses the same version of the interpreter for both the tested and testing interpreter. How...
Article
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed...
Article
Genetic improvement uses computational search to improve existing software while retaining its partial functionality. Genetic improvement has previously been concerned with improving a system with respect to all possible usage scenarios. In this paper, we show how genetic improvement can also be used to achieve specialisation to a specific set of u...
Conference Paper
We prove that certain formulations of program synthesis and reachability are equivalent. Specifically, our constructive proof shows the reductions between the template-based synthesis problem, which generates a program in a pre-specified form, and the reachability problem, which decides the reachability of a program location. This establishes a lin...
Article
We prove that certain formulations of program synthesis and reachability are equivalent. Specifically, our constructive proof shows the reductions between the template-based synthesis problem, which generates a program in a pre-specified form, and the reachability problem, which decides the reachability of a program location. This establishes a lin...
Conference Paper
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to synthesize witness values that m...
Article
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to synthesize witness values that m...
Article
Static type errors are a common stumbling block for newcomers to typed functional languages. We present a dynamic approach to explaining type errors by generating counterexample witness inputs that illustrate how an ill-typed program goes wrong. First, given an ill-typed function, we symbolically execute the body to dynamically synthesize witness v...
Conference Paper
We present RAPID, a high-level programming language and combined imperative and declarative model for programming pattern-recognition processors, such as Micron's Automata Processor (AP). The AP is a novel, non-Von Neumann architecture for direct execution of non-deterministic finite automata (NFAs), and has been demonstrated to provide substantial...
Conference Paper
Cyber security research has produced numerous artificial diversity techniques such as address space layout randomization, heap randomization, instruction-set randomization, and instruction location randomization. To be most effective, these techniques must be high entropy and secure from information leakage which, in practice, is often difficult to...
Article
We present RAPID, a high-level programming language and combined imperative and declarative model for programming pattern-recognition processors, such as Micron's Automata Processor (AP). The AP is a novel, non-Von Neumann architecture for direct execution of non-deterministic finite automata (NFAs), and has been demonstrated to provide substantial...
Article
We present RAPID, a high-level programming language and combined imperative and declarative model for programming pattern-recognition processors, such as Micron's Automata Processor (AP). The AP is a novel, non-Von Neumann architecture for direct execution of non-deterministic finite automata (NFAs), and has been demonstrated to provide substantial...
Article
We present RAPID, a high-level programming language and combined imperative and declarative model for programming pattern-recognition processors, such as Micron's Automata Processor (AP). The AP is a novel, non-Von Neumann architecture for direct execution of non-deterministic finite automata (NFAs), and has been demonstrated to provide substantial...
Conference Paper
Full-text available
Normalized cross-correlation template matching is used as a detection method in many scientific domains. To be practical, template matching must scale to large datasets while handling ambiguity, uncertainty, and noisy data. We propose a novel approach based on Dempster-Shafer (DS) Theory and MapReduce parallelism. DS Theory addresses conflicts betw...
Article
The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducib...
Article
Procedural shaders are a vital part of modern rendering systems. Despite their prevalence, however, procedural shaders remain sensitive to aliasing any time they are sampled at a rate below the Nyquist limit. Antialiasing is typically achieved through numerical techniques like supersampling or precomputing integrals stored in mipmaps. This paper ex...
Conference Paper
Unit tests for object-oriented classes can be generated automatically using search-based testing techniques. As the search algorithms are typically guided by structural coverage criteria, the resulting unit tests are often long and confusing, with possible negative implications for developer adoption of such test generation tools, and the difficult...
Conference Paper
We introduce a mutation-based approach to automatically discover and expose `deep' (previously unavailable) parameters that affect a program's runtime costs. These discovered parameters, together with existing (`shallow') parameters, form a search space that we tune using search-based optimisation in a bi-objective formulation that optimises both t...
Conference Paper
The speed with which newly discovered software vulnerabilities are patched is a critical factor in mitigating the harm caused by subsequent exploits. Unfortunately, software vendors are often slow or unwilling to patch vulnerabilities, especially in embedded systems which frequently have no mechanism for updating factory-installed firmware. The sit...
Conference Paper
Writing good unit tests can be tedious and error prone, but even once they are written, the job is not done: Developers need to reason about unit tests throughout software development and evolution, in order to diagnose test failures, maintain the tests, and to understand code written by other developers. Unreadable tests are more difficult to main...
Article
This article describes and evaluates DIG, a dynamic invariant generator that infers invariants from observed program traces, focusing on numerical and array variables. For numerical invariants, DIG supports both nonlinear equalities and inequalities of arbitrary degree defined over numerical program variables. For array invariants, DIG generates ne...
Article
We present a framework for learning new analytic BRDF models through Genetic Programming that we call genBRDF. This approach to reflectance modeling can be seen as an extension of traditional methods that rely either on a phenomenological or empirical process. Our technique augments the human effort involved in deriving mathematical expressions tha...
Article
Program invariants are important for defect detection, program verification, and program repair. However, existing techniques have limited support for important classes of invariants such as disjunctions, which express the semantics of conditional statements. We propose a method for generating disjunctive invariants over numerical domains, which ar...
Conference Paper
Genetic Improvement (GI) is a form of Genetic Programming that improves an existing program. We use GI to evolve a faster version of a C++ program, a Boolean satisfiability (SAT) solver called MiniSAT, specialising it for a particular problem class, namely Combinatorial Interaction Testing (CIT), using automated code transplantation. Our GI-evolved...
Article
Modern compilers typically optimize for executable size and speed, rarely exploring non-functional properties such as power efficiency. These properties are often hardware-specific, time-intensive to optimize, and may not be amenable to standard dataflow optimizations. We present a general post-compilation approach called Genetic Optimization Algor...