Michael D. Ernst

University of Washington Seattle, Seattle, Washington, United States

Are you Michael D. Ernst?

Claim your profile

Publications (169)18.17 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: It is a staple development practice to log system behavior. Numerous powerful model-inference algorithms have been proposed to aid developers in log analysis and system understanding. Unfortunately, existing algorithms are typically declared procedurally, making them difficult to understand, extend, and compare. This paper presents InvariMint, an approach to specify model-inference algorithms declaratively. We applied the InvariMint declarative approach to two model-inference algorithms. The evaluation results illustrate that InvariMint (1) leads to new fundamental insights and better understanding of existing algorithms, (2) simplifies creation of new algorithms, including hybrids that combine or extend existing algorithms, and (3) makes it easy to compare and contrast previously published algorithms. InvariMint’s declarative approach can outperform procedural implementations. For example, on a log of 50,000 events, InvariMint’s declarative implementation of the kTails algorithm completes in 12 seconds, while a procedural implementation completes in 18 minutes. We also found that InvariMint’s declarative version of the Synoptic algorithm can be over 170 times faster than the procedural implementation.
    IEEE Transactions on Software Engineering 04/2015; 41(4):408-428. DOI:10.1109/TSE.2014.2369047 · 2.29 Impact Factor
  • IEEE Transactions on Software Engineering 01/2015; DOI:10.1109/TSE.2015.2417161 · 2.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A good test suite is one that detects real faults. Because the set of faults in a program is usually unknowable, this definition is not useful to practitioners who are creating test suites, nor to researchers who are creating and evaluating tools that generate test suites. In place of real faults, testing research often uses mutants, which are artificial faults — each one a simple syntactic variation — that are systematically seeded throughout the program under test. Mutation analysis is appealing because large numbers of mutants can be automatically-generated and used to compensate for low quantities or the absence of known real faults. Unfortunately, there is little experimental evidence to support the use of mutants as a replacement for real faults. This paper investigates whether mutants are indeed a valid substitute for real faults, i.e., whether a test suite's ability to detect mutants is correlated with its ability to detect real faults that developers have fixed. Unlike prior studies, these investigations also explicitly consider the conflating effects of code coverage on the mutant detection rate. Our experiments used 357 real faults in 5 open-source applications that comprise a total of 321,000 lines of code. Furthermore, our experiments used both developer-written and automatically-generated test suites. The results show a statistically significant correlation between mutant detection and real fault detection, independently of code coverage. The results also give concrete suggestions on how to improve mutation analysis and reveal some inherent limitations.
    Symposium on the Foundations of Software Engineering (FSE); 11/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current app stores distribute some malware to unsuspecting users, even though the app approval process may be costly and time-consuming. High-integrity app stores must provide stronger guar-antees that their apps are not malicious. We propose a verification model for use in such app stores to guarantee that the apps are free of malicious information flows. In our model, the software vendor and the app store auditor collaborate — each does tasks that are easy for her/him, reducing overall verification cost. The software vendor provides a behavioral specification of information flow (at a finer granularity than used by current app stores) and source code annotated with information-flow type qualifiers. A flow-sensitive, context-sensitive information-flow type system checks the informa-tion flow type qualifiers in the source code and proves that only information flows in the specification can occur at run time. The app store auditor uses the vendor-provided source code to manually verify declassifications. We have implemented the information-flow type system for An-droid apps written in Java, and we evaluated both its effectiveness at detecting information-flow violations and its usability in practice. In an adversarial Red Team evaluation, we analyzed 72 apps (576,000 LOC) for malware. The 57 Trojans among these had been written specifically to defeat a malware analysis such as ours. Nonetheless, our information-flow type system was effective: it detected 96% of malware whose malicious behavior was related to information flow and 82% of all malware. In addition to the adversarial evaluation, we evaluated the practicality of using the collaborative model. The programmer annotation burden is low: 6 annotations per 100 LOC. Every sound analysis requires a human to review potential false alarms, and in our experiments, this took 30 minutes per 1,000 LOC for an auditor unfamiliar with the app.
    Conference on Computer and Communications Security (CCS); 11/2014
  • Source
    René Just, Michael D. Ernst, Gordon Fraser
    [Show abstract] [Hide abstract]
    ABSTRACT: Mutation analysis evaluates a testing technique by measur- ing how well it detects seeded faults (mutants). Mutation analysis is hampered by inherent scalability problems — a test suite is executed for each of a large number of mutants. Despite numerous optimizations presented in the literature, this scalability issue remains, and this is one of the reasons why mutation analysis is hardly used in practice. Whereas most previous optimizations attempted to stati- cally reduce the number of executions or their computational overhead, this paper exploits information available only at run time to further reduce the number of executions. First, state infection conditions can reveal — with a single test execution of the unmutated program — which mutants would lead to a different state, thus avoiding unnecessary test executions. Second, determining whether an infected execution state propagates can further reduce the number of executions. Mutants that are embedded in compound expressions may infect the state locally without affecting the outcome of the compound expression. Third, those mutants that do infect the state can be partitioned based on the resulting infected state — if two mutants lead to the same infected state, only one needs to be executed as the result of the other can be inferred. We have implemented these optimizations in the Major mu- tation framework and empirically evaluated them on 14 open source programs. The optimizations reduced the mutation analysis time by 40% on average.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most programming languages support format strings, but their use is error-prone. Using the wrong format string syntax, or passing the wrong number or type of arguments, leads to unintelligible text output, program crashes, or security vulnerabilities. This paper presents a type system that guarantees that calls to format string APIs will never fail. In Java, this means that the API will not throw exceptions. In C, this means that the API will not return negative values, corrupt memory, etc. We instantiated this type system for Java’s Formatter API, and evaluated it on 6 large and well-maintained open-source projects. Format string bugs are common in practice (our type system found 104 bugs), and the annotation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).
  • [Show abstract] [Hide abstract]
    ABSTRACT: In a test suite, all the test cases should be independent: no test should affect any other test’s result, and running the tests in any order should produce the same test results. Techniques such as test prioritization generally assume that the tests in a suite are independent. Test dependence is a little-studied phenomenon. This paper presents five results related to test dependence. First, we characterize the test dependence that arises in practice. We studied 96 real-world dependent tests from 5 issue tracking systems. Our study shows that test dependence can be hard for programmers to identify. It also shows that test dependence can cause non-trivial consequences, such as masking program faults and leading to spurious bug reports. Second, we formally define test dependence in terms of test suites as ordered sequences of tests along with explicit environments in which these tests are executed. We formulate the problem of detecting dependent tests and prove that a useful special case is NP-complete. Third, guided by the study of real-world dependent tests, we propose and compare four algorithms to detect dependent tests in a test suite. Fourth, we applied our dependent test detection algorithms to 4 real-world programs and found dependent tests in each human-written and automatically-generated test suite. Fifth, we empirically assessed the impact of dependent tests on five test prioritization techniques. Dependent tests affect the output of all five techniques; that is, the reordered suite fails even though the original suite did not.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Java supports format strings, but their use is error prone because: Java’s type system does not find any but the most trivial mistakes, Java’s format methods fail silently, and for- mat methods are often executed infrequently. This paper presents the Format String Checker that is based on the format string type system presented in [3]. The Format String Checker guarantees that calls to Java’s Formatter API will not throw exceptions. We evaluate the Format String Checker on 6 large and well-maintained open-source projects. Format string bugs are common in practice (we found 104 bugs), and the an- notation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).
  • Source
    René Just, Darioush Jalali, Michael D. Ernst
  • [Show abstract] [Hide abstract]
    ABSTRACT: Compared to desktop and web applications, mobile applications have so far been developed in an extremely siloed environment. The apps running on our phone are developed by a single entity with operating system protections between sharing of data or code between programs. However, application extensibility is often desired. In the web, a secondary ecosystem flourishes around browser extensions, enabling users to customize the web as they wish. This paper presents BladeDroid, a system enabling user customization of mobile applications, using a novel combination of bytecode rewriting and dynamic class loading. We describe four extensions that we have built to evaluate BladeDroid's usability, robustness, and performance.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In a distributed system, the hosts execute concurrently, generating asynchronous logs that are challenging to comprehend. We present two tools: ShiVector to transparently add vector timestamps to distributed system logs, and ShiViz to help developers understand distributed system logs by visualizing them as space-time diagrams. ShiVector is the first tool to offer automated vector timestamp instrumentation without modifying source code. The vector-timestamped logs capture partial ordering information, useful for analysis and comprehension. ShiViz space-time diagrams are simple to understand and interactive — the user can explore the log through the visualization to understand complex system behavior. We applied ShiVector and ShiViz to two systems and found that they aid developers in understanding and debugging.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation. To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Contracts are a popular tool for specifying the functional behavior of software. This paper characterizes the contracts that developers write, the contracts that developers could write, and how a developer reacts when shown the difference. This paper makes three research contributions based on an investigation of open-source projects' use of Code Contracts. First, we characterize Code Contract usage in practice. For example, approximately three-fourths of the Code Contracts are basic checks for the presence of data. We discuss similarities and differences in usage across the projects, and we identify annotation burden, tool support, and training as possible explanations based on developer interviews. Second, based on contracts automatically inferred for four of the projects, we find that developers underutilize contracts for expressing state updates, object state indicators, and conditional properties. Third, we performed user studies to learn how developers decide which contracts to enforce. The developers used contract suggestions to support their existing use cases with more expressive contracts. However, the suggestions did not lead them to experiment with other use cases for which contracts are better-suited. In support of the research contributions, the paper presents two engineering contributions: (1) Celeriac, a tool for generating traces of .NET programs compatible with the Daikon invariant detection tool, and (2) Contract Inserter, a Visual Studio add-in for discovering and inserting likely invariants as Code Contracts.
  • Sai Zhang, Michael D. Ernst
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern software often exposes configuration options that enable users to customize its behavior. During software evolution, developers may change how the configuration options behave. When upgrading to a new software version, users may need to re-configure the software by changing the values of certain configuration options. This paper addresses the following question during the evolution of a configurable software system: which configuration options should a user change to maintain the software's desired behavior? This paper presents a technique (and its tool implementation, called ConfSuggester) to troubleshoot configuration errors caused by software evolution. ConfSuggester uses dynamic profiling, execution trace comparison, and static analysis to link the undesired behavior to its root cause - a configuration option whose value can be changed to produce desired behavior from the new software version. We evaluated ConfSuggester on 8 configuration errors from 6 configurable software systems written in Java. For 6 errors, the rootcause configuration option was ConfSuggester's first suggestion. For 1 error, the root cause was ConfSuggester's third suggestion. The root cause of the remaining error was ConfSuggester's sixth suggestion. Overall, ConfSuggester produced significantly better results than two existing techniques. ConfSuggester runs in just a few minutes, making it an attractive alternative to manual debugging.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Students and faculty alike at all education levels are clearly spending much more of their time interacting with computing and communication tools than with each other. Is this good? Are all uses of computational technology in education helpful, and ...
    Proceedings of the 45th ACM technical symposium on Computer science education; 03/2014
  • Brian Burg, Richard Bailey, Andrew J. Ko, Michael D. Ernst
    [Show abstract] [Hide abstract]
    ABSTRACT: During debugging, a developer must repeatedly and manually reproduce faulty behavior in order to inspect different facets of the program's execution. Existing tools for reproducing such behaviors prevent the use of debugging aids such as breakpoints and logging, and are not designed for interactive, random-access exploration of recorded behavior. This paper presents Timelapse, a tool for quickly recording, reproducing, and debugging interactive behaviors in web applications. Developers can use Timelapse to browse, visualize, and seek within recorded program executions while simultaneously using familiar debugging tools such as breakpoints and logging. Testers and end-users can use Timelapse to demonstrate failures in situ and share recorded behaviors with developers, improving bug report quality by obviating the need for detailed reproduction steps. Timelapse is built on Dolos, a novel record/replay infrastructure that ensures deterministic execution by capturing and reusing program inputs both from the user and from external sources such as the network. Dolos introduces negligible overhead and does not interfere with breakpoints and logging. In a small user evaluation, participants used Timelapse to accelerate existing reproduction activities, but were not significantly faster or more successful in completing the larger tasks at hand. Together, the Dolos infrastructure and Timelapse developer tool support systematic bug reporting and debugging practices.
    Proceedings of the 26th annual ACM symposium on User interface software and technology; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Developers use analysis tools to help write, debug, and understand software systems under development. A developer's change to the system source code may affect analysis results. Typically, to learn those effects, the developer must explicitly initiate the analysis. This may interrupt the developer's workflow and/or the delay until the developer learns the implications of the change. The situation is even worse for impure analyses — ones that modify the code on which it runs — because such analyses block the developer from working on the code. This paper presents Codebase Replication, a novel approach to easily convert an offline analysis — even an impure one — into a continuous analysis that informs the developer of the implications of recent changes as quickly as possible after the change is made. Codebase Replication copies the developer's codebase, incrementally keeps this copy codebase in sync with the developer's codebase, makes that copy codebase available for offline analyses to run without disturbing the developer and without the developer's changes disturbing the analyses, and makes analysis results available to be presented to the developer. We have implemented Codebase Replication in Solstice, an open-source, publicly-available Eclipse plug-in. We have used Solstice to convert three offline analyses — FindBugs, PMD, and unit testing — into continuous ones. Each conversion required on average 436 NCSL and took, on average, 18 hours. Solstice-based analyses experience no more than 2.5 milliseconds of runtime overhead per developer action.
    Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering; 08/2013
  • Sai Zhang, Hao Lü, Michael D. Ernst
    [Show abstract] [Hide abstract]
    ABSTRACT: A workflow is a sequence of UI actions to complete a specific task. In the course of a GUI application's evolution, changes ranging from a simple GUI refactoring to a complete rearchitecture can break an end-user's well-established workflow. It can be challenging to find a replacement workflow. To address this problem, we present a technique (and its tool implementation, called FlowFixer) that repairs a broken workflow. FlowFixer uses dynamic profiling, static analysis, and random testing to suggest a replacement UI action that fixes a broken workflow. We evaluated FlowFixer on 16 broken workflows from 5 realworld GUI applications written in Java. In 13 workflows, the correct replacement action was FlowFixer's first suggestion. In 2 workflows, the correct replacement action was FlowFixer's second suggestion. The remaining workflow was un-repairable. Overall, FlowFixer produced significantly better results than two alternative approaches.
    Proceedings of the 2013 International Symposium on Software Testing and Analysis; 07/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most graphical user interface (GUI) libraries forbid accessing UI elements from threads other than the UI event loop thread. Violating this requirement leads to a program crash or an inconsistent UI. Unfortunately, such errors are all too common in GUI programs. We present a polymorphic type and effect system that prevents non-UI threads from accessing UI objects or invoking UI-thread-only methods. The type system still permits non-UI threads to hold and pass references to UI objects. We implemented this type system for Java and annotated 8 Java programs (over 140KLOC) for the type system, including several of the most popular Eclipse plugins. We confirmed bugs found by unsound prior work, found an additional bug and code smells, and demonstrated that the annotation burden is low. We also describe code patterns our effect system handles less gracefully or not at all, which we believe offers lessons for those applying other effect systems to existing code.
    Proceedings of the 27th European Conference on Object-Oriented Programming (ECOOP'13); 07/2013
  • Source
    Colin S Gordon, Michael D Ernst, Dan Grossman
    [Show abstract] [Hide abstract]
    ABSTRACT: Reasoning about side effects and aliasing is the heart of verifying imperative programs. Unrestricted side effects through one reference can invalidate assumptions about an alias. We present a new type system approach to reasoning about safe assumptions in the presence of aliasing and side effects, unifying ideas from reference immutability type systems and rely-guarantee program logics. Our approach, rely-guarantee references, treats multiple references to shared objects similarly to multiple threads in rely-guarantee program logics. We propose statically associating rely and guarantee conditions with individual references to shared objects. Multiple aliases to a given object may coexist only if the guarantee condition of each alias implies the rely condition for all other aliases. We demonstrate that existing reference immutability type systems are special cases of rely-guarantee references. In addition to allowing precise control over state modification, rely-guarantee references allow types to depend on mutable data while still permitting flexible aliasing. Dependent types whose denotation is stable over the actions of the rely and guarantee conditions for a reference and its data will not be invalidated by any action through any alias. We demonstrate this with refinement (subset) types that may depend on mutable data. As a special case, we derive the first reference immutability type system with dependent types over immutable data. We show soundness for our approach and describe experience using rely-guarantee references in a dependently-typed monadic DSL in Coq.
    Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'13); 06/2013

Publication Stats

5k Citations
18.17 Total Impact Points

Institutions

  • 1997–2014
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, Washington, United States
  • 2003–2009
    • Massachusetts Institute of Technology
      • Computer Science and Artificial Intelligence Laboratory
      Cambridge, MA, United States
    • Distributed Artificial Intelligence Laboratory
      Berlín, Berlin, Germany