Michael D. Ernst

University of Washington Seattle, Seattle, Washington, United States

Are you Michael D. Ernst?

Claim your profile

Publications (142)6.09 Total impact

  • Proceedings of the 27th European Conference on Object-Oriented Programming (ECOOP'13); 07/2013
  • Source
    Colin S Gordon, Michael D Ernst, Dan Grossman
    Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'13); 06/2013
  • Source
    04/2013;
  • Source
    Colin S Gordon, Michael D Ernst, Dan Grossman
    03/2013;
  • Source
    Eric Spishak, Werner Dietl, Michael D. Ernst
    [show abstract] [hide abstract]
    ABSTRACT: Regular expressions are used to match and extract text. It is easy for developers to make syntactic mistakes when writing regular expressions, because regular expressions are often complex and different across programming languages. Such errors result in exceptions at run time, and there is currently no static support for preventing them. This paper describes practical experience designing and using a type system for regular expressions. This type system validates regular expression syntax and capturing group usage at compile time instead of at run time---ensuring the absence of PatternSyntaxExceptions from invalid syntax and IndexOutOfBoundsExceptions from accessing invalid capturing groups. Our implementation is publicly available and supports the full Java language. In an evaluation on five open-source Java applications (480kLOC), the type system was easy to use, required less than one annotation per two thousand lines, and found 56 previously-unknown bugs.
    01/2012;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Program verification is the only way to be certain that a given piece of software is free of (certain types of) errors --- errors that could otherwise disrupt operations in the field. To date, formal verification has been done by specially-trained engineers. Labor costs have heretofore made formal verification too costly to apply beyond small, critical software components. Our goal is to make verification more cost-effective by reducing the skill set required for program verification and increasing the pool of people capable of performing program verification. Our approach is to transform the verification task (a program and a goal property) into a visual puzzle task --- a game --- that gets solved by people. The solution of the puzzle is then translated back into a proof of correctness. The puzzle is engaging and intuitive enough that ordinary people can through game-play become experts. This paper presents a status report on the Verification Games project and our Pipe Jam prototype game.
    01/2012;
  • Jingyue Li, Michael D. Ernst
    [show abstract] [hide abstract]
    ABSTRACT: Developers often copy, or clone, code in order to reuse or modify functionality. When they do so, they also clone any bugs in the original code. Or, different developers may independently make the same mistake. As one example of a bug, multiple products in a product line may use a component in a similar wrong way. This paper makes two contributions. First, it presents an empirical study of cloned buggy code. In a large industrial product line, about 4% of the bugs are duplicated across more than one product or file. In three open source projects (the Linux kernel, the Git version control system, and the PostgreSQL database) we found 282, 33, and 33 duplicated bugs, respectively. Second, this paper presents a tool, CBCD, that searches for code that is semantically identical to given buggy code. CBCD tests graph isomorphism over the Program Dependency Graph (PDG) representation and uses four optimizations. We evaluated CBCD by searching for known clones of buggy code segments in the three projects and compared the results with text-based, token-based, and AST-based code clone detectors, namely Simian, CCFinder, Deckard, and CloneDR. The evaluation shows that CBCD is fast when searching for possible clones of the buggy code in a large system, and it is more precise for this purpose than the other code clone detectors.
    Proceedings - International Conference on Software Engineering 01/2012;
  • Source
    Yuriy Brun, Reid Holmes, Michael D. Ernst, David Notkin
    [show abstract] [hide abstract]
    ABSTRACT: The benefits of collaborative development are reduced by the cost of resolving conflicts. We posit that reducing the time between when developers introduce and learn about conflicts reduces this cost. We outline the state-of-the-practice of man- aging and resolving conflicts and describe how it can be im- proved by available state-of-the-art tools. Then, we describe our vision for future tools that can predict likely conflicts be- fore they are even created, warning developers and allowing them to avoid potentially costly situations. Author Keywords: collaborate development; collaborative conflicts; conflict prediction; conflict detection
    01/2012;
  • ACM Transactions on Software Engineering and Methodology-TOSEM. 01/2012; 21(4):25-1.
  • [show abstract] [hide abstract]
    ABSTRACT: Modern integrated development environments (IDEs) offer recommendations to aid development, such as auto-completions, refactorings, and fixes for compilation errors. Recommendations for each code location are typically computed independently of the other locations. We propose that an IDE should consider the whole codebase, not just the local context, before offering recommendations for a particular location. We demonstrate the potential benefits of our technique by presenting four concrete scenarios in which the Eclipse IDE fails to provide proper Quick Fixes at relevant locations, even though it offers those fixes at other locations. We describe a technique that can augment an existing IDE's recommendations to account for non-local information. For example, when some compilation errors depend on others, our technique helps the developer decide which errors to resolve first.
    Proceedings - International Conference on Software Engineering 01/2012;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Many automatic testing, analysis, and verification techniques for programs can effectively be reduced to a constraint-generation phase followed by a constraint-solving phase. This separation of concerns often leads to more effective and maintainable software reliability tools. The increasing efficiency of off-the- shelf constraint solvers makes this approach even more compelling. However, there are few effective and sufficiently expressive off-the-shelf solvers for string constraints generated by analysis of string-manipulating programs, and hence researchers end up implementing their own ad-hoc solvers. Thus, there is a clear need for an effective and expressive string-constraint solver that can be easily integrated into a variety of applications. To fulfill this need, we designed and implemented Hampi, an efficient and easy-to-use string solver. Users of the Hampi string solver specify constraints using membership predicate over regular expressions, context-free grammars, and equality/dis-equality between string terms. These terms are constructed out of string constants, bounded string variables, and typical string operations such as concatenation and substring extraction. Hampi takes such a constraint as input and decides whether it is satisfiable or not. If an input constraint is satisfiable, Hampi generates a satsfying assignment for the string variables that occur in it. We demonstrate Hampi’s expressiveness and efficiency by applying it to program analysis and automated testing: We used Hampi in static and dynamic analyses for finding SQL injection vulnerabilities in Web applications with hundreds of thousands of lines of code.We also used Hampi in the context of automated bug finding in C programs using dynamic systematic testing (also known as concolic testing). Hampi’s source code, documentation, and experimental data are available at http://people.csail.mit.edu/akiezun/hampi .
    08/2011: pages 1-19;
  • Source
    Fausto Spoto, Michael D. Ernst
    [show abstract] [hide abstract]
    ABSTRACT: A raw object is partially initialized, with only some of its fields set to legal values. A raw object may violate its object invariants, such as that a given field is non-null. Programs often need to manipulate partially-initialized objects, but they must do so with care. Furthermore, analyses must be aware of rawness. For in- stance, software verification cannot depend on object invariants for raw objects. We present a static analysis that infers a safe over-approximation of the program variables, fields, or array elements that, at run- time, might hold non-fully initialized objects. Our formalization is flow-sensitive and considers the exception flow in the analyzed programs. We have proved the analysis to be sound. We have also implemented our analysis, in a tool called JULIA that computes both nullness and rawness information. We have evaluated JULIA on over 50K lines of code. We have compared its output to manually-written nullness and rawness information, and to an independently-written type-checking tool that checks nullness and rawness. JULIA's output is accurate and, we believe, useful both to programmers and to static analyses.
    Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011; 01/2011
  • Source
    Sai Zhang, David Saff, Yingyi Bu, Michael D. Ernst
    [show abstract] [hide abstract]
    ABSTRACT: In an object-oriented program, a unit test often consists of a sequence of method calls that create and mutate objects, then use them as arguments to a method under test. It is challenging to automatically generate sequences that are legal and behaviorally-diverse, that is, reaching as many different program states as possible. This paper proposes a combined static and dynamic automated test generation approach to address these problems, for code without a formal specification. Our approach first uses dynamic analysis to infer a call sequence model from a sample execution, then uses static analysis to identify method dependence relations based on the fields they may read or write. Finally, both the dynamically-inferred model (which tends to be accurate but incomplete) and the statically-identified dependence information (which tends to be conservative) guide a random test generator to create legal and behaviorally-diverse tests. Our Palus tool implements this testing approach. We compared its effectiveness with a pure random approach, a dynamic-random approach (without a static phase), and a static-random approach (without a dynamic phase) on several popular open-source Java programs. Tests generated by Palus achieved higher structural coverage and found more bugs. Palus is also internally used in Google. It has found 22 previously-unknown bugs in four well-tested Google products.
    Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, July 17-21, 2011; 01/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Logging is a powerful method for capturing program activity and state during an execution. However, log inspection remains a tedious activity, with developers often piecing together what went on from multiple log lines and across many files. This paper describes Synoptic, a tool that takes logs as input and outputs a finite state machine that models the process generating the logs. The paper overviews the model inference algorithms. Then, it describes the Synoptic tool, which is designed to support a rich log exploration workflow.
    SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011; 01/2011
  • Source
    Michael Bayne, Richard Cook, Michael D. Ernst
    [show abstract] [hide abstract]
    ABSTRACT: Developers who write code in a statically typed language are denied the ability to obtain dynamic feedback by executing their code during periods when it fails the static type checker. They are further confined to the static typing discipline during times in the development process where it does not yield the highest productivity. If they opt instead to use a dynamic language, they forgo the many benefits of static typing, including machine-checked documentation, improved correctness and reliability, tool support (such as for refactoring), and better runtime performance. We present a novel approach to giving developers the benefits of both static and dynamic typing, throughout the development process, and without the burden of manually separating their program into staticallyand dynamically-typed parts. Our approach, which is intended for temporary use during the development process, relaxes the static type system and provides a semantics for many type-incorrect programs. It defers type errors to run time, or suppresses them if they do not affect runtime semantics. We implemented our approach in a publicly available tool, DuctileJ, for the Java language. In case studies, DuctileJ conferred benefits both during prototyping and during the evolution of existing code.
    Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011; 01/2011
  • Source
    Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings; 01/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Computer systems are often difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process and documentation is often incomplete and out of sync with the implementation. This paper presents Synoptic, a tool that helps developers by inferring a concise and accurate system model. Unlike most related work, Synoptic does not require developer-written scenarios, specifications, negative execution examples, or other complex user input. Synoptic processes the logs most systems already produce and requires developers only to specify a set of regular expressions for parsing the logs. Synoptic has two unique features. First, the model it produces satisfies three kinds of temporal invariants mined from the logs, improving accuracy over related approaches. Second, Synoptic uses refinement and coarsening to explore the space of models. This improves model efficiency and precision, compared to using just one approach. In this paper, we formally prove that Synoptic always produces a model that satisfies exactly the temporal invariants mined from the log, and we argue that it does so efficiently. We empirically evaluate Synoptic through two user experience studies, one with a developer of a large, real-world system and another with 45 students in a distributed systems course. Developers used Synoptic-generated models to verify known bugs, diagnose new bugs, and increase their confidence in the correctness of their systems. None of the developers in our evaluation had a background in formal methods but were able to easily use Synoptic and detect implementation bugs in as little as a few minutes.
    SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011; 01/2011
  • Source
    Yuriy Brun, Reid Holmes, Michael D. Ernst, David Notkin
    [show abstract] [hide abstract]
    ABSTRACT: During collaborative development, individual developers can create conflicts in their copies of the code. Such conflicting edits are frequent in practice, and resolving them can be costly. We present Crystal, a tool that proactively examines developers' code and precisely identifies and reports on textual, compilation, and behavioral conflicts. When conflicts are present, Crystal enables developers to resolve them more quickly, and therefore at a lesser cost. When conflicts are absent, Crystal increases the developers' confidence that it is safe to merge their code. Crystal uses an unobtrusive interface to deliver pertinent information about conflicts. It informs developers about actions that would address the conflicts and about people with whom they should communicate.
    SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011; 01/2011
  • Source
    Yuriy Brun, Reid Holmes, Michael D. Ernst, David Notkin
    [show abstract] [hide abstract]
    ABSTRACT: Collaborative development can be hampered when conflicts arise because developers have inconsistent copies of a shared project. We present an approach to help developers identify and resolve conflicts early, before those conflicts become severe and before relevant changes fade away in the developers' memories. This paper presents three results. First, a study of open-source systems establishes that conflicts are frequent, persistent, and appear not only as overlapping textual edits but also as subsequent build and test failures. The study spans nine open-source systems totaling 3.4 million lines of code; our conflict data is derived from 550,000 development versions of the systems. Second, using previously-unexploited information, we precisely diagnose important classes of conflicts using the novel technique of speculative analysis over version control operations. Third, we describe the design of Crystal, a publicly-available tool that uses speculative analysis to make concrete advice unobtrusively available to developers, helping them identify, manage, and prevent conflicts.
    SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011; 01/2011
  • Source
    Werner Dietl, Michael D. Ernst, Peter Müller
    [show abstract] [hide abstract]
    ABSTRACT: Object ownership is useful for many applications, including program verification, thread synchronization, and memory management. However, the annotation overhead of ownership type systems hampers their widespread application. This paper addresses this issue by presenting a tunable static type inference for Generic Universe Types. In contrast to classical type systems, ownership types have no single most general typing. Our inference chooses among the legal typings via heuristics. Our inference is tunable: users can indicate a preference for certain typings by adjusting the heuristics or by supplying partial annotations for the program. We present how the constraints of Generic Universe Types can be encoded as a boolean satisfiability (SAT) problem and how a weighted Max-SAT solver finds a correct Universe typing that optimizes the weights. We implemented the static inference tool, applied our inference tool to four real-world applications, and inferred interesting ownership structures.
    ECOOP 2011 - Object-Oriented Programming - 25th European Conference, Lancaster, UK, July 25-29, 2011 Proceedings; 01/2011

Publication Stats

3k Citations
6.09 Total Impact Points

Institutions

  • 1999–2011
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, Washington, United States
  • 2003–2009
    • Massachusetts Institute of Technology
      • Computer Science and Artificial Intelligence Laboratory
      Cambridge, MA, United States
    • Distributed Artificial Intelligence Laboratory
      Berlín, Berlin, Germany