[Show abstract][Hide abstract] ABSTRACT: Implicit or indirect control flow is a transfer of control between procedures using some mechanism other than an explicit procedure call. Implicit control flow is a staple design pattern that adds flexibility to system design. However, it is challenging for a static analysis to compute or verify properties about a system that uses implicit control flow. This paper presents static analyses for two types of implicit control flow that frequently appear in Android apps: Java reflection and Android intents. Our analyses help to resolve where control flows and what data is passed. This information improves the precision of downstream analyses, which no longer need to make conservative assumptions about implicit control flow. We have implemented our techniques for Java. We enhanced an existing security analysis with a more precise treatment of reflection and intents. In a case study involving ten real-world Android apps that use both intents and reflection, the precision of the security analysis was increased on average by two orders of magnitude. The precision of two other downstream analyses was also improved.
International Conference on Automated Software Engineering (ASE); 11/2015
[Show abstract][Hide abstract] ABSTRACT: During software development, the sooner a developer learns how code changes affect program analysis results, the more helpful that analysis is. Manually invoking an analysis may interrupt the developer’s workflow or cause a delay before the developer learns the implications of the change. A better approach is continuous analysis tools that always provide up-to-date results. We present Codebase Replication, a technique that eases the implementation of continuous analysis tools by converting an existing offline analysis into an IDE-integrated, continuous tool with two desirable properties: isolation and currency. Codebase Replication creates and keeps in sync a copy of the developer’s codebase. The analysis runs on the copy codebase without disturbing the developer and without being disturbed by the developer’s changes. We developed Solstice, an open-source, publicly-available Eclipse plug-in that implements Codebase Replication. Solstice has less than 2.5 milliseconds overhead for most common developer actions. We used Solstice to implement four Eclipse-integrated continuous analysis tools based on the offline versions of FindBugs, PMD, data race detection, and unit testing. Each conversion required on average 710 LoC and 20 hours of implementation effort. Case studies indicate that Solstice-based continuous analysis tools are intuitive and easy-to-use.
[Show abstract][Hide abstract] ABSTRACT: It is a staple development practice to log system behavior. Numerous powerful model-inference algorithms have been proposed to aid developers in log analysis and system understanding. Unfortunately, existing algorithms are typically declared procedurally, making them difficult to understand, extend, and compare. This paper presents InvariMint, an approach to specify model-inference algorithms declaratively. We applied the InvariMint declarative approach to two model-inference algorithms. The evaluation results illustrate that InvariMint (1) leads to new fundamental insights and better understanding of existing algorithms, (2) simplifies creation of new algorithms, including hybrids that combine or extend existing algorithms, and (3) makes it easy to compare and contrast previously published algorithms. InvariMint’s declarative approach can outperform procedural implementations. For example, on a log of 50,000 events, InvariMint’s declarative implementation of the kTails algorithm completes in 12 seconds, while a procedural implementation completes in 18 minutes. We also found that InvariMint’s declarative version of the Synoptic algorithm can be over 170 times faster than the procedural implementation.
[Show abstract][Hide abstract] ABSTRACT: This paper reports on our experience teaching introductory programming by means of real-world data analysis. We have found that students can be motivated to learn programming and computer science concepts in order to analyze DNA, predict the outcome of elections, detect fraudulent data, suggest friends in a social network, determine the authorship of documents, and more. The approach is more than just a collection of "nifty assignments"; rather, it affects the choice of topics and pedagogy. This paper describes how our approach has been used at four diverse colleges and universities to teach CS majors and nonmajors alike. It outlines the types of assignments, which are based on problems from science, engineering, business, and the humanities. Finally, it offers advice for anyone trying to integrate the approach into their own institution.
[Show abstract][Hide abstract] ABSTRACT: A good test suite is one that detects real faults. Because the set of faults in a program is usually unknowable, this definition is not useful to practitioners who are creating test suites, nor to researchers who are creating and evaluating tools that generate test suites. In place of real faults, testing research often uses mutants, which are artificial faults — each one a simple syntactic variation — that are systematically seeded throughout the program under test. Mutation analysis is appealing because large numbers of mutants can be automatically-generated and used to compensate for low quantities or the absence of known real faults. Unfortunately, there is little experimental evidence to support the use of mutants as a replacement for real faults. This paper investigates whether mutants are indeed a valid substitute for real faults, i.e., whether a test suite's ability to detect mutants is correlated with its ability to detect real faults that developers have fixed. Unlike prior studies, these investigations also explicitly consider the conflating effects of code coverage on the mutant detection rate. Our experiments used 357 real faults in 5 open-source applications that comprise a total of 321,000 lines of code. Furthermore, our experiments used both developer-written and automatically-generated test suites. The results show a statistically significant correlation between mutant detection and real fault detection, independently of code coverage. The results also give concrete suggestions on how to improve mutation analysis and reveal some inherent limitations.
Symposium on the Foundations of Software Engineering (FSE); 11/2014
[Show abstract][Hide abstract] ABSTRACT: Current app stores distribute some malware to unsuspecting users, even though the app approval process may be costly and time-consuming. High-integrity app stores must provide stronger guar-antees that their apps are not malicious. We propose a verification model for use in such app stores to guarantee that the apps are free of malicious information flows. In our model, the software vendor and the app store auditor collaborate — each does tasks that are easy for her/him, reducing overall verification cost. The software vendor provides a behavioral specification of information flow (at a finer granularity than used by current app stores) and source code annotated with information-flow type qualifiers. A flow-sensitive, context-sensitive information-flow type system checks the informa-tion flow type qualifiers in the source code and proves that only information flows in the specification can occur at run time. The app store auditor uses the vendor-provided source code to manually verify declassifications. We have implemented the information-flow type system for An-droid apps written in Java, and we evaluated both its effectiveness at detecting information-flow violations and its usability in practice. In an adversarial Red Team evaluation, we analyzed 72 apps (576,000 LOC) for malware. The 57 Trojans among these had been written specifically to defeat a malware analysis such as ours. Nonetheless, our information-flow type system was effective: it detected 96% of malware whose malicious behavior was related to information flow and 82% of all malware. In addition to the adversarial evaluation, we evaluated the practicality of using the collaborative model. The programmer annotation burden is low: 6 annotations per 100 LOC. Every sound analysis requires a human to review potential false alarms, and in our experiments, this took 30 minutes per 1,000 LOC for an auditor unfamiliar with the app.
Conference on Computer and Communications Security (CCS); 11/2014
[Show abstract][Hide abstract] ABSTRACT: Mutation analysis evaluates a testing technique by measur- ing how well it detects seeded faults (mutants). Mutation analysis is hampered by inherent scalability problems — a test suite is executed for each of a large number of mutants. Despite numerous optimizations presented in the literature, this scalability issue remains, and this is one of the reasons why mutation analysis is hardly used in practice. Whereas most previous optimizations attempted to stati- cally reduce the number of executions or their computational overhead, this paper exploits information available only at run time to further reduce the number of executions. First, state infection conditions can reveal — with a single test execution of the unmutated program — which mutants would lead to a different state, thus avoiding unnecessary test executions. Second, determining whether an infected execution state propagates can further reduce the number of executions. Mutants that are embedded in compound expressions may infect the state locally without affecting the outcome of the compound expression. Third, those mutants that do infect the state can be partitioned based on the resulting infected state — if two mutants lead to the same infected state, only one needs to be executed as the result of the other can be inferred. We have implemented these optimizations in the Major mu- tation framework and empirically evaluated them on 14 open source programs. The optimizations reduced the mutation analysis time by 40% on average.
International Symposium on Software Testing and Analysis (ISSTA); 07/2014
[Show abstract][Hide abstract] ABSTRACT: Most programming languages support format strings, but their use is error-prone. Using the wrong format string syntax, or passing the wrong number or type of arguments, leads to unintelligible text output, program crashes, or security vulnerabilities. This paper presents a type system that guarantees that calls to format string APIs will never fail. In Java, this means that the API will not throw exceptions. In C, this means that the API will not return negative values, corrupt memory, etc. We instantiated this type system for Java’s Formatter API, and evaluated it on 6 large and well-maintained open-source projects. Format string bugs are common in practice (our type system found 104 bugs), and the annotation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).
[Show abstract][Hide abstract] ABSTRACT: In a test suite, all the test cases should be independent: no test should affect any other test’s result, and running the tests in any order should produce the same test results. Techniques such as test prioritization generally assume that the tests in a suite are independent. Test dependence is a little-studied phenomenon. This paper presents five results related to test dependence. First, we characterize the test dependence that arises in practice. We studied 96 real-world dependent tests from 5 issue tracking systems. Our study shows that test dependence can be hard for programmers to identify. It also shows that test dependence can cause non-trivial consequences, such as masking program faults and leading to spurious bug reports. Second, we formally define test dependence in terms of test suites as ordered sequences of tests along with explicit environments in which these tests are executed. We formulate the problem of detecting dependent tests and prove that a useful special case is NP-complete. Third, guided by the study of real-world dependent tests, we propose and compare four algorithms to detect dependent tests in a test suite. Fourth, we applied our dependent test detection algorithms to 4 real-world programs and found dependent tests in each human-written and automatically-generated test suite. Fifth, we empirically assessed the impact of dependent tests on five test prioritization techniques. Dependent tests affect the output of all five techniques; that is, the reordered suite fails even though the original suite did not.
[Show abstract][Hide abstract] ABSTRACT: Java supports format strings, but their use is error prone because: Java’s type system does not find any but the most trivial mistakes, Java’s format methods fail silently, and for- mat methods are often executed infrequently. This paper presents the Format String Checker that is based on the format string type system presented in . The Format String Checker guarantees that calls to Java’s Formatter API will not throw exceptions. We evaluate the Format String Checker on 6 large and well-maintained open-source projects. Format string bugs are common in practice (we found 104 bugs), and the an- notation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).
[Show abstract][Hide abstract] ABSTRACT: Compared to desktop and web applications, mobile applications have so far been developed in an extremely siloed environment. The apps running on our phone are developed by a single entity with operating system protections between sharing of data or code between programs. However, application extensibility is often desired. In the web, a secondary ecosystem flourishes around browser extensions, enabling users to customize the web as they wish. This paper presents BladeDroid, a system enabling user customization of mobile applications, using a novel combination of bytecode rewriting and dynamic class loading. We describe four extensions that we have built to evaluate BladeDroid's usability, robustness, and performance.
[Show abstract][Hide abstract] ABSTRACT: Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation. To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs.
[Show abstract][Hide abstract] ABSTRACT: Contracts are a popular tool for specifying the functional behavior of software. This paper characterizes the contracts that developers write, the contracts that developers could write, and how a developer reacts when shown the difference. This paper makes three research contributions based on an investigation of open-source projects' use of Code Contracts. First, we characterize Code Contract usage in practice. For example, approximately three-fourths of the Code Contracts are basic checks for the presence of data. We discuss similarities and differences in usage across the projects, and we identify annotation burden, tool support, and training as possible explanations based on developer interviews. Second, based on contracts automatically inferred for four of the projects, we find that developers underutilize contracts for expressing state updates, object state indicators, and conditional properties. Third, we performed user studies to learn how developers decide which contracts to enforce. The developers used contract suggestions to support their existing use cases with more expressive contracts. However, the suggestions did not lead them to experiment with other use cases for which contracts are better-suited. In support of the research contributions, the paper presents two engineering contributions: (1) Celeriac, a tool for generating traces of .NET programs compatible with the Daikon invariant detection tool, and (2) Contract Inserter, a Visual Studio add-in for discovering and inserting likely invariants as Code Contracts.
[Show abstract][Hide abstract] ABSTRACT: In a distributed system, the hosts execute concurrently, generating asynchronous logs that are challenging to comprehend. We present two tools: ShiVector to transparently add vector timestamps to distributed system logs, and ShiViz to help developers understand distributed system logs by visualizing them as space-time diagrams. ShiVector is the first tool to offer automated vector timestamp instrumentation without modifying source code. The vector-timestamped logs capture partial ordering information, useful for analysis and comprehension. ShiViz space-time diagrams are simple to understand and interactive — the user can explore the log through the visualization to understand complex system behavior. We applied ShiVector and ShiViz to two systems and found that they aid developers in understanding and debugging.
[Show abstract][Hide abstract] ABSTRACT: Modern software often exposes configuration options that enable users to customize its behavior. During software evolution, developers may change how the configuration options behave. When upgrading to a new software version, users may need to re-configure the software by changing the values of certain configuration options. This paper addresses the following question during the evolution of a configurable software system: which configuration options should a user change to maintain the software's desired behavior? This paper presents a technique (and its tool implementation, called ConfSuggester) to troubleshoot configuration errors caused by software evolution. ConfSuggester uses dynamic profiling, execution trace comparison, and static analysis to link the undesired behavior to its root cause - a configuration option whose value can be changed to produce desired behavior from the new software version. We evaluated ConfSuggester on 8 configuration errors from 6 configurable software systems written in Java. For 6 errors, the rootcause configuration option was ConfSuggester's first suggestion. For 1 error, the root cause was ConfSuggester's third suggestion. The root cause of the remaining error was ConfSuggester's sixth suggestion. Overall, ConfSuggester produced significantly better results than two existing techniques. ConfSuggester runs in just a few minutes, making it an attractive alternative to manual debugging.
[Show abstract][Hide abstract] ABSTRACT: Too many students in introductory programming classes fail to understand the significance and utility of the concepts being taught. Their low motivation impacts their learning. One contributing factor is pedagogy that emphasizes computing for its own sake and assignments that are abstract, such as computing the factorial function. Many educators have improved on such traditional approaches by teaching concepts in contexts that students find more relevant, such as games, robots, and media. Now, it is time to take the next step. In this special session, participants will develop and discuss ways to teach introductory programming by means of real-world data analysis problems from science, engineering, business, and the humanities. Students can be motivated to learn programming in order to analyze DNA, predict the outcome of elections, detect fraudulent data, suggest friends in a social network, determine the authorship of texts, and more (see Section 3.4 for more examples). The approach is more than just a collection of "nifty assignments": rather, it affects the choice of topics and pedagogy, all of which together lead to greater student satisfaction. The approach has been successfully used at 4 colleges and universities. The classes were effective for both CS and non-CS majors. Neither the computing material nor the problems need to be "dumbed down". At the end of the term students were amazed and delighted at the real data analysis that they could perform. They were excited about applying computation in their work and about learning more. The special session contains a mix of activities, including comparative analysis of introductory classes; group discussion of curriculum design; a mini-panel discussing how the approach has worked in practice; and brainstorming about example assignments and curriculum revision.
Proceedings of the 45th ACM technical symposium on Computer science education; 03/2014
[Show abstract][Hide abstract] ABSTRACT: During debugging, a developer must repeatedly and manually reproduce faulty behavior in order to inspect different facets of the program's execution. Existing tools for reproducing such behaviors prevent the use of debugging aids such as breakpoints and logging, and are not designed for interactive, random-access exploration of recorded behavior. This paper presents Timelapse, a tool for quickly recording, reproducing, and debugging interactive behaviors in web applications. Developers can use Timelapse to browse, visualize, and seek within recorded program executions while simultaneously using familiar debugging tools such as breakpoints and logging. Testers and end-users can use Timelapse to demonstrate failures in situ and share recorded behaviors with developers, improving bug report quality by obviating the need for detailed reproduction steps. Timelapse is built on Dolos, a novel record/replay infrastructure that ensures deterministic execution by capturing and reusing program inputs both from the user and from external sources such as the network. Dolos introduces negligible overhead and does not interfere with breakpoints and logging. In a small user evaluation, participants used Timelapse to accelerate existing reproduction activities, but were not significantly faster or more successful in completing the larger tasks at hand. Together, the Dolos infrastructure and Timelapse developer tool support systematic bug reporting and debugging practices.
Proceedings of the 26th annual ACM symposium on User interface software and technology; 10/2013
[Show abstract][Hide abstract] ABSTRACT: Developers use analysis tools to help write, debug, and understand software systems under development. A developer's change to the system source code may affect analysis results. Typically, to learn those effects, the developer must explicitly initiate the analysis. This may interrupt the developer's workflow and/or the delay until the developer learns the implications of the change. The situation is even worse for impure analyses — ones that modify the code on which it runs — because such analyses block the developer from working on the code. This paper presents Codebase Replication, a novel approach to easily convert an offline analysis — even an impure one — into a continuous analysis that informs the developer of the implications of recent changes as quickly as possible after the change is made. Codebase Replication copies the developer's codebase, incrementally keeps this copy codebase in sync with the developer's codebase, makes that copy codebase available for offline analyses to run without disturbing the developer and without the developer's changes disturbing the analyses, and makes analysis results available to be presented to the developer. We have implemented Codebase Replication in Solstice, an open-source, publicly-available Eclipse plug-in. We have used Solstice to convert three offline analyses — FindBugs, PMD, and unit testing — into continuous ones. Each conversion required on average 436 NCSL and took, on average, 18 hours. Solstice-based analyses experience no more than 2.5 milliseconds of runtime overhead per developer action.
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering; 08/2013
[Show abstract][Hide abstract] ABSTRACT: A workflow is a sequence of UI actions to complete a specific task. In the course of a GUI application's evolution, changes ranging from a simple GUI refactoring to a complete rearchitecture can break an end-user's well-established workflow. It can be challenging to find a replacement workflow. To address this problem, we present a technique (and its tool implementation, called FlowFixer) that repairs a broken workflow. FlowFixer uses dynamic profiling, static analysis, and random testing to suggest a replacement UI action that fixes a broken workflow. We evaluated FlowFixer on 16 broken workflows from 5 realworld GUI applications written in Java. In 13 workflows, the correct replacement action was FlowFixer's first suggestion. In 2 workflows, the correct replacement action was FlowFixer's second suggestion. The remaining workflow was un-repairable. Overall, FlowFixer produced significantly better results than two alternative approaches.
Proceedings of the 2013 International Symposium on Software Testing and Analysis; 07/2013