William Pugh

William Pugh
University of Maryland, College Park | UMD, UMCP, University of Maryland College Park · Department of Computer Science

Ph.D, Cornell University

About

92
Publications
18,068
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,029
Citations

Publications

Publications (92)
Conference Paper
Full-text available
In May 2009, Google conducted a company wide FindBugs "fixit". Hundreds of engineers reviewed thousands of FindBugs warnings, and fixed or filed reports against many of them. In this paper, we discuss the lessons learned from this exercise, and analyze the resulting dataset, which contains data about how warnings in each bug pattern were classified...
Conference Paper
Full-text available
Many analysis techniques have been proposed to determine when a potentially null value may be dereferenced. But we have observed in practice that not every potential null dereference is a "bug" that developers want to fix. In this paper we discuss some of the challenges of using a null dereference analysis in practice, and reasons why developers ma...
Conference Paper
Full-text available
Recent research has tried to identify changes in source code repositories that fix bugs by linking these changes to reports in issue tracking systems. These changes have been traced back to the point in time when they were previously modified as a way of identifying bug introducing changes. But we observe that not all changes linked to bug tracking...
Article
Full-text available
Static analysis tools find silly mistakes, confusing code, bad practices and property violations. But software developers and organizations may or may not care about all these warn-ings, depending on how they impact code behavior and other factors. In the past, we have tried to identify important warnings by asking users to rate them as severe, low...
Article
Full-text available
Static analysis examines code in the absence of input data and without running the code. It can detect potential security violations (SQL injection), runtime errors (dereferencing a null pointer) and logical inconsistencies (a conditional test that can't possibly be true). Although a rich body of literature exists on algorithms and analytical frame...
Conference Paper
Abstract C++ object layout schemes rely on (sometimes numerous) compiler generated fields We describe a new language - independent ob - ject layout scheme, which is space optimal, i e , objects are contiguous, and contain no compiler generated fields other than a single type identi - fier As in C++ and other multiple inheritance languages such as C...
Conference Paper
Full-text available
As static analysis tools mature and attract more users, vendors and researchers have an increased interest in understanding how users interact with them, and how they impact the software development process. The FindBugs project has conducted a number of studies including online surveys, interviews and a preliminary controlled user study to better...
Conference Paper
Full-text available
There are many diculties associated with developing cor- rect multithreaded software, and many of the activities that are simple for single threaded software are exceptionally hard for multithreaded software. One such example is con- structing unit tests involving multiple threads. Given, for example, a blocking queue implementation, writing a test...
Conference Paper
Full-text available
In the summer of 2006, the FindBugs project was challenged to improve the null pointer analysis in FindBugs so that we could find more null pointer bugs. In particular, we were challenged to try to do as well as a publicly available analysis by Reasoning, Inc on version 4.1.24 of Apache Tomcat. Reasoning's report is a result of running their own st...
Conference Paper
Full-text available
Static analysis tools for software defect detection are becoming widely used in practice. However, there is little public information regarding the experimental evaluation of the accuracy and value of the warnings these tools report. In this paper, we discuss the warn- ings found by FindBugs, a static analysis tool that finds defects in Java progra...
Conference Paper
At the University of Maryland, we have been working to improve the reliability and security of software by developing new, effective static analysis tools. These tools scan software for bug patterns or show that the software is free from a particular class of defects. There are two themes common to our different projects: 1. Our ultimate focus is o...
Conference Paper
Full-text available
Java Specification Request 305 defines a set of annotations that can understood by multiple static analysis tools. Rather than push the bleeding edge of static analysis, this JSR represents an attempt to satisfy different static analysis tool vendors and address the engineering issues required to make these annotations widely useful.
Conference Paper
Full-text available
This poster will present our experiences using FindBugs in production software development environments, including both open source efforts and Google's internal code base. We summarize the defects found, describe the issue of real but trivial defects, and discuss the integration of FindBugs into Google's Mondrian code review system.
Conference Paper
Many people have proposed adding transactions, or atomic blocks, to type-safe high-level programming languages. However, re- searchers have not considered the semantics of transactions with respect to a memory model weaker than sequential consistency. The details of such semantics are more subtle than many people realize, and the interaction betwee...
Conference Paper
Full-text available
Two important questions regarding automated submission and testing systems are: What kind of feedback should we give students as they work on their programming assign- ments, and how can we study in more detail the program- ming assignment development process of novices? To address the issue of feedback, Marmoset provides stu- dents with limited ac...
Conference Paper
FindBugs looks for bugs in Java programs. It is based on the concept of bug patterns. A bug pattern is a code idiom that is often an error. Bug patterns arise for a variety of reasons, such as difficult language features, misunderstood API semantics, misunderstood invariants when code is modified during maintenance, garden variety mistakes: typos,...
Conference Paper
Full-text available
Testing is an important part of the software development cycle that should be covered throughout the computer science curriculum. However, for students to truly learn the value of testing, they need to benefit from writing test cases for their own software.We report on our initial experiences teaching students to write test cases and evaluating stu...
Conference Paper
Full-text available
Marmoset is a framework for storing and testing student submissions to programming assignments. It gives students a limited ability to release test their code using an instructor's private suite of test cases. This encourages them to start early and implement their own test cases. It also provides facilities for instructors to manage the grading pr...
Conference Paper
Full-text available
Various static analysis tools will analyze a software artifact in order to identify potential defects, such as misused APIs, race conditions and deadlocks, and security vulnerabilities. For a number of reasons, it is important to be able to track the occurrence of each potential defect over multiple versions of a software artifact under study: in o...
Conference Paper
This paper describes the new Java memory model, which has been revised as part of Java 5.0. The model specifies the legal behaviors for a multithreaded program; it defines the semantics of multithreaded Java programs and partially determines legal implementations of Java virtual machines and compilers.The new Java model provides a simple interface...
Article
Full-text available
Most computer science educators hold strong opinions about the "right" approach to teaching introductory level programming. Unfortunately, we have comparatively little hard evidence about the effectiveness of these various approaches because we generally lack the infrastructure to obtain sufficiently detailed data about novices' programming habits....
Conference Paper
We all understand that software defects are a problem, and we want to make it easy to build more reliable software. But finding bugs is embarrassingly easy, as is devising a new program analysis technique that is able to find some bugs. Actually improving software quality through bug detection or bug localization tools is hard, and requires careful...
Conference Paper
Full-text available
Using static analysis to detect memory access errors, such as null pointer dereferences, is not a new problem. However, much of the previous work has used rather sophisticated analysis techniques in order to detect such errors.In this paper we show that simple analysis techniques can be used to identify many such software defects, both in productio...
Article
Full-text available
Many techniques have been developed over the years to automatically find bugs in software. Often, these techniques rely on formal methods and sophisticated program analysis. While these techniques are valuable, they can be difficult to apply, and they aren't always effective in finding real bugs. Bug patterns are code idioms that are often errors....
Article
Full-text available
Because threads are a core feature of the Java language, the widespread adoption of Java has exposed a much wider audience to concurrency than previous languages have. Concurrent programs are notoriously di#cult to write correctly, and many subtle bugs can result from incorrect use of threads and synchronization. Therefore, finding techniques to fi...
Conference Paper
Full-text available
Much research has been done on techniques to teach students how to program. However, it is usually difficult to quantify exactly how students work. Instructors typically only see students' work when they submit their projects or come to office hours. Another common problem in introductory programming courses is that student code is only subjected t...
Conference Paper
Full-text available
We have replicated the experiments of Cecchet et al detailed in "Performance and Scalability of EJB Applications" at OOPSLA '02. We report on our experiences configuring, deploying and tuning Enterprise software, and provide evidence that many of the conclusions of the original work are misleading or cannot be generalized.
Article
One of the goals of the designers of the Java programming language was that multithreaded programs written in Java would have consistent and well-defined behavior. This would allow Java programmers to understand how their programs might behave; it would also allow Java platform architects to develop their platforms in a flexible and ecient way, whi...
Article
We explore advances in Java Virtual Machine (JVM) technology along with new high performance I/O libraries in Java 1.4, and find that Java is increasingly an attractive platform for scientific clusterbased message passing codes.
Conference Paper
Full-text available
We explore advances in Java Virtual Machine (JVM) technology along with new high performance I/O libraries in Java 1.4, and find that Java is increasingly an attractive platform for scientific cluster-based message passing codes. We report that these new technologies allow a pure Java implementation of a cluster communication library that performs...
Article
Full-text available
In this paper, we describe bundling, a technique for the transfer of files over a network. The goal of bundling is to group together files that tend to be needed in the same program execution and that are loaded close together. We describe an algorithm for dividing a collection of files into bundles based on profiles of file-loading behavior.
Article
Java has integrated multithreading to a far greater extent than most programming languages. It is also one of the only languages that specifies and requires safety guarantees for improperly synchronized programs. It turns out that understanding these issues is far more subtle and difficult than was previously thought. The existing specification mak...
Article
The class of parallel finite automata (PFA) is described that naturally expresses the interleaving parallelism inherentinPetri net notation without admitting the possibility of an infinite state space.
Conference Paper
Full-text available
Atomic instructions atomically access and update one or more memory locations. Because they do not incur the overhead of lock acquisition or suspend the executing thread during contention, they may allow higher levels of concurrency on multiprocessors than lock-based synchronization. Wait-free data structures are an important application of atomic...
Article
Full-text available
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many “unanalyzable” terms: subscript arrays, run-time tests, function calls. The standard approach to a...
Article
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework...
Article
Full-text available
To compile programs for message passing architectures and to obtain good performance on NUMA architectures it is necessary to control how computations and data are mapped to processors. Languages such as High-Performance Fortran use data distributions supplied by the programmer and the owner computes rule to specify this. However, the best data and...
Conference Paper
Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a [5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-ba...
Article
Program slicing is an analysis that answers questions such as "Which statements might affect the computation of variable v at statement s?" or "Which statements depend on the value of v computed in statement s?". The answers computed by program slicing are generally a set of statements. We introduce the idea of iteration spacing slicing: we refine...
Article
Full-text available
Integer tuple relations can concisely summarize many types of information gathered from analysis of scientific codes. For example they can be used to precisely describe which iterations of a statement are data dependent of which other iterations. It is generally not possible to represent these tuple relations by enumerating the related pairs of tup...
Article
Full-text available
In previous work, we presented a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, loop skewing and statement reordering.
Article
Existing compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as is the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not...
Article
Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programmi...
Article
Array data dependence analysis provides important information for optimization of scientific programs. Array dependence testing can be viewed as constraint analysis, although traditionally general-purpose constraint manipulation algorithms have been thought to be too slow for dependence analysis. We have explored the use of exact constraint analysi...
Article
Full-text available
this paper are important. A great deal of research has been done on deriving good layouts of data and/or computation [5, 1, 4]. In other systems such as Fortran-D and HPF [7, 6], the data decompositions are specified by the user and computation is decomposed according to an owner computes rule. In this paper, we assume that we have been given a dat...
Article
Full-text available
Scientific codes which use iterative methods are often difficult to parallelize well. Such codes usually contain while loops which iterate until they converge upon the solution. Problems arise since the number of iterations cannot be determined at compile time, and tests for termination usually require a global reduction and an associated barrier....
Article
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite transition system, however, many basic properties are undecidable. In this paper we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Prsburgcrformulas to symbolically e...
Conference Paper
Java has integrated multithreading to a far greater extent than most programming languages. It is also one of the only languages that specifies and requires safety guarantees for improperly synchronized programs. It turns out that understanding these issues is far more subtle and difficult than was previously thought. The existing specification mak...
Book
This volume contains the papers presented at the 13th International Workshop on Languages and Compilers for Parallel Computing. It also contains extended abstracts of submissions that were accepted as posters. The workshop was held at the IBM T. J. Watson Research Center in Yorktown Heights, New York. As in previous years, the workshop focused on i...
Article
The class of parallel finite automata (PFA) is described that naturally expresses the interleaving parallelism inherent in Petri net notation without admitting the possibility of an infinite state space. The equivalence of this class to deterministic finite automata (DFA) is demonstrated using an algorithm for generating an equivalent nondeterminis...
Conference Paper
Improving data locality in programs which manipulate arrays has been the subject of a great deal of research. Much of the work on improving data locality examines individual loop nests; other work includes transformations such as loop fusion, which combines loops so that multiple loop nests can be transformed as a single loop nest. We propose a dat...
Article
Full-text available
Array data-dependence analysis is an important part of any optimizing compiler for scientific programs. The Omega test is an exact test for integer solutions to affine constraints and can be used for array data dependence. There are other tests that are less exact but are intended to be faster. Many of these less exact tests are rather complicated...
Conference Paper
. We present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by u...
Article
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many "unanalyzable" terms: subscript arrays, run-time tests, function calls. The standard approach to a...
Article
Full-text available
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework...
Article
Full-text available
Existing compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as in the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not...
Article
Why do existing parallelizing compilers and environments fail to parallelize many realistic FORTRAN programs? One of the reasons is that these programs contain a number of linearized array references, such as A(M*N*i+N*j+k) or A(i*(i+1)/2+j). Performing exact dependence analysis for these references requires testing polynomial constraints for integ...
Article
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite-state system, however, many basic properties are undecidable. In this paper, we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically enc...
Article
Full-text available
Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it hard to analyze or predict the effects of compositions...
Conference Paper
Developing computational codes that compute with sparse matrices is a difficult and error-prone process. Automatic generation of sparse code from the corresponding dense version would simplify the programmer’s task, provided that a compiler-generated code is fast enough to be used instead of a hand-written code. We propose a new Sparse Intermediat...
Article
Full-text available
Traditional array dependence analysis, which detects potential memory aliasing of array references, is a key analysis technique for automatic parallelization. Recent studies of benchmark codes indicate that limitations of analysis cause many compilers to overlook large amounts of potential parallelism, and that exploiting this parallelism requires...
Article
Full-text available
Alignment and distribution of array data should be managed by optimizing compilers for parallel computers, but current approaches to the distribution problem formulate it as an NP-complete graph optimization problem. The graphs arising in applications are large and difficult to optimize. In this paper, we improve some earlier results on methods tha...
Article
Full-text available
Integer tuple relations can concisely summarize many types of information gathered from analysis of scientific codes. For example, they can be used to precisely describe which iterations of a statement are data dependent of which other iterations. It is generally not possible to represent these tuple relations by enumerating the related pairs of tu...
Article
Full-text available
Alignment and distribution of data by an optimizing compiler is a dream of both manufacturers and users of parallel computers. The distribution problem has been formulated as an NP-complete graph optimization problem. The graphs arising in applications are large, and the optimization problem does not lend itself to traditional heuristic optimizatio...
Article
A classic problem in computer science is selection: given a list of n numbers, find the ith largest, using the fewest number of comparisons. We are interested in the exact number of comparisons required for specific small values of i and n. We have written a program that can be used either to find the exact number of comparisons for very low values...
Article
We present a unified framework for applying iteration reordering transformations. This framework is able to represent traditional transformations such as loop interchange, loop skewing and loop distribution as well as compositions of these transformations. Using a unified framework rather than a sequence of ad-hoc transformations makes it easier to...
Conference Paper
Full-text available
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is des...
Conference Paper
Full-text available
There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the p...
Article
In previous work, we presented a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, loop skewing and statement reordering. The framework provides a uniform way to represent and reason about transformations. However, it does not provide a way to decide which transformation(s) should be applied to...
Conference Paper
Array data dependence analysis provides important information for optimization of scientific programs. Array dependence testing can be viewed as constraint analysis, although traditionally general-purpose constraint manipulation algorithms have been thought to be too slow for dependence analysis. We have explored the use of exact constraint analysi...
Article
The use of high-level programming constructs (such as recursion, loops, and dynamic data structures) makes it difficult to estimate at compile-time the execution time and resource requirements of a program. We contend thatpartial evaluation provides a solution to this problem. Given a real-time program employing high level language constructs, part...
Conference Paper
Full-text available
Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programmi...
Conference Paper
This paper explores the use monads to structure functional programs. No prior knowledge of monads or category theory is required. Monads increase the ease with which programs may be modified. They can mimic the effect of impure features such as ...
Conference Paper
Much recent work in polymorphic programming languages allows subtyping and multiple inheritance for records. In such systems, we would like to extract a field from a record with the same efficiency as if we were not making use of subtyping and multiple inheritance. Methods currently used make field extraction 3-5 times slower, which can produce a s...
Conference Paper
We analyze the implementation of set operations using binary tries. Our techniques are substantially simpler than previous techniques used for this problem, and allow us to analysis not only the expected performance but also the probability distribution of the performance. We show that by making use of constant-time equality tests, we can achieve b...
Article
Augmenting a uniformly balanced data structure (such as a balanced tree) with a cache may be a more effective way to deal with nonuniform query distribution than using an optimally balanced data structure (such as self-adjusting trees). We show how to calculate the cache size required to make the cached uniformly balanced data structure faster for...
Conference Paper
Skip lists are data structures that use probabilistic balancing rather than strictly enforced balancing. The structure of a skip list is determined only by the number of elements in the skip list and the results of consulting the random number generator. Skip lists can be used to perform the same kinds of operations that a balanced tree can perform...
Article
The authors present a new paradigm for incremental evaluation based on function caching that works well in some situations for which incremental attribute grammer evaluation and incremental dependency graph evaluation techniques are unusable. The paradigm also works reasonably well in many of those situations best suited for incremental attribute g...
Article
Function caching is the technique of remembering previous function calls and avoiding the cost of recomputing them. Function caching provides a simple way of implementing dynamic programming algorithms and can provide a facility for incremental computation. Previous discussions of function caching have generally relied on the user to purge items fr...
Article
Java's original memory model, which describes how threads interact through memory, was fundamentally flawed. Some language features, like the volatile field modifier, were under-specified: their treatment was so weak as to render them useless. Other features, including reads and writes of ordinary object fields, were unintentionally over-specified:...
Article
An action a is described by a tuple k, v, u comprising: t -the thread performing the action k -the kind of action: volatile read, volatile write, (normal or non-volatile) read, (normal or non-volatile) write, lock or unlock. Volatile reads, volatile writes, locks and unlocks are synchronization actions. A -a set of actions po → -program order, whic...

Network

Cited By

Projects

Project (1)
Archived project