[Show abstract][Hide abstract] ABSTRACT: Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture. In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.
Full-text · Article · Jun 2012 · ACM SIGPLAN Notices
[Show abstract][Hide abstract] ABSTRACT: Dynamic programming is usually regarded as a design technique, where each application is designed as an individual program. This contrasts with other techniques such as linear programming, where there exists a single generic program that solves all instances. From a software engineering perspective, the lack of a generic solution to dynamic programming is somewhat unsatisfactory. It would be much preferable if dynamic programming could be understood as a software component, where the ideas common to all its applications are explicit in shared code. In this paper, we argue that such a component does indeed exist, at least for a large class of applications in which the decision process is a sequential scan of the input sequence. We also assess the suitability of C++ for expressing this type of generic program, and argue that the simplicity offered by lazy functional programming is preferable. In particular, functional programs can be manipulated as algebraic expressions. The paper does not present any novel results: it is an introduction to recent work on the formalisation of algorithmic paradigms in software engineering.
[Show abstract][Hide abstract] ABSTRACT: When answering many reachability queries on a large graph, the principal challenge is to represent the transitive closure of the graph compactly, while still allowing fast membership tests on that transitive closure. Recent attempts to address this problem are complex data structures and algorithms such as Path-Tree and 3-HOP. We propose a simple alternative based on a novel form of bit-vector compression. Our starting point is the observation that when computing the transitive closure, reachable vertices tend to cluster together. We adapt the well-known scheme of word-aligned hybrid compression (WAH) to work more efficiently by introducing word partitions. We prove that the resulting scheme leads to a more compact data structure than its closest competitor, namely interval lists. In extensive and detailed experiments, this is confirmed in practice. We also demonstrate that the new technique can handle much larger graphs than alternative algorithms.
[Show abstract][Hide abstract] ABSTRACT: Modern IDEs for object-oriented languages like Java provide support for a basic set of simple automated refactorings whose behaviour is easy to describe intuitively. It is, however, surprisingly difficult to specify their behaviour in detail. In particular, the popular precondition-based approach tends to produce somewhat unwieldy descriptions if advanced features of the object language are taken into account. This has resulted in refactoring implementations that are complex, hard to understand, and even harder to maintain, yet these implementations themselves are the only precise "specification" of many refactorings. We have in past work advocated a different approach based on several complementary notions of dependencies that guide the implementation, and on the concept of microrefactorings that structure it. We show in this work that these concepts are powerful enough to provide high-level specifications of many of the refactorings implemented in Eclipse. These specifications are precise enough to serve as the basis of a clean-room reimplementation of these refactorings that is very compact, yet matches Eclipse's for features and outperforms it in terms of correctness.
[Show abstract][Hide abstract] ABSTRACT: Type inference for Datalog can be understood as the problem of mapping programs to a sublanguage for which containment is decidable. To wit, given a program in Datalog, a schema describing the types of extensional relations, and a user-supplied set of facts about the basic types (stating conditions such as disjointness, implication or equivalence), we aim to infer an over-approximation of the semantics of the program, which should be expressible in a suitable sublanguage of Datalog. We argue that Datalog with monadic extensionals is an appropriate choice for that sublanguage of types, and we present an inference algorithm. The inference algorithm is proved sound, and we also show that it infers the tightest possible over-approximation for a large class of Datalog programs. Furthermore, we present a practical containment check for a large subset of our type language. The crux of that containment check is a novel generalisation of Quine's procedure for computing prime implicants. The type system has been implemented in a state-of-the-art industrial database system, and we report on experiments with this implementation.
[Show abstract][Hide abstract] ABSTRACT: Refactoring tools allow the programmer to pretend they are working with a richer language where the behaviour of a program
is automatically preserved during restructuring. In this paper we show that this metaphor of an extended language yields a
very general and useful implementation technique for refactorings: a refactoring is implemented by embedding the source program
into an extended language on which the refactoring operations are easier to perform, and then translating the refactored program
back into the original language. Using the well-known Extract Method refactoring as an example, we show that this approach allows a very fine-grained decomposition of the overall refactoring
into a series of micro-refactorings that can be understood, implemented, and tested independently. We thus can easily write
implementations of complex refactorings that rival and even outperform industrial strength refactoring tools in terms of correctness,
but are much shorter and easier to understand.
[Show abstract][Hide abstract] ABSTRACT: Automated refactoring tools are an essential part of a software developer's toolbox. They are most useful for gradually improving large existing code bases and it is essential that they work reliably, since even a simple refactoring may affect many different parts of a program, and the programmer should not have to inspect every individual change to ensure that the transformation went as expected. Even extensively tested industrial-strength refactoring engines, however, are fraught with many bugs that lead to incorrect, non-behaviour preserving transformations. We argue that software refactoring tools are a prime candidate for mechanical verification, offering significant challenges but also the prospect of tangible benefits for real-world software development.
[Show abstract][Hide abstract] ABSTRACT: Reference attribute grammars are a powerful formalism for concisely specify- ing and implementing static analyses. While they have proven their merit in prac- tical applications, no attempt has so far been made to rigorously verify correctness properties of the resulting systems. We present a general method for formalising reference attribute grammars in the theorem prover Coq. The formalisation is sup- ported by tools for generating standard definitions from an abstract description and custom proof tactics to help automate verification. As a small but typical applica- tion, we show how closure analysis for the untyped lambda calculus can easily be implemented and proved correct with respect to an operational semantics. To evaluate the feasibility of our approach on larger systems, we implement name lookup for a naming core calculus of Java and give a formal correctness proof of the centrepiece of a rename refactoring for this language.
[Show abstract][Hide abstract] ABSTRACT: Inference of static types for local variables in Java bytecode is the first step of any serious tool that manipulates bytecode, be it for decompilation, transformation or analysis. It is important, therefore, to perform that step as accurately and efficiently as possible. Previous work has sought to give solutions with good worst-case complexity. We present a novel algorithm, which is optimised for the common case rather than worst-case performance. It works by first finding a set of minimal typings that are valid for all assignments, and then checking whether these minimal typings satisfy all uses. Unlike previous algorithms, it does not explicitly build a data structure of type constraints, and it is easy to implement efficiently. We prove that the algorithm produces a typing that is both sound (obeying the rules of the language) and as tight as possible. We then go on to present extensive experiments, compar- ing the results of the new algorithm against the previously best known method. The experiments include bytecode that is generated in other ways than compilation of Java source. The new algorithm is always faster, typically by a factor 6, but on some real benchmarks the gain is as high as a factor of 92. Furthermore, whereas that previous method is sometimes suboptimal, our algorithm always returns a tightest possible type. We also discuss in detail how we handle primitive types, which is a difficult issue due to the discrepancy in their treatment between Java bytecode and Java source. For the application to decompilation, however, it is very important to handle this correctly.
[Show abstract][Hide abstract] ABSTRACT: Descriptive names are crucial to understand code. How- ever, good names are notoriously hard to choose and manu- ally changing a globally visible name can be a maintenance nightmare. Hence, tool support for automated renaming is an essential aid for developers and widely supported by popular development environments. This work improves on two limitations in current refac- toring tools: too weak preconditions that lead to unsound- ness where names do not bind to the correct declarations after renaming, and too strong preconditions that prevent re- naming of certain programs. We identify two main reasons for unsoundness: complex name lookup rules make it hard to define sufficient preconditions, and new language features require additional preconditions. We alleviate both problems by presenting a novel extensible technique for creating sym- bolic names that are guaranteed to bind to a desired entity in a particular context by inverting lookup functions. The in- verted lookup functions can then be tailored to create quali- fied names where otherwise a conflict would occur, allowing the refactoring to proceed and improve on the problem with too strong preconditions. We have implemented renaming for Java as an extension to the JastAdd Extensible Java Compiler and integrated it in Eclipse. We show examples for which other refactoring engines have too weak preconditions, as well as examples where our approach succeeds in renaming entities by insert- ing qualifications. To validate the extensibility of the ap- proach we have implemented renaming support for Java 5 and AspectJ like inter-type declarations as modular exten- sions to the initial Java 1.4 refactoring engine. The renaming engine is only a few thousand lines of code including ex- tensions and performance is on par with industrial strength refactoring tools.
[Show abstract][Hide abstract] ABSTRACT: The magic-sets transformation is a useful technique for dramatically improving the performance of complex queries, but it has been observed that this transformation can also drastically reduce the performance of some queries. Successful implementations of magic in previous work require integration with the database optimiser to make appropriate decisions to guide the transformation (the sideways information passing strategy, or SIPS). This paper reports on the addition of the magic-sets transformation to a fully automatic optimising compiler from Datalog to SQL with no support from the database optimiser. We present an algorithm for making a good choice of SIPS using heuristics based on the sizes of relations. To achieve this, we define an abstract interpretation of Datalog programs to estimate the sizes of relations in the program. The effectiveness of our technique is evaluated over a substantial set of over a hundred queries, and in the context of the other optimisations performed by our compiler. It is shown that using the SIPS chosen by our algorithm, query performance is often significantly improved, as expected, but more importantly performance is never significantly degraded on queries that cannot benefit from magic.
[Show abstract][Hide abstract] ABSTRACT: A trace monitor observes an execution trace at runtime; when it recognises a specified sequence of events, the mon- itor runs extra code. In the aspect-oriented programming community, the idea originated as a generalisation of the advice-trigger mechanism: instead of matching on single events (joinpoints), one matches on a sequence of events. The runtime verification community has been investigating similar mechanisms for a number of years, specifying the event patterns in terms of temporal logic, and applying the monitors to hardware and software. In recent years trace monitors have been adapted for use with mainstream object-oriented languages. In this settin g, a crucial feature is to allow the programmer to quantify over groups of related objects when expressing the sequence of events to match. While many language proposals exist for allowing such features, until now no implementation had scalable performance: execution on all but very simple examples was infeasible. This paper rectifies that situation, by identifying two opti - misations for generating feasible trace monitors from declar- ative specifications of the relevant event pattern. We restr ict ourselves to optimisations that do not have a significant im- pact on compile-time: they only analyse the event pattern, and not the monitored code itself. The first optimisation is an important improvement over an earlier proposal in (2) to avoid space leaks. The second optimisation is a form of indexing for partial matches. Such indexing needs to be very carefully designed to avoid intro- ducing new space leaks, and the resulting data structure is highly non-trivial.
[Show abstract][Hide abstract] ABSTRACT: Program slicing is a program-reduction technique for extracting statements that may influence other statements. While there exist efficient algorithms to slice sequential programs precisely, there are only two algorithms for precise slicing of concurrent ...
[Show abstract][Hide abstract] ABSTRACT: Many tasks in source code analysis can be viewed as evaluating queries over a relational representation of the code. Here we present an object-oriented query language, named .QL, and demonstrate its use for general navigation, bug finding and enforcing coding conventions. We then focus on the particular problem of specifying metrics as queries.
[Show abstract][Hide abstract] ABSTRACT: WRT'07 was the first instance of the Workshop on Refactoring Tools. It was held in Berlin, Germany, on July 31st, in conjunction with ECOOP'07. The workshop brought together over 50 participants from both academia and industry. Participants include the lead developers of two widely used refactoring engines (Eclipse and NetBeans), researchers that work on refactoring tools and techniques, and others generally interested in refactoring. WRT'07 accepted 32 submissions, however, it was impossible to present all these submissions in one single day. Instead, in the morning session we started with a few technical presentations, followed by large group discussions around noon, a poster session and small group discussions in the afternoon. WRT'07 ended with a retrospective session and unanimous consensus to organize another session in the future.
[Show abstract][Hide abstract] ABSTRACT: Trace monitor specifications consist of a pattern that is matched against the trace of events of a subject system. We investigate the design choices in defining the semantics of matching patterns against traces.
Some systems use an exact-match semantics (where every relevant event must be matched by the pattern), while others employ a skipping semantics (which allows any event to be skipped during matching). The semantics of exact-match is well established; here we give a semantics to skipping by providing a translation to exact-match. It turns out the translation is not surjective: a pattern language with skipping semantics is strictly less expressive than one with exact-match semantics. That proof suggests the addition of a novel operator to a skipping language that makes it equivalent to exact-match.
Another design decision concerns the atoms in patterns: are these unique runtime events, or can multiple atoms match the same runtime event? Many implementations have chosen predicates for atoms, and then overlap is natural. There are some exceptions, however, and we examine the consequences of that design choice in some depth.
[Show abstract][Hide abstract] ABSTRACT: In aspect-oriented programming, one can intercept events by writ- ing patterns called pointcuts. The pointcut language of the most popular aspect-oriented programming language, AspectJ, allows the expression of highly complex properties of the static program structure. We present the first rigorous semantics of the AspectJ point- cut language, by translating static patterns into safe ( i.e. range- restricted and stratified) Datalog queries. Safe Datalog is a logic language like Prolog, but it does not have data structures; c on- sequently it has a straightforward least fixpoint semantics and all queries terminate. The translation from pointcuts to safe Datalog consists of a set of simple conditional rewrite rules, implemented using the Stratego system. The resulting queries are themselves executable with the CodeQuest system. We present experiments indicating that direct execution of our semantics is not prohibitively expensive.
[Show abstract][Hide abstract] ABSTRACT: Code queries are useful for enforcing coding conventions, navigating a large code base, and for identifying locations to refactor. The program understanding community has long advocated the use of a relational database to facilitate such code queries [3, 9]. While the idea has found some uptake in industry [2, 11], relational queries over code have not yet found widespread use.
[Show abstract][Hide abstract] ABSTRACT: Navigate code, find bugs, compute metrics, check style rules, and enforce coding conventions in Eclipse with SemmleCode. SemmleCode is a new free Eclipse plugin that allows you to phrase these tasks as queries over the codebase - it thus takes the search facilities in Eclipse to a whole new level. A large library of queries for common operations is provided, including metrics and Java EE style rules. Query results can be displayed as a tree view, a table view, in the problem view, as charts or graphs, all with links to the source code.
[Show abstract][Hide abstract] ABSTRACT: These notes are an introduction to .QL, an object-oriented query language for any type of structured data. We illustrate the use of .QL in assessing software quality, namely to find bugs, to compute metrics and to enforce coding conventions. The class mechanism of .QL is discussed in depth, and we demonstrate how it can be used to build libraries of reusable queries.