Rastislav Bodik

Rastislav Bodik
University of Washington Seattle | UW · Department of Computer Science and Engineering

About

142
Publications
8,994
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,347
Citations
Citations since 2016
49 Research Items
4038 Citations
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600

Publications

Publications (142)
Preprint
Full-text available
Analytical SQL is widely used in modern database applications and data analysis. However, its partitioning and grouping operators are challenging for novice users. Unfortunately, programming by example, shown effective on standard SQL, are less attractive because examples for analytical queries are more laborious to solve by hand. To make demonstra...
Preprint
Full-text available
Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by mapping data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spen...
Article
Full-text available
Automated verification for program safety is reduced to the discovery safe inductive invariants, i.e., formulas that over-approximate the sets of reachable program states, but precise enough to prove unreachability of the error state. We present a framework, called FreqHorn, that follows the Syntax-Guided Synthesis paradigm to iteratively sample ca...
Article
Halide is a domain-specific language for high-performance image processing and tensor computations, widely adopted in industry. Internally, the Halide compiler relies on a term rewriting system to prove properties of code required for efficient and correct compilation. This rewrite system is a collection of handwritten transformation rules that inc...
Preprint
The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way t...
Preprint
Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. H...
Article
Full-text available
While visualizations play a crucial role in gaining insights from data, generating useful visualizations from a complex dataset is far from an easy task. In particular, besides understanding the functionality provided by existing visualization libraries, generating the desired visualization also requires reshaping and aggregating the underlying dat...
Preprint
Full-text available
While visualizations play a crucial role in gaining insights from data, generating useful visualizations from a complex dataset is far from an easy task. Besides understanding the functionality provided by existing visualization libraries, generating the desired visualization also requires reshaping and aggregating the underlying data as well as co...
Preprint
Programs that respond to asynchronous events are challenging to write; they are difficult to reason about and tricky to test and debug. Because these programs can have a huge space of possible input timings and interleaving, the programmer may easily miss corner cases. We propose applying synthesis to aid programmers in creating programs more easil...
Conference Paper
Full-text available
Utilizing memory and register bandwidth in modern architectures may require swizzles --- non-trivial mappings of data and computations onto hardware resources --- such as shuffles. We develop Swizzle Inventor to help programmers implement swizzle programs, by writing program sketches that omit swizzles and delegating their creation to an automatic...
Preprint
Smart contracts are programs running on top of blockchain platforms. They interact with each other through well-defined interfaces to perform financial transactions in a distributed system with no trusted third parties. But these interfaces also provide a favorable setting for attackers, who can exploit security vulnerabilities in smart contracts t...
Article
Domain-specific reconfigurable accelerators (DSRAs) achieve high performance and energy efficiency by using specialized processing elements (PEs) instead of general-purpose alternatives. However, the process of designing, selecting, and refining the reconfigurable PEs that compose the accelerator fabric has remained a manual and difficult task. Thi...
Article
Full-text available
The ability to reason about relational queries plays an important role across many types of database applications, such as test data generation, query equivalence checking, and computer-assisted query authoring. Unfortunately, symbolic reasoning about relational queries can be challenging because relational tables are multisets (bags) of tuples, an...
Conference Paper
Programming by Demonstration (PBD) promises to enable data scientists to collect web data. However, in formative interviews with social scientists, we learned that current PBD tools are insufficient for many real-world web scraping tasks. The missing piece is the capability to collect hierarchically-structured data from across many different webpag...
Article
Full-text available
We present a method for automatically discovering signaling pathways from time-resolved phosphoproteomic data. The Temporal Pathway Synthesizer (TPS) algorithm uses constraint-solving techniques first developed in the context of formal verification to explore paths in an interaction network. It systematically eliminates all candidate structures for...
Chapter
Full-text available
We present a fast algorithm for syntax-guided synthesis of inductive invariants which combines enumerative learning with inductive-subset extraction, leverages counterexamples-to-induction and interpolation-based bounded proofs. It is a variant of a recently proposed probabilistic method, called FreqHorn, which is however less dependent on heuristi...
Preprint
Full-text available
Advances in proteomics reveal that pathway databases fail to capture the majority of cellular signaling activity. Our mass spectrometry study of the dynamic epidermal growth factor (EGF) response demonstrates that over 89% of significantly (de)phosphorylated proteins are excluded from individual EGF signaling maps, and 63% are absent from all annot...
Article
With more and more web scripting languages on offer, programmers have access to increasing language support for web scraping tasks. However, in our experiences collaborating with data scientists, we learned that two issues still plague long-running scraping scripts: i) When a network or website goes down mid-scrape, recovery sometimes requires rest...
Conference Paper
Full-text available
We present a new SMT-based, probabilistic, syntax-guided method to discover numerical inductive invariants. The core idea is to initialize frequency distributions from the program's source code, then repeatedly sample lemmas from those distributions, and terminate when the conjunction of learned lemmas becomes a safe invariant. The sampling process...
Article
When designing a type system, we may want to mechanically check the design to guide its further development. We describe algorithms that perform symbolic reasoning about executable models of type systems. The algorithms support three queries. First, they check type soundness and synthesize a counterexample program if such a soundness bug is found....
Conference Paper
Parallelizing of software improves its effectiveness and productivity. To guarantee correctness, the parallel and serial versions of the same code must be formally verified to be equivalent. We present a novel approach, called GRASSP, that automatically synthesizes parallel single-pass array-processing programs by treating the given serial versions...
Conference Paper
SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs. While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing tec...
Article
Parallelizing of software improves its effectiveness and productivity. To guarantee correctness, the parallel and serial versions of the same code must be formally verified to be equivalent. We present a novel approach, called GRASSP, that automatically synthesizes parallel single-pass array-processing programs by treating the given serial versions...
Article
SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs. While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing tec...
Conference Paper
This demo showcases Scythe, a novel query-by-example system that can synthesize expressive SQL queries from input-output examples. Scythe is designed to help end-users program SQL and explore data simply using input-output examples. From a web-browser, users can obtain SQL queries with Scythe in an automated, interactive fashion: from a provided ex...
Article
Full-text available
We present GraSSP, a novel approach to perform automated parallelization relying on recent advances in formal verification and synthesis. GraSSP augments an existing sequential program with an additional functionality to decompose data dependencies in loop iterations, to compute partial results, and to compose them together. We show that for some c...
Conference Paper
With increasing amounts of data available on the web and a diverse range of users interested in programmatically accessing that data, web automation must become easier. Automation helps users complete many tedious interactions, such as scraping data, completing forms, or transferring data between websites. However, writing web automation scripts ty...
Article
With increasing amounts of data available on the web and a diverse range of users interested in programmatically accessing that data, web automation must become easier. Automation helps users complete many tedious interactions, such as scraping data, completing forms, or transferring data between websites. However, writing web automation scripts ty...
Conference Paper
Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimization...
Article
A probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines---even the compiled ones---incur significant r...
Conference Paper
Energy efficiency is one of the main performance goals when designing processors for embedded systems. Typically, the simpler the processor, the less energy it consumes. Thus, an ultra-low power multicore processor will, likely have very small distributed memory with a simple interconnect. To compile for such an architecture, a partitioning strateg...
Article
Energy efficiency is one of the main performance goals when designing processors for embedded systems. Typically, the simpler the processor, the less energy it consumes. Thus, an ultra-low power multicore processor will, likely have very small distributed memory with a simple interconnect. To compile for such an architecture, a partitioning strateg...
Article
Inference algorithms in probabilistic programming languages (PPLs) can be thought of as interpreters, since an inference algorithm traverses a model given evidence to answer a query. As with interpreters, we can improve the efficiency of inference algorithms by compiling them once the model, evidence and query are known. We present SIMPL, a domain...
Article
Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimization...
Article
Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimization...
Article
Developing a code optimizer is challenging, especially for new, idiosyncratic ISAs. Superoptimization can, in principle, discover machine-specific optimizations automatically by searching the space of all instruction sequences. If we can increase the size of code fragments a superoptimizer can optimize, we will be able to discover more optimization...
Conference Paper
Developing an optimizing compiler backend remains a laborious process, especially for nontraditional ISAs that have been appearing recently. Superoptimization sidesteps the need for many code transformations by searching for the most optimal instruction sequence semantically equivalent to the original code fragment. Even though superoptimization di...
Article
We present an algorithm for synthesizing efficient document layout engines from compact relational specifications. These specifications are compact in that a single specification can produce multiple engines, each for a distinct layout situation, i.e., a different combination of known vs. unknown attributes. Technically, our specifications are rela...
Conference Paper
Syntax-guided synthesis searches for an implementation of a given specification by exploring large spaces of candidate programs. Sketches reduce these search spaces, making synthesis more tractable, by predefining the structure of the desired implementation. Typically, this structure is obtained through human insight---this paper introduces a metho...
Article
Program synthesis is the contemporary answer to automatic programming. It innovates in two ways: First, it replaces batch automation with interactivity, assisting the programmer in refining the understanding of the programming problem. Second, it produces programs using search in a candidate space rather than by derivation from a specification. Sea...
Article
We describe NIMBLE, a system for programming statistical algorithms within R for general model structures. NIMBLE is designed to meet three challenges: flexible model specification, a language for programming algorithms that can use different models, and a balance between high-level programmability and execution efficiency. For model specification,...
Conference Paper
To build a programming by demonstration (PBD) web scraping tool for end users, one needs two central components: a list finder, and a record and replay tool. A list finder extracts logical tables from a webpage. A record and replay (R+R) system records a user's interactions with a webpage, and replays them programmatically. The research community h...
Article
The classical formulation of the program-synthesis problem is to find a program that meets a correctness specification given as a logical formula. Recent work on program synthesis and program optimization illustrates many potential benefits of allowing the user to supplement the logical specification with a syntactic template that constrains the sp...
Article
Full-text available
Over the last decade, executable models of biological behaviors have repeatedly provided new scientific discoveries, uncovered novel insights, and directed new experimental avenues. These models are computer programs whose execution mechanistically simulates aspects of the cell's behaviors. If the observed behavior of the program agrees with the ob...
Conference Paper
Background/Question/Methods Bayesian analysis in ecology has exploded in popularity, largely facilitated by the availability of the BUGS language for declaring models. WinBUGS, OpenBUGS and JAGS each implement BUGS and provide automated MCMC algorithms that can lead to long run times for complicated models. However, MCMC research has led to many i...
Conference Paper
Full-text available
We developed Chlorophyll, a synthesis-aided programming model and compiler for the GreenArrays GA144, an extremely minimalist low-power spatial architecture that requires partitioning the program into fragments of no more than 256 instructions and 64 words of data. This processor is 100-times more energy efficient than its competitors, but currentl...
Article
Solver-aided domain-specific languages (SDSLs) are an emerging class of computer-aided programming systems. They ease the construction of programs by using satisfiability solvers to automate tasks such as verification, debugging, synthesis, and non-deterministic execution. But reducing programming tasks to satisfiability problems involves translati...
Article
We developed Chlorophyll, a synthesis-aided programming model and compiler for the GreenArrays GA144, an extremely minimalist low-power spatial architecture that requires partitioning the program into fragments of no more than 256 instructions and 64 words of data. This processor is 100-times more energy efficient than its competitors, but currentl...
Article
Full-text available
There are many tools that help programmers find code fragments, but most are inexpressive and rely on static information. We present a new technique for synthesizing code that is dynamic (giving accurate results and allowing programmers to reason about concrete executions), easy-to-use (supporting a wide range of correctness specifications), and in...
Conference Paper
SAT and SMT solvers have automated a spectrum of programming tasks, including program synthesis, code checking, bug localization, program repair, and programming with oracles. In principle, we obtain all these benefits by translating the program (once) to a constraint system understood by the solver. In practice, however, compiling a language to lo...
Conference Paper
A good model of a biological cell exposes secrets of the cell's signaling mechanisms, explaining diseases and facilitating drug discovery. Modeling cells is fundamentally a programming problem - it's programming because the model is a concurrent program that simulates the cell, and it's a problem because it is hard to write a program that reproduce...
Conference Paper
The classical formulation of the program-synthesis problem is to find a program that meets a correctness specification given as a logical formula. Recent work on program synthesis and program optimization illustrates many potential benefits of allowing the user to supplement the logical specification with a syntactic template that constrains the sp...
Article
Program synthesis is a process of producing an executable program from a specification. Algorithmic synthesis produces the program automatically, without an intervention from an expert. While classical compilation falls under the definition of algorithmic program synthesis, with the source program being the specification, the synthesis literature i...
Conference Paper
We examine how to synthesize a parallel schedule of structured traversals over trees. In our system, programs are declaratively specified as attribute grammars. Our synthesizer automatically, correctly, and quickly schedules the attribute grammar as a composition of parallel tree traversals. Our downstream compiler optimizes for GPUs and multicore...
Conference Paper
Executable biology presents new challenges to formal methods. This paper addresses two problems that cell biologists face when developing formally analyzable models. First, we show how to automatically synthesize a concurrent in-silico model for cell development given in-vivo experiments of how particular mutations influence the experiment outcome....
Article
Executable biology presents new challenges to formal methods. This paper addresses two problems that cell biologists face when developing formally analyzable models. First, we show how to automatically synthesize a concurrent in-silico model for cell development given in-vivo experiments of how particular mutations influence the experiment outcome....
Conference Paper
Classical synthesis derives programs from a specification. We show an alternative approach where programs are obtained through search in a space of candidate programs. Searching for a program that meets a specification frees us from having to develop a sufficiently complete set of derivation rules, a task that is more challenging than merely descri...
Conference Paper
To solve a problem with a dynamic programming algorithm, one must reformulate the problem such that its solution can be formed from solutions to overlapping subproblems. Because overlapping subproblems may not be apparent in the specification, it is desirable to obtain the algorithm directly from the specification. We describe a semi-automatic synt...
Conference Paper
We show that program synthesis can generate GPU algorithms as well as their optimized implementations. Using the scan kernel as a case study, we describe our evolving synthesis techniques. Relying on our synthesizer, we can parallelize a serial problem by transforming it into a scan operation, synthesize a SIMD scan algorithm, and optimize it to re...
Conference Paper
Software ships with known bugs because it is expensive to pinpoint and fix the bug exposed by a failing test. To reduce the cost of bug identification, we locate expressions that are likely causes of bugs and thus candidates for repair. Our symbolic method approximates an ideal approach to fixing bugs mechanically, which is to search the space of a...
Article
Full-text available
Sparse matrix formats are typically implemented with low-level imperative programs. The optimized nature of these implementations hides the structural organization of the sparse format and complicates its verification. We define a variable-free functional language (LL) in which even advanced formats can be expressed naturally, as a pipeline-style c...
Conference Paper
Angelic nondeterminism can play an important role in program development. It simplifies specifications, for example in deriving programs with a refinement calculus; it is the formal basis of regular expressions; and Floyd relied on it to concisely express backtracking algorithms such as N-queens. We show that angelic nondeterminism is also useful d...
Article
Angelic nondeterminism can play an important role in program development. It simplifies specifications, for example in deriving programs with a refinement calculus; it is the formal basis of regular expressions; and Floyd relied on it to concisely express backtracking algorithms such as N-queens. We show that angelic nondeterminism is also useful d...
Conference Paper
The web browser is a CPU-intensive program. Especially on mobile devices, webpages load too slowly, expending sig- nicant time in processing a document's appearance. Due to power constraints, most hardware-driven speedups will come in the form of parallel architectures. This is also true of mobile devices such as phones and e-books. In this pa- per...
Conference Paper
Full-text available
Statement st transitively depends on statement stseed if the execution of stseed may affect the execution of st. Computing transitive program dependences is a fundamental operation in many automatic software analysis tools. Existing tools find it challenging to compute transitive dependences for programs manipulating large aggregate structure varia...