
Torsten GrustUniversity of Tübingen | EKU Tübingen · Department of Computer Science
Torsten Grust
Professor
About
125
Publications
13,650
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,726
Citations
Introduction
Additional affiliations
September 2008 - present
July 2005 - August 2008
April 2004 - June 2005
Publications
Publications (125)
SQL database systems support user-defined functions (UDFs), but they hardly encourage programming with these functions. Quite the contrary: the systems’ focus on plan-based query evaluation penalizes every function call at runtime, rendering programming with UDFs—especially if these are recursive—largely impractical. We propose to take UDFs for wha...
We report on the conversion of two advanced database courses from their classical in-lecture-hall setup into an all-digital remote format that was delivered via YouTube . While the course contents were not turned on their heads, throughout the semester we adopted a video style that has been popularized by the live coding community. This new focus o...
"PL/SQL functions are slow," is common developer wisdom that derives from the tension between set-oriented SQL evaluation and statement-by-statement PL/SQL interpretation. We pursue the radical approach of compiling PL/SQL away, turning interpreted functions into regular subqueries that can then be efficiently evaluated together with their embracin...
We demonstrate how to use PostgreSQL's planner hook to open a side entrance through which we can pass plan trees for immediate execution. Since this reaches deep into PostgreSQL, we implement plan detail inference and decoration to ensure that externally crafted trees perfectly mimic regular plans. Plan trees may then (1) be generated by external c...
SQL declaratively specifies what the desired output of a query is. This work shows that a non-standard interpretation of the SQL semantics can, instead, disclose where a piece of the output originated in the input and why that piece found its way into the result. We derive such data provenance for very rich SQL dialects---including recursion, windo...
SQL declaratively specifies what (not how) the desired output of a query is. This work shows that a non-standard interpretation of the SQL semantics can, instead, disclose where a piece of the output originated in the input and why that piece found its way into the result. We derive such data provenance for very rich SQL dialects$\text{---}$includi...
Ideas from programming languages play an important role in a range of advanced applications of databases, in database system implementation, distributed programming (MapReduce), streaming computation, and high-performance (GPU/multicore) computation. This creative research area is broadening into a subfield of data-centric computation. Although the...
Recent years have seen a considerable increase in informal educational environments complementing formal educational settings such as schools. In this chapter, we will report results on the efficacy of a web-platform for game-based learning of orthography and numeracy. Besides the behavioral assessment of the platform, we focused specifically on ne...
XQuery has an order-sensitive semantics in the sense that it requires nodes to be sorted in document order without duplicates (or in Distinct Document Order, DDO for short). This paper shows that for a given XQuery expression and a nested-relational DTD, the input expression can be transformed into an expression that can be evaluated without---pote...
For more than twenty years now I am roaming the frontier between the database query and programming language fields. During all that time, one trusty companion has never let me down: the comprehension. Its elegant, concise syntactic form and flexible semantics render it one of the most versatile tools in query and collection processing, analysis, t...
We demonstrate how the compilation of SQL expressions into machine code leads to significant query runtime improvements in PostgreSQL 9. Our primary goal is to connect recent research in query code generation with one of the most widely deployed database engines. The approach calls on LLVM to translate arithmetic and filter expressions into native...
Zusammenfassung. Digitale Medien haben nicht nur die Kinderzimmer erobert, sondern sind mittlerweile fester Bestandteil einer modernen Schulbildung. Der Einsatz von Online-Lernumgebungen und -spielen, in und außerhalb von pädagogischen Kontexten, erlaubt es selbst traditionelle Lerninhalte spielerisch und unabhängig von Ort und Zeit zu vermitteln....
We demonstrate the derivation of fine-grained where- and why-provenance for a rich dialect of SQL that includes recursion, (correlated) subqueries, windows, grouping/aggregation, and the RDBMS's library of built-in functions. The approach relies on ideas that originate in the programming language community---program slicing and abstract interpretat...
We demonstrate a new incarnation of Habitat, an observational debugger for SQL. In observational debugging, users highlight parts of a presumably faulty query to observe the evaluation of SQL subexpressions and learn about the query's actual runtime behavior. The present version of Habitat has been redesigned from scratch and employs a query instru...
We demonstrate the insides and outs of a query compiler based on the flattening transformation, a translation technique designed by the programming language community to derive efficient data-parallel implementations from iterative programs. Flattening admits the straightforward formulation of intricate query logic including deeply nested loops ove...
Numeracy and literacy are key competencies in modern knowledge societies with insufficient skills resulting in severe disadvantages on the individual and the societal level. Therefore, we are in urgent need for effective prevention and remediation programs corroborating numeracy and literacy. In the present article we describe the development and f...
We demonstrate the derivation of fine-grained where-and why-provenance for a rich dialect of SQL that includes recursion, (correlated) subqueries, windows, grouping/aggregation, and the RDBMS's library of built-in functions. The approach relies on ideas that originate in the programming language community-program slicing and abstract interpretation...
We demonstrate a full-fledged implementation of first-class functions for the widely used PL/SQL database programming language. Functions are treated as regular data items that may be (1) constructed at query runtime, (2) stored in and retrieved from tables, (3) assigned to variables, and (4) passed to and from other (higher-order) functions. The r...
Abstract In this technical report we propose algorithms for implementing the axes for element nodes in XPath given a DOM-like representation of the document. First, we construct algorithms for evaluating simple step expressions, withoout any (positional) predicates. The time complexity of these algorithms is at most O(l + m) where l is the size of...
We describe Query Defunctionalization which enables off-the-shelf first-order
database engines to process queries over first-class functions. Support for
first-class functions is characterized by the ability to treat functions like
regular data items that can be constructed at query runtime, passed to or
returned from other (higher-order) functions...
We describe Habitat, a declarative observational debugger for SQL. Habitat facilitates true language-level (not: plan-level) debugging of, probably flawed, SQL queries that yield unexpected results. Users mark SQL subexpressions of arbitrary size and then observe whether these evaluate as expected. Habitat understands query nesting and free row var...
The seamless integration of relational databases and programming languages remains a major challenge. Mapping rich data types featured in general-purpose programming languages to the relational data model is one aspect of this challenge. We present a novel technique for mapping arbitrary (nonrecursive) algebraic data types to a relational data mode...
In this paper we report on our experience of using Database Supported Haskell (DSH) for analysing the entire Wikipedia history. DSH is a novel high-level database query facility allowing for the formulation and efficient execution of queries on nested and ordered collections of data. DSH grew out of a research project on the integration of database...
We demonstrate SWITCH, a deep embedding of relational queries into Ruby and Ruby on Rails. With SWITCH, there is no syntactic or stylistic difference between Ruby programs that operate over in-memory array objects or database-resident tables, even if these programs rely on array order or nesting. SWITCH's built-in compiler and SQL code generator gu...
This paper is about a Glasgow Haskell Compiler (GHC) extension that generalises Haskell's list comprehension notation to monads. The monad comprehension notation implemented by the extension supports generator and filter clauses, as was the case in the Haskell 1.4 standard. In addition, the extension generalises the recently proposed parallel and S...
Relational database management systems can be used as a coprocessor for general-purpose programming languages, especially for those program fragments that carry out data-intensive and data-parallel computations. In this paper we present a Haskell library for database-supported program execution. Data-intensive and data-parallel
computations are exp...
This paper is about a Glasgow Haskell Compiler (GHC) extension that generalises Haskell's list comprehension notation to monads. The monad comprehension notation implemented by the extension supports generator and filter clauses, as was the case in the Haskell 1.4 standard. In addition, the extension generalises the recently proposed parallel and S...
We demonstrate Habitat, a declarative observational debugger for SQL. Habitat facilitates true language-level (not: plan-level) debugging of, probably flawed, SQL queries that yield unexpected results. Users may mark arbitrary SQL subexpressions---ranging from literals, over fragments of predicates, to entire subquery blocks---to observe whether th...
We demonstrate an ecient LINQ to SQL provider and its signicant impact on the runtime performance of LINQ programs that process large data volumes. This alterna- tive provider is based on Ferry, compilation technology that lets relational database systems participate in the eval- uation of rst-order functional programs over nested, or- dered data s...
We report on a query compilation technique that enables the construction of alternative efficient query providers for Microsoft's Language Integrated Query (LINQ) framework. LINQ programs are mapped into an intermediate algebraic form, suitable for execution on any SQL:1999-capable relational database system.
This compilation technique leads to que...
We demonstrate an efficient LINQ to SQL provider and its significant impact on the runtime performance of LINQ programs that process large data volumes. This alternative provider is based on Ferry, compilation technology that lets relational database systems participate in the evaluation of first-order functional programs over nested, ordered data...
A purely relational account of the true XQuery semantics can turn any relational database system into an XQuery processor. Compiling nested expressions of the fully com- positional XQuery language, however, yields odd algebraic plan shapes featuring scattered distributions of join opera- tors that currently overwhelm commercial SQL query opti- mize...
Relational database management systems can be used as a coprocessor for general-purpose programming languages, especially for those program fragments that carry out data-intensive and data-parallel computations. In this paper we present a Haskell library for databasesupported program execution. Data-intensive and data-parallel computations are expr...
Relational database management systems (RDBMSs) pro- vide the best understood and most carefully engineered query process- ing infrastructure available today. However, RDBMSs are often operated as plain stores that do little more than reproduce stored data items for further processing outside the database host, in the general-purpose pro- gramming...
We demonstrate the language Ferry and its editing, com- pilation, and execution environment FerryDeck. Ferry's type system and operations match those of scripting or pro- gramming languages; its compiler has been designed to emit (bundles of) compliant and ecient SQL:1999 statements. Ferry acts as glue that permits a programming style in which deve...
A purely relational account of the true XQuery semantics can turn any relational database system into an XQuery processor. Compiling nested expressions of the fully compositional XQuery language, however, yields odd algebraic plan shapes featuring scattered distributions of join operators that currently overwhelm commercial SQL query optimizers. Th...
We introduce a controlled form of recursion in XQuery, an inationary xed point operator , familiar from the context of relational databases. This operator imposes restrictions on the expressible types of recursion, but it is suciently versatile to capture a wide range of interesting use cases, including Regular XPath and its core transitive closure...
Abstract— A purely,relational account,of the true XQuery semantics,can,turn,any,relational database,system,into an XQuery,processor. Compiling,nested,expressions,of the,fully compositional XQuery language, however, yields odd algebraic plan shapes,featuring,scattered distributions of join operators that currently overwhelm,commercial,SQL query,opti...
Though inevitable for eective cost-based query rewriting, the derivation of meaningful cardinality estimates has re- mained a notoriously hard problem in the context of XQuery. By basing the estimation on a relational representation of the XQuery syntax, we show how existing cardinality esti- mation techniques for XPath and proven relational estima...
We are taking the next big step towards the goal of a purely relational XQuery implementation. The Pathfinder XQuery compiler has been enhanced by a code generator that emits SQL. This code generator targets off-the-shelf relational database systems (e.g., DB2®) and turns them into efficient and scalable XQuery processors. Our approach neither depe...
The Pathfinder project makes inventive use of relational database technology—originally developed to process data of strictly tabular shape—to construct efficient database-supported XML and XQuery pro- cessors. Pathfinder targets database engines that implement a set-oriented mode of query execution: many off-the-shelf traditional database systems...
In the implementation of hosted business services, multi- ple tenants are often consolidated into the same database to reduce total cost of ownership. Common practice is to map multiple single-tenant logical schemas in the applica- tion to one multi-tenant physical schema in the database. Such mappings are challenging to create because enterprise a...
We introduce a controlled form of recursion in XQuery, inflationary fixed points, familiar in the context of relational databases. This imposes restrictions on the expressible types of recursion, but we show that inflationary fixed points nevertheless are sufficiently versatile to capture a wide range of interesting use cases, including the semanti...
On June 30, 2006, XIME-P 2006, the International Workshop on XQuery Implementation, Experience and Perspectives was held. This workshop marks the third event in a workshop series whose primary aim is to shed light on XQuery systems, specification aspects, foundations of the language, and the many perceivable shapes it may take on in the future. Lik...
There are more spots than immediately obvious in XQuery expressions where order is immaterial for evaluation - this affects most notably, but not exclusively, expressions in the scope of unordered {} and the argument of fn:unordered(). Clearly, performance gains are lurking behind such expression contexts but the prevalent impact of order on the XQ...
We explore the design and implementation of Rover, a post- mortem debugger for XQuery. Rather than being based on the traditional breakpoint model, Rover acknowledges XQuery's nature as a functional language: the debugger fol- lows a declarative debugging paradigm in which a user is enabled to observe the values of selected XQuery subexpres- sions....
To compensate for the inherent impedance mismatch be- tween the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational data- bases, most notably the TwigStack (6), structural join (1), and staircase join (13) algorithms. However, the addition...
The Pathfinder XQuery compiler has been enhanced by a new code generator that can target any SQL:1999-compliant relational database system (RDBMS). This code genera- tor marks an important next step towards truly relational XQuery processing, a branch of database technology that aims to turn RDBMSs into highly efficient XML and XQuery processors wi...
Only a couple of weeks after the participants of seminar No. 06472 met in Dagstuhl, the W3C published the Final Recommendation documents that fix the XQuery 1.0 syntax, data model, formal semantics, built-in function library and the interaction with the XML Schema Recommendations (see W3C's XQuery web site at http://www.w3.org/XML/Query/). With the...
From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during...
Relational XQuery processors aim at leveraging mature relational DBMS query processing technology to provide scalability and
efficiency. To achieve this goal, various storage schemes have been proposed to encode the tree structure of XML documents
in flat relational tables. Basically, two classes can be identified: (1) encodings using fixed-length...
We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is \textit{query routing}: selecting a small subset of (a potentially very large number of relevant) peers to contact to satisfy a keyword query. Existing approaches for query routing work wel...
Relevance Feedback is an important way to enhance retrieval quality by integrating relevance information provided by a user. In XML retrieval, feedback engines usually generate an expanded query from the content of elements marked as relevant or nonrelevant. This approach that is inspired by text-based IR completely ignores the semistructured natur...
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational table...
This work tries to employ the monoid comprehension calculus - which has proven to be an adequate framework to capture the semantics of modern object query languages featuring a family of collection types like sets, bags, and lists - in a twofold manner: First, serving as a target language for the translation of ODMG OQL queries. We review work done...
The series of International Conferences on Extending Database Technology (EDBT) is an established and prestigious forum for the exchange of the latest research results in data management. It provides unique opportunities for database researchers, practitioners, developers, and users to explore new ideas, techniques, and tools, and to exchange exper...
lowed: based on the extensible relational database ker-nel MonetDB [2], Pathˉnder provides highly e±cient and scalable XQuery technology that scales beyond 10 GB XML input instances on commodity hardware. Pathˉnder requires only local extensions to the un-derlying DBMS's kernel, such as the staircase join op-erator [7, 9]. A join recognition logic...
Various techniques have been proposed for efficient evaluation of XPath expressions,
where the XPath location steps are rooted in a single sequence of
context nodes. Among these techniques, the staircase join allows
to evaluate XPath location steps along arbitrary axes in at most
one scan over the XML document, exploiting the XPath accelerator
enco...
Pathfinder/MonetDB is a collaborative effort
of the University of Konstanz, the University of Twente, and the
Centrum voor Wiskunde en Informatica (CWI) in Amsterdam to develop
an XQuery compiler that targets an RDBMS back-end. The author of
this abstract is student at the University of Konstanz and spent
six months as an intern at the CWI, designi...
We report on a compilation procedure that derives rela- tional algebra plans from arbitrarily nested XQuery FLWOR blocks. While recent research was able to develop relational encodings of trees which may turn RDBMSs into highly ef- flcient XPath and XML Schema processors, here we describe relational encodings of nested iteration, variables, and the...
The XPath accelerator encodes the tree structure of an XML document using unique pairs of integer values, the nodes' preorder and postorder traversal ranks. If these ranks are used to place the document nodes in the two-dimensional pre/post plane, it becomes apparent that the encoding preserves an important property. Any context node v divides the...
Relational database systems may be turned into e#cient XML and XPath processors if the system is provided with a suitable relational tree encoding. This paper extends this relational XML processing stack and shows that an RDBMS can also serve as a highly e#cient XQuery runtime environment. Our approach is purely relational: XQuery expressions are c...
The syntactic wellformedness constraints of XML (opening and closing tags nest properly) imply that XML processors face the challenge to effciently handle data that takes the shape of ordered, unranked trees. Although RDBMSs have originally been designed to manage table-shaped data, we propose their use as XML and XPath processors. In our setup, th...
This work may be seen as a further proof of the versatility of the relational database model. Here, we add XQuery to the catalog of languages which RDBMSs are able to "speak" fluently. Given suitable relational encodings of sequences and ordered, unranked trees
We argue that e#cient support for schema validation and type annotation in XQuery processors deserves as much attention as e#cient evaluation techniques for XPath queries have received in the past. To this end, we describe a validation procedure that operates on an encoding of trees that has already been succesfully used for XPath location step eva...
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among r...