
Jens TeubnerTU Dortmund University | TUD · Faculty of Computer Science
Jens Teubner
PhD
About
109
Publications
7,273
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,444
Citations
Additional affiliations
April 2013 - present
May 2001 - May 2005
August 2008 - March 2013
Publications
Publications (109)
Large-scale data processing forms the core of modern online services, such as social media and e-commerce, calling for an ever-increasing performance with predictable service quality. Even though emerging hardware platforms can deliver the required performance, actually harnessing it and guaranteeing a certain service quality is still a challenge f...
Feed-Forward Networks (FFNs), or multilayer perceptrons, are fundamental network structures for deep learning. Although feed-forward networks are structurally uncomplicated, their training procedure is computationally expensive. It is challenging to design customized hardware for training due to the diversity of operations in forwardand backward-pr...
With the increasing demand for time-predictable machine learning applications, e.g., object detection in autonomous driving systems, such a trend poses several new challenges for resource synchronization in real-time systems, especially when hardware accelerators like Graphics Processing Units (GPUs) are considered as shared resources. When the sha...
Query compilation is a processing technique that achieves very high processing speeds but has the disadvantage of introducing additional compilation latencies. These latencies cause an overhead that is relatively high for short-running and high-complexity queries. In this work, we present Flounder IR and ReSQL, our new approach to query compilation...
Query compilation has proven to be one of the most efficient query processing techniques. Despite its fast processing speed, the additional compilation times of the technique limit its applicability. This is because the approach is most beneficial only when the improvements in processing time clearly exceed the additional compilation time.
Recently...
Emerging hardware platforms are characterized by large degrees of parallelism, complex memory hierarchies, and increasing hardware heterogeneity. Their theoretical peak data processing performance can only be unleashed if the different pieces of systems software collaborate much more closely and if their traditional dependencies and interfaces are...
In response to physical limitations, hardware has changed significantly during the past two decades. As the database community we have no chance but adapt to those changes in order to benefit from these and further hardware advances.
Due to the growing demand on processing power and energy efficiency by today’s data-intensive applications developers have to deal with heterogeneous hardware platforms composed of specialized computing resources. These are highly efficient for certain workloads but difficult to handle from the software engineering perspective. Even state-of-the-ar...
Graphics processing units (GPUs) promise spectacular performance advantages when used as database coprocessors. Their massive compute capacity, however, is often hampered by control flow divergence caused by non-uniform data distributions. When data-parallel work items demand for different amounts or types of processing, instructions execute with l...
Die Arbeitsgruppe „Datenbanken und Informationssystem“ vertritt an der TU Dortmund das Gebiet in Forschung und Lehre. Dieser Artikel gibt einen Überblick über die Aktivitäten des Lehrstuhls in beiden Bereichen.
Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization.
Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style...
As the operating costs of today’s data centres continue to increase and processor manufacturers are forced to meet thermal design power constraints when designing new hardware, the energy efficiency of a main-memory database management system becomes more and more important. Plus, lots of database workloads are more memory-intensive than compute-in...
Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete...
To escape a number of physical limitations (e.g., bandwidth and thermal issues), hardware technology is strongly trending toward heterogeneous system designs , where a large share of the application work can be off-loaded to accelerators , such as graphics or network processors.
In the database domain, field-programmable gate arrays (FPGAs) were re...
Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-proces...
Parallelism is currently seen as a mechanism to minimize the impact of the power and heat dissipation problems encountered in modern hardware. Data parallelism—based on partitioning the data—and pipeline parallelism—based on partitioning the computation—are the two main approaches to leverage parallelism on a wide range of hardware platforms.
Unfor...
For several decades, the roles in developing IT systems remained clearly separated. It was the responsibility of the hardware community to embrace and leverage the latest technology trends. The resulting hardware would become faster—but the interfaces it exposes to software remained basically unchanged for many years. The role of the software commu...
This work revisits the processing of stream joins on modern hardware architectures. Our work is based on the recently proposed handshake join algorithm, which is a mechanism to parallelize the processing of stream joins in a NUMA-aware and hardware-friendly manner. Handshake join achieves high throughput and scalability, but it suffers from a high...
Existing main-memory hash join algorithms for multi-core can be classified into two camps. Hardware-oblivious hash join variants do not depend on hardware-specific parameters. Rather, they consider qualitative characteristics of modern hardware and are expected to achieve good performance on any technologically similar platform. The assumption behi...
While offering unique performance and energy-saving advantages, the use of Field-Programmable Gate Arrays (FPGAs) for database acceleration has demanded major concessions from system designers. Either the programmable chips have been used for very basic application tasks (such as implementing a rigid class of selection predicates) or their circuit...
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multi-core architectures, it has been argued that sort-merge join i...
In this demonstration, we present Ibex, a novel storage engine featuring hybrid, FPGA-accelerated query processing. In Ibex, an FPGA is inserted along the path between the storage devices and the database engine. The FPGA acts as an intelligent storage engine supporting query off-loading from the query engine. Apart from significant performance imp...
Download Free Sample
Roughly a decade ago, power consumption and heat dissipation concerns forced the semiconductor industry to radically change its course, shifting from sequential to parallel computing. Unfortunately, improving performance of applications has now become much more difficult than in the good old days of frequency scaling. This is a...
Due to stagnant clock speeds and high power consumption of commodity microprocessors, database vendors have started to explore massively parallel co-processors such as FPGAs to further increase performance. A typical approach is to push simple but compute-intensive operations (e.g., pre-filtering, (de)compression) to FPGAs for acceleration. In this...
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multicore architectures, it has been argued that sort-merge join is...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory join algorithms. In the last few years, two diverging views have appeared. One approach advocates careful tailoring of the algorithm to the architectural parameters (cache sizes, TLB, and memory bandwidth). The other approach argues that modern hardwar...
We mainly looked at FPGAs from the hardware technology side so far. Clearly, the use and “programming” of FPGAs is considerably different to the programming models that software developers are used to. In this chapter, we will show how entire system designs can be derived from a given application context, and we will show how those designs can be m...
So far, we have manly highlighted advantages of FPGAs with respect to performance, e.g., for low-latency and/or high-volume stream processing. An entirely different area where FPGAs pro-vide a number of benefits is secure data processing. Especially in the cloud computing era, guar-anteeing data confidentiality, privacy, etc., are increasingly impo...
In this chapter, we give an overview of the technology behind field-programmable gate arrays (FP-GAs). We begin with a brief history of FPGAs before we explain the key concepts that make (re)programmable hardware possible. We do so in a bottom-up approach, that is, we first dis-cuss the very basic building blocks of FPGAs, and then gradually zoom o...
FPGA technology can be leveraged in modern computing systems byusing them as a co-processor (or “accelerator”) in a heterogeneous computing architecture, where CPUs, FPGAs, and possibly further hardware components are used jointly to solve application problems.
In the previous chapter, we illustrated various ways of applying FPGAs to stream processing applications. In this chapter, we illustrate that FPGAs also have the potential to accelerate more classical data processing tasks by exploiting various forms of parallelism inherent to FPGAs. In particular, we will discuss FPGA-acceleration for two differen...
Before we delve into core FPGA technology in Chapter 3, we need to familiarize ourselves with a few basic concepts of hardware design. As we will see, the process of designing a hard-wired circuit—a so-called application specific integrated circuit (ASIC)—is not that different from im-plementing the same circuit on an FPGA. In this chapter, we will...
The increasing number of cores and the rich instruction sets of modern hardware are opening up new opportunities for optimizing many traditional data mining tasks. In this paper we demonstrate how to speed up the performance of the computation of frequent items by almost one order of magnitude over the best published results by matching the algorit...
While the performance opportunities of field-programmable gate arrays field (FPGAs)field for high-volume query processing are well-known, system makers still have to compromise between desired query expressiveness and high compilation effort. The cost of the latter is the primary limitation in building efficient FPGA/CPU hybrids.
In this work we re...
We demonstrate MXQuery/H, a modified version of MXQuery that uses hardware acceleration to speed up XML processing. The main goal of this demonstration is to give an interactive example of hardware/software co-design and show how system performance and energy efficiency can be improved by off-loading tasks to FPGA hardware. To this end, we equipped...
Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities
but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable
microprocessors and FPGAs add further diversity but also increase complexity...
The architectural changes introduced with multi-core CPUs have triggered a redesign of main-memory join algorithms. In the last few years, two diverging views have appeared. One approach advocates careful tailoring of the algorithm to the architectural parameters (cache sizes, TLB, and memory bandwidth). The other approach argues that modern hardwa...
Computing frequent items is an important problem by itself and as a subroutine in several data mining algorithms. In this paper, we explore how to accelerate the computation of frequent items using field-programmable gate arrays (FPGAs) with a threefold goal: increase performance over existing solutions, reduce energy consumption over CPU-based sys...
In spite of the omnipresence of parallel (multi-core) systems, the predominant strategy to evaluate window-based stream joins is still strictly sequential, mostly just straightforward along the definition of the operation semantics.
In this work we present handshake join, a way of describing and executing window-based stream joins that is highly a...
We demonstrate a hardware implementation of a complex event processor, built on top of field-programmable gate arrays (FPGAs). Compared to CPU-based commodity systems, our solution shows distinctive advantages for stream monitoring tasks, e.g., wire-speed processing and predictable performance. The demonstration is based on a query-to-hardware comp...
Field-programmable gate arrays (FPGAs) are chip devices that can be runtime-reconfigured to realize arbitrary processing tasks directly in hardware. Industrial products [Net, Xtr] as well as research prototypes [MTA09, MVB + 09, SLS + 10, TMA11] demonstrated how this capability can be exploited to build highly efficient processors for data warehous...
Complex event detection is an advanced form of data stream processing where the stream(s) are scrutinized to identify given event patterns. The challenge for many complex event processing (CEP) systems is to be able to evaluate event patterns on high-volume data streams while adhering to real-time constraints. To solve this problem, in this paper w...
Field-programmable gate arrays (FPGAs) are a promising technology that can be used in database systems. In this demonstration we show Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on FPGAs. Glacier consists of a library of compositional hardware modules that represent stream processing op...
Field-programmable gate arrays (FPGAs) can provide performance advantages with a lower resource consumption (e.g., energy) than conventional CPUs. In this paper, we show how to employ FPGAs to provide an efficient and high-performance solution for the frequent item problem. We discuss three design alternatives, each one of them exploiting different...
In line with the insight that "one size" of databases will not fit all application needs [19] the database community is currently exploring various alternatives to commodity, CPU-based system designs. One particular candidate in this trend are field-programmable gate arrays (FPGAs), programmable chips that allow tailor-made hardware designs optimiz...
As network infrastructures with 10 Gb/s bandwidth and beyond have become pervasive and as cost advantages of large commodity-machine clusters continue to increase, research and industry strive to exploit the available processing performance for large-scale database processing tasks. In this work we look at the use of high-speed networks for distrib...
Given the tremendous versatility of relational database implementations toward a wide range of
database problems, it seems only natural to consider them as back-ends for XML data processing. Yet, the
assumptions behind the language XQuery are considerably different to those in traditional RDBMSs. The underlying
data model is a tree, data and result...
Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. In this paper, we explore how to program data processing operators on top of field-programmable ga...
Taking advantage of many-core, heterogeneous hardware for data processing tasks is a difficult problem. In this paper, we consider the use of FPGAs for data stream processing as coprocessors in many-core architectures. We present Glacier, a component library and compositional compiler that transforms continuous queries into logic circuits by compos...
While there seems to be a general agreement that next years' systems will include many processing cores, it is often over- looked that these systems will also include an increasing number of dierent cores (we already see dedicated units for graphics or network processing). Orchestrating the diversity of processing functionality is going to be a maj...
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed data- base processing signicantly. Complex and sophisticated mechanisms to avoid network trac can be replaced by a scheme that takes advantage of the bandwidth and low la- tency oered by such interconnects. We illustrate this phenomenon wit...
We introduce a controlled form of recursion in XQuery, an inationary xed point operator , familiar from the context of relational databases. This operator imposes restrictions on the expressible types of recursion, but it is suciently versatile to capture a wide range of interesting use cases, including Regular XPath and its core transitive closure...
Taking advantage of many-core, heterogeneous hardware for data processing tasks is a difficult problem. In this paper, we consider the use of FPGAs for data stream processing as co-processors in many-core architectures. We present Glacier, a component library and compositional compiler that transforms continuous queries into logic circuits by compo...
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed data- base processing signicantly. Complex and sophisticated mechanisms to avoid network trac can be replaced by a scheme that takes advantage of the bandwidth and low la- tency oered by such interconnects. We illustrate this phenomenon wit...
Systems Group, together with the Enterprise Computing Center (ECC) have taken initiatives at the ETH Zurich department of computer science to support both academic and industrial research. The goal of systems groups is to redefine, restructure, and reorganize systems research to avoid the trap and complex problems from a single, isolated perspectiv...
Though inevitable for eective cost-based query rewriting, the derivation of meaningful cardinality estimates has re- mained a notoriously hard problem in the context of XQuery. By basing the estimation on a relational representation of the XQuery syntax, we show how existing cardinality esti- mation techniques for XPath and proven relational estima...
XML Schema awareness has been an integral part of the XQuery language since its early design stages. Matching XML data against XML types is the main operation that backs up XQuery type expressions, such as typeswitch, instance of, or certain XPath operators. This interac- tion is particularly vital in data-centric XQuery applications, where data co...
The Pathfinder project makes inventive use of relational database technology—originally developed to process data of strictly tabular shape—to construct efficient database-supported XML and XQuery pro- cessors. Pathfinder targets database engines that implement a set-oriented mode of query execution: many off-the-shelf traditional database systems...
We introduce a controlled form of recursion in XQuery, inflationary fixed points, familiar in the context of relational databases. This imposes restrictions on the expressible types of recursion, but we show that inflationary fixed points nevertheless are sufficiently versatile to capture a wide range of interesting use cases, including the semanti...
We explore the design and implementation of Rover, a post- mortem debugger for XQuery. Rather than being based on the traditional breakpoint model, Rover acknowledges XQuery's nature as a functional language: the debugger fol- lows a declarative debugging paradigm in which a user is enabled to observe the values of selected XQuery subexpres- sions....
To compensate for the inherent impedance mismatch be- tween the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational data- bases, most notably the TwigStack (6), structural join (1), and staircase join (13) algorithms. However, the addition...
The Pathfinder XQuery compiler has been enhanced by a new code generator that can target any SQL:1999-compliant relational database system (RDBMS). This code genera- tor marks an important next step towards truly relational XQuery processing, a branch of database technology that aims to turn RDBMSs into highly efficient XML and XQuery processors wi...
There are more spots than immediately obvious in XQuery expressions where order is immaterial for evaluation - this affects most notably, but not exclusively, expressions in the scope of unordered {} and the argument of fn:unordered(). Clearly, performance gains are lurking behind such expression contexts but the prevalent impact of order on the XQ...
Relational database systems are highly efficient hosts to table-shaped data. It is all the more interesting to see how a careful inspection of both, the XML tree structure as well as the W3C XQuery language definition, can turn relational databases into fast and scalable XML processors. This work shows how the deliberate choice of a relational tree...
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational table...
Relational XQuery processors aim at leveraging mature relational DBMS query processing technology to provide scalability and
efficiency. To achieve this goal, various storage schemes have been proposed to encode the tree structure of XML documents
in flat relational tables. Basically, two classes can be identified: (1) encodings using fixed-length...
lowed: based on the extensible relational database ker-nel MonetDB [2], Pathˉnder provides highly e±cient and scalable XQuery technology that scales beyond 10 GB XML input instances on commodity hardware. Pathˉnder requires only local extensions to the un-derlying DBMS's kernel, such as the staircase join op-erator [7, 9]. A join recognition logic...
Various techniques have been proposed for efficient evaluation of XPath expressions,
where the XPath location steps are rooted in a single sequence of
context nodes. Among these techniques, the staircase join allows
to evaluate XPath location steps along arbitrary axes in at most
one scan over the XML document, exploiting the XPath accelerator
enco...
Pathfinder/MonetDB is a collaborative effort
of the University of Konstanz, the University of Twente, and the
Centrum voor Wiskunde en Informatica (CWI) in Amsterdam to develop
an XQuery compiler that targets an RDBMS back-end. The author of
this abstract is student at the University of Konstanz and spent
six months as an intern at the CWI, designi...
The XPath accelerator encodes the tree structure of an XML document using unique pairs of integer values, the nodes' preorder and postorder traversal ranks. If these ranks are used to place the document nodes in the two-dimensional pre/post plane, it becomes apparent that the encoding preserves an important property. Any context node v divides the...
Relational database systems may be turned into e#cient XML and XPath processors if the system is provided with a suitable relational tree encoding. This paper extends this relational XML processing stack and shows that an RDBMS can also serve as a highly e#cient XQuery runtime environment. Our approach is purely relational: XQuery expressions are c...
The syntactic wellformedness constraints of XML (opening and closing tags nest properly) imply that XML processors face the challenge to effciently handle data that takes the shape of ordered, unranked trees. Although RDBMSs have originally been designed to manage table-shaped data, we propose their use as XML and XPath processors. In our setup, th...
This work may be seen as a further proof of the versatility of the relational database model. Here, we add XQuery to the catalog of languages which RDBMSs are able to "speak" fluently. Given suitable relational encodings of sequences and ordered, unranked trees
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among r...
Relational query processors derive much of their effectiveness from the awareness of specific table properties like sort order, size, or absence of duplicate tuples. This chapter applies (and adapts) this successful principle to database-supported XML and XPath processing: the relational system is made tree aware, i.e., tree properties like subtree...
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among r...
The W3 Consortium is currently developing the XQuery specification to query XML data.
Relational query processors derive much of their effectiveness from the awareness of specific table properties like sort order, size, or absence of duplicate tuples. This text applies (and adapts) this successful principle to database-supported XML and XPath processing: the relational system is made tree aware, i.e., tree properties like subtree si...