Norman May

Norman May
SAP Research | SAP

Dr.

About

80
Publications
45,798
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,344
Citations
Citations since 2016
32 Research Items
835 Citations
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140
2016201720182019202020212022020406080100120140
Additional affiliations
January 2003 - December 2007
Universität Mannheim

Publications

Publications (80)
Article
Full-text available
Software Testing is an established activity in the software development process to ensure and improve the quality of a software. Consequently, there exists a wide range of literature, popular information, and even multiple ISO standards covering this topic. However, we found that testing very large database management systems (DBMS) requires specia...
Article
With the advent of cloud computing, where computational resources are expensive and data movement needs to be secured and minimized, database management systems need to reconsider their architecture to accommodate such requirements. In this paper, we present our analysis, design and evaluation of an FPGA-based hardware accelerator for offloading co...
Article
The cost of DRAM contributes significantly to the operating costs of in-memory database management systems (IMDBMS). Persistent memory (PMEM) is an alternative type of byte-addressable memory that offers --- in addition to persistence --- higher capacities than DRAM at a lower price with the disadvantage of increased latencies and reduced bandwidth...
Article
Full-text available
In our initial DaMoN paper, we set out the goal to revisit the results of “Starring into the Abyss [...] of Concurrency Control with [1000] Cores” (Yu in Proc. VLDB Endow 8: 209-220, 2014). Against their assumption, today we do not see single-socket CPUs with 1000 cores. Instead, multi-socket hardware is prevalent today and in fact offers over 1000...
Article
Full-text available
String dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dicti...
Preprint
Full-text available
In this paper, we propose a radical new approach for scale-out distributed DBMSs. Instead of hard-baking an architectural model, such as a shared-nothing architecture, into the distributed DBMS design, we aim for a new class of so-called architecture-less DBMSs. The main idea is that an architecture-less DBMS can mimic any architecture on a per-que...
Conference Paper
An efficient compression of integer vectors is critical in dictionary- encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on light- weight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benef...
Article
Full-text available
Index joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be interleaved, so that, when a cache miss occurs in one task, the processor executes independent instructio...
Conference Paper
Non-Volatile Memory (NVM) technologies exhibit 4X the read access latency of conventional DRAM. When the working set does not fit in the processor cache, this latency gap between DRAM and NVM leads to more than 2X runtime increase for queries dominated by latency-bound operations such as index joins and tuple reconstruction. We explain how to easil...
Conference Paper
String dictionaries constitute a large portion of the memory foot-print of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding are favored in practice. This paper endeavors to make strong string dictionary...
Preprint
Full-text available
The discipline of Enterprise Application Integration (EAI) is the centrepiece of current on-premise, cloud and device integration scenarios. However, the building blocks of integration scenarios, i.e., essentially a composition of Enterprise Integration Patterns (EIPs), are only informally described, and thus their composition takes place in an inf...
Article
Full-text available
The quality of query execution plans in database systems determines how fast a query can be executed. It has been shown that conventional query optimization still selects sub-optimal or even bad execution plans, due to errors in the cardinality estimation. Although cardinality estimation errors are an evident problem, they are in general not consid...
Conference Paper
Full-text available
Enterprise Application Integration is the centerpiece of current on-premise, cloud and device integration scenarios. We describe optimization strategies that help reduce the model complexity, and improve the process execution using design time techniques. In order to achieve this, we formalize compositions of Enterprise Integration Patterns based o...
Conference Paper
Full-text available
Cardinality estimation is a crucial task in query optimization and typically relies on heuristics and basic statistical approximations. At execution time, estimation errors might result in situations where intermediate result sizes may differ from the estimated ones, so that the originally chosen plan is not the optimal plan anymore. In this paper...
Article
Index join performance is determined by the efficiency of the lookup operation on the involved index. Although database indexes are highly optimized to leverage processor caches, main memory accesses inevitably increase lookup runtime when the index outsizes the last-level cache; hence, index join performance drops. Still, robust index join perform...
Conference Paper
Full-text available
The growing number of (cloud) applications and devices massively increases the communication rate and volume pushing integration systems to their (throughput) limits. While the usage of modern hardware like Field Programmable Gate Arrays (FPGAs) led to low latency when employed for query and event processing, application integration adds yet unexpl...
Article
Full-text available
The discipline of enterprise application integration (EAI) enables the decoupled communication between (business) applications, and thus became a cornerstone of today’s IT architectures. In 2004, the book by Hohpe and Woolf on Enterprise Integration Patterns (EIP) provided a fundamental collection of messaging patterns, denoting the building blocks...
Article
Full-text available
Maintaining and querying hierarchical data in a relational database system is an important task in many business applications. This task is especially challenging when considering dynamic use cases with a high rate of complex, possibly skewed structural updates. Labeling schemes are widely considered the indexing technique of choice for hierarchica...
Article
Non-uniform memory access (NUMA) architectures pose numerous performance challenges for main-memory column-stores in scaling up analytics on modern multi-socket multi-core servers. A NUMA-aware execution engine needs a strategy for data placement and task scheduling that prefers fast local memory accesses over remote memory accesses, and avoids an...
Article
Today's hardware architectures provide an ever-increasing number of CPU cores that can be used for running concurrent operations. A big challenge is to ensure that these operations are properly synchronized and make efficient use of the available resources. Fellow database researchers have appropriately described this problem as "staring into the a...
Article
We address the problem of expressing and evaluating computations on hierarchies represented as database tables. Engine support for such computations is very limited today, and so they are usually outsourced into stored procedures or client code. Recently, data model and SQL language extensions were proposed to conveniently represent and work with h...
Conference Paper
The integration of a growing number of distributed, heterogeneous applications is one of the main challenges of enterprise data management. Through the advent of cloud and mobile application integration, higher volumes of messages have to be processed, compared to common enterprise computing scenarios, while guaranteeing high throughput. However, n...
Article
Full-text available
Main-memory column-stores are called to efficiently use modern non-uniform memory access (NUMA) architectures to service concurrent clients on big data. The efficient usage of NUMA architectures depends on the data placement and scheduling strategy of the column-store. Most column-stores choose a static strategy that involves partitioning all data...
Article
Full-text available
The SAP HANA database extends the scope of traditional database engines as it supports data models beyond regular tables, e.g. text, graphs or hierarchies. Moreover, SAP HANA also provides developers with a more fine-grained control to define their database application logic, e.g. exposing specific operators which are difficult to express in SQL. F...
Article
Full-text available
Maintaining and querying hierarchical data in a relational database system is an important task in many business applications. This task is especially challenging when considering dynamic use cases with a high rate of complex, possibly skewed structural updates. Labeling schemes are widely considered the indexing technique of choice for hierarchica...
Article
Full-text available
Following the adoption of basic temporal features in the SQL:2011 standard, there has been a tremendous interest within the database industry in supporting bi-temporal features, as a significant number of real-life workloads would greatly benefit from efficient temporal operations. However, current implementations of bi-temporal storage systems and...
Patent
Full-text available
A system and method of performing snapshot isolation in distributed databases. Each node stores local snapshot information that enforces snapshot isolation for that node. The method includes partially processing a distributed transaction by a first node, receiving a global commit identifier from a coordinator, and continuing to process the distribu...
Conference Paper
Full-text available
In order to confirm their theoretical assumptions, physicists employ Monte-Carlo generators to produce millions of simulated particle collision events and compare them with the results of the detector experiments. The traditional, static analysis workflow of physicists involves creating and compiling a C++ program for each study, and loading large...
Chapter
Full-text available
Main-memory column-stores are called to efficiently use modern non-uniform memory access (NUMA) architectures to service concurrent clients on big data. The efficient usage of NUMA architectures depends on the data placement and scheduling strategy of the column-store. Most column-stores choose a static strategy that involves partitioning all data...
Article
Full-text available
Modern database systems employ Snapshot Isolation to implement concurrency control and isolationbecause it promises superior query performance compared to lock-based alternatives. Furthermore, Snapshot Isolation never blocks readers, which is an important property for modern information systems, which have mixed workloads of heavy OLAP queries and...
Conference Paper
Full-text available
The common " one size does not fit all " paradigm isolates transactional and analytical workloads into separate, specialized database systems. Operational data is periodically replicated to a data warehouse for analytics. Competitiveness of enterprises today, however, depends on real-time reporting on operational data, necessitating an integration...
Article
Full-text available
Large-scale data analysis relies on custom code both for preparing the data for analysis as well as for the core analysis algorithms. The map-reduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases. Likewise, the literature on optimizing queries in relational databases has largely...
Article
Full-text available
Histograms that guarantee a maximum multiplicative error (q-error) for estimates may significantly improve the plan quality of query optimizers. However, the construction time for histograms with maximum q-error was too high for practical use cases. In this paper we extend this concept with a threshold, i.e., an estimate or true cardinality θ, belo...
Conference Paper
Full-text available
After more than a decade of a virtual standstill, the adoption of temporal data management features has recently picked up speed, driven by customer demand and the inclusion of temporal expressions into SQL:2011. Most of the big commercial DBMS now include support for bitemporal data and operators. In this paper, we perform a thorough analysis of...
Conference Paper
Full-text available
An increasing number of applications such as risk evalua-tion in banking or inventory management require support for temporal data. After more than a decade of standstill, the recent adoption of some bitemporal features in SQL:2011 has reinvigorated the support among commercial database vendors, who incorporate an increasing number of relevant bite...
Conference Paper
Full-text available
Managing temporal data is becoming increasingly important for many applications. Several database systems already support the time dimension, but provide only few temporal operators, which also often exhibit poor performance characteristics. On the academic side, a large number of algorithms and data structures have been proposed, but they often ad...
Conference Paper
Full-text available
Main-memory database systems are emerging as the new backbone of business applications. Besides flat relational data representations also hierarchical ones are essential for these modern applications; therefore we devise a new indexing and versioning approach for hierarchies that is deeply integrated into the relational kernel. We propose the Delta...
Conference Paper
Full-text available
Benchmarks are widely applied for the development and optimization of database systems. Standard benchmarks such as TPC-C and TPC-H provide a way of comparing the performance of different systems. In addition, micro benchmarks can be exploited to test a specific behavior of a system. Yet, despite all the benefits that can be derived from bench-mark...
Article
Full-text available
Modern enterprise applications are currently undergoing a complete paradigm shift away from traditional transactional processing to combined analytical and transactional processing. This challenge of combining two opposing query types in a single database management system results in additional requirements for transaction management as well. In th...
Conference Paper
Full-text available
Today, not only Internet companies such as Google, Facebook or Twitter do have Big Data but also Enterprise Information Systems store an ever growing amount of data (called Big Enterprise Data in this paper). In a classical SAP system landscape a central data warehouse (SAP BW) is used to integrate and analyze all enterprise data. In SAP BW most of...
Conference Paper
Full-text available
Managing temporal data is becoming increasingly important for many applications. Several database systems already support the time dimension, but provide only few temporal operators, which also often exhibit poor performance characteristics. On the aca-demic side, a large number of algorithms and data structures have been proposed, but they often a...
Conference Paper
Full-text available
MapReduce as a programming paradigm provides a simple-to-use yet very powerful abstraction encapsulated in two second-order functions: Map and Reduce. As such, they allow defining single sequentially processed tasks while at the same time hiding many of the framework details about how those tasks are parallelized and scaled out. In this paper we di...
Article
Full-text available
Requirements of enterprise applications have become much more demanding because they execute complex reports on transactional data while thousands of users may read or update records of the same data. The goal of the SAP HANA database is the integration of transactional and analytical workloads within the same database management system. To achieve...
Conference Paper
Full-text available
Intermediaries for e-services continuously gain momentum, powered by a materializing Internet of Services. However, quality of service still exhibits considerable shortcomings, as no structured process to enhance consumer satisfaction is available yet. To improve the match of delivered e-service quality and expected service quality on the consumer...
Conference Paper
Full-text available
This paper sheds light on control mechanisms to improve and automate service quality respectively service portfolio management in platform ecosystems. Its focus is placed on e-service value networks as found in platforms such as the Apple App Store, Facebook, Salesforce or SAP ByD. The paper differentiates between direct and indirect control mecha...
Chapter
Exchanging and analyzing ideas across different software tools and repositories is needed to implement the concepts of open innovation and holistic innovation management. However, a precise and formal definition for the concept of an idea is hard to obtain. In this paper, the authors introduce an ontology to represent ideas. This ontology provides...
Article
Full-text available
Early approaches to XQuery processing proposed proprietary techniques to optimize and evaluate XQuery statements. In this chapter, we argue for an algebraic optimization and evaluation technique for XQuery as it allows us to benefit from experience gained with relational databases. An algebraic XQuery processing method requires a translation into a...
Conference Paper
Full-text available
Service-oriented Architectures (SOA) and Web services leverage the technical value of solutions in the areas of distributed systems and cross-enterprise integration. The emergence of Internet marketplaces for business services is driving the need to describe services, not only from a technical level, but also from a business and operational perspec...