
Peter BaumannConstructor University Bremen gGmbH · Computer Science and Electrical Engineering
Peter Baumann
PhD, Computer Science
About
207
Publications
38,825
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,013
Citations
Citations since 2017
Introduction
My research focuses on flexible, scalable services for massive multi-dimensional arrays in all aspects: theory, query languages and their optimization, architectures, application, and standardization. With our rasdaman sytem (www.rasdaman.org), we have pioneered the field of Array Databases. Further, we actively design international Big Data standards, such as ISO SQL/MDA (SQL extension with Multi-Dimensional Arrays) and the OGC "Big Geo Datacube" standard, WCS.
Additional affiliations
September 2012 - October 2014
November 2010 - present
rasdaman GmbH
Position
- Founder & CEO
August 2004 - November 2015
Publications
Publications (207)
Datenwürfel mit rasdaman
Die Welt der Daten und Dienste – oder besser der Geodaten
und der Geodatendienste. Open Data könnte nur schlecht
funktionieren, wenn die Geodaten nicht mit Hilfe der Geodatendienste
über das Internet transportiert werden könnten.
Nun scheint es dafür eine Grenze zu geben. Große Datenmengen
– Big Data – sind „too big to tran...
This paper provides an in-depth survey on the integration of machine learning and array databases. First,machine learning support in modern database management systems is introduced. From straightforward implementations of linear algebra operations in SQL to machine learning capabilities of specialized database managers designed to process specific...
Rapid environmental changes due to climate change call for innovative technological approaches, and likewise is this required for security missions. Essentially, organizations like NATO need advanced capabilities allowing geographically distributed joint mission forces with all their information sources and consumers to (i) operate off the same map...
Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, r...
In the era of ubiquitous data collection and generation, demands are high to make these data accessible as widely as possible, with as little effort and as much power and flexibility as ever possible. On Earth data, this holds in particular for pixel data and point clouds, some of the main “Big Data” today. Coverages represent a unifying concept fo...
Multi-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly...
Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority...
Flexible, scalable services on massive geo data receive much attention today. In particular, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards suites have established a best practice for versatile access and analytics on spatio-temporal "Big Data".
In this workshop, the participants get a deep dive into the sta...
In its quest for a common European Spatial Data Infrastructure INSPIRE has also addressed the category of spatio-temporally extended coverages, in particular: raster data. The INSPIRE definition of coverages is similar to the OGC and ISO standards, but not identical. This deviation from the common standards disallows using standard off-the-shelf so...
Datacubes form an emerging paradigm in the quest for providing EO data ready for spatial and temporal analysis; this concept, which generalizes the concept of seamless maps from 2-D to n-D, is based on preprocessing incoming data so as to integrate all data form one sensor into one logical array, say 3-D x/y/t for image timeseries or 4-D x/y/z/t fo...
Map browsers currently in place present maps and geospatial information using common image formats such as JPEG or PNG, usually created from a service on demand. This is a clear approach for a simple visualization map browser but prevents the browser from modifying the visualization since the content of the image file represents the intensity of co...
We demonstrate the rasdaman ("raster data manager") scalable datacube engine in a series of multi-dimensional live scenarios of spatio-temporal datacube analytics, distributed processing in federations, as well as simple, rapid construction of datacubes.
Goal of the BigPicture project is to enhance satellite-based information with empirical in-situ ground-truthing. In this approach, the symptoms detected from remote sensing data are a mere starting point where additional data -- like long-term time series, location of unusually growing areas, and weather conditions -- get mixed in to ultimately obt...
With the unprecedented increase of orbital sensor, in situ measurement, and simulation data there is a rich, yet not leveraged potential for obtaining insights from dissecting datasets and rejoining them with other datasets. Obviously, goal is to allow users to “ask any question, any time, on any size”, thereby enabling them to “build their own pro...
This paper provides a structure to the recently intensified discussion around 'data cubes' as a means to facilitate management and analysis of very large volumes of structured geospatial data. The goal is to arrive to a widely agreed and harmonised definition of a 'data cube'. To this end, we propose an approach that deconstructs the 'data cube' co...
Presentation about the complexity of DataCube systems and the challenges to make them interoperable
Multidimensional arrays represent a core underlying structure of manifold science and engineering data. It is generally recognized today, therefore, that arrays have an essential role in Big Data and should become an integral part of the overall data type orchestration in information systems. This Technical Report discusses the support for Multidim...
Array databases are used to manage and query large N-dimensional arrays, such as sensor data, simulation models and imagery, as well as various time-series. Modern database systems and database applications make extensive use of caching techniques to improve performance. Research on array databases on the other hand has not explored the potential b...
Array databases have set out to close an important gap in data management, as multi-dimensional arrays play a key role in science and engineering data and beyond. Even more, arrays regularly contribute to the “Big Data” deluge, such as satellite images, climate simulation output, medical image modalities, cosmological simulation data, and datacubes...
Recent trends on big Earth-observing (EO) data lead to some questions that the Earth science community needs to address. Are we experiencing a paradigm shift in Earth science research now? How can we better utilize the explosion of technology maturation to create new forms of EO data processing? Can we summarize the existing methodologies and techn...
With the deluge of scientific big data affecting a large variety of research institutions, support for large multidimensional arrays has gained traction in the database community in the past decade. Array databases aim to cover the gap left by traditional relational database systems in the domains of large scientific data by enabling researchers to...
Never before it was so easy and inexpensive to gather data in amounts which were beyond imagination only a few years in the past. However, as we all are aware, this richness in data goes hand in hand with a poverty in insight, as data understanding cannot keep up with this data deluge. Today, this phenomenon is not confined to highly specialized ap...
Spatio-temporal grid data form a core structure in Earth and Space sciences alike. While Array Databases have set out to support this information category they only offer integer indexing, corresponding to equidistant grids. However, often grids in reality have irregular structures, such as raw satellite swath data.
We present an approach to modeli...
With the unprecedented availability of continuously updated measured and generated data there is an immense potential for getting new and timely insights – yet, the value is not fully leveraged as of today. The quest is up for high-level service interfaces for dissecting datasets and rejoining them with other datasets – ultimately, to allow users t...
With the unprecedented availability of continuously updated measured and generated data there is an immense potential for getting new and timely insights – yet, the value is not fully leveraged as of today. The quest is up for high-level service interfaces for dissecting datasets and rejoining them with other datasets – ultimately, to allow users t...
Multidimensional array data, such as remote-sensing imagery and
timeseries, climate model simulations, telescope observations, and
medical images, contribute massively to virtually all science and
engineering domains, and hence play a key role in "Big Data"
challenges. Pure array storage management and analytics is relatively
well understood today....
Support for large arrays has been increasingly gaining attention by the database community. Array databases are a quickly expanding category of database management systems that treat large, multidimensional array data as first-class database citizens, allowing convenient and efficient storage and retrieval. Large array data on its own, however, is...
Flexible, scalable services on massive geo data receive much attention today. In particular, the OGC Web Coverage Service (WCS) standards suite has established a best practice for versatile access and retrieval on spatio-temporal "Big Data". Fewer efforts have been devoted, though, to an easy-to-use, standardized way of maintaining a service's offe...
In this abstract we present an approach to analyze a Digital Terrain Model (DTM) located at the Lunar south pole using a web client. The aim of this paper is to describe the structure of the project which involves a server (database) and a client (web interface) side. A case study is proposed where the tool will be evaluated to calculate illuminati...
Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existi...
Distributed information infrastructures are increasingly used in the geospatial domain. In the infrastructures, data are being collected by distributed sensor services, served by distributed geospatial data services, transformed by processing services and workflows, and consumed by smart clients. Consequently, Geographical Information Systems (GISs...
The global Earth Science Systems (ESS) cooperation requires both flexible and interoperable Web Service support built on large varieties of Earth Observation archives. Given the complexity and dynamics of each observation and the large number of disciplines involved, Open GIS Consortium (OGC) proposed a modular standardization approach to facilitat...
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated throug...
Multidimensional array data, including satellite images and weather simulations in the Earth Science, confocal microscopy and CAT scans in the Life Science, as well as telescope and cosmological observations in Space science, is traditionally the type of data seriously contributing to “Big Data”. Traditionally, the database community has neglected...
In this paper, we present a topological neighborhood ex- pression which allows us to express arbitrary neighborhood around cells in unstructured meshes. We show that the ex- pression can be evaluated by traversing the connectivity in- formation of the meshes. We implemented two algebraic operators which use the expression to compute neighbors of ce...
The ever growing amount of information collected by scientific instruments and the presence of descriptive metadata accompanying them calls for a unified way of querying over array and semi-structured data. We present xWCPS, a novel query language that bridges the path between these two different worlds, enhancing the expressiveness and user-friend...
This contribution introduces the forthcoming extension of the
ISO SQL standard for multi-dimensional arrays, SQL/MDA.
We present concepts, the language, and highlight how it can
be implemented in a scalable manner. Examples used stem
from Earth Observation and related domains.
Ensuring long-term accessibility of Earth Science archive data is a recurrent issue for data centers. Heterogeneity of data adds particular challenges. The Data Virtualisation Toolkit (DVT) has been developed by the SCIDIP-ES project to support long-term access and use of heterogeneous Earth Science (ES) data in a format-independent manner. DVT pro...
Archiving models employed in Multi-disciplinary Earth System Science research tend to be very heterogeneous, as recognized by the “Variability” Aspect of the common Big Data “Four V” definition. The information being preserved is at constant risk of obsolescence due to continuous technology and community knowledge changes and development. Accessing...
Big Data pose special challenges on geo data, touching upon all the V keywords, like Volume, Velocity, Variety, and Veracity. Simultaneously, demands are massively increasing, from the traditional file download to allowing customers to build their own spatio-temporal product on the fly. This seminar introduces to the key Big Geo Data standards of O...
Rasdaman (raster data manager") is the pioneer in Array Database Systems, the next generation in scalable scientific data services: it provides agile analytics on massive multidimensional raster data ("arrays"), such as regular and irregular spatio-temporal grids. An SQL-style query language allows users to flexibly build their own product in a "mi...
Arrays are among those data types which contribute the most to Big Data -- examples include satellite images and weather simulation output in the Earth sciences, confocal microscopy and CAT scans in the Life sciences, as well as telescope and cosmological observations in Space science, to name but a few. Traditionally, the database community has ne...
Since array data of arbitrary dimensionality appears in massive amounts in a wide range of application domains, such as geographic information systems, climate simulations, and medical imaging, it has become crucial to build scalable systems for complex query answering in real time. Cloud architectures can be expected to significantly speed up arra...
How to handle and publish multi-dimensional gridded big data using Array DBMS technology and OGC open standards.
The continuous growth of remotely sensed data raises the need for efficient ways of accessing data archives. The classical model of accessing remote sensing (satellite) archives via distribution of large files is increasingly making way for a more dynamic and interactive data service. A challenge, though, is interoperability of such services, in pa...
In this work-in-progress paper, we model scientific meshes as a multi-graph in Neo4j graph database using the graph property model. We conduct experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experim...
Big Data are a central challenge today in science and industry. Typically, Big Data are characterized from application perspectives. From a data structure perspective, among the core structures appearing are sets, graphs, and arrays. In particular in science and engineering we find arrays being a main contributor to data volumes. In fact, large, mu...
Image analysis plays an important role both in medical diagnostics and in biology. The main reasons that prevent the creation of objective and reliable methods of analysis of biomedical images are the high variability and heterogeneity of the biological material, distortion introduced by the experimental procedures, and the large size of the images...
Modern sensors, such as hyperspectral cameras, deliver massive amounts of data. On board of satellites, the high volume is paired with low bandwidth and part-time availability, during overpasses. This leads to well-known availability problems and bottlenecks in today's remote sensing.
We address this challenge by enhancing the on-board system with...