Peter Baumann

Peter Baumann
Constructor University Bremen gGmbH · Computer Science and Electrical Engineering

PhD, Computer Science

About

207
Publications
38,825
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,013
Citations
Citations since 2017
41 Research Items
963 Citations
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
2017201820192020202120222023050100150
Introduction
My research focuses on flexible, scalable services for massive multi-dimensional arrays in all aspects: theory, query languages and their optimization, architectures, application, and standardization. With our rasdaman sytem (www.rasdaman.org), we have pioneered the field of Array Databases. Further, we actively design international Big Data standards, such as ISO SQL/MDA (SQL extension with Multi-Dimensional Arrays) and the OGC "Big Geo Datacube" standard, WCS.
Additional affiliations
September 2012 - October 2014
Constructor University Bremen gGmbH
Position
  • Managing Director
November 2010 - present
rasdaman GmbH
Position
  • Founder & CEO
August 2004 - November 2015
Constructor University Bremen gGmbH
Position
  • Professor

Publications

Publications (207)
Article
Full-text available
Datenwürfel mit rasdaman Die Welt der Daten und Dienste – oder besser der Geodaten und der Geodatendienste. Open Data könnte nur schlecht funktionieren, wenn die Geodaten nicht mit Hilfe der Geodatendienste über das Internet transportiert werden könnten. Nun scheint es dafür eine Grenze zu geben. Große Datenmengen – Big Data – sind „too big to tran...
Article
Full-text available
This paper provides an in-depth survey on the integration of machine learning and array databases. First,machine learning support in modern database management systems is introduced. From straightforward implementations of linear algebra operations in SQL to machine learning capabilities of specialized database managers designed to process specific...
Conference Paper
Full-text available
Rapid environmental changes due to climate change call for innovative technological approaches, and likewise is this required for security missions. Essentially, organizations like NATO need advanced capabilities allowing geographically distributed joint mission forces with all their information sources and consumers to (i) operate off the same map...
Article
Full-text available
Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, r...
Article
Full-text available
In the era of ubiquitous data collection and generation, demands are high to make these data accessible as widely as possible, with as little effort and as much power and flexibility as ever possible. On Earth data, this holds in particular for pixel data and point clouds, some of the main “Big Data” today. Coverages represent a unifying concept fo...
Article
Full-text available
Multi-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly...
Conference Paper
Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority...
Conference Paper
Full-text available
Flexible, scalable services on massive geo data receive much attention today. In particular, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards suites have established a best practice for versatile access and analytics on spatio-temporal "Big Data". In this workshop, the participants get a deep dive into the sta...
Article
Full-text available
In its quest for a common European Spatial Data Infrastructure INSPIRE has also addressed the category of spatio-temporally extended coverages, in particular: raster data. The INSPIRE definition of coverages is similar to the OGC and ISO standards, but not identical. This deviation from the common standards disallows using standard off-the-shelf so...
Chapter
Datacubes form an emerging paradigm in the quest for providing EO data ready for spatial and temporal analysis; this concept, which generalizes the concept of seamless maps from 2-D to n-D, is based on preprocessing incoming data so as to integrate all data form one sensor into one logical array, say 3-D x/y/t for image timeseries or 4-D x/y/z/t fo...
Chapter
Map browsers currently in place present maps and geospatial information using common image formats such as JPEG or PNG, usually created from a service on demand. This is a clear approach for a simple visualization map browser but prevents the browser from modifying the visualization since the content of the image file represents the intensity of co...
Conference Paper
Full-text available
We demonstrate the rasdaman ("raster data manager") scalable datacube engine in a series of multi-dimensional live scenarios of spatio-temporal datacube analytics, distributed processing in federations, as well as simple, rapid construction of datacubes.
Conference Paper
Full-text available
Goal of the BigPicture project is to enhance satellite-based information with empirical in-situ ground-truthing. In this approach, the symptoms detected from remote sensing data are a mere starting point where additional data -- like long-term time series, location of unusually growing areas, and weather conditions -- get mixed in to ultimately obt...
Chapter
Full-text available
With the unprecedented increase of orbital sensor, in situ measurement, and simulation data there is a rich, yet not leveraged potential for obtaining insights from dissecting datasets and rejoining them with other datasets. Obviously, goal is to allow users to “ask any question, any time, on any size”, thereby enabling them to “build their own pro...
Conference Paper
Full-text available
Conference Paper
Full-text available
This paper provides a structure to the recently intensified discussion around 'data cubes' as a means to facilitate management and analysis of very large volumes of structured geospatial data. The goal is to arrive to a widely agreed and harmonised definition of a 'data cube'. To this end, we propose an approach that deconstructs the 'data cube' co...
Presentation
Full-text available
Presentation about the complexity of DataCube systems and the challenges to make them interoperable
Technical Report
Full-text available
Multidimensional arrays represent a core underlying structure of manifold science and engineering data. It is generally recognized today, therefore, that arrays have an essential role in Big Data and should become an integral part of the overall data type orchestration in information systems. This Technical Report discusses the support for Multidim...
Conference Paper
Array databases are used to manage and query large N-dimensional arrays, such as sensor data, simulation models and imagery, as well as various time-series. Modern database systems and database applications make extensive use of caching techniques to improve performance. Research on array databases on the other hand has not explored the potential b...
Conference Paper
Array databases have set out to close an important gap in data management, as multi-dimensional arrays play a key role in science and engineering data and beyond. Even more, arrays regularly contribute to the “Big Data” deluge, such as satellite images, climate simulation output, medical image modalities, cosmological simulation data, and datacubes...
Article
Full-text available
Recent trends on big Earth-observing (EO) data lead to some questions that the Earth science community needs to address. Are we experiencing a paradigm shift in Earth science research now? How can we better utilize the explosion of technology maturation to create new forms of EO data processing? Can we summarize the existing methodologies and techn...
Conference Paper
With the deluge of scientific big data affecting a large variety of research institutions, support for large multidimensional arrays has gained traction in the database community in the past decade. Array databases aim to cover the gap left by traditional relational database systems in the domains of large scientific data by enabling researchers to...
Article
Never before it was so easy and inexpensive to gather data in amounts which were beyond imagination only a few years in the past. However, as we all are aware, this richness in data goes hand in hand with a poverty in insight, as data understanding cannot keep up with this data deluge. Today, this phenomenon is not confined to highly specialized ap...
Conference Paper
Spatio-temporal grid data form a core structure in Earth and Space sciences alike. While Array Databases have set out to support this information category they only offer integer indexing, corresponding to equidistant grids. However, often grids in reality have irregular structures, such as raw satellite swath data. We present an approach to modeli...
Article
Full-text available
With the unprecedented availability of continuously updated measured and generated data there is an immense potential for getting new and timely insights – yet, the value is not fully leveraged as of today. The quest is up for high-level service interfaces for dissecting datasets and rejoining them with other datasets – ultimately, to allow users t...
Article
Full-text available
With the unprecedented availability of continuously updated measured and generated data there is an immense potential for getting new and timely insights – yet, the value is not fully leveraged as of today. The quest is up for high-level service interfaces for dissecting datasets and rejoining them with other datasets – ultimately, to allow users t...
Conference Paper
Multidimensional array data, such as remote-sensing imagery and timeseries, climate model simulations, telescope observations, and medical images, contribute massively to virtually all science and engineering domains, and hence play a key role in "Big Data" challenges. Pure array storage management and analytics is relatively well understood today....
Conference Paper
Support for large arrays has been increasingly gaining attention by the database community. Array databases are a quickly expanding category of database management systems that treat large, multidimensional array data as first-class database citizens, allowing convenient and efficient storage and retrieval. Large array data on its own, however, is...
Conference Paper
Flexible, scalable services on massive geo data receive much attention today. In particular, the OGC Web Coverage Service (WCS) standards suite has established a best practice for versatile access and retrieval on spatio-temporal "Big Data". Fewer efforts have been devoted, though, to an easy-to-use, standardized way of maintaining a service's offe...
Conference Paper
Full-text available
In this abstract we present an approach to analyze a Digital Terrain Model (DTM) located at the Lunar south pole using a web client. The aim of this paper is to describe the structure of the project which involves a server (database) and a client (web interface) side. A case study is proposed where the tool will be evaluated to calculate illuminati...
Article
Full-text available
Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existi...
Article
Full-text available
Distributed information infrastructures are increasingly used in the geospatial domain. In the infrastructures, data are being collected by distributed sensor services, served by distributed geospatial data services, transformed by processing services and workflows, and consumed by smart clients. Consequently, Geographical Information Systems (GISs...
Conference Paper
The global Earth Science Systems (ESS) cooperation requires both flexible and interoperable Web Service support built on large varieties of Earth Observation archives. Given the complexity and dynamics of each observation and the large number of disciplines involved, Open GIS Consortium (OGC) proposed a modular standardization approach to facilitat...
Article
Full-text available
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated throug...
Chapter
Multidimensional array data, including satellite images and weather simulations in the Earth Science, confocal microscopy and CAT scans in the Life Science, as well as telescope and cosmological observations in Space science, is traditionally the type of data seriously contributing to “Big Data”. Traditionally, the database community has neglected...
Article
In this paper, we present a topological neighborhood ex- pression which allows us to express arbitrary neighborhood around cells in unstructured meshes. We show that the ex- pression can be evaluated by traversing the connectivity in- formation of the meshes. We implemented two algebraic operators which use the expression to compute neighbors of ce...
Conference Paper
The ever growing amount of information collected by scientific instruments and the presence of descriptive metadata accompanying them calls for a unified way of querying over array and semi-structured data. We present xWCPS, a novel query language that bridges the path between these two different worlds, enhancing the expressiveness and user-friend...
Conference Paper
Full-text available
This contribution introduces the forthcoming extension of the ISO SQL standard for multi-dimensional arrays, SQL/MDA. We present concepts, the language, and highlight how it can be implemented in a scalable manner. Examples used stem from Earth Observation and related domains.
Article
Full-text available
Ensuring long-term accessibility of Earth Science archive data is a recurrent issue for data centers. Heterogeneity of data adds particular challenges. The Data Virtualisation Toolkit (DVT) has been developed by the SCIDIP-ES project to support long-term access and use of heterogeneous Earth Science (ES) data in a format-independent manner. DVT pro...
Article
Archiving models employed in Multi-disciplinary Earth System Science research tend to be very heterogeneous, as recognized by the “Variability” Aspect of the common Big Data “Four V” definition. The information being preserved is at constant risk of obsolescence due to continuous technology and community knowledge changes and development. Accessing...
Conference Paper
Full-text available
Big Data pose special challenges on geo data, touching upon all the V keywords, like Volume, Velocity, Variety, and Veracity. Simultaneously, demands are massively increasing, from the traditional file download to allowing customers to build their own spatio-temporal product on the fly. This seminar introduces to the key Big Geo Data standards of O...
Conference Paper
Rasdaman (raster data manager") is the pioneer in Array Database Systems, the next generation in scalable scientific data services: it provides agile analytics on massive multidimensional raster data ("arrays"), such as regular and irregular spatio-temporal grids. An SQL-style query language allows users to flexibly build their own product in a "mi...
Conference Paper
Full-text available
Arrays are among those data types which contribute the most to Big Data -- examples include satellite images and weather simulation output in the Earth sciences, confocal microscopy and CAT scans in the Life sciences, as well as telescope and cosmological observations in Space science, to name but a few. Traditionally, the database community has ne...
Article
Since array data of arbitrary dimensionality appears in massive amounts in a wide range of application domains, such as geographic information systems, climate simulations, and medical imaging, it has become crucial to build scalable systems for complex query answering in real time. Cloud architectures can be expected to significantly speed up arra...
Technical Report
Full-text available
How to handle and publish multi-dimensional gridded big data using Array DBMS technology and OGC open standards.
Conference Paper
Full-text available
The continuous growth of remotely sensed data raises the need for efficient ways of accessing data archives. The classical model of accessing remote sensing (satellite) archives via distribution of large files is increasingly making way for a more dynamic and interactive data service. A challenge, though, is interoperability of such services, in pa...
Conference Paper
Full-text available
In this work-in-progress paper, we model scientific meshes as a multi-graph in Neo4j graph database using the graph property model. We conduct experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experim...
Chapter
Full-text available
Big Data are a central challenge today in science and industry. Typically, Big Data are characterized from application perspectives. From a data structure perspective, among the core structures appearing are sets, graphs, and arrays. In particular in science and engineering we find arrays being a main contributor to data volumes. In fact, large, mu...
Article
Image analysis plays an important role both in medical diagnostics and in biology. The main reasons that prevent the creation of objective and reliable methods of analysis of biomedical images are the high variability and heterogeneity of the biological material, distortion introduced by the experimental procedures, and the large size of the images...
Conference Paper
Full-text available
Modern sensors, such as hyperspectral cameras, deliver massive amounts of data. On board of satellites, the high volume is paired with low bandwidth and part-time availability, during overpasses. This leads to well-known availability problems and bottlenecks in today's remote sensing. We address this challenge by enhancing the on-board system with...