Conference PaperPDF Available

The Six Faces of the Data Cube

Authors:
  • Constructor University
  • Digital Earth Africa

Abstract and Figures

This paper provides a structure to the recently intensified discussion around 'data cubes' as a means to facilitate management and analysis of very large volumes of structured geospatial data. The goal is to arrive to a widely agreed and harmonised definition of a 'data cube'. To this end, we propose an approach that deconstructs the 'data cube' concept into distinct aspects. We have identified six such aspects, which we refer to as the 6 faces of the data cube. More than a pleasing analogy, these 6 faces are fairly independent, and hence 'orthogonal' domains. They should allow breaking down the description and handling of data cubes into meaningful and manageable 'parts', which however, only if seen holistically, make it possible to harness the full potential of this multidisciplinary infrastructure.
Content may be subject to copyright.
THE SIX FACES OF THE DATA CUBE
Peter Strobl1, Peter Baumann2, Adam Lewis3, Zoltan Szantoi1,4, Brian Killough5,
Matthew Purss3, Max Craglia1, Stefano Nativi1,6, Alex Held7, Trevor Dhu3
1European Commission, Joint Research Centre, Ispra, Italy; 2Jacobs University, Bremen, Germany;
3Geoscience Australia, Canberra, Australia; 4Stellenbosch University, South Africa;
5NASA Langley Research Center, Hampton, United States; 6National Research Council of Italy, Rome, Italy;
7Commonwealth Scientific Industrial Research Organisation (CSIRO), Canberra, Australia
ABSTRACT
This paper provides a structure to the recently intensified
discussion around ‘data cubes’ as a means to facilitate
management and analysis of very large volumes of
structured geospatial data. The goal is to arrive to a widely
agreed and harmonised definition of a ‘data cube’. To this
end, we propose an approach that deconstructs the ‘data
cube’ concept into distinct aspects. We have identified six
such aspects, which we refer to as the 6 faces of the data
cube. More than a pleasing analogy, these 6 faces are fairly
independent, and hence ‘orthogonal’ domains. They should
allow breaking down the description and handling of data
cubes into meaningful and manageable ‘parts’, which
however, only if seen holistically, make it possible to
harness the full potential of this multidisciplinary
infrastructure.
Index Terms data cube, structured data, data
infrastructure, geospatial data, big data, standardisation,
WCS, CIS, INSPIRE, OGC, ISO
1. INTRODUCTION
The term data cube originally was used in Online Analytical
Processing (OLAP) of business and statistics data;
technically speaking, such a data cube represents a multi-
dimensional array together with metadata describing the
semantics of axes, coordinates, and cells. More recently,
data cubes have emerged in a geospatial context [1,2] as an
approach to the management and analysis of these large and
rapidly growing datasets. While the terminology ‘data cube’
was used as early as in the 1980’s when the first imaging
spectrometers produced ‘hyperspectral data cubes’,
technology was not ready for efficiently storing and serving
data cubes. Geospatial data cubes typically are densely
populated, whereas OLAP data cubes typically are sparse.
A generic requirement remains that data can only be
organised as a ‘cube’ if they have inherent attributes
(usually referred to as coordinates) according to which they
can be ordered. A data cube may have horizontal and
vertical spatial axes, temporal axes, or any other application-
dependent dimensions. For geospatial data cubes, at least
one of those should be non-spatial.
2. DEFINING THE CUBE
Similar to the term ‘big data’, for which no consistent
definition has yet emerged, we find the notion of ‘data cube’
still varying across the literature and often dependent on the
context in which it is used. Whilst ‘big data’ is an
expression at a high and abstract enough level so that a
certain room for interpretation is not problematic,
discussions around data cubes will suffer, unless further
structure is provided to this evolving concept.
For us, a Geospatial Data Cube (GDC) is based on regularly
and irregularly gridded, spatial and/or temporal data with n
dimensions (or axes) and characterised by the presence of
the 6 faces that we explore in this paper. As such, it
complements the conceptual view of the ‘Datacube
Manifesto’ [4] with a holistic system view, whose aim is to
raise awareness for all necessary aspects of such an
infrastructure.
3. DISSECTING THE CUBE
The purpose of a GDC is to allow ingestion, storage,
provision, and analysis of structured geospatial data for
which it has to cover several technical aspects, which we
call, faces. Individually each face is a well-established
domain within data sciences, allowing the respective experts
to enter the discussion at the right end. However, as an
infrastructure a data cube can unfold to its full potential only
if all the following ‘faces’ are comprehensively covered and
well-orchestrated.
3.1. Parameter Model
The semantics of a cube cell value is described by a
parameter model which allows understanding the
information stored in each thematic layer of the cube. This
includes the parameterisation of the property and its quality,
as well as the associated metadata that are necessary for the
analysis. The Open Geospatial Consortium (OGC) Sensor
Web Enablement (SWE) Common Data Model (CDM) [3]
defines important elements of parameter models. Well-
documented implementations of such models for various
themes, such as terrain elevation [5], are given in the
INSPIRE data specifications. However, incorporating data
Data Cubes and Multidimensional Arrays
Proc. of the 2017 conference on
Big Data from Space (BiDS’17) doi: 10.2760/383579
32 Toulouse, France
28–30 November 2017
describing the same parameter data (i.e. radiance imagery)
but from various origin into a geospatial data cube remains a
challenge even in cases where such models are applied, due
to the differences among collecting sensors, imagery
processing chains and algorithms used. Thus, such
geospatial (raster) data need to be either pre-processed with
approved algorithms or, rather should be directly produced
by the corresponding instrument owner such that they fit
into the data cube structure. The latter, and preferred option
is being advocated and endorsed by the Committee on Earth
Observation Satellites (CEOS). Such data, called “Analysis
Ready Data” or ARD, would come from CEOS’ member
space agencies and fulfil a minimum set of criteria, like
consistent parameter models and approved algorithms, thus
largely facilitating the compilation of data cubes and data
exchange among them. Direct or automatic multi sensor
data fusion however calls also for harmonised sensor
characteristics such as spectral band definition and
availability and consistency of ancillary data like Digital
Elevation Models.
3.2. Data Representation
Data representation is the way in which a parameter is
discretised and semantically encoded along the different
axes or dimensions of the cube such as space, time, and
thematic properties. A given parameter might be represented
in different ways and the same representation scheme might
be used for different parameters. Depending on the
representation type a specific set of metadata needs to be
supplied including e.g. range, interval, scale, precision, or
reference. The OGC SWE-CDM contains a comprehensive
overview of representation types [3].
Discretisation in the spatial domain is highly familiar in the
form of gridding [6]. ISO and OGC today base most of their
grid definitions on the EPSG catalogue of projections,
which either limits respective grids to regional coverage or
induces considerable spatial distortion. An example for a
common (quasi) global spatial grid system is the WMTS,
which in fact is a mixture of projection, grid definition and
tiling schema. A relatively new concept is promoted by the
recent OGC standard for Discrete Global Grid Systems
(DGGS), which aim at overcoming limitations of planar
projections by defining hierarchical grids directly on the
ellipsoid.
In other areas standards are often still missing, and the
representation of observation-level metadata such as
measurement quality and uncertainty is in its infancy.
3.3. Data Organisation
The cell values generated by the discretisation of the
parameter need to be physically arranged and stored in a
machine-readable way. This encompasses issues like file
formats, file systems, and database structures. OGC CIS [6]
- which is also adopted as ISO 19123-2 - establishes how
representation can be based on ASCII (such as GML, JSON,
or RDF), binary (such as GeoTIFF or NetCDF), or a mix of
both embedded in some “container format” (such as zip or
GeoPackage). Furthermore, the data cubes representing “Big
Data” typically require data to be partitioned (also called
tiling), and they need to be amenable to streaming (mainly
in case of timeseries); both is included in the current version
CIS 1.1. Furtado [7] performed a general analysis of multi-
dimensional partitioning.
Fig. 1 The Data oriented Faces of the Geospatial Data Cube.
3.4. Infrastructure
The data storage units must be hosted by an IT infrastructure
or ‘hardware’ that also allows their handling. This could be
a centralised or distributed setup of storage and processing
devices. Rapid data access and transfer between storage and
processing instances are important criteria [2], particularly
for very large spatio-temporal datasets.
Amount and increase of geospatial data require significant
financial and logistical investments to offer competitive
services for attracting and retaining users. Among the many
supercomputing facilities, which over the last years have
started offering geospatial data and services are industrial
initiatives such as the Google Earth Engine [8] or Amazon
Web Services. Others are publicly funded and operated such
as the Australian Geospatial Data Cube [2], the Technical
University of Vienna’s Earth Observation Data Centre
(EODC) [9] or the JRC Earth Observation Data Processing
Platform (JEODPP) at European Commission’s (EC) Joint
Research Centre [10]. In the frame of the Copernicus
program the EC is about to fund various consortia uniting
Data Cubes and Multidimensional Arrays
Proc. of the 2017 conference on
Big Data from Space (BiDS’17) doi: 10.2760/383579
33 Toulouse, France
28–30 November 2017
public and private entities to serve as ‘Data Information and
Access Systems’ (DIAS) [11,12].
While all these initiatives also show commitments covering
other aspects of data cubes, their main investments seem to
be directed towards the IT infrastructure. However, the
success of these investments will largely depend on the
functionality of these infrastructures for which they must
also duly cover the other faces described here.
Fig. 2 The functionality oriented faces of the Geospatial
Data Cube.
3.5. Access and Analysis
Within the infrastructure a wide range of functionalities
must be implemented through software to access,
manipulate and analyse the stored data (and metadata) and
to ingest new products into the data cube. These
functionalities must be documented and made available to
users by means of APIs and other interactive interfaces
(GUIs). Between the User API (front-end) and the file
manipulation routines (back-end) one or several layers of
software are imaginable.
One of these layers could consist of common GIS tools (e.g.
QGIS, ArcGIS), and OGC Web Coverage Services (WCS)
can be used to connect these within the data cube. A most
recent example of an API and GUI has been demonstrated
by the CEOS Open Data Cube initiative
(http://tinyurl.com/datacubeui).
An existing standard defining a GDC analytics language is
the OGC Web Coverage Processing Service (WCPS) [13].
Additional recent attempts to establish such languages are
made by OPeNDAP, Google Earth Engine [8] and others.
As substantial processing is being shifted to the data cube
host, anticipative cost estimation as well as access rights and
security will also be of high concern when it comes to
granting access to data and to analysis power. Given the size
of data cubes it will often not be sufficient to give a binary
answer on the whole cube, but guard particular regions,
collections, etc. separately. Costs for accessing, processing,
and transferring data should be determined prior to
execution so that the host can decide about admissibility,
and maybe users can be warned or disproportionate request
rejected.
3.6. Interoperability
Interoperability and scalable fusion of spatial information
across different data cubes is crucial and highly dependent
on the use of robust international standards governing the
access and transfer protocols for communication between
client and server as well as among different servers.
ISO 19123 (which is identical to OGC Abstract Topic 6)
defines an abstract data cube model as part of the coverage
concept; however, due to its level of abstraction it is not yet
interoperable. Its sister standard, OGC CIS 1.1 / ISO 19123-
2, establishes concrete encodings which allow re-encoding
of coverages from one format into another so that a well-
defined, format-independent data cube exchange is possible,
though at the cost of additional interpolation and
resampling.
The corresponding service model is provided by the OGC
Web Coverage Service (WCS) [13], which has been adopted
by INSPIRE and is on the adoption plan of ISO. A large,
growing number of open-source and proprietary
implementations support WCS so that interoperable access
to data cubes is possible through a wide range of tools
today, including map navigation (like OpenLayers, Leaflet),
Web GIS (like QGIS, ArcGIS), visualization (like NASA
WorldWind, Cesium), and analytics (like python and R) -
see the examples in the Jupyter notebook at [14]. This
allows users to remain in the comfort zone of their tools
while accessing data cubes stored in rasdaman, GeoServer,
MapServer, ArcGIS, and other WCS-enabled engines.
Further, the Web Coverage Processing Service (WCPS) geo
datacube analytics language standard provides a means for
“shipping code to data” in an unambiguous, semantically
well-defined manner [13].
Since 2012, the intercontinental EarthServer initiative
(http://www.earthserver.eu) is establishing agile datacube
analytics on 3D x/y/t image timeseries and 4D x/y/z/t
weather data, based on the rasdaman Array Database
System (http://www.rasdaman.org). The largest installation,
EO Data Service (www.eodataservice.org), recently has
passed the 1 Petabyte frontier; ECMWF in EarthServer is
working on unleashing its 220 PB climate archive. Currently
many more stakeholders such as the Committee on Earth
Data Cubes and Multidimensional Arrays
Proc. of the 2017 conference on
Big Data from Space (BiDS’17) doi: 10.2760/383579
34 Toulouse, France
28–30 November 2017
Observation Satellites (CEOS) and W3C have started
working on data cubes. Consistency among these and the
established OGC / ISO / INSPIRE standards will be a key to
success. Barriers to interoperability, on the other hand, will
inevitably lead to silo effects undermining the
multidisciplinary concept and potential of data cubes.
4. OUTLOOK
The future success of (geospatial) data cubes will certainly
not depend on the existence of a widely-agreed definition
alone. But it is likely that a well-structured discussion and a
widespread agreement on key features of data cubes will
enable a much faster convergence, increased interoperability
and more rapid progress at global level.
Valuable technology contributions can be expected from the
field of Array Databases, which is working on flexible,
scalable query services on massive arrays, backed by the
existing OGC Web Coverage Processing Service (WCPS)
[13] and the forthcoming ISO Array SQL [16] standards.
However, users should not need to learn new languages each
time they work on another platform, but be able to use their
own existing tools and scripts (e.g., python and R for
analysis), which can be coupled through the
abovementioned languages as hidden, standards-based
client/server APIs.
Ultimately, the efforts should go beyond just the exchange
of data, but move us towards compatibility and consistency
of the available information and of the way it can be
accessed and analysed.
5. REFERENCES
[1] Salehi, M., Bédard, Y., Mostafavi, M., Brodeur, J., 2007,
“From transactional spatial databases integrity constraints to spatial
data cubes integrity constraints”, Proc. of the 5th International
Symposium on Spatial Data Quality.
[2] Lewis, A., et al., 2017, “The Australian Geoscience Data Cube
Foundations and lessons learned”, Remote Sensing of
Environment, http://dx.doi.org/10.1016/j.rse.2017.03.015
[3] Robin, A. (Ed.), 2011, SWE CDM Encoding Standard, OGC,
http://www.opengeospatial.org/standards/swecommon
[4] Baumann P., 2017, “The Datacube Manifesto”,
http://www.earthserver.eu/tech/datacube-manifesto
[5] INSPIRE Data Specification on Elevation Tech. Guidelines
https://inspire.ec.europa.eu/file/1530/download?token=pq85sbLG
[6] Baumann, P., Hirschorn, E., Maso, J., 2017, Coverage
Implementation Schema, version 1.1, OGC,
https://portal.opengeospatial.org/files/?artifact_id=48553
[7] Furtado, P. et al, 1999, “Storage of Multidimensional Arrays
based on Arbitrary Tiling.”, ICDE'99, Sydney, Australia
[8] Gorelick, N., et al., Google Earth Engine: Planetary-scale
geospatial analysis for everyone, Remote Sensing of
Environment(2016), http://dx.doi.org/10.1016/j.rse.2017.06.031
[9] Wagner, W., 2015, Big Data infrastructures for processing
Sentinel data, in Photogrammetric Week 2015, Dieter Fritsch (Ed.),
Wichmann/VDE, Berlin Offenbach, 93-104
[10] Soille, P., et al, 2017, The JRC Earth Observation Data and
Processing Platform, Big Data form Space BiDS’17, this issue
[11] http://copernicus.eu/news/upcoming-copernicus-data-and-
information-access-services-dias
[12] Schick, M., 2017, EUMETSAT, ECMWF & MERCATOR
OCÉAN partners DIAS
[13] Baumann, P., 2009, Web Coverage Processing Service
(WCPS) Language Interface Standard, OGC,
http://www.opengeospatial.org/standards/wcps
[14] Baumann, P., 2012, “OGC Web Coverage Service (WCS)
Core”, OGC, https://portal.opengeospatial.org/files/09-110r4
[15] Clements, O., et al, 2017, “Improving access to big data
through OGC standard interfaces”,
https://nbviewer.jupyter.org/github/earthserver-eu/INSPIRE-
notebooks/blob/master/index.ipynb
[16] Misev, D., et al., 2015, “A Database Language More Suitable
for the Earth System Sciences”. In G. Lohmann et al (eds.):
Towards an Interdisciplinary Approach in Earth System Science
Springer 2015, doi:10.1007/978-3-319-13865-7
Data Cubes and Multidimensional Arrays
Proc. of the 2017 conference on
Big Data from Space (BiDS’17) doi: 10.2760/383579
35 Toulouse, France
28–30 November 2017
... The term "data cube" typically refers to a multidimensional array of values along with metadata that describes the meaning of the axes, coordinates, and cells [12]. Fig. 1 shows a "cube" as a metaphor illustrating a data structure that can be one-dimensional to multidimensional. ...
... Spatial data cube is characterized by the presence of the six faces shown in Fig. 2. Six faces of a spatial data cube could be classified within two categories: data-oriented faces (parameter model, data representation, and data organization) and functionality-oriented faces (infrastructure, access & analysis, and interoperability) [12]. ...
... The implementation of reliable international standards is essential for the interoperability and scalable merging of spatial information in various data cubes. For communication between client and server as well as between various servers, these standards specify access and transfer protocols [12]. III. ...
Conference Paper
In the last decade, an upsurge of the free and open data access policy for Earth observation (EO) and Geographic Information System (GIS) data took place. It is practically impossible to manually collect, analyse, and process data since there is so much remotely sensed data available for every area of the Earth's surface. As a result, data cubes are now a new option for handling Big Data thanks to technological advancements. As a technological and analytical approach, the data cube concept provides a way to access, manage and analyse enormous amounts of EO data. This paper’s main contribution is giving insights and guidance for choosing the most suitable data cube solution by comparing Open Data Cube (ODC), Euro Data Cube (EDC), Rasdaman, and Data Cube on Demand (DCoD) solutions based on provided satellite data, type of data storage, data uploading possibility by the user, available standardized web services, backend services for developers, and financial aspect. There is no best data cube solution in general, but depending on a size and scope of the project, available resources, storage type and integration costs, stated assets and liabilities of introduced data cubes should be taken into account and the most suitable solution can be chosen.
... The potential impact of the cognitive problem of 'ARD ⊂ EO-IU ⊂ CV ⊂ AGI' on the RS community is highlighted by recalling here that the notion of ARD has been strictly coupled with the concept of EO (raster-based) data cube, proposed as innovative midstream EO technology by the RS community in recent years (Open Data Cube, 2020;Baumann, 2017;CEOS, 2020;Giuliani et al., 2017;Giuliani, Chatenoux, Piller, Moser, & Lacroix, 2020;Lewis et al., 2017;Strobl et al., 2017). ...
... Unfortunately, a community-agreed definition of EO (raster-based) data cube does not exist yet, although several recommendations and implementations have been made (Open Data Cube, 2020; Baumann, 2017; CEOS -Committee on Earth Observation Satellites, 2020; Giuliani et al., 2017Giuliani et al., , 2020Lewis et al., 2017;Strobl et al., 2017). A community-agreed definition of ARD, to be adopted as standard baseline in EO data cube implementations, does not exist either. ...
... Unfortunately, existing ESA TEPs and the ESA Climate Change Initiative are parallel projects lacking inter-platform operability, in contrast with the first principles of a new Space Economy 4.0 (Mazzucato & Robinson, 2017). In practice, each of these ESA EO big data processing chains specializes from the start, instead of starting a "vertical" user-and domain-specific specialization/competition second stage following a "horizontal" (enabling) harmonized/ interoperable/ cooperative EO data processing first stage, encompassing the RTD of a general-purpose user-and application-independent multi-source Level 2/ARD product generation ( Vermote & Saleous, 2007), in combination with a harmonized/interoperable EO ARD cube management system (Open Data Cube, 2020; Baumann, 2017; CEOS -Committee on Earth Observation Satellites, 2020; Giuliani et al., 2017Giuliani et al., , 2020Lewis et al., 2017;Strobl et al., 2017). ...
Article
Full-text available
Aiming at the convergence between Earth observation (EO) Big Data and Artificial General Intelligence (AGI), this paper consists of two parts. In the previous Part 1, existing EO optical sensory image-derived Level 2/Analysis Ready Data (ARD) products and processes are critically compared, to overcome their lack of harmonization/ standardization/ interoperability and suitability in a new notion of Space Economy 4.0. In the present Part 2, original contributions comprise, at the Marr five levels of system understanding: (1) an innovative, but realistic EO optical sensory image-derived semantics-enriched ARD co-product pair requirements specification. First, in the pursuit of third-level semantic/ontological interoperability, a novel ARD symbolic (categorical and semantic) co-product, known as Scene Classification Map (SCM), adopts an augmented Cloud versus Not-Cloud taxonomy, whose Not-Cloud class legend complies with the standard fully-nested Land Cover Classification System’s Dichotomous Phase taxonomy proposed by the United Nations Food and Agriculture Organization. Second, a novel ARD subsymbolic numerical co-product, specifically, a panchromatic or multi-spectral EO image whose dimensionless digital numbers are radiometrically calibrated into a physical unit of radiometric measure, ranging from top-of-atmosphere reflectance to surface reflectance and surface albedo values, in a five-stage radiometric correction sequence. (2) An original ARD process requirements specification. (3) An innovative ARD processing system design (architecture), where stepwi se SCM generation and stepwise SCM-conditional EO optical image radiometric correction are alternated in sequence. (4) An original modular hierarchical hybrid (combined deductive and inductive) computer vision subsystem design, provided with feedback loops, where software solutions at the Marr two shallowest levels of system understanding, specifically, algorithm and implementation, are selected from the scientific literature, to benefit from their technology readiness level as proof of feasibility, required in addition to proven suitability. To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers, the proposed EO optical sensory image-derived semantics-enriched ARD product-pair and process reference standard is highlighted as linchpin for success of a new notion of Space Economy 4.0.
... Geospatial data cubes are defined as multi-dimensional data structures based on regular or irregular grids (represented as arrays), often containing spatio-temporal data with n-dimensions (Strobl et al., 2017;Baumann et al., 2018). The structured manner of representing spatiotemporal data has become an intuitive way to organise big EO data, usually in raster or gridded formats, with minimum two spatial dimensions, i.e., x / y or latitude / longitude. ...
Article
Full-text available
The amount of geospatial data generated, in particular from segmentation techniques applied to Earth observation (EO) data, is rapidly increasing. This, in combination with the rising popularity of EO data cubes for time series analysis, results in a need to adequately structure, represent and further analyse data coming from segmentation approaches. In this study, we explore the use of vector data cubes for the structuring and analysis of features that evolve in space and time with a particular focus on geomorphological features due to their high spatio-temporal variability. Vector data cubes are multi-dimensional data structures that often contain spatio-temporal data with n-dimensions, with a geometry as the minimum spatial dimension and time as the temporal dimension. We consider two vector data cube formats, i.e., array and tabular, and further extend their conceptualisation to contain features that evolve in space and time. We showcase our implementation for two geomorphological features, the Fagradalsfjall lava flow in Iceland and the Butangbunasi landslide and landslide-dammed lake in Taiwan. Finally, we discuss the potential and limitations of vector data cubes, regarding their technical implementation and application to geomorphology, and further outline the future research directions.
... Afterwards, the system is capable of handling requests for data that is inside a geospatial bounding box and a time interval and provide an agile response containing lists of elements (e.g. tiles) that comply to these constraints [8]. ...
Article
Full-text available
Earth Observation (EO) videos are undergoing rapid expansion due to the swift advancements in aerial, spaceborne, and ground remote sensing technologies enabling the continuous capture of imagery of the Earth's surface. Compared to traditional image-based EO data, EO videos offer persistent Earth observations, rendering a promising observation resource across diverse applications, including climate monitoring and hazard assessment. The continuous observation capability introduces a challenge to the community, i.e., how to effectively manage and fully harness the value of the substantial volume of EO videos. In this paper, we propose a novel approach leveraging a spatiotemporal data cube with EO video management to facilitate analysis. It suggests an Analysis Ready Data (ARD) for EO videos, termed as Analysis Ready Video Data (ARVD), which is incorporated into an Earth Video Cube (EVC). The ARD includes semantics at frame, object/trajectory, and event levels. The paper presents the cube data organization and query processing for EO videos. A prototype system is implemented to demonstrate the applicability of the approach.
... As an open-source, flexible and promising data cube framework, Open Data Cube (ODC) [63] has been used in numerous national or regional data cubes projects, such as SDC [35], CDCol [64], ARDC [65], and CDC [45]. It organizes and manages data with a large-scale multidimensional array [66][67] that can support seamless spatial, temporal, spectral, and feature analysis [68] [69]. So users can access RS data based on spatio-temporal coordinates rather than the traditional single "scene" file [24][30] [70]. ...
Article
Full-text available
With the booming of high-resolution Earth observation and Open-Data efforts, petabyte-scale Earth observation data have been available for free access. Due to the unprecedented availability of big data deluge, regional to global spatio-temporal analysis has been significantly challenged with the huge computational barriers, the tedious cycles of “download-preprocess-store-analyze” leading to excessive data downloading overhead, and the acquisition-oriented 2D file-based structure which is not fit for spatio-temporal analysis. The Earth Observation Data Cube (EODC) paradigm revolutionizes the traditional way of storing, managing, and analyzing spatio-temporal RS data, and solves problems of easy-to-use of RS data to a certain extent. However, different EODC solutions are becoming “Information Silos”. Therefore, the sharing and joint use of remote sensing (RS) data across EODCs have become extremely challenging. To address the above challenges, we proposed a method of in-memory distributed data cube auto-discovery and retrieval across Clouds. We construct a distributed in-memory data orchestration across Clouds to shield the heterogeneity of the EODC storage solutions, solving “Information Silos” problems. And we put forward a Larger-sites-first and Spatio-temporal Aware RS data discovery strategy, which can automatically discover data across Clouds for requirements. Based on the data cube paradigm, this paper proposes a Quality-first data filtering strategy, which can help users to filter out high-quality data covering the target spatio-temporal range from the huge amount of data, and solve the problem of data cube joint retrieval and efficient use across Clouds. In addition, we have confirmed that our method is effective and efficient through comparative experiments.
... Data gaps were filled using image-based predictive mapping [42], integrating remote sensing (Sentinel-1 and -2) and environmental data (soil information and topographic indices from a national DEM) using a Random Forest [43] classifier. The integration of the different databases in a theoretical data cube [44,45] is an example of the application of a relatively new concept in the use of geospatial data. The map and its creation process are described in detail in [36]. ...
Article
Full-text available
Mapping and assessing ecosystem services (ES) projects at the national level have been implemented recently in the European Union in order to comply with the targets set out in the EU’s Biodiversity Strategy for 2020 and later in the Strategy for 2030. In Hungary this work has just been accomplished in a large-scale six-year project. The Hungarian assessment was structured along the ES cascade with each level described by a set of indicators. We present the selected and quantified indicators for 12 ES. For the assessment of cascade level 4, human well-being, a set of relevant well-being dimensions were selected. The whole process was supported by several forms of involvement, interviews, consultations and workshops and in thematic working groups performing the ES quantifications, followed by building scenarios and synthesizing maps and results. Here we give an overview of the main steps and results of the assessment, discuss related conceptual issues and recommend solutions that may be of international relevance. We refine some definitions of the cascade levels and suggest theoretical extensions to the cascade model. By finding a common basis for ES assessments and especially for national ones, we can ensure better comparability of results and better adoption in decision making.
... In addition to the satellite images, the data repository provides internal access to UAV images currently covering the shorelines of Lake Sevan. The Armenian Data Cube uses the power of the Open Data Cube (ODC) analytical framework (http://www.opendatacube.org) to address the societal and scientific challenges, giving a better picture of the land resources and changes (Strobl et al. 2017). At its core, the ODC comprises Python libraries and a PostgreSQL database that helps work with geospatial raster data. ...
Article
Full-text available
Coastal management has a critical role in estimating the coastal environmental and socio-economic dynamics, providing various vital regional and local services. Remote sensing earth observations are essential for detecting and monitoring shorelines. UAVs combined with satellite remote sensing address the shoreline delineation problems to detect the shoreline and identify the shoreline zones. The paper presents a shoreline delineation service utilizing UAV and Sentinel 2 images within a Data Cube environment for monitoring coastal areas. The BandRatio, McFeeters, MNDWI1, and MNDWI2 algorithms have been implemented in the service to analyze the accuracy of each algorithm by comparing satellite and UAV-derived shorelines. As a case study, the Lake Sevan shoreline delineation, as one of the most incredible freshwater lakes in Eurasia, has been studied using the service. MNDWI2 algorithm showed the best accuracy for Lake Sevan shoreline delineation.
... The systematic and regular provision of Analysis-Ready Data (ARD) can significantly reduce the burden of EO data usage. To be considered as ARD, data should be processed according to a minimum set of requirements (e.g., radiometric and geometric calibration; atmospheric correction; metadata description) and organized in a way that allows immediate analysis without additional effort [124,125]. In Switzerland, more than 37 years of satellite EO Analysis-Ready Data over Switzerland are made available by the Swiss Data Cube [126,127]. ...
Article
Full-text available
High spatial and thematic resolution of Land Use/Cover (LU/LC) maps are central for accurate watershed analyses, improved species, and habitat distribution modeling as well as ecosystem services assessment, robust assessments of LU/LC changes, and calculation of indices. Downscaled LU/LC maps for Switzerland were obtained for three time periods by blending two inputs: the Swiss topographic base map at a 1:25,000 scale and the national LU/LC statistics obtained from aerial photointerpretation on a 100 m regular lattice of points. The spatial resolution of the resulting LU/LC map was improved by a factor of 16 to reach a resolution of 25 m, while the thematic resolution was increased from 29 (in the base map) to 62 land use categories. The method combines a simple inverse distance spatial weighting of 36 nearest neighbors' information and an expert system of correspondence between input base map categories and possible output LU/LC types. The developed algorithm, written in Python, reads and writes gridded layers of more than 64 million pixels. Given the size of the analyzed area, a High-Performance Computing (HPC) cluster was used to parallelize the data and the analysis and to obtain results more efficiently. The method presented in this study is a generalizable approach that can be used to downscale different types of geographic information.
Article
Full-text available
Aiming at the convergence between Earth observation (EO) Big Data and Artificial General Intelligence (AGI), this two-part paper identifies an innovative, but realistic EO optical sensory image-derived semantics-enriched Analysis Ready Data (ARD) product-pair and process gold standard as linchpin for success of a new notion of Space Economy 4.0. To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers, it is regarded as necessary-but-not-sufficient “horizontal” (enabling) precondition for: (I) Transforming existing EO big raster-based data cubes at the midstream segment, typically affected by the so-called data-rich information-poor syndrome, into a new generation of semantics-enabled EO big raster-based numerical data and vector-based categorical (symbolic, semi-symbolic or subsymbolic) information cube management systems, eligible for semantic content-based image retrieval and semantics-enabled information/knowledge discovery. (II) Boosting the downstream segment in the development of an ever-increasing ensemble of “vertical” (deep and narrow, user-specific and domain-dependent) value–adding information products and services, suitable for a potentially huge worldwide market of institutional and private end-users of space technology. For the sake of readability, this paper consists of two parts. In the present Part 1, first, background notions in the remote sensing metascience domain are critically revised for harmonization across the multi-disciplinary domain of cognitive science. In short, keyword “information” is disambiguated into the two complementary notions of quantitative/unequivocal information-as-thing and qualitative/equivocal/inherently ill-posed information-as-data-interpretation. Moreover, buzzword “artificial intelligence” is disambiguated into the two better-constrained notions of Artificial Narrow Intelligence as part-without-inheritance-of AGI. Second, based on a better-defined and better-understood vocabulary of multidisciplinary terms, existing EO optical sensory image-derived Level 2/ARD products and processes are investigated at the Marr five levels of understanding of an information processing system. To overcome their drawbacks, an innovative, but realistic EO optical sensory image-derived semantics-enriched ARD product-pair and process gold standard is proposed in the subsequent Part 2.
Article
Data management and analysis are challenging with big Earth observation (EO) data. Expanding upon the rising promises of data cubes for analysis-ready big EO data, we propose a new geospatial infrastructure layered over a data cube to facilitate big EO data management and analysis. Compared to previous work on data cubes, the proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data. GeoCube is developed in terms of three major efforts: formalize cube dimensions for multi-source geospatial data, process geospatial data query along these dimensions, and organize cube data for high-performance geoprocessing. This strategy improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for EO data cube processing. The paper highlights the major efforts and key research contributions to online analytical processing for dimension formalization, distributed cube objects for tiles, and artificial intelligence enabled prediction of computational intensity for data cube processing. Case studies with data from Landsat, Gaofen, and OpenStreetMap demonstrate the capabilities and applicability of the proposed infrastructure.
Conference Paper
Full-text available
The JRC Earth Observation Data and Processing Platform (JEODPP) is a versatile petabyte-scale platform that serves the needs of a wide variety of projects. This is achieved by providing a cluster environment for batch processing, a webbased remote desktop access with a variety of software suites, and a web-based interactive visualisation and analysis ecosystem. These three layers are complementary and are all relying on a common hardware layer where the data is co-located with the processing services. The versatility of the platform is illustrated by a series of applications running on the JEODPP.
Article
Full-text available
Google Earth Engine is a cloud-based platform for planetary-scale geospatial analysis that brings Google's massive computational capabilities to bear on a variety of high-impact societal issues including deforestation, drought, disaster, disease, food security, water management, climate monitoring and environmental protection. It is unique in the field as an integrated platform designed to empower not only traditional remote sensing scientists, but also a much wider audience that lacks the technical capacity needed to utilize traditional supercomputers or large-scale commodity cloud computing resources.
Article
Full-text available
The Australian Geoscience Data Cube (AGDC) aims to realise the full potential of Earth observation data holdings by addressing the Big Data challenges of volume, velocity, and variety that otherwise limit the usefulness of Earth observation data. There have been several iterations and AGDC version 2 is a major advance on previous work. The foundations and core components of the AGDC are: (1) data preparation, including geometric and radiometric corrections to Earth observation data to produce standardised surface reflectance measurements that support time-series analysis, and collection management systems which track the provenance of each Data Cube product and formalise re-processing decisions; (2) the software environment used to manage and interact with the data; and (3) the supporting high performance computing environment provided by the Australian National Computational Infrastructure (NCI). A growing number of examples demonstrate that our data cube approach allows analysts to extract rich new information from Earth observation time series, including through new methods that draw on the full spatial and temporal coverage of the Earth observation archives. To enable easy-uptake of the AGDC, and to facilitate future cooperative development, our code is developed under an open-source, Apache License, Version 2.0. This open-source approach is enabling other organisations, including the Committee on Earth Observing Satellites (CEOS), to explore the use of similar data cubes in developing countries.
Conference Paper
Full-text available
Spatial multidimensional databases (also called "spatial datacubes") are the cornerstone of the emerging Spatial On-Line Analytical Processing technology (SOLAP). They are aimed at supporting Geographic Knowledge Discovery (GKD) as well as certain types of spatial decision-making. Although these technologies seem promising at first glance, they may provide unreliable results if one does not consider the quality of spatio-temporal data. In traditional spatial databases, spatial integrity constraints have been employed to improve internal quality of spatial data. However, spatial datacubes require additional integrity constraints in comparison to the traditional databases found into transactional GIS systems. These extra constraints concern the supplementary information included in these datacubes, such as spatial dimensions and hierarchies, aggregated data, multidimensional cross-tabulation of data, and the existence of a temporal dimension with several levels of granularity. This paper presents the characteristics of spatial datacubes that differentiate them from transactional spatial databases from a spatial integrity constraint perspective. Based on these characteristics, we propose fundamental considerations for the classification of these integrity constraints and for the use of integrity constraint specification languages tailored for geospatial datacubes. Finally, the paper concludes and addresses some questions that contribute to a research agenda for the definition of spatial integrity constraints in spatial datacubes.
Conference Paper
Full-text available
Storage management of multidimensional arrays aims at supporting the array model needed by applications and insuring fast execution of access operations. Current approaches to store multidimensional arrays rely on partitioning data into chunks (equally sized subarrays). Regular partitioning however, does not adapt to access patterns, leading to suboptimal access performance. We propose a storage approach for multidimensional discrete data (MDD) based on multidimensional arbitrary tiling. Tiling is arbitrary in that any partitioning into disjoint multidimensional intervals as well as incomplete coverage of n-D space and gradual growth of MDDs are supported. The proposed approach allows the storage structure to be configured according to user access patterns through tunable tiling strategies. We describe four strategies and respective tiling algorithms and present performance measurements which show their effectiveness in reducing disk access and post-processing times for range queries
  • A Robin
Robin, A. (Ed.), 2011, SWE CDM Encoding Standard, OGC, http://www.opengeospatial.org/standards/swecommon
The Datacube Manifesto
  • P Baumann
Baumann P., 2017, "The Datacube Manifesto", http://www.earthserver.eu/tech/datacube-manifesto
Coverage Implementation Schema
  • P Baumann
  • E Hirschorn
  • J Maso
Baumann, P., Hirschorn, E., Maso, J., 2017, Coverage Implementation Schema, version 1.1, OGC, https://portal.opengeospatial.org/files/?artifact_id=48553
Big Data infrastructures for processing Sentinel data
  • W Wagner
Wagner, W., 2015, Big Data infrastructures for processing Sentinel data, in Photogrammetric Week 2015, Dieter Fritsch (Ed.), Wichmann/VDE, Berlin Offenbach, 93-104
The JRC Earth Observation Data and Processing Platform, Big Data form Space BiDS'17, this issueeu/news/upcoming-copernicus-data-andinformation-access-services-dias
  • P Soille
Soille, P., et al, 2017, The JRC Earth Observation Data and Processing Platform, Big Data form Space BiDS'17, this issue [11] http://copernicus.eu/news/upcoming-copernicus-data-andinformation-access-services-dias [12] Schick, M., 2017, EUMETSAT, ECMWF & MERCATOR OCÉAN partners DIAS [13] Baumann, P., 2009, Web Coverage Processing Service (WCPS) Language Interface Standard, OGC, http://www.opengeospatial.org/standards/wcps [14] Baumann, P., 2012, "OGC Web Coverage Service (WCS) Core", OGC, https://portal.opengeospatial.org/files/09-110r4