ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Different kinds of observations feature different strengths, e.g. visible-infrared imagery for clouds and radar for precipitation, and, when integrated, better constrain scientific models and hypotheses. Even critical, fundamental operations such as cross-calibrations of related sensors operating on different platforms or orbits, e.g. spacecraft and aircraft, are integrative analyses. The great variety of Earth Science data types and the spatiotemporal irregularity of important low-level (ungridded) data has so far made their integration a customized, tedious process which scales in neither variety nor volume. Generic, higher-level (gridded) data products are easier to use, at the cost of being farther from the original observations and having to settle with grids, interpolation assumptions, and uncertainties that limit their applicability. The root cause of the difficulty in scalably bringing together diverse data is the current rectilinear geo-partitioning of Earth Science data into conventional arrays indexed using consecutive integer indices and then packaged into files. Such indices suffice for archival, search, and retrieval, but lack a common geospatial semantics, which is mitigated by adding on floating-point encoded longitude-latitude information for registration. An alternative to floating-point, the SpatioTemporal Adaptive Resolution Encoding (STARE) provides an integer encoding for geo-spatiotemporal location and neighborhood that transcends the use of files and native array indexing, allowing diverse data to be organized on scalable, distributed computing and storage platforms.
This content is subject to copyright. Terms and conditions apply.
RESEARCH ARTICLE
STARE into the future of GeoData integrative analysis
Michael L. Rilee
1,2
&Kwo-Sen Kuo
1,3
&James Frew
4
&James Gallagher
5
&Niklas Griessbaum
4
&Kodi Neumiller
5
&
Robert E. Wolfe
1
Received: 25 February 2020 /Accepted: 4 January 2021
#The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021
Abstract
Different kinds of observations feature different strengths, e.g.visible-infrared imagery for clouds and radar for precipitation, and,
when integrated, better constrain scientific models and hypotheses. Even critical, fundamental operations such as cross-
calibrations of related sensors operating on different platforms or orbits, e.g. spacecraft and aircraft, are integrative analyses.
The great variety of Earth Science data types and the spatiotemporal irregularity of important low-level (ungridded) data has so
far made their integration a customized, tedious process which scales in neither variety nor volume. Generic, higher-level
(gridded) data products are easier to use, at the cost of being farther from the original observations and having to settle with
grids, interpolation assumptions, and uncertainties that limit their applicability. The root cause of the difficulty in scalably
bringing together diverse data is the current rectilinear geo-partitioning of Earth Science data into conventional arrays indexed
using consecutive integer indices and then packaged into files. Such indices suffice for archival, search, and retrieval, but lack a
common geospatial semantics, which is mitigated by adding on floating-point encoded longitude-latitude information for
registration. An alternative to floating-point, the SpatioTemporal Adaptive Resolution Encoding (STARE) provides an integer
encoding for geo-spatiotemporal location and neighborhood that transcends the use of files and native array indexing, allowing
diverse data to be organized on scalable, distributed computing and storage platforms.
Keywords STARE .Big data .Geolocation .DGGS .Data fusion .Integration
Introduction
The objective of Earth Science (ES) is to provide descriptions
and explanations and, if possible, predictions of phenomena on
the earth and earth-like planets.(Kleinhans et al. 2010).
Presently, prediction performance of ES numerical models is
perhaps the ultimate evaluator for the effectiveness of our un-
derstanding. As a natural system, however, Earth is a nonlinear,
complex, and open system replete with emergent phenomena
that display novel properties and are qualitatively different from
the properties from which they emerge. For example, the fractal
appearance of many natural conditions, emerging from nonlin-
ear interactions governed by basic laws of physics and chemis-
try, is now a popular example of emergent phenomena.
Emergence is characteristic of the breakdown of aggregativity,
a state in which the whole is nothing more than the sum of its
parts(Wimsatt 1997; Humphreys 2008).
The nonlinearity, complexity, and openness of the Earth
System lead to the problem of non-uniqueness and
underdetermination in our attempt to simulate it. Non-
uniqueness is manifested in that more than one model imple-
mentation (and thus more than one embodiment of conceptu-
alized hypotheses) may produce output matching a given set
of observations (Konikow and Bredehoeft 1992), whereas
underdetermination,equivalently but from a different perspec-
tive, states that the evidence available at a given time is insuf-
ficient to determine what belief we should hold in response to
it (Stanford 2017). In a nutshell, non-uniqueness and
underdetermination testify to the fact that multiple hypotheses
may be supported by the available evidence. Thus, non-
uniqueness and underdetermination make verifying and
validating (Lee et al. 2016), inthe strong sense of these words,
*Michael L. Rilee
mike@rilee.net
1
NASA Goddard Space Flight Center, Greenbelt, MD, USA
2
Rilee Systems Technologies LLC, Derwood, MD, USA
3
ESSIC, University of Maryland, College Park, MD, USA
4
Bren School of Environmental Science & Management, University
of California, Santa Barbara, CA, USA
5
OPeNDAP, Inc., Narragansett, RI, USA
https://doi.org/10.1007/s12145-021-00568-8
/ Published online: 29 January 2021
Earth Science Informatics (2022) 15:1495–1512
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Thus, geolocated data can be encoded to a ~7cm precision with an approximate associated neighborhood, while notably avoiding the polar quirks of conventional floating-point regular longitude-latitude (lon-lat) mesh. Please see [1] for details. It is apparent that STARE HTM 64-bit integer indices actually index a large set of solid angles with their approximate steradians (sizes of the solid angles) incorporated in the lowest 5 bits, while two angular variables, e.g., longitude and latitude in geolocation, only specify a direction without any notion of the neighborhood, an essential property for scientific analyses of Earth data. ...
... The temporal encoding is similar to the spatial encoding, but is complicated by the fact that calendrical partitions, i.e., years, months, days, etc., are neither regular subdivisions nor constant throughout time. For example, months don't have the same number of days, and leap-years (and seconds!) add to the complexity [1]. The range of temporal intervals of interest stretches from instrument-driven milliseconds, to spacecraft-driven orbital periods, to geophysical periods (e.g., days or seasons), or even climatological time periods seen in large-scale observational studies or model simulations. ...
... Once a time, forward and backward resolutions, and an HCP scale/type label are encoded into a 64-bit number, the idea is to perform as many operations as possible using integer-and bit-level operations, e.g., temporal interval overlaps or matching. See [1] for more information about STARE-HCP and translation to and from common time encodings. ...
Research
Full-text available
Abstract. We have designed the SpatioTemporal Adaptive-Resolution Encoding , STARE, specifically to harmonize geo-spatiotemporal data for Big-Data scalability, especially on distributed compute-storage resources. In order to achieve such harmonization, the notion of extent, or neighborhood, for a data point, an attribute of Earth data crucial for scientific analyses, must be incorporated and designed into the encoding. Thus, each tuple of spatial and temporal STARE-encoded indices uniquely indexes a spatiotemporal volume element (voxel) of a designated spatiotemporal extent. An aggregated set of these voxels, possibly of varying spatiotemporal extents and from diverse data, can tesselate and specify an entity spanning an irregular spatiotemporal volume, much like a set of triangles of varying sizes can tessellate and specify an irregular planar region. Such a spatiotemporal tessellation with STARE indices forms a basis for capturing individual episodes of geophysical phenomena, motivating a moving object database for geophysical phenomenon episodes and enabling scalable, comprehensive, and more reproducible phenomenon-based investigations. Particularly , we use Extratropical Cyclone and Blizzard as example phenomena to demonstrate the phenomenon hierarchy concept.
Conference Paper
Full-text available
Scaling up volume and variety in Big Earth Science Data is particularly difficult when combining low-level, ungridded data, such as swath observations obtained with, for example, Moderate Resolution Imaging Spectroradiometers (MODIS). A unified way to index and combine data with different geo-spatiotemporal layouts and incomparable native array formatting is required for scalable integrative analyses based on data at its full instrument resolution, that is, without extra interpolation (or extrapolation) onto a common grid. The SpatioTemporal Adaptive Resolution Encoding (STARE) uses the Hierarchical Triangular Mesh (HTM) and the Hierarchical Calendrical Partitioning (HCP), recursive partitionings of solid angle and time into tree data structures, to encode spatiotemporal neighborhoods as sets of integers. Regions sharing common paths through the STARE tree hierarchy have similar index values, which can then serve as keys in algorithms and data structures supporting scalable integrative analyses. Thus, STARE co-aligns data in both physical (spatiotemporal) and cyber (memory) spaces, providing a means for marshalling computing resources and conducting analysis with minimum data movement, addressing volume scalability while simultaneously unifying diverse data for variety scaling. In this paper, we demonstrate how easy it is to use the Python STARE API (PySTARE) and the parallel programming platform Dask to integrate MODIS and Geostationary Operational Environmental Satellite (GOES) data, datasets with very different geo-spatiotemporal characteristics.
Conference Paper
Full-text available
A Big Earth Data platform has been constructed based on a parallel distributed database management system, SciDB, to demonstrate visual analytics with interactive animation on diverse datasets.This high-performing capability is achieved by exploiting transparent multimodal parallelization, largely enabled by a unifying indexing scheme, STARE, that provides unparalleled variety scaling. Such a platform not only supports effortless interactive data exploration and analysis but also has the potential to systemize machine learning undertakings with diverse and voluminous Earth Science data.
Conference Paper
Full-text available
We have devised and implemented a key technology, SpatioTemporal Adaptive-Resolution Encoding (STARE), in an array database management system, i.e. SciDB, to achieve unparalleled variety scaling for Big Earth Data, enabling rapid-response visual analytics. STARE not only serves as a unifying data representation homogenizing diverse varieties of Earth Science Datasets, but also supports spatiotemporal data placement alignment of these datasets to optimize a major class of Earth Science data analyses, i.e. those requiring spatiotemporal coincidence. Using STARE, we tailor a data partitioning and distribution strategy for the data access patterns of our scientific analysis, leading to optimal use of distributed resources. With STARE, rapid-response visual analytics are made possible through a high-level query interface, allowing geoscientists to perform data exploration visually, intuitively and interactively. We envision a system based on these innovations to relieve geoscientists of most laborious data management chores so that they may focus better on scientific issues and investigations. A significant boost in scientific productivity may thus be expected. We demonstrate these advantages with a prototypical system including comparisons to alternatives.
Conference Paper
Full-text available
As a universal geoscience data representation, the Spatio-Temporal Adaptive-Resolution Encoding, STARE, is bringing about unprecedented interoperability to all Earth Science data. In its spatial component, STARE contracts the usual two-dimensional, i.e. latitude and longitude, geolocation into a one-dimensional, hierarchical index. The STARE geolocation index follows the quadfurcation scheme of the well-established hierarchical triangular mesh (HTM) used in astronomy but with an innovative bit-field arrangement that includes approximate data resolution information to enable efficient geospatial set operations. STARE's temporal component is also hierarchical with bit fields referring to conventional date-time intervals or units. STARE is designed for geo-spatiotemporal data placement alignment in databases (e.g. SciDB) but also supports more traditional contexts via a STARE application programming interface (API). Index Terms-interoperability, interdisciplinary analysis , universal data representation, geo-spatiotemporal indexing , data placement alignment, array database
Conference Paper
Full-text available
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for Earth Science (ES) data analysis, however, is being able to achieve good scalability with variety. With HTM unifying at least the three popular data models, i.e. Grid, Swath, and Point, used by current ES data products, data preparation time for integrative analysis of diverse datasets can be drastically reduced and better variety scaling can be achieved. HTM is also an indexing scheme, and when applied to all ES datasets, data placement alignment (or co-location) on the shared nothing architecture, which most Big Data systems are based on, is guaranteed and better performance is ensured. With HTM most geospatial set operations become integer interval operations with further performance advantages.
Article
The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), is the latest atmospheric reanalysis of the modern satellite era produced by NASA's Global Modeling and Assimilation Office (GMAO). MERRA-2 assimilates observation types not available to its predecessor, MERRA, and includes updates to the Goddard Earth Observing System (GEOS) model and analysis scheme so as to provide a viable ongoing climate analysis beyond MERRA's terminus. While addressing known limitations of MERRA, MERRA-2 is also intended to be a development milestone for a future integrated Earth system analysis (IESA) currently under development at GMAO. This paper provides an overview of the MERRA-2 system and various performance metrics. Among the advances in MERRA-2 relevant to IESA are the assimilation of aerosol observations, several improvements to the representation of the stratosphere including ozone, and improved representations of cryospheric processes. Other improvements in the quality of MERRA-2 compared with MERRA include the reduction of some spurious trends and jumps related to changes in the observing system and reduced biases and imbalances in aspects of the water cycle. Remaining deficiencies are also identified. Production of MERRA-2 began in June 2014 in four processing streams and converged to a single near-real-time stream in mid-2015. MERRA-2 products are accessible online through the NASA Goddard Earth Sciences Data Information Services Center (GES DISC).