About
124
Publications
11,674
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
917
Citations
Citations since 2017
Introduction
Additional affiliations
July 1997 - October 2000
Publications
Publications (124)
Key Points
Fenni et al. (2021, https://doi.org/10.1029/2020jd034172) main outcome is the enhancement in accuracy and computational cost of MIDAS with use of higher‐order quadrature schemes
Major conclusions of Fenni et al. (2021, https://doi.org/10.1029/2020jd034172) are validated on a large set of realistic arbitrarily‐shaped ice aggregates
An outstanding challenge in modeling the radiative properties of stratiform rain systems is an accurate representation of the mixed-phase hydrometeors present in the melting layer. The use of ice spheres coated with meltwater or mixed-dielectric spheroids have been used as rough approximations, but more realistic shapes are needed to improve the ac...
Different kinds of observations feature different strengths, e.g. visible-infrared imagery for clouds and radar for precipitation, and, when integrated, better constrain scientific models and hypotheses. Even critical, fundamental operations such as cross-calibrations of related sensors operating on different platforms or orbits, e.g. spacecraft an...
In the present paper, we investigate the optimal
quadrature to retrieve the full-direction scattering behavior of a realistic arbitrarily-shaped ice particle, from a limited number of pre-calculated incident directions. Our motivation is to enable accurate recovery of single scattering properties of realistic hydrometeors at any incident direction...
Deep Neural Networks (DNNs) have performed admirably in classification tasks. However, the characterization of their classification uncertainties, required for certain applications, has been lacking. In this work, we investigate the issue by assessing DNNs’ ability to estimate conditional probabilities and propose a framework for systematic uncerta...
"SpatioTemporal Adaptive Resolution Encoding (STARE) is an integer encoding scheme for spatial and temporal co- ordinates and volumes. The spatial element of STARE en- codes the traversal through a recursive partitioning (quad- furcation) of spherical triangles (or trixels) on the unit sphere to index discrete solid angles. In addition, the index c...
We evaluate several high‐order quadrature schemes for accuracy and efficacy in obtaining orientation‐averaged single‐scattering properties (SSPs). We use the highly efficient MIDAS to perform electromagnetic scattering calculations to evaluate the gain in efficiency from these schemes. MIDAS is shown to be superior to DDSCAT, a popular discrete dip...
Abstract. We have designed the SpatioTemporal Adaptive-Resolution Encoding , STARE, specifically to harmonize geo-spatiotemporal data for Big-Data scalability, especially on distributed compute-storage resources. In order to achieve such harmonization, the notion of extent, or neighborhood, for a data point, an attribute of Earth data crucial for s...
Scaling up volume and variety in Big Earth Science Data is particularly difficult when combining low-level, ungridded data, such as swath observations obtained with, for example, Moderate Resolution Imaging Spectroradiometers (MODIS). A unified way to index and combine data with different geo-spatiotemporal layouts and incomparable native array for...
The only effective strategy to address the volume challenge of Big Data is “parallel processing”, e.g. employing a cluster of computers (nodes), in which a large volume of data is partitioned and distributed to the cluster nodes. Each of the cluster nodes processes a small portion of the whole volume. The nodes, working in tandem, can therefore col...
This article illustrates how multi-frequency radar observations can refine the mass-size parametrization of frozen hydrometeors in scattering models and improve the correlation between the radar observations and in situ measurements of microphysical properties of ice and snow. The data presented in this article were collected during the GCPEx (2012...
STARE-based hierarchical data packaging facilitates spatiotemporal colocation of data on computing nodes to realize optimal efficiency, by minimizing unnecessary cross-nodal communication when integratively analyzing diverse data with parallelization.
Accepting computer modeling as science often hinges on one's concept of believability. Complex models of complex things put a particular strain on believability, and in some cases, create intense and widespread discord. In many ways this problem is akin to that of a Turing test; scientists are asked to compare two very complicated entities (say cli...
The most common challenge to the systematic and routine comparisons of data from model outputs/analyses with NASA remote sensing data is perhaps that of homogenizing the different geometries involved, resulting from model grid systems employed and observational geometric characteristics associated with the instruments and/or satellite platforms. Th...
Accepting computer modeling as science often hinges on one's concept of believability. Complex models of complex things put a particular strain on believability, and in some cases, create intense and widespread discord. In many ways this problem is akin to that of a Turing test; scientists are asked to compare two very complicated entities (say cli...
A Big Earth Data platform has been constructed based on a parallel distributed database management system, SciDB, to demonstrate visual analytics with interactive animation on diverse datasets.This high-performing capability is achieved by exploiting transparent multimodal parallelization, largely enabled by a unifying indexing scheme, STARE, that...
The challenge of Big Data may be succinctly summarized as: Achieving optimal scalability on data volume and variety to obtain analysis results with desirable speed (velocity). "Interactivity" is perhaps the analysis speed most desired by science researchers. Such high performance is obviously unattainable without employing parallel processing. It i...
An accurate representation of the electromagnetic (EM) behavior of precipitation particles requires modeling of realistic complex geometry and a numerically efficient technique to calculate averaged scattering properties over multiple random target orientations. The discrete dipole approximation is commonly used to compute scattering and absorption...
Existing pathways for bringing together massive, diverse Earth Science datasets for integrated analyses burden end users with data packaging and management details irrelevant to their domain goals. The major data repositories focus on archival, discovery, and dissemination of products (files) in a standardized manner. Endusers must download and the...
The innovative SpatioTemporal Adaptive-Resolution Encoding, STARE, is consited of two components, geospatial and temporal. Its geospatial component encodes the geolocation (latitude and longitude) and the approximate spatial resolution of an Earth Science data point in a 64-bit integer, whereas its temporal component encodes the time along with the...
Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer fro...
Recent advances in database technology have led to systems optimized for managing petabyte-scale multidimensional arrays. These array databases are a good fit for subsets of the Earth's surface that can be projected into a rectangular coordinate system with acceptable geometric fidelity. However, for global analyses, array databases must address th...
Data preparation is the necessary but unproductive part of machine learning that takes disproportionate effort and time. We introduce the technologies that we have developed to simplify and accelerate data preparation and ease geoscience machine learning using large volumes and varieties of data.
We have devised and implemented a key technology,
SpatioTemporal Adaptive-Resolution Encoding (STARE), in
an array database management system, i.e. SciDB, to achieve
unparalleled variety scaling for Big Earth Data, enabling rapid-response
visual analytics. STARE not only serves as a unifying
data representation homogenizing diverse varieties of Ear...
As a universal geoscience data representation, the Spatio-Temporal Adaptive-Resolution Encoding, STARE, is bringing about unprecedented interoperability to all Earth Science data. In its spatial component, STARE contracts the usual two-dimensional, i.e. latitude and longitude, geolocation into a one-dimensional, hierarchical index.
The STARE geolo...
(Beginning of WHAT, WHEN, WHERE Summary Box:) What: The work-shop gathered almost 50 scientists from Europe and the United States to discuss the progress towards developing electromagnetic scattering databases for ice and snow particles in the microwave region, their applications, the physical approximations used to compute these scattering propert...
Among all the V’s of Big Data challenges, such as Volume, Variety, Velocity, Veracity, etc., we believe Value is the ultimate determinant, because a system delivering better value has a competitive edge over others. Although it is not straightforward to assess the value of scientific endeavors, we believe the ratio of scientific productivity increa...
There is little doubt that machine learning techniques possess the potential to timely extract information, knowledge, and even prediction from seas of Earth Science (ES) data and help in answering some of the pressing questions regarding the intricate systems of our planet. However, it is reported that data preparation, necessitated by subsetting...
Integrating Earth Science data from diverse sources such as satellite imagery and simulation output can be expensive and time-consuming, limiting scientific inquiry and the quality of our analyses. Reducing these costs will improve innovation and quality in science. The current Earth Science data infrastructure focuses on downloading data based on...
A superbly optimized method has been developed for a Big Data technology, whereby events of Earth Science phenomena can be identified and tracked with unprecedented efficiency. An event in this context is an episode of a phenomenon evolving in both space and time and not just a snapshot in time. This innovative method is applied to 36 years of both...
While Big Data technologies are transforming our ability to analyze ever larger volumes of Earth science data, practical constraints continue to limit our ability to compare data across datasets from different sources in an efficient and robust manner. Within a single data collection, invariants such as file format, grid type, and spatial resolutio...
We investigate the impact of data placement on two Big Data technologies, Spark and SciDB, with a use case from Earth Science where data arrays are multidimensional. Simultaneously, this investigation provides an opportunity to evaluate the performance of the technologies involved. Two datastores, HDFS and Cassandra, are used with Spark for our com...
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in...
We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficient...
Snowstorm climatology derived from NASA MERRA reanalysis as an example for event-based virtual collections. * Used MERRA-2 data; * Snowstorms over land (no over land-ice) are identified and tracked.
Feature extraction and tracking is a fundamental operation used in many geoscience applications. In this paper, we present a scalable method for computing and tracking features on distributed memory machines for large-scale geospatial data. We carefully apply new communication schemes to minimize the data exchanged among the computing nodes in buil...
A 3D growth model is used to simulate pristine ice crystals, which are aggregated using a collection algorithm to create larger, multicrystal particles. The simulated crystals and aggregates have mass-versus-size and fractal properties that are consistent with field observations. The growth/collection model is used to generate a large database of s...
30+ years of snowstorm climatology
In this study, two different particle models describing the structure and electromagnetic properties of snow are developed and evaluated for potential use in satellite combined radar-radiometer precipitation estimation algorithms. In the first model, snow particles are assumed to be homogeneous ice-air spheres with single-scattering properties deri...
To measure precipitation from space requires an accurate estimation of the collective scattering properties of particles suspended in a precipitating column. It is well known that the complicated and typically unknowable shapes of the solid precipitation particles cause much uncertainty in the retrievals involving such particles. This remote-sensin...
The performance and ease of extensibility for two Big-Data technol-ogies, SciDB and Hadoop/MapReduce (HD/MR), are evaluated on identical hardware for an Earth science use case of locating intersec-tions between two NASA remote sensing satellites' ground tracks. SciDB is found to be 1.5 to 2.5 times faster than HD/MR. The per-formance of HD/MR appro...
http://www.earthzine.org/2014/04/04/earth-science-data-analysis-in-the-era-of-big-data/
A generalized algorithm implementation is applied to scientific data sets for establishing events, such as tornadoes, both spatially and temporally.
Data-intensive science is a scientific discovery process that is driven by knowledge extracted from large volumes of data rather than the traditional hypothesis driven discovery process. One of the key challenges in data-intensive science is development of enabling technologies to allow researchers to effectively utilize these large volumes of data...
Precipitation events (white boxes) and tornadic events (red boxes) for 24 March 2010
We have constructed ~10,000 realistic snow particles, including both pristine and aggregate types, with maximum diameter spanning ~100 micron to 15 mm. The scattering property for each of these particles has subsequently been obtained using the open source discrete-dipole-approximation (DDA) code, DDSCAT, at thirteen (13) microwave frequencies rang...
We have modified the publicly available electromagnetic scattering code called DDSCAT to handle bigger targets with shorter execution times than the original code. A big tar- get is one whose sphere equivalent radius is large compared to the incident wavelength. DDSCAT uses the discrete-dipole approximation and an accurate result requires the spaci...
Approximately 7000 snow aggregate particles have been synthesized, using a heuristic aggregation algorithm, from 9 realistic snowflake habits simulated using the now famous Snowfake ice crystal growth model. These particles exhibit mass-dimension relations consistent with those derived from observations. In addition, ranging from 0.1 to 3.5 mm in l...
The past decade has seen many advancements in data management of remote sensing data, with much of the data available online. However, while the data access problem is largely (though not completely) solved, remote sensing data is still sometimes difficult to find, in part because of the myriad data sources. Data can also be difficult to use, espec...
The NASA Earth Observing System Simulators Suite (NEOS3) is a
modular framework of forward simulations tools for remote sensing of
Earth's Atmosphere from space. It was initiated as the Instrument
Simulator Suite for Atmospheric Remote Sensing (ISSARS) under the NASA
Advanced Information Systems Technology (AIST) program of the Earth
Science Techno...
Our AES is an ideal example of a new generation of scientific analysis tools that are empowered by the rapid growth of facilities tailored for data-intensive computing. AES will greatly reduce the effort on the part of investigators to systematically search for interesting correlations and test hypotheses while also freeing researchers from the bur...
The great majority of Earth Science events are studied using ”snap-shot” observations in time, mainly due to the scarcity of observations with dense temporal coverage and the lack of robust methods amenable to connecting the ”snap shots”. To enable the studies of these events in the four-dimensional (4D) spatiotemporal space and to demonstrate the...
Earth science information resources include data, tools, analysis workflows, results and articles as well as contextual knowledge about each one of these resources. While these information resources are shared piecemeal within the community, the full collaboration potential still eludes us. For instance, it is difficult to share the full informatio...
In the commercial software industry, unit testing frameworks have emerged as a disruptive technology that has permanently altered the process by which software is developed. Unit testing frameworks significantly reduce traditional barriers, both practical and psychological, to creating and executing tests that verify software implementations. A new...
This study improves upon an earlier, preliminary study using only three size bins based on maximum diameter in which it is found that the single-scattering properties of ensembles of non-spherical precipitation particles can be better characterized by considering the non-convexity of these particles. The difficulty of retrievals involving non-spher...
Forward simulation is an indispensable tool for evaluation of precipitation retrieval algorithms as well as for studying snow/ice microphysics and their radiative properties. The main challenge of the implementation arises due to the size of the problem domain. To overcome this hurdle, assumptions need to be made to simplify complex cloud microphys...
The original objective motivating the creation of the CloudSat+TRMM intersect products (by E.A. Smith, K.-S. Kuo et al) was to provide new opportunities in research related to precipitating clouds. The data products consist of near-coincident CloudSat Cloud Profiling Radar calibrated 94-GHz reflectivity factors and detection flag, sampled every 240...
In August 2006, the NASA African Monsoon Multidisciplinary Analyses (NAMMA) field
campaign was launched, providing great opportunities to characterize the frequency of African Easterly
Waves (AEWs) and evolution of their structure over continental western Africa. In this study,
extended-range (30day) high-resolution simulations with the NASA global...