ArticlePDF Available

An Overview of Big Data Applications in Water Resources Engineering

Authors:

Abstract

One of the emerging challenges in the 21 th century era is collecting and handling 'Big Data'. The definition of big data changes from one area to other over time. Big data as its name implies is unstructured data that is very big, fast, hard and comes in many forms. Though the applications of big data was confined to information technology before 21 st technology, now it is of emerging area in almost all engineering specializations. But for water managers/engineers, big data is showing big promise in many water related applications such as planning optimum water systems, detecting ecosystem changes through big remote sensing and geographical information system, forecasting/predicting/detecting natural and manmade calamities, scheduling irrigations, mitigating environmental pollution, studying climate change impacts etc. This study reviewed the basic information about big data, applications of big data in water resources engineering related studies, advantages and disadvantages of big data. Further, this study presented some of review of literature which has been done on big data applications in water resources engineering.
Machine Learning Research
2017; 2(1): 10-18
http://www.sciencepublishinggroup.com/j/mlr
doi: 10.11648/j.mlr.20170201.12
Review Article
An Overview of Big Data Applications in Water Resources
Engineering
Sirisha Adamala
Applied Engineering Department, Vignan’s Foundation for Science, Technology and Research University (VFSTRU), Vadlamudi, Andhra
Pradesh
Email address:
sirisha@agfe.iitkgp.ernet.in
To cite this article:
Sirisha Adamala. An Overview of Big Data Applications in Water Resources Engineering. Machine Learning Research.
Vol. 2, No. 1, 2017, pp. 10-18. doi: 10.11648/j.mlr.20170201.12
Received: January 15, 2016; Accepted: February 3, 2017; Published: March 1, 2017
Abstract:
One of the emerging challenges in the 21
th
century era is collecting and handling ‘Big Data’. The definition of big
data changes from one area to other over time. Big data as its name implies is unstructured data that is very big, fast, hard and
comes in many forms. Though the applications of big data was confined to information technology before 21
st
technology, now it
is of emerging area in almost all engineering specializations. But for water managers/engineers, big data is showing big promise
in many water related applications such as planning optimum water systems, detecting ecosystem changes through big remote
sensing and geographical information system, forecasting/predicting/detecting natural and manmade calamities, scheduling
irrigations, mitigating environmental pollution, studying climate change impacts etc. This study reviewed the basic information
about big data, applications of big data in water resources engineering related studies, advantages and disadvantages of big data.
Further, this study presented some of review of literature which has been done on big data applications in water resources
engineering.
Keywords:
Big Data, Petabytes, Gigabyte, Water Resources, Climate, Artificial Neural Network, Remote Sensing
1. Introduction
Big data is a term that is used to describe data that is high
volume, velocity, and variety; requires new technologies and
techniques to capture, store, and analyze it; and is used to
enhance decision making, provide insight and discovery, and
support and optimize processes. Science is driven by data. Big
data is growing rapidly, currently at a projected rate of 40%
growth in the amount of global data generated per year versus
only 5% growth in global Information Technology (IT)
spending [5]. Around 90% of the world’s digitized big data
was captured over just the past two years [15]. Big Data has
the potential to revolutionize not only research, but also
science. Imagine a system in which one has an access to a
huge database; where one collect every detailed measure of
every event that occurs on the globe. Big Data does not arise
out of a vacuum: it is recorded from some data generating
source: scientific experiments and simulations can easily
produce petabytes (PB) of data per day. Much of this data is of
no interest, and it can be filtered and compressed by orders of
magnitude. One of the main challenges is to define these
filters in such a way that they do not discard useful
information [32]. For example, suppose one sensor reading
differs substantially from the rest: it is likely to be due to the
sensor being faulty, but how can one be sure that it is not an
artifact that deserves attention? In addition, the data collected
by these sensors most often are spatially and temporally
correlated. Furthermore, one requires “on-line” analysis
techniques that can process such streaming data on the fly,
since one cannot afford to store first and reduce afterward.
The demand for food is expected to double by 2050 as the
world’s population heads toward 9 billion people. Lack of
water is a critical constraint to increasing food production,
particularly as droughts and other consequences of climate
change are making water scarcer. For water managers, big
data is showing big promise to plan water systems optimally,
to analyze climate change impact, to detect changes in
ecosystem through remote sensing, to predict natural and
Machine Learning Research 2017; 2(1): 10-18 11
manmade calamities, to schedule irrigation plans, to mitigate
environmental pollution, etc. Many have understood the
importance of big data analysis in the above mentioned areas
and they initiated their research. In fact, mobilizing Big Data
to improve global water and food security was the subject of
the 2014 Water for Food Global Conference hosted by the
Robert B. Daugherty Water for Food Institute at the University
of Nebraska in association with the Bill and Melinda Gates
Foundation being held October 19-22 in Seattle. Further, IBM
has started its research on big data applications for watershed
management by capturing meteorological, surface,
sub-surface and groundwater data, monitoring rain, snow, soil
moisture, water turbidity, flow rates, temperature, and
groundwater quality using different sensors. Therefore, the
study of big data in real-life water related applications helps in
preventing many natural and man-made disasters like floods,
droughts, overflow of rivers/streams containing toxic wastes,
siltation, erosion etc.
Greater data gathering and computing power is allowing
researchers to develop drought-resistant crop breeds, better
understand climate change and create models that help us to
understand risks and opportunities moving forward, among
other research goals. Climate change is a very real problem
facing our planet. The term “climate change” can cover a great
many things, some natural and some manmade, including
global warming and loss of wildlife habitat. Each of these
brings its own challenges but, increasingly, big data and
analytics are being put to use to come up with new solutions
and research methods. Climate change has been attracting a lot
of attention for a long time due to the adverse effects of it is
being felt everywhere. High spatial and temporal resolution
data, which is of big in nature is required to study about the
weather impacts in current and changing climate. Further,
uncertainty in the climate studies is addressing through multi
climate model ensemble as the data is big and models are
complex.
It is evident from the many studies that the change in
climate is happening in a very fast way. For example: Over the
past century, the global sea level has risen by about 10 to 25
cm, Arctic ice sheets were shrinking at about 13% a decade
(according to NASA), glaciers are melting, cities are
experiencing recurring floods and deforestation is on the rise,
and Earth’s average temperature has risen by 8.6 °F, and is
projected to rise another 0.5 to 8.6 °F over the next hundred
years [8]. To handle this issue, countries need a good action
plan, which should be made on the basis of accurate, real-time
or near real-time analytics. Big data and predictive analytics
can potentially provide accurate, real-time or near real-time
analytics. Given the rate at which climate is changing, one
need to respond fast. Big data and predictive analytics
technologies have enabled stakeholders to process huge
volumes of data fast and generate accurate insights. Sensors
are collecting data on various variables such as rain, soil, and
forest cover and helping establish correlations between
datasets. It is clear that big data and predictive analytics is, and
will be one of the most important tools researchers will be
using while they find ways to mitigate effects of climate
change. More frequent and intense weather events can be
predicted and managed with greater accuracy using big
climatic data.
Big Data analysis and modelling can reach even farmers in
poor, rural areas of developing countries through cell-phones,
providing access to weather forecasting and market
information to make better decisions and thereby helping
improve livelihoods as well as local water and food security.
Developing countries are taking on their own Big Data
projects to better plan for the future. Sri Lanka, for example,
recently began mapping many of its primary river basins to
develop a comprehensive flood and drought mitigation plan.
And ‘Kerala Water Authority’ uses IBM Big Data and
analytics technology for seamless water distribution in the city
of Thiruvananthapuram with more than 3.3 million
inhabitants.
Big data, if done responsibly, can deliver significant
benefits and efficiencies in water resources analysis, scientific
research, environment, and other specific areas. Archive of
big datasets collected during historical events that can be
shared among researchers would be helpful in enabling
quantitative analysis between different models i.e., the use of a
common dataset for evaluations and validations of models
would help researchers in developing models with improved
quality. Especially, in the water resources area, there should be
a well-established system of depositing collected data into a
public repository, and also of creating public databases for use
by other scientists/researchers. In fact, there should be an
entire discipline of water resources that is largely devoted to
the curation and analysis of such data.
Large amount of remote sensing data are now freely
available from the NASA Open Government Initiative. Only
one of NASA archives, the Earth Science Data and
Information System (ESDIS), holds 7.5 petabytes (PB) of data
with nearly 7,000 unique datasets and 1.5 million users in
2013 [31].
2. Review Studies on Big Data
Applications in Water Resources
As we know, there is a flood of big data available at present
era, but the applications of these big data are limited in many
areas. It needs some time to get advance in the area of
analyzing and applications of big data rather than collection
and storing. Very few limited studies are available in
reviewing the application of big data analysis in diverse areas
other than information technology, which are mentioned
below. [11] reported the various techniques to analyze big
datasets and they have addressed various big data application
in finance and business fields.
[2] performed both the biophysical and economic analysis
and generated 6800 fully big and detailed future climates per
emissions scenario. These climates were designed to represent
the range of possible climate outcomes for South Africa by
2050. [25] reviewed about what is Big Data? Which
technologies were used to build a Big Data infrastructure and
12 Sirisha Adamala: An Overview of Big Data Applications in Water Resources Engineering
how it can be useful in the development of nation with
prospect to improve decision-making in critical development
areas such as health care, employment, economic productivity,
crime and security, natural disaster and resource management?
[34] introduced the water resources and hydropower cloud
GIS platform based on development of service oriented big
data architecture. The developed platform managed the
various and massive data efficiently based on the construction
of big data framework of survey, design, construction,
environment, immigrant and equipment and supplies. [12]
addressed about some of the real world examples of big data
analytics and they differentiated how the big data analytics
could be different from most supervisory control and data
acquisition (SCADA) system.
[1] described the applicability of Big Data and analyzed
the Big Data process model, which consists of storage
system, handling process, and analysis mechanism. Further,
they suggested that in future, the interest in and use of big
data platforms will continue and expand and the applicable
area too will go beyond Information Technology (IT) and be
expanded to every possible sector. [24] introduced Big Data
platform for environmental sciences and water resources
management. This Platform was designed to provide
effective tools that allow water system managers to solve
complex water resources systems, water modeling issues and
help in decision making. The Platform brings a variety of
information technology tools including stochastic aspects,
high performance computing, simulation models, hydraulic
and hydrological models, grid computing, decision tools, Big
Data analysis system, communication and diffusion system,
database management, geographic information system (GIS)
and Knowledge based expert system. The operators’
objectives of this Big Data Open Platform were to solve and
discuss water resources problems that are featured by a huge
volume of collected, analyzed and visualized data, to analyze
the heterogeneity of data resulting from various sources
including structured, unstructured and semi-structured data,
also to prevent and/or avoid a catastrophic event related to
floods and/or droughts, through hydraulic infrastructures
designed for such purposes or strategic planning.
[16] analyzed the challenges and opportunities that big
data bring in the context of remote sensing applications.
Furthermore, they described the most challenging issues in
managing, processing, and efficient exploitation of big data
for remote sensing problems. They had demonstrated the two
case studies discussing the use of big data in the
aforementioned aspects. In the first test case, big data were
used to detect marine oil spills automatically using a large
archive of remote sensing and social media data together. In
the second test case, content-based information retrieval was
performed using high performance computing to extract
information from a large database of remote sensing images,
collected after the terrorist attack on the World Trade Center
in New York City on Sept. 11, 2001. [4] studied the
opportunities, challenges and benefits of incorporating big
data applications for smart cities. The results reveals that
several opportunities were available for utilizing big data in
smart cities; however, there were still many issues and
challenges to be addressed to achieve better utilization of this
technology. [10] demonstrated a full implementation of a
travel demand model utilizing mobile phone big data in the
form of calls, Global Position System (GPS) traces, or real
time traffic monitoring system as an input. They have
mapped the flows of people within cities on to transportation
infrastructure
[35] developed a learning algorithm named ‘Deep Big
Networks (DBN)’ from a big data set and applied it to large
water distribution system (WDS) containing a dozen tanks and
several pump stations. A total of 13 Artificial Neural
Networks (ANNs) have been trained, with each producing one
output, for representing the whole system operation. In
addition, before training the ANNs, sensitivity analysis was
performed to figure out what inputs are sensitive to the desired
output. Results suggest that DBN can eliminate the
requirement for sensitivity analysis and also avoid use of
multiple ANNs to represent the whole system. Further,
comparison results suggest that the DBN model outperformed
the conventional ANN and they are efficient and require less
computational units than the shallow ANNs. [19] reviewed the
180 articles related to the opportunities and threats of Big Data
Analytics for international development. The advent of Big
Data delivers a cost-effective prospect for improved
decision-making in critical development areas such as
healthcare, economic productivity and security.
[18] reviewed the significance of big data requirement in
modelling integrated urban hydrology, which comprises of
land-use change modeling, urban drainage modeling, rainfall–
runoff modeling, and urban water quality modeling. [13] used
the big data in refining the geospatial targeting of new
drought-tolerant (DT) maize varieties in Malawi,
Mozambique, Zambia, and Zimbabwe (southern Africa).
Results indicated that more than 1.0 million hectares (Mha) of
maize in the study countries was exposed to a seasonal
drought frequency exceeding 20% while an additional 1.6
Mha experience a drought occurrence of 10–20%. Spatial
modeling indicated that new DT varieties could give a yield
advantage of 5–40% over the commercial check variety across
drought environments while crop management and input costs
are kept equal. [21] summarize the strengths, weaknesses, and
opportunities associated with big water data. He developed a
tool named “Water Quality Risk Assessment Tool”. The tool is
a model for how agencies and private developers can create
value by identifying specific data sources, processing data
with analytics, and visualizing data to address a specific user
need. Furthermore, the tool shows how data value is increased
with the user can interact with the information in an intuitive,
functional interface.
[14] assessed the physical quantity and value of forest
ecosystem services using oxygen release, soil and water
conservation, carbon sequestration, and air purification as
input data in Anhui provinces, central eastern China from
2009 to 2014 using big data. Further, optimization of output
was done to identify the spatial and temporal heterogeneity
during the study period. [29] evaluated the efficiency of
Machine Learning Research 2017; 2(1): 10-18 13
China’s forest resources in terms of economic, social and
ecological indices by utilizing the big data, which was
collected from 31 inland provinces and municipalities of
China from 2005 to 2013. [30] proposed and tested a
theoretical framework to explain resilience in supply chain
networks for sustainability using unstructured Big Data,
based upon 36,422 items gathered in the form of tweets,
news, Facebook, WordPress, Instagram, Google+, and
YouTube, and structured data, via responses from 205
managers involved in disaster relief activities in the
aftermath of Nepal earthquake in 2015. The authors have
used Big Data analysis, followed by a survey, which was
analyzed using content analysis and confirmatory factor
analysis (CFA).
[9] addressed the big data challenges of climate science,
using a notion of Climate Analytics-as-a-Service (CAaaS),
which is enabled by Cloud Computing. A subset of
cloud-enabled CAaaS ‘Modern-Era Retrospective Analysis
for Research and Applications (MERRA)’ was developed,
which integrates observational data with numerical models to
produce a global temporally and spatially consistent synthesis
of 26 key climate variables. It represents a type of data product
that is of growing importance to scientists doing climate
change research and a wide range of decision support
applications. [20] documented the improved natural resource
management in supporting the societal development and
environmental sustainability of China’s ecosystem using big
data support. They have also proposed policies and
decision-support models to provide optimization of
sustainable technologies. [6] explored the factors, which
influences the ecological land change using relevant big data
sets, including spatial land data, soil data, DEM, climatic data,
and socio-economic data during the period of 2000–2005 in
China's Beijing–Tianjin–Hebei Region. The results showed
that the factors influencing different types of ecological land
change have substantial differences.
3. Theoretical Consideration
Big data is characterized by five features: Volume, Velocity,
Variety, Veracity, and Value defined as five “V” dimensions.
The term, ‘Volume’ means amount or quantity of data
that has been created from all the sources (eg. TeraBytes
(TB = 1024 GigaBytes), PetaBytes (PB = 1024 TB), and
ExaBytes (EB=1024 PB)).
‘Velocity’ represents the rate at which data is created,
stored, analyzed and processed. For example: the
velocity of big data in remote sensing involves not only
generation of data at a rapid growing rate, but also
efficiency of data processing and analysis. In other
words, the data should be analyzed in a (nearly) real or a
reasonable time to achieve a given task, e.g., seconds can
save hundreds or thousands of lives in an earthquake.
‘Variety’ indicates that there are various types of data,
and they could be classified to structured,
semi-structured, and unstructured data sets depending on
the sort of structure. For example, big remote sensing
data consist of multisource (laser, radar, optical, etc.),
multi-temporal (collected on different dates), and
multi-resolution (different spatial resolution) data, as
well as data from different disciplines depending on
several application domains.
‘Veracity’ can be described as the big noise in big data,
which refers to the accuracy and truthfulness of the
captured data and the meaningfulness of the results
generated from the data for certain problems.
‘Value’ refers to the possible advantage that a big data
can offer based on good data collection, management,
and analysis. Some more Vs of big data are ‘Volatility,
which refers to the retention policy of the structured data
implemented from different sources. Also there is
‘Validity’ that refers to the correctness, accuracy, and
validation of the data.
3.1. Big Data Analysis
Recently due to the increased use of internet, digital devices,
mobile phones, sensors, social networking sites, Satellite
images, etc., large amount of big data (more than petabytes) is
produced per minute or second. One can utilize these data for
decision making, trend finding, weather forecast, planning
optimum resources management, etc. For this purpose the big
data present in the world has to be stored and analyzed.
Big data is creating a new generation of decision support
data management. A key to deriving value from big data is the
use of analytics. Collecting and storing big data creates little
value; it is only data infrastructure at this point. The collected
big data should be analyzed to derive a value or decision
before drawing any results. Big data and analytics are
intertwined, but analytics is not new. Many analytic
techniques, such as regression analysis, simulation, and
machine learning, have been available for many years.
Analysis of Big Data can be helpful in improving the
decision-making in critical development areas such as,
environment, remote sensing, artificial intelligence, natural
disaster and resource management, etc. The challenge of big
data in water resources involves not only dealing with high
volumes of data. In particular, challenges on data acquisition,
storage, management and analysis are also related to water
resources problems involving big data. By analyzing this data,
organizations are able to learn trends about the data they are
measuring, as well as the people generating this data. The
hope for this big data analysis are to provide more customized
service and increased efficiencies in whatever industry the
data is collected from.
Analyzing big data helps in answering below questions:
What happened?
Why did it happen?
What will happen?
How can we make it happen?
The enormous volumes of data require automated or
semi-automated analysis techniques to detect patterns,
identify anomalies, and extract knowledge. Again, the “trick”
is in the software algorithms - new forms of computation,
combining statistical analysis, optimization, and artificial
14 Sirisha Adamala: An Overview of Big Data Applications in Water Resources Engineering
intelligence, are able to construct statistical models from
large collections of data and to infer how the system should
respond to new data.
Maintaining and analyzing a sheer size of the data is a
major challenge and finding right analytical tools and smart
systems to sit on top of the data and interpreting it in order to
assist management in decision-making processes is another
challenge. The analysis of Big Data involves multiple
distinct phases, each of which introduces challenges. Many
people unfortunately focus just on the analysis/modeling
phase: while that phase is crucial, it is of little use without the
other phases of the data analysis. Even in the analysis phase,
which has received much attention, there are poorly
understood complexities in the context of multi-tenanted
clusters where several users’ programs run concurrently.
Many significant challenges extend beyond the analysis
phase. For example, Big Data has to be managed in context,
which may be noisy, heterogeneous and not include an
upfront model. Doing so raises the need to track provenance
and to handle uncertainty and error: topics that are crucial to
success, and yet rarely mentioned in the same breath as Big
Data.
3.2. Big Data Analyzing Techniques
A/B testing or split testing or bucket testing: It is a
technique in which control group is compared with a variety of
test groups in order to determine what treatments (i.e., changes)
will improve a given objective variable, e.g., crop yield [3].
Association rule learning or Fuzzy learning: It is a
technique consisting variety of algorithms for discovering
interesting relationships, i.e., “association rules,” among
variables in largeig or b databases.
Classification: It consists some set of techniques to identify
the categories in which new data points belong, based on a
training set, which containing data points that have already
been categorized. These techniques have been divided as
supervised learning and unsupervised learning. Supervised
learning is a set of machine learning technique that infer a
function or relationship from a set of training data.
Unsupervised learning is a set of machine learning technique
that finds hidden structure in unlabeled data.
Cluster analysis: A statistical method for classifying
objects that splits a diverse group into smaller groups of
similar objects, whose characteristics of similarity are not
known in advance. This is a type of unsupervised learning
because training data are not used. This technique is in
contrast to supervised learning.
Crowd sourcing: A technique for collecting data submitted
by a large group of people or community (i.e., the “crowd”)
through an open call, usually through networked media such
as the Web.
Data fusion and data integration: A set of techniques that
integrate and analyze data from multiple sources in order to
develop insights in ways that are more efficient and
potentially more accurate than if they were developed by
analyzing a single source of data. Signal processing
techniques can be used to implement some types of data
fusion. One example of an application is sensor data from the
Internet of Things being combined to develop an integrated
perspective on the performance of a complex distributed
system such as an oil refinery.
Data mining: A set of techniques to extract patterns from
large datasets by combining methods from statistics and
machine learning with database management. These
techniques include association rule learning, cluster analysis,
classification, and regression [35].
Ensemble learning: Using multiple predictive models (each
developed using statistics and/or machine learning) to obtain
better predictive performance than could be obtained from any
of the constituent models. This is a type of supervised
learning.
Genetic algorithms: A technique used for optimization that
is inspired by the process of natural evolution or “survival of
the fittest.” In this technique, potential solutions are encoded
as “chromosomes” that can combine and mutate. These
individual chromosomes are selected for survival within a
modeled “environment” that determines the fitness or
performance of each individual in the population. Often
described as a type of “evolutionary algorithm,” these
algorithms are well-suited for solving nonlinear problems.
Machine learning: A subspecialty of computer science
(within a field historically called “artificial intelligence”)
concerned with the design and development of algorithms that
allow computers to evolve behaviors based on empirical data.
A major focus of machine learning research is to automatically
learn to recognize complex patterns and make intelligent
decisions based on data.
Natural language processing: A set of techniques from a
subspecialty of computer science (within a field historically
called “artificial intelligence”) and linguistics that uses
computer algorithms to analyze human (natural) language.
Neural networks: Computational models, inspired by the
structure and workings of biological neural networks (i.e., the
cells and connections within a brain), that find patterns in data.
Neural networks are well-suited for finding nonlinear patterns.
They can be used for pattern recognition and optimization.
Some neural network applications involve supervised learning
and others involve unsupervised learning.
Network analysis: A set of techniques used to characterize
relationships among discrete nodes in a graph or a network. In
social network analysis, connections between individuals in a
community or organization are analyzed, e.g., how
information travels, or who has the most influence over
whom.
Optimization: A portfolio of numerical techniques used to
redesign complex systems and processes to improve their
performance according to one or more objective measures
(e.g., cost, speed, or reliability). Genetic algorithms are an
example of an optimization technique.
Pattern recognition: A set of machine learning techniques
that assign some sort of output value (or label) to a given input
value (or instance) according to a specific algorithm.
Classification techniques are an example.
Predictive modeling: A set of techniques in which a
Machine Learning Research 2017; 2(1): 10-18 15
mathematical model is created or chosen to best predict the
probability of an outcome.
Regression: A set of statistical techniques to determine how
the value of the dependent variable changes when one or more
independent variables are modified. Often used for forecasting
or prediction.
Sentiment analysis: Application of natural language
processing and other analytic techniques to identify and
extract subjective information from source text material. Key
aspects of these analyses include identifying the feature,
aspect, or product about which a sentiment is being expressed,
and determining the type, “polarity” (i.e., positive, negative,
or neutral) and the degree and strength of the sentiment.
Signal processing: A set of techniques from electrical
engineering and applied mathematics originally developed to
analyze discrete and continuous signals, i.e., representations
of analog physical quantities (even if represented digitally)
such as radio signals, sounds, and images. This category
includes techniques from signal detection theory, which
quantifies the ability to discern between signal and noise.
Sample applications include modeling for time series analysis
or implementing data fusion to determine a more precise
reading by combining data from a set of less precise data
sources (i.e., extracting the signal from the noise).
Spatial analysis: A set of techniques, some applied from
statistics, which analyze the topological, geometric, or
geographic properties encoded in a data set. Often the data for
spatial analysis come from geographic information systems
(GIS) that capture data including location information, e.g.,
addresses or latitude/longitude coordinates.
Statistics: The science of the collection, organization, and
interpretation of data, including the design of surveys and
experiments. Statistical techniques are often used to make
judgments about what relationships between variables could
have occurred by chance (the “null hypothesis”), and what
relationships between variables likely result from some kind
of underlying causal relationship (i.e., that are “statistically
significant”). Statistical techniques are also used to reduce the
likelihood of Type I errors (“false positives”) and Type II
errors (“false negatives”). An example of an application is
A/B testing to determine what types of marketing material will
most increase revenue.
Simulation: Modeling the behavior of complex systems,
often used for forecasting, predicting and scenario planning.
Monte Carlo simulations, for example, are a class of
algorithms that rely on repeated random sampling, i.e.,
running thousands of simulations, each based on different
assumptions. The result is a histogram that gives a probability
distribution of outcomes.
Time series analysis: Set of techniques from both statistics
and signal processing for analyzing sequences of data points,
representing values at successive times, to extract meaningful
characteristics from the data.
Visualization: Techniques used for creating images,
diagrams, or animations to communicate, understand, and
improve the results of big data analyses.
3.3. Applications of Big Data
Nowadays Big Data are becoming a popular topic and a
comparatively new technological concept focused on many
different disciplines like agriculture, remote sensing, climate
change, water resources, environmental science, etc.
Everyone is in eager of research to find how the captured data
and analysis of it is transforming the management of our most
precious resource: water. A main goal in big data applications
is to identify the right data to solve the problems at hand,
which are difficult to be addressed or mostly cannot be
manipulated by traditional data. In general, big data are of no
value until they are utilized for applications. There are many
applications of big data, which are mentioned below.
Automated sensor and monitoring systems are providing
large amounts of real time flow data. For example,
automated sensors in irrigation systems are producing
various climate (temperature, radiation, wind speed, and
humidity), crop (crop height, plant density, leaf area
index etc.), and soil (moisture content, infiltration etc.)
data on the order of seconds, minutes, and other can be a
matter of hours. These data can be stored and analyzed to
understand and automate an irrigation water source to
either on or off. The data generated by sensors need to be
processed in real-time for immediate action. However,
the development and validation of models using
real-time data is a challenging task.
There is a tremendous amount of geospatial data that
created by cell phones, GPS, space borne or air borne
sensors and that can be used for many applications like to
find exact latitude and longitude of remote location,
natural hazard monitoring, global climate change, urban
planning, etc.
A Global Climate Model (GCM) needs to analyse a
variety of atmospheric and ocean models (big data) with
an integration of many heterogeneous big data sources
and software tools to achieve global weather prediction.
The major threat faced from nature in developing
countries is flood, drought, earthquakes or any other
natural disaster. Big Data can be proved fruitful in many
instances to tackle with the unpredicted nature.
Application which can give correct statistics about the
rainfall, humidity, correct measure about water resources,
snow fall etc. can be developed, which analyses the big
data gathered historically, from satellite images or
through digital medium.
Sensors, robotics and computational technology in terms
of big data can be used to track river and estuary
ecosystems, which help officials to monitor water quality
and supply through the movement of chemical
constituents and large volumes of underwater acoustic
big data that tracks the behavior of fish and marine
mammal species [26].
Big Data can be useful in finding new sources of natural
resources like coal, gold, iron ore etc. Sometimes, big
data applications will help the government to make
correct estimation for the use of natural resources,
16 Sirisha Adamala: An Overview of Big Data Applications in Water Resources Engineering
inform the coastal area people about high tides,
informing farmers about the rainfall, which crop to
cultivate, which species of animal or plant to save, which
natural resources to save etc.
The information processed by big data applications
should not be personal: data generated by sensors for
monitoring natural or atmospheric phenomena like the
weather or pollution should have a global access.
Crop assessment and yield forecasting [27, 17] requires
big data analysis.
Big data should be aggregated from diverse sources in a
huge volume and imported to a model which allows
decision-making in minutes rather than weeks or months.
This is a big challenge for PetaByte level or larger
volume of data inputs, for instance in applications related
with hazard monitoring.
Big data techniques have been successfully used for
different applications, such as agriculture (e.g., food
security monitoring, pasture monitoring), oceanic (e.g.,
ship detection, oil spill detection), Urban planning,
management, and sustainability [33, 23], human
settlements (both urban and rural), grape productivity
[22], food security monitoring, water quality monitoring,
energy assessment, population of disease, ecosystem
assessment, global warming, global change, global forest
resources assessment, ancient site discovery (For
instance, a hidden relic site can be found by high
resolution remote sensing data in a dense forest without
modern infrastructure, which is an incredible barrier for
field archaeologists to penetrate), Land development and
use [28] and so on.
With the availability of big data, one can predict the
trend of any even (Example: Flood) over hourly basis
unlike days, weeks and years.
Helps in preventing man-made disasters, such as sudden
drops in water quality, which may not be detected until
after they are reported in the media or after the outbreak
of a contagious disease.
Provide weather information that will lead to improving
the country’s agriculture, better informing people of
possible hazardous conditions, and better management
of energy utilization by providing more accurate
predictions on demand.
Big Data can help in all four phases of disaster
management: prevention, preparedness, response, and
recovery. Two major sources of big data, dedicated
sensor networks (e.g., earthquake detection using
seismometers) and multi-purpose sensor networks (e.g.,
social media such as Twitter using smart phones), both
have demonstrated their usefulness in disasters such as
the Earthquake.
Big data generated from the seismometers and other
sources in terms of Seismic intensity maps, wind
velocity maps, Tsunami estimation maps, radiation
distribution maps and rainfall maps are continuously fed
into geological models that can find earthquake
epicenters precisely within seconds of the detection of
earthquake vibrations.
Understanding the environment requires collecting and
analyzing data from thousands of sensors monitoring air
and water quality and meteorological conditions. These
measurements can then be used to guide simulations of
climate and groundwater models to create reliable
methods to predict the effects of long-term trends, such
as increased CO
2
emissions and the use of chemical
fertilizers.
Data feeds from pumping stations, sewage plants, and
reservoirs deliver information to the central control
systems on a range of measurements. This helps in
spotting the flash floods by monitoring water levels in a
reservoir timely and a GPS system assists in tracking the
location flash flood site to follow up warning and rescue
operations.
Helps in improving the quality of precision farming
which, reduce water requirements, allow for an increase
in both crop volume and nutritional value of food grown.
3.4. Disadvantages of Big Data Analysis
Inaccurate information drawn from big data analytics
result in the distribution of rescuers and supplies to
wrong places, wasting limited resources. False
information could also misguide the public, increasing
their stress level.
The quantity of information now available to individuals
and organizations is unprecedented in human history,
and the rate of information generation continues to grow
exponentially. Yet, the sheer volume of information is in
danger of creating more noise than value, and as a result
limiting its effective use.
Disasters affect every country on Earth and effective
disaster management is a global challenge. This is
particularly the case of large-scale disasters that affect
many countries (e.g., the 2004 Indian Ocean earthquake
and tsunami) and multi-hazards such as the Earthquake
and landslides. Tools that can be used by many countries
will have significant broad impact in helping the world
population as well as many government agencies and
non-governmental organizations. At the same time, the
importance of cultural, social, and linguistic differences
among countries in emergency response and recovery,
this will have impact on the big data used for disaster
management.
Having the ability to analyze Big Data is of limited value
if users cannot understand the analysis. Ultimately, a
decision-maker, provided with the result of analysis, has
to interpret these results. This interpretation cannot
happen in a vacuum. Usually, it involves examining all
the assumptions made and retracing the analysis.
There are many possible sources of error: computer
systems can have bugs, models almost always have
assumptions, and results can be based on erroneous data.
Man power/human mind should be always there to
understand and verify, the results produced by the
computer. This is particularly a challenge with Big Data
Machine Learning Research 2017; 2(1): 10-18 17
due to its complexity.
Privacy of information and security of some of data
constitutes a constraint.
4. Summary and Conclusions
This paper has successfully studied the definition of big
data, size of big data, various previous studies related to big
data towards water sector, techniques to analyze big data,
advantages and disadvantages in using this data, applications
in various fields. Though an extensive review on big data
applications in various water resources related subjects have
been studied here, still much have to learn about how to use
and analyze the big data.
References
[1] B. T. Chun, S. H. Lee, A study on big data processing
mechanism and applicability. International Journal of Software
Engineering and Its Applications, 8 (8) (2014) 73-82.
[2] C. A. Schlosser, X. Gao, K. Strzepek, A. Sokolov, C. E. Forest,
S. Awadalla, W. Farmer. Quantifying the likelihood of regional
climate change: A hybridized approach. J Clim, 26 (10) (2012)
3394-3414.
[3] C. Thota, G. Manogaran, D. Lopez, V. Vijayakumar, Big data
security framework for distributed cloud data centers. In
Cybersecurity Breaches and Issues Surrounding Online Threat
Protection, IGI Global (2017) 288-310.
[4] E. Al Nuaimi, H. Al Neyadi, N. Mohamed, J. Al-Jaroodi,
Applications of big data to smart cities. Journal of Internet
Services and Applications. 6 (1) (2015) 1-15.
[5] H. D. Guo, L. Zhang, L. W. Zhu, Earth observation big data for
climate change research. Advances in Climate Change
Research, 6 (2) (2015) 108-117.
[6] H. Xie, Y. He, X. Xie, Exploring the factors influencing
ecological land change for China's Beijing–Tianjin–Hebei
region using big data. Journal of Cleaner Production, 142
(2017), 677-687.
[7] I. Becker-Reshef, C. Justice, B. Doorn, C. Reynolds, A.
Anyamba, C. Tucker, S. Korontzi, NASA contribution to the
group on earth observation (GEO) global agricultural
monitoring system of systems. The Earth Obs. 21 (2009) 24-29.
[8] IPCC, In: Solomon S (ed) Climate change: the physical science
basis. Contribution of Working Group I to the Fourth
Assessment Report of the Intergovernmental Panel on Climate
Change. Cambridge University Press, Cambridge, (2007).
[9] J. L. Schnase, D. Q. Duffy, G. S. Tamkin, D. Nadeau, J. H.
Thompson, C. M. Grieg, ... W. P. Webster, MERRA analytic
services: Meeting the big data challenges of climate science
through cloud-enabled climate analytics-as-a-service.
Computers, Environment and Urban Systems, 61 (2017),
198-211.
[10] J. L. Toole, et al. The path most traveled: Travel demand
estimation using big data resources. Transport. Res. Part C,
http://dx.doi.org/10.1016/j.trc.2015.04.022, (2015).
[11] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C.
Roxburgh, A. H. Byers, Big data: The next frontier for
innovation, competition, and productivity, McKinsey Global
Institute, (2011).
[12] J. Neemann, D. Roberts, P. Kenel, A. Chastain-Howley and
ScottStallard. Manager to manager: Will data analytics change
the way we deliver water? Journal (American Water Works
Association), 105 (11) (2013) 25-27.
[13] K. Tesfaye, K. Sonder, J. Caims, C. Magorokosho, A. Tarekegn,
G. T. Kassie, ... O. Erenstein, Targeting drought-tolerant maize
varieties in southern Africa: a geospatial crop modeling
approach using big data, (2016).
[14] L. Li, T. Hao, T. Chi, Evaluation on China’s forestry resources
efficiency based on big data. Journal of Cleaner Production,
142 (2017a), 513-523.
[15] M. Bernard, Big data: using smart big data, analytics and
metrics to make better decisions and improve performance.
Chichester: John Wiley & Sons, (2015).
[16] M. Chi, A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, Y. Zhu,
Big data for remote sensing: challenges and opportunities.
Proceedings of the IEEE, (2015).
[17] M. D. Steven, J. A. Clark, Applications of remote sensing in
agriculture. Butterworth-Heinemann, (1991).
[18] M. G. Hutchins, S. J. McGrane, J. D. Miller, A. HagenZanker,
T. R. Kjeldsen, S. J. Dadson, C. S. Rowland, Integrated
modeling in urban hydrology: reviewing the role of monitoring
technology in overcoming the issue of ‘big data’ requirements.
Wiley Interdisciplinary Reviews: Water, 4 (1) (2017).
[19] M. Hilbert, Big data for development: A review of promises
and challenges. Development Policy Review, 34 (1) (2016),
135-174.
[20] M. Song, L. Cen, Z. Zheng, R. Fisher, X. Liang, Y. Wang, D.
Huisingh, How would big data support societal development
and environmental sustainability? Insights and practices.
Journal of Cleaner Production, 142 (2017), 489-500.
[21] M. Ziman, Data intelligence for improved water resource
management. Masters project: Nicholas School of the
Environment of Duke University, (2016).
[22] N. Dokoozlian, Big data and the productivity challenge for
wine grapes. In Agricultural Outlook Forum (No. 236854).
United States Department of Agriculture, (2016).
[23] P. Gamba, M. Herold, Global mapping of human settlement:
Experiences, datasets, and prospects. CRC Press, (2009).
[24] R. Chalh, Z. Bakkoury, D. Ouazar, M. D. Hasnaoui, Big data
open platform for water resources management. In Cloud
Technologies and Applications (CloudTech), International
Conference on IEEE, (2015) 1-8.
[25] R. Panicker, Adoption of big data technology for the
development of developing countries. In Proceedings of
National Conference on New Horizons in IT-NCNHIT, (2013)
219.
[26] S. L. LaDeau, B. A. Han, E. J. Rosi-Marshall, K. C. Weathers,
The next decade of big data in ecosystem science. Ecosystems,
(2017) 1-10.
[27] S. Liaghat, S. Balasundram, A review: The role of remote
sensing in precision agriculture. American Journal of
Agricultural and Biological Sciences, 5 (1) (2010) 50-55.
18 Sirisha Adamala: An Overview of Big Data Applications in Water Resources Engineering
[28] S. Martinuzzia, W. A. Goulda, O. M. R. Gonzaleza, Land
development, land use, and urban sprawl in Puerto Rico
integrating remote sensing and population census data.
Landscape and Urban Planning, 79 (2007) 288-297.
[29] T. Li, , Y. Cui, A. Liu, Spatiotemporal dynamic analysis of
forest ecosystem services using “big data”: A case study of
Anhui province, central-eastern China. Journal of Cleaner
Production, 142 (2017b), 589-599.
[30] T. Papadopoulos, A. Gunasekaran, R. Dubey, N. Altay, S. J.
Childe, S. Fosso-Wamba,. The role of Big Data in explaining
disaster resilience in supply chains for sustainability. Journal of
Cleaner Production, 142 (2017), 1108-1118.
[31] T. W. Crowther, et al. Mapping tree density at a global scale.
Nature, 525 (2015) 201-205.
[32] V. Mayer-Schonberger, K. Cukier, Big data: A revolution that
will transform how we live, work, and think. Houghton Mifflin
Harcourt, (2013).
[33] X. Deng, J. Huang, S. Rozelle, E. Uchida, Growth, population
and industrialization, and urban land expansion of China. J.
Urb. Econ, 63 (2008) 96-115.
[34] X. Wang, Z. Sun, The design of water resources and
hydropower cloud GIS platform based on big data. In
Geo-Informatics in Resource Management and Sustainable
Ecosystem, Springer Berlin Heidelberg, (2013) 313-322.
[35] X. Wu, X. Zhu, G. Q. Wu, W. Ding, Data mining with big data.
IEEE transactions on knowledge and data engineering, 26 (1)
(2014) 97-107.
[36] Z. Y. Wu, M. El-Maghraby, S. Pathak, Applications of deep
learning for smart water networks. Procedia Engineering, 119
(2015) 479-485.
... The rise of advanced monitoring technologies has significantly augmented the volume of data in WQ management, creating challenges in the management, processing, and extraction of insights from large datasets. Prominent issues include the necessity to address the substantial and rapid influx of data generated by numerous sensors, which necessitates the implementation of scalable processing solutions (Adamala 2017). Ensuring data quality and standardization across various sources is imperative for conducting reliable analyses. ...
Article
Full-text available
Water quality monitoring is a fundamental component of sustainable management of water resources. This review outlines the progression of water quality assessment, transitioning from traditional methodologies to advanced technologies such as remote sensing, nanotechnology, sensor networks, and artificial intelligence. It emphasizes the transformative potential of integrating big data analytics to enhance accuracy, transparency, and decision-making processes. The study addresses critical global issues, including transboundary water conflicts, complexities in data management and the emergence of contaminants. Additionally, the study employs case studies to illustrate the practical application of these technologies, offering actionable recommendations for their integration into sustainability governance frameworks. This review highlights the interconnectedness of advanced technologies, community engagement, and regulatory frameworks in pursuing sustainable water resource management, serving as a guide for researchers, policymakers, practitioners and communities dedicated to preserving water resources.
... Big data analytics could help to analyze and examine a large amount of big data. The use of big data can help maintain the equipment that is extensively used or operated on daily to have a better collection of data, which is a significant benefit in the water industry [12][13][14]. In other words, big data analytics can help generate fragmented, messy, and unparalleled data from different treatment facilities into meaningful information [11]. ...
Article
Industrial Revolution 4.0 (IR 4.0) is the revolution that promotes customized and flexible mass production technologies in automation and manufacturing processes. The application of industrial revolution 4.0, such as the Internet of Things (IoT), Cyber-Physical System (CPS), and big data, are used in the water industry to provide better water quality, accurate real-time data, etc. This paper compiles IR 4.0 technology applications, advantages, disadvantages, impacts, and challenges in the water sector, including water supply, wastewater treatment, and stormwater management. It is found that the current water industry is facing a few challenges, such as managing water services, balancing water costs with energy prices, maintaining aging water and wastewater infrastructure, and environmental issues. However, with the application of numerous IR 4.0 innovations, all the challenges can be solved or minimized. Meanwhile, the advantages and impacts of adopting IR 4.0 in water industries are thoroughly discussed. Despite the numerous advantages of IR 4.0, significant drawbacks have restricted the adoption of IR 4.0 in the water industry. The high start-up cost, reluctance, uncertainty, and problems of adopting IR 4.0 are some key challenges that need to be addressed. Generally, IR 4.0 provides more benefits to the community as it helps improve the water system’s performance and efficiencies and creates a more sustainable environment.
... Big Data applications in water resources will require high cost of management and computation [125,126], fast spatial data-mining will be increasingly essential in determining the ...
Article
Full-text available
Hydrology is a mature physical science based on application of first principles. However, the water system is complex and its study requires analysis of increasingly large data available from conventional and novel remote sensing and IoT sensor technologies. New data-driven approaches like Artificial Intelligence (AI) and Machine Learning (ML) are attracting much “hype” despite their apparent limitations (transparency, interpretability, ethics). Some AI/ML applications lack in addressing explicitly important hydrological questions, focusing mainly on “black-box” prediction without providing mechanistic insights. We present a typology of four main types of hydrological problems based on their dominant space and time scales, review their current tools and challenges, and identify important opportunities for AI/ML in hydrology around three main topics: data management, insights and knowledge extraction, and modelling structure. Instead of just for prediction, we propose that AI/ML can be a powerful inductive and exploratory dimension-reduction tool within the rich hydrological toolchest to support the development of new theories that address standing gaps in changing hydrological systems. AI/ML can incorporate other forms of structured and non-structured data and traditional knowledge typically not considered in process-based models. This can help us further advance process-based understanding, forecasting and management of hydrological systems, particularly at larger integrated system scales with big models. We call for reimagining the original definition of AI in hydrology to incorporate not only today’s main focus on learning, but on decision analytics and action rules, and on development of autonomous machines in a continuous cycle of learning and refinement in the context of strong ethical, legal, social, and economic constrains. For this, transdisciplinary communities of knowledge and practice will need to be forged with strong investment from the public sector and private engagement to protect water as a common good under accelerated demand and environmental change.
... Due to the ageing of infrastructure and population expansion, improving sustainable water management is a crucial and increasingly important issue [1]. By 2050, food expectancy is estimated to double due to the increase in world's population, which might exceed 9 billion people [2]. Insufficient water or inadequate water management systems will significantly constrain the increase in food production. ...
Article
Full-text available
The current work is an illustration of an empirical investigation conducted on a pharmaceutical water treatment plant that subsequently proposes potential ICT implications for optimizing the plant’s conventional operating procedures and improving production efficiency. Typically, the pilot plant incorporates a standard infrastructure for maintaining quality and production goals. In the study, a schematic of the reverse osmosis section of the pilot treatment plant was developed. A mathematical modeling and process simulation approach was adopted to carry out the linear process investigation and validation of key performance parameters. The study’s findings reveal that the performance and lifecycle of the RO treatment unit are primarily determined via the structured pre-treatment filtering procedures, including critical parameters such as volumetric flowrate, solute concentrations, and differential pressure across the membrane. These operational parameters were also found to be instrumental in increasing plant production and improving equipment efficiency. Based on our results, the study proposes cost-effective ICT implications for plant managers through which pilot organization can substantially save on their annual water and energy consumption.
Article
Water, a priceless gift from nature, acts as Earth's matrix, medium, and life-sustaining substance. While the planet is predominantly covered by water, only 3% is available as freshwater, with 99% of that sourced underground. This groundwater supplies nearly half of the global population. Unfortunately, many areas have experienced recent pollution and overexploitation of this precious resource, adversely affecting the development, sustainability, and economy of people and the planet. Therefore, the evaluation and prediction of Groundwater Quality become indispensable for effective water resource management. Nevertheless, with the continuous advancement of technology, the sheer magnitude of data in Groundwater Science surpasses the capabilities of traditional methods to store, process, and analyse it accurately, leading to erroneous assessments and predictions. Machine Learning is among the promising advanced techniques for processing and extracting new insights from such “Big Data”. This paper explores the scope of Big Data and Machine Learning algorithms for Ground Water Quality Assessment and Prediction (GWQAP). The primary objective of this paper is to identify the impact of Big Data and the effectiveness of Machine learning models in GWQAP. This paper discusses the significance of different Big Data techniques and Machine Learning algorithms for GWQAP. It includes a systematic review of various recently deployed Big Data and Machine Learning applications for Groundwater Quality Management. It also highlights the challenges and future scope of Big Data and Machine Learning in Groundwater Quality Management. Ultimately, this paper is the first step towards enhancing our understanding towards Ground Water Resource Management through Big Data and Machine Learning applications. According to the study, Big Data and Machine Learning can substantially impact water resource management and analysis. Big Data ensures new possibilities for data-driven discovery and decision-making if correctly assessed and managed.
Article
This study presents a comprehensive framework for analyzing water productivity products provided by the FAO Water Productivity Open-access portal (WaPOR), focusing on various crops cultivated in both rainfed and irrigated areas within a semi-arid basin in Iran. Two indices, namely Gross Water Productivity (GWP) and Net Water Productivity (NWP), were introduced to quantify water productivity across crop fields. However, these indices may mislead decision-makers, because they aggregate water productivity for all crops and exacerbate the challenges posed by water scarcity. Therefore, mapping crop types seems necessary to enhance the interpretation of these indices and develop a dimensionless index for comparing different crops. The results demonstrated a fundamental change when comparing dimensionless water productivity with GWP and NWP products. Surprisingly, some pixels initially exhibiting high water productivity ranked as low water-productive pixels based on the derived dimensionless index, and vice versa. Based on dimensionless indicators, rainfed crops, particularly rainfed cereals, ranked as the most water-productive crops. The areas with dimensionless values below 0.5 warrant heightened attention to curtail non-beneficial water consumption and elevate water productivity. This research emphasizes the significance of mapping cultivation types as supplementary layers to facilitate precise, data-driven decision-making and enable comparisons of crops based on dimensionless water productivity indices.
Chapter
Water Informatics is an interdisciplinary branch of Information Science and Technology dedicated in collection, selection, organization, processing, management and dissemination of information related to water and allied concern. Water Informatics is close with the Hydroinformatics expecting few additional and different features. Water Informatics is considered as a branch of Environmental Informatics. It has become an interdisciplinary field of study concerning with water of different sources and their proper management using different sub-technologies of IT and Computing. The concept and features including stakeholders of Water Informatics is changing rapidly due to its emergence. Many Countries have started initiative on developing Water Informatics practice for better water systems of the country concerned, even many developing countries also join in such race. Gradually technologies are also changing viz. Cloud Computing, Big Data, Advanced Geo Information Systems, Remote Sensing and so on. Use of artificial intelligence, machine learning and Internet of Things (IoT) are also considered as latest addition in this space. This chapter is about Water Informatics including illustrating its foundation, basic scope and emergence, rapid and growing technologies in water management. The chapter also focuses on educational, training opportunities available and to be made in this promising interdisciplinary field. This Chapter also illustrated few aspects of National Water Informatics Centre, Government of India as a short overview.
Article
Experimental catchments are fundamental elements in hydrological sciences as they provide key data for putting forward and testing hypotheses, developing theories, constraining models, and making predictions. Significant progress in catchment hydrology stemmed from field measurements but increasing costs and risks associated with field work and the availability of big data based on remote sensing, machine learning, and a plethora of models, as well as observations deriving from previous and current sites, raises questions on whether running an experimental catchment still provides individual and community benefits as in the past. In this commentary, I highlight the advantages of keeping experimental catchments alive and propose a personal 10‐step “recipe” to set up a new experimental catchment and manage and sustain it in the long term. These suggestions can be useful both to young and less young researchers who are open to facing the challenge of measuring processes in the field and are willing to offer the scientific community new experimental evidence for advancing our current knowledge in catchment hydrology.
Book
Full-text available
Forward by Henrik von Scheel. Convert the promise of big data into real world results There is so much buzz around big data. We all need to know what it is and how it works - that much is obvious. But is a basic understanding of the theory enough to hold your own in strategy meetings? Probably. But what will set you apart from the rest is actually knowing how to USE big data to get solid, real-world business results - and putting that in place to improve performance. Big Data will give you a clear understanding, blueprint, and step-by-step approach to building your own big data strategy. This is a well-needed practical introduction to actually putting the topic into practice. Illustrated with numerous real-world examples from a cross section of companies and organisations, Big Data will take you through the five steps of the SMART model: Start with Strategy, Measure Metrics and Data, Apply Analytics, Report Results, Transform. Discusses how companies need to clearly define what it is they need to know - Outlines how companies can collect relevant data and measure the metrics that will help them answer their most important business questions Addresses how the results of big data analytics can be visualised and communicated to ensure key decisions-makers understand them - Discusses how companies need to clearly define what it is they need to know - Includes many high-profile case studies from the author's work with some of the world's best known brands
Article
Full-text available
Ecosystem scientists will increasingly be called on to inform forecasts and define uncertainty about how changing planet conditions affect human well-being. We should be prepared to leverage the best tools available, including big data. Use of the term ‘big data’ implies an approach that includes capacity to aggregate, search, cross-reference, and mine large volumes of data to generate new understanding that can inform decision-making about emergent properties of complex systems. Although big-data approaches are not a panacea, there are large-scale environmental questions for which big data are well suited, even necessary. Ecosystems are complex biophysical systems that are not easily defined by any one data type, location, or time. Understanding complex ecosystem properties is data intensive along axes of volume (size of data), velocity (frequency of data), and variety (diversity of data types). Ecosystem scientists have employed impressive technology for generating high-frequency, large-volume data streams. Yet important challenges remain in both theoretical and infrastructural development to support visualization and analysis of large and diverse data. The way forward includes greater support for network science approaches, and for development of big-data infrastructure that includes capacity for visualization and analysis of integrated data products. Likewise, a new paradigm of cross-disciplinary training and professional evaluation is needed to increase the human capital to fully exploit big-data analytics in a way that is sustainable and adaptable to emerging disciplinary needs.
Article
Full-text available
Increasingly, the application of models in urban hydrology has undergone a shift toward integrated structures that recognize the interconnected nature of the urban landscape and both the natural and engineered water cycles. Improvements in computational processing during the past few decades have enabled the application of multiple, connected model structures that link previously disparate systems together, incorporating feedbacks and connections. Many applications of integrated models look to assess the impacts of environmental change on physical dynamics and quality of landscapes. Whilst these integrated structures provide a more robust representation of natural dynamics, they often place considerable data requirements on the user, whereby data are required at contrasting spatial and temporal scales which can often transcend multiple disciplines. Concomitantly, our ability to observe complex, natural phenomena at contrasting scales has improved considerably with the advent of increasingly novel monitoring technologies. This has provided a pathway for reducing model uncertainty and improving our confidence in modeled outputs by implementing suitable monitoring regimes. This commentary assesses how component models of an exemplar integrated model have advanced over the past few decades, with a critical focus on the role of monitoring technologies that have enabled better identification of the key physical process. This reduces the uncertainty of processes at contrasting spatial and temporal scales, through a better characterization of feedbacks which then enhances the utility of integrated model applications. WIREs Water 2017, 4:e1177. doi: 10.1002/wat2.1177 This article is categorized under: Science of Water > Methods
Chapter
The rapid development of data generation sources such as digital sensors, networks, and smart devices along with their extensive use is leading to create huge database and coins the term Big Data. Cloud Computing enables computing resources such as hardware, storage space and computing tools to be provided as IT services in a pay-as-you-go fashion with high efficiency and effectiveness. Cloud-based technologies with advantages over traditional platforms are rapidly utilized as potential hosts for big data. However, privacy and security is one of major issue in cloud computing due to its availability with very limited user-side control. This chapter proposes security architecture to prevent and secure the data and application being deployed in cloud environment with big data technology. This chapter discuss the security issues for big data in cloud computing and proposes Meta Cloud Data Storage architecture to protect big data in cloud computing environment.
Article
Although a large number of reviews have studied different scales of forest ecosystem services, few have concentrated on accounting the spatiotemporal dynamic changes of forest ecosystem services in a region. To make a more accurate accounting, this paper selects intense data sets that assess the physical quantity and value of forest ecosystem services in Anhui provinces from 2009 to 2014. Regulating and supporting Services such as water conservation, carbon sequestration and oxygen release, soil conservation and air purification are included in the paper as variables. Optimize analysis was further conducted in order to identify the spatial and temporal heterogeneity during the study period. We establish a dynamic ecosystem services assessment model with the coefficient of spatial heterogeneity and coefficients of scarce resources and social development to analyze the trend of the value in Anhui province. This model is designed for comparisons of forest services in different scales in Anhui province. The results demonstrate that the total value of forest ecosystem services is beyond the previous estimation while using spatiotemporal visualization. The implications are that the evaluation of forest ecosystem services should be measured more accurately with the help of big data in the future.
Article
Every day a large number of Earth observation (EO) spaceborne and airborne sensors from many different countries provide a massive amount of remotely sensed data. Those data are used for different applications, such as natural hazard monitoring, global climate change, urban planning, etc. The applications are data driven and mostly interdisciplinary. Based on this it can truly be stated that we are now living in the age of big remote sensing data. Furthermore, these data are becoming an economic asset and a new important resource in many applications. In this paper, we specifically analyze the challenges and opportunities that big data bring in the context of remote sensing applications. Our focus is to analyze what exactly does big data mean in remote sensing applications and how can big data provide added value in this context. Furthermore, this paper describes the most challenging issues in managing, processing, and efficient exploitation of big data for remote sensing problems. In order to illustrate the aforementioned aspects, two case studies discussing the use of big data in remote sensing are demonstrated. In the first test case, big data are used to automatically detect marine oil spills using a large archive of remote sensing data. In the second test case, content-based information retrieval is performed using high-performance computing (HPC) to extract information from a large database of remote sensing images, collected after the terrorist attack to the World Trade Center in New York City. Both cases are used to illustrate the significant challenges and opportunities brought by the use of big data in remote sensing applications.
Article
The purpose of this paper is to propose and test a theoretical framework to explain resilience in supply chain networks for sustainability using unstructured Big Data, based upon 36,422 items gathered in the form of tweets, news, Facebook, WordPress, Instagram, Google+, and YouTube, and structured data, via responses from 205 managers involved in disaster relief activities in the aftermath of Nepal earthquake in 2015. The paper uses Big Data analysis, followed by a survey which was analyzed using content analysis and confirmatory factor analysis (CFA). The results of the analysis suggest that swift trust, information sharing and public-private partnership are critical enablers of resilience in supply chain networks. The current study used cross-sectional data. However the hypotheses of the study can be tested using longitudinal data to attempt to establish causality. The article advances the literature on resilience in disaster supply chain networks for sustainability in that (i) it suggests the use of Big Data analysis to propose and test particular frameworks in the context of resilient supply chains that enable sustainability; (ii) it argues that swift trust, public private partnerships, and quality information sharing link to resilience in supply chain networks; and (iii) it uses the context of Nepal, at the moment of the disaster relief activities to provide contemporaneous perceptions of the phenomenon as it takes place.