ArticlePDF Available

Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges

Authors:

Abstract and Figures

Voluminous amounts of data have been produced, since the past decade as the miniaturization of Internet of things (IoT) devices increases. However, such data are not useful without analytic power. Numerous big data, IoT, and analytics solutions have enabled people to obtain valuable insight into large data generated by IoT devices. However, these solutions are still in their infancy, and the domain lacks a comprehensive survey. This paper investigates the state-of-the-art research efforts directed toward big IoT data analytics. The relationship between big data analytics and IoT is explained. Moreover, this paper adds value by proposing a new architecture for big IoT data analytics. Furthermore, big IoT data analytic types, methods, and technologies for big data mining are discussed. Numerous notable use cases are also presented. Several opportunities brought by data analytics in IoT paradigm are then discussed. Finally, open research challenges, such as privacy, big data mining, visualization, and integration, are presented as future research directions.
Content may be subject to copyright.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
1
AbstractVoluminous amounts of data have been produced
since the past decade as the miniaturization of Internet of
things (IoT) devices increases. However, such data are not
useful without analytic power. Numerous big data, IoT, and
analytics solutions have enabled people to obtain valuable
insight into large data generated by IoT devices. However,
these solutions are still in their infancy, and the domain lacks a
comprehensive survey. This study investigates state-of-the-art
research efforts directed toward big IoT data analytics. The
relationship between big data analytics and IoT is explained.
Moreover, this study adds value by proposing a new
architecture for big IoT data analytics. Furthermore, big IoT
data analytic types, methods, and technologies for big data
mining are discussed. Numerous notable use cases are also
presented. Several opportunities brought by data analytics in
IoT paradigm are then discussed. Lastly, open research
challenges, such as privacy, big data mining, visualization, and
integration, are presented as future research directions.
Index Terms Big data, Internet of things, Data analytics,
Distributed computing, Smart city.
I. INTRODUCTION
The development of big data and the Internet of things (IoT)
is rapidly accelerating and affecting all areas of
technologies and businesses by increasing the benefits for
organizations and individuals. The growth of data produced
via IoT has played a major role on the big data landscape.
Big data can be categorized according to three aspects: (a)
volume, (b) variety, and (c) velocity [1]. These categories
were first introduced by Gartner to describe the elements of
big data challenges [2]. Immense opportunities are
presented by the capability to analyze and utilize huge
amounts of IoT data, including applications in smart cities,
smart transport and grid systems, energy smart meters, and
remote patient healthcare monitoring devices.
Mohsen Marjani, Abdullah Gani, Aisha Siddiqa, Ibrahim Abaker Targio
Hashem, and Ibrar Yaqoob are with Department of Computer System and
Technology, Faculty of Computer Science and Information Technology,
University of Malaya, Kuala Lumpur, Malaysia
(marjanimohsen@gmail.com). Fariza Nasaruddin is with Department of
Information System, Faculty of Computer Science and Information
Technology, University of Malaya, Kuala Lumpur, Malaysia.
Ahmad Karim is with Department of Information Technology, Bahauddin
Zakariya University, Multan, Pakistan.
The widespread popularity of IoT has made big data
analytics challenging because of the processing and
collection of data through different sensors in the IoT
environment. The International Data Corporation (IDC)
report indicates that the big data market will reach over
US$125 billion by 2019 [3]. IoT big data analytics can be
defined as the steps in which a variety of IoT data are
examined [4] to reveal trends, unseen patterns, hidden
correlations, and new information [5]. Companies and
individuals can benefit from analyzing large amounts of
data and managing huge amounts of information that can
affect
1
businesses [6]. Therefore, IoT big data analytics aims
to assist business associations and other organizations to
achieve improved understanding of data, and thus, make
efficient and well-informed decisions. Big data analytics
enables data miners and scientists to analyze huge amounts
of unstructured data that can be harnessed using traditional
tools [5]. Moreover, big data analytics aims to immediately
extract knowledgeable information using data mining
techniques that help in making predictions, identifying
recent trends, finding hidden information, and making
decisions [7].
Techniques in data mining are widely deployed for both
problem-specific methods and generalized data analytics.
Accordingly, statistical and machine learning methods are
utilized. IoT data are different from normal big data
collected via systems in terms of characteristics because of
the various sensors and objects involved during data
collection, which include heterogeneity, noise, variety, and
rapid growth. Statistics [8] show that the number of sensors
will be increased by 1 trillion in 2030. This increase will
affect the growth of big data. Introducing data analytics and
IoT into big data requires huge resources, and IoT has the
capability to offer an excellent solution. Appropriate
resources and intensive applications of the platforms are
provided by IoT services for effective communication
among various deployed applications. Such process is
suitable for fulfilling the requirements of IoT applications,
and can reduce some challenges in the future of big data
analytics. This technological amalgamation increases the
possibility of implementing IoT toward a better direction.
Moreover, implementing IoT and big data integration
Big IoT Data Analytics: Architecture,
Opportunities, and Open Research Challenges
Mohsen Marjani, Fariza Nasaruddin*, Abdullah Gani, Fellow, IEEE, Ahmad Karim, Ibrahim Abaker
Targio Hashem*, Aisha Siddiqa, Ibrar Yaqoob Member, IEEE
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
2
solutions can help address issues on storage, processing,
data analytics, and visualization tools. It can also assist in
improving collaboration and communication among various
objects in a smart city [9]. Application areas, such as smart
ecological environments, smart traffic, smart grids,
intelligent buildings, and logistic intelligent management,
can benefit from the aforementioned arrangement. Many
studies on big data has focused on big data management; in
particular, big data analytics has been surveyed [10, 11].
However, this survey focused on IoT big data in the context
of the analytics of a huge amount of data. The contributions
of this survey are as follows.
a) State-of-the-art research efforts conducted in terms
of big data analytics are investigated.
b) An architecture for big IoT data analytics is
proposed.
c) Several unprecedented opportunities brought by
data analytics in the IoT domain are introduced.
d) Credible use cases are presented.
e) Research challenges that remain to be addressed
are identified and discussed.
These contributions are presented from Sections 3 to 6. The
conclusion is provided in Section 7.
II. OVERVIEW OF IOT AND BIG DATA
An overview of IoT technologies and big data is provided
before the discussion.
A. IoT
IoT offers a platform for sensors and devices to
communicate seamlessly within a smart environment and
enables information sharing across platforms in a
convenient manner. The recent adaptation of different
wireless technologies places IoT as the next revolutionary
technology by benefiting from the full opportunities offered
by the Internet technology. IoT has witnessed its recent
adoption in smart cities with interest in developing
intelligent systems, such as smart office, smart retail, smart
agriculture, smart water, smart transportation, smart
healthcare, and smart energy [12, 13].
IoT has emerged as a new trend in the last few years, where
mobile devices, transportation facilities, public facilities,
and home appliances can all be used as data acquisition
equipment in IoT. All surrounding electronic equipment to
facilitate daily life operations, such as wristwatches,
vending machines, emergency alarms, and garage doors, as
well as home appliances, such as refrigerators, microwave
ovens, air conditioners, and water heaters are connected to
an IoT network and can be controlled remotely. Ciufo [14]
stated that these devices talk to one another and to central
controlling devices. Such devices deployed in different
areas may collect various kinds of data, such as
geographical, astronomical, environmental, and logistical
data.
A large number of communication devices in the IoT
paradigm are embedded into sensor devices in the real
world. Data collecting devices sense data and transmit these
data using embedded communication devices. The
continuum of devices and objects are interconnected
through a variety of communication solutions, such as
Bluetooth, WiFi, ZigBee, and GSM. These communication
devices transmit data and receive commands from remotely
controlled devices, which allow direct integration with the
physical world through computer-based systems to improve
living standards.
Over 50 billion devices ranging from smartphones, laptops,
sensors, and game consoles are anticipated to be connected
to the Internet through several heterogeneous access
networks enabled by technologies, such as radio frequency
identification (RFID) and wireless sensor networks. [15]
mentioned that IoT could be recognized in three paradigms:
Internet-oriented, sensors, and knowledge [16]. The recent
adaptation of different wireless technologies places IoT as
the next revolutionary technology by benefiting from the
full opportunities offered by Internet technology.
B. Big data
The volume of data generated by sensors, devices, social
media, health care applications, temperature sensors, and
various other software applications and digital devices that
continuously generate large amounts of structured,
unstructured, or semi-structured data is strongly increasing.
This massive data generation results in ―big data‖ [17].
Traditional database systems are inefficient when storing,
processing, and analyzing rapidly growing amount of data
or big data [18]. The term ―big data‖ has been used in the
previous literature but is relatively new in business and IT
[19]. An example of big data-related studies is the next
frontier for innovation, competition, and productivity;
McKinsey Global Institute [20] defined big data as the size
of data sets that are a better database system tool than the
usual tools for capturing, storing, processing, and analyzing
such data [18]. The Digital Universe study [21] labels big
data technologies as a new generation of technologies and
architectures that aim to take out the value from a massive
volume of data with various formats by enabling high-
velocity capture, discovery, and analysis. This previous
study also characterizes big data into three aspects: (a) data
sources, (b) data analytics, and (c) the presentation of the
results of the analytics. This definition uses the 3Vs
(volume, variety, velocity) model proposed by Gartner [2].
The model highlights an e-commerce trend in data
management that faces challenges to manage volume or size
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
3
of data, variety or different sources of data, and velocity or
speed of data creation. Some studies declare volume as a
main characteristic of big data without providing a pure
definition [22]. However, other researchers introduced
additional characteristics for big data, such as veracity,
value, variability, and complexity [23, 24]. The 3V‘s model,
or its derivations, is the most common descriptions of the
term big data.
III. BIG DATA ANALYTICS
Big data analytics involves the processes of searching a
database, mining, and analyzing data dedicated to improve
company performance [25].
Big data analytics is the process of examining large data
sets that contain a variety of data types [4] to reveal unseen
patterns, hidden correlations, market trends, customer
preferences, and other useful business information [5]. The
capability to analyze large amounts of data can help an
organization deal with considerable information that can
affect the business [6]. Therefore, the main objective of big
data analytics is to assist business associations to have
improved understanding of data, and thus, make efficient
and well-informed decisions. Big data analytics enables data
miners and scientists to analyze a large volume of data that
may not be harnessed using traditional tools [5].
Big data analytics require technologies and tools that can
transform a large amount of structured, unstructured, and
semi-structured data into a more understandable data and
metadata format for analytical processes. The algorithms
used in these analytical tools must discover patterns, trends,
and correlations over a variety of time horizons in the data
[26]. After analyzing the data, these tools visualize the
findings in tables, graphs, and spatial charts for efficient
decision making. Thus, big data analysis is a serious
challenge for many applications because of data complexity
and the scalability of underlying algorithms that support
such processes [27].
Talia (2013) highlighted that obtaining helpful information
from big data analysis is a critical matter that requires
scalable analytical algorithms and techniques to return well-
timed results, whereas current techniques and algorithms are
inefficient to handle big data analytics. Therefore, large
infrastructure and additional applications are necessary to
support data parallelism. Moreover, data sources, such as
high-speed data stream received from different data sources,
have different formats, which makes integrating multiple
sources for analytics solutions critical [28]. Hence, the
challenge is focused on the performance of current
algorithms used in big data analysis, which is not rising
linearly with the rapid increase in computational resources
[19].
Big data analytics processes consume considerable time to
provide feedback and guidelines to users, whereas only a
few tools [29] can process huge data sets within reasonable
amount of processing time. By contrast, most of the
remaining tools use the complicated trial-and-error method
to deal with massive amounts of data sets and data
heterogeneity [30]. Big data analytics systems exist. For
example, the Exploratory Data Analysis Environment [31]
is a big data visual analytics system that is used to analyze
complex earth system simulations with large numbers of
data sets.
A. Existing analytics systems
Different analytic types are used according to the
requirements of IoT applications [32]. These analytic types
are discussed in this subsection under real-time, off-line,
memory-level, business intelligence (BI) level, and massive
level analytics categories. Moreover, a comparison based on
analytics types and their levels is presented in Table 1.
Real-time analytics is typically performed on data collected
from sensors. In this situation, data change constantly, and
rapid data analytics techniques are required to obtain an
analytical result within a short period. Consequently, two
existing architectures have been proposed for real-time
analysis: parallel processing clusters using traditional
relational databases and memory-based computing
platforms [33]. Greenplum [34] and Hana [35] are examples
of real-time analytics architecture.
Off-line analytics is used when a quick response is not
required [32]. For example, many Internet enterprises use
Hadoop-based off-line analytics architecture to reduce the
cost of data format conversion [36]. Such analytics
improves data acquisition efficiency. SCRIBE [37], Kafka
[38], Time-Tunnel [39], and Chukwa [40] are examples of
architectures that conduct off-line analytics and can satisfy
the demands of data acquisition.
Memory-level analytics is applied when the size of data is
smaller than the memory of a cluster [32]. To date, the
memory of clusters has reached terabyte (TB) level [41].
Therefore, several internal database technologies are
required to improve analytical efficiency. Memory-level
analytics is suitable for conducting real-time analysis.
MongoDB [42] is an example of this architecture.
BI analytics is adopted when the size of data is larger than
the memory level, but in this case, data may be imported to
the BI analysis environment [43]. BI analytic currently
supports TB-level data [32]. Moreover, BI can help discover
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
4
strategic business opportunities from the flood of data. In
addition, BI analytics allows easy interpretation of data
volumes. Identifying new opportunities and implementing
an effective strategy provide competitive market advantage
and long-term stability.
Massive analytics is applied when the size of data is greater
than the entire capacity of the BI analysis product and
traditional databases [44]. Massive analytics uses the
Hadoop distributed file system for data storage and
map/reduce for data analysis. Massive analytics helps create
the business foundation and increases market
competitiveness by extracting meaningful values from data.
Moreover, massive analytics obtains accurate data that
leverage the risks involved in making any business decision.
In addition, massive analytics provides services effectively
TABLE 1: COMPARISON OF DIFFERENT ANALYTICS TYPES AND THEIR LEVELS
Analytic Types/Level
Specified Use
Existing
Architectures/Tools
Real time[33]
To analyze the large amounts of data
generated by the sensors
+Greenplum
+HANA
Offline [36]
To use for the
Applications where there is no high
requirements on response time
+Scribe
+ Kafka
+Timetunnel
+Chukwa
Memory level [41]
To use where the total
data volume is smaller than the
maximum
Memory of the cluster
+MongoDB
Business intelligence level
[43]
To use when the data
scale surpasses the
memory level
+Data analysis plans.
Massive level [44]
To use when data scale is
totally surpassed the
capacity of business
intelligence products and traditional
databases
+MapReduce
B. Relationship between IoT and big data analytics
Big data analytics is rapidly emerging as a key IoT initiative
to improve decision making. One of the most prominent
features of IoT is its analysis of information about
―connected things.‖ Big data analytics in IoT requires
processing a large amount of data on the fly and storing the
data in various storage technologies. Given that much of the
unstructured data are gathered directly from web-enabled
―things,‖ big data implementations will necessitate
performing lightning-fast analytics with large queries to
allow organizations to gain rapid insights, make quick
decisions, and interact with people and other devices. The
interconnection of sensing and actuating devices provide the
capability to share information across platforms through a
unified architecture and develop a common operating
picture for enabling innovative applications.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
5
Fig. 1. Relationship between IoT and big data analytics
The need to adopt big data in IoT applications is
compelling. These two technologies have already been
recognized in the fields of IT and business. Although, the
development of big data is already lagging, these
technologies are inter-dependent and should be jointly
developed. In general, the deployment of IoT increases the
amount of data in quantity and category; hence, offering the
opportunity for the application and development of big data
analytics. Moreover, the application of big data technologies
in IoT accelerates the research advances and business
models of IoT. The relationship between IoT and big data,
which is shown in Figure 1, can be divided into three steps
to enable the management of IoT data. The first step
comprises managing IoT data sources, where connected
sensors devices use applications to interact with one
another. For example, the interaction of devices such as
CCTV cameras, smart traffic lights, and smart home
devices, generates large amounts of data sources with
different formats. This data can be stored in low cost
commodity storage on the cloud. In the second step, the
generated data are called ―big data,‖ which are based on
their volume, velocity, and variety. These huge amounts of
data are stored in big data files in shared distributed fault-
tolerant databases. The last step applies analytics tools such
as MapReduce, Spark, Splunk, and Skytree that can analyze
the stored big IoT data sets. The four levels of analytics start
from training data, then move on to analytics tools, queries,
and reports.
C. Big data analytics methods
Big data analytics aim to immediately extract
knowledgeable information that helps in making
predictions, identifying recent trends, finding hidden
information, and ultimately, making decisions [7]. Data
mining techniques are widely deployed for both problem-
specific methods and generalized data analytics.
Accordingly, statistical and machine learning methods are
utilized. The evolution of big data also changes analytics
requirements. Although the requirements for efficient
mechanisms lie in all aspects of big data management [30],
such as capturing, storage, preprocessing, and analysis; for
our discussion, big data analytics requires the same or faster
processing speed than traditional data analytics with
minimum cost for high-volume, high-velocity, and high-
variety data [45].
Various solutions are available for big data analytics, and
advancements in developing and improving these solutions
are being continuously achieved to make them suitable for
new big data trends. Data mining plays an important role in
analytics, and most of the techniques are developed using
data mining algorithms according to a particular scenario.
Knowledge on available big data analytics options is crucial
when evaluating and choosing an appropriate approach for
decision making. In this section, we present several methods
that can be implemented for several big data case studies.
Some of these analytics methods are efficient for big IoT
data analytics. Diverse and tremendous size data sets
contribute more in big data insights. However, this belief is
not always valid because more data may have more
ambiguities and abnormalities [7].
We present big data analytics methods under classification,
clustering, association rule mining, and prediction
categories. Figure 2 depicts and summarizes each of these
categories. Each category is a data mining function and
involves many methods and algorithms to fulfill information
extraction and analysis requirements. For example,
Bayesian network, support vector machine (SVM), and k-
nearest neighbor (KNN) offer classification methods.
Similarly, partitioning, hierarchical clustering, and co-
occurrence are widespread in clustering. Association rule
mining and prediction comprise significant methods.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
6
Fig. 2. Overview of big data analytics methods
Classification is a supervised learning approach that uses
prior knowledge as training data to classify data objects into
groups [46]. A predefined category is assigned to an object,
and thus, the objective of predicting a group or class for an
object is achieved (see Figure 2). Finding unknown or
hidden patterns is more challenging for big IoT data.
Furthermore, extracting valuable information from large
data sets to improve decision making is a critical task. A
Bayesian network is a classification method that offers
model interpretability. Bayesian networks are efficient for
analyzing complex data structures revealed through big data
rather than traditional structured data formats. These
networks are directed acyclic graphs, where nodes are
random variables and edges denote conditional dependency
[47]. Naïve, selective naïve, semi-naïve Bayes, and Bayes
multi-nets are the proposed categories for classification
[48].
Analyzing data patterns and creating groups are efficiently
performed using SVM, which is also classification approach
for big data analytics. SVM utilizes statistical learning
theory to analyze data patterns and create groups. Several
applications of SVM classification in big data analytics
include text classification [49], pattern matching [50], health
diagnostics [51], and commerce. Similarly, KNN is
typically designed to provide efficient mechanisms for
finding hidden patterns from big data sets, such that
retrieved objects are similar to the predefined category [52].
Using cases further improve the KNN algorithm for
application in anomaly detection [53], high-dimensional
data [54], and scientific experiments [55]. Classification has
other extensions while adopting a large number of artificial
intelligence and data mining techniques. Consequently,
classification is one of the widespread data mining
techniques for big data analytics.
Clustering is another data mining technique used as a big
data analytics method. Contrary to classification, clustering
uses an unsupervised learning approach and creates groups
for given objects based on their distinctive meaningful
features [56]. As we have presented in Figure 2 that
grouping a large number of objects in the form of clusters
makes data manipulation simple. The well-known methods
used for clustering are hierarchical clustering and
partitioning. The hierarchical clustering approach keeps
combining small clusters of data objects to form a
hierarchical tree and create agglomerative clusters. Divisive
clusters are created in the opposite manner by dividing a
single cluster that contains all data objects into smaller
appropriate clusters [57].
Market analysis and business decision making are the most
significant applications of big data analytics. The process of
association rule mining involves identifying interesting
relationships among different objects, events, or other
entities to analyze market trends, consumer buying
behavior, and product demand predictions (see Figure 2).
Association rule mining [58] focuses on identifying and
creating rules based on the frequency of occurrences for
numeric and non-numeric data. Data processing is
performed in two manners under association rules. First,
sequential data processing uses priori-based algorithms,
such as MSPS [59] and LAPIN-SPAM [60], to identify
interaction associations. Another significant data processing
approach under association rule is temporal sequence
analysis, which uses algorithms to analyze event patterns in
continuous data.
Predictive analytics use historical data, which are known as
training data, to determine the results as trends or behavior
in data. SVM and fuzzy logic algorithms are used to identify
relationships between independent and dependent variables
and to obtain regression curves for predictions, such as for
natural disasters. Furthermore, customer buying predictions
and social media trends are analyzed through predictive
analytics [61] (see Table 2). In the case of big data
analytics, processing requirements are modified according
to the nature and volume of data. Fast data access and
mining methods for structured and unstructured data are
major concerns related to big data analytics. Furthermore,
data representation is a significant requirement in big data
analytics. Time series analysis reduces high dimensionality
associated with big data and offers representation for
improved decision making. Research related to time series
representation includes ARMA [62], bitmaps [63], and
wavelet functions [64].
The big data analytics methods discussed in this section are
widely adopted in many application areas of big data, such
as disaster management, healthcare, business, industry, and
e-governance. In Table 2, we present the application areas
Big Data Analytics
Classification
Clustering
Prediction
Association Rule
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
7
of big data mining functionalities that are elaborated in this
section, ‗ is used to show the support for an application
whereas -‘ denotes that it is not obvious whether the
method supports to an application or not. In particular,
Table 2 shows that classification methods are suitable for
medical imaging, industry, speech recognition, natural
language processing, and e-governance. Clustering and
association rule-based data analytics methods are applicable
to industry and e-governance and are well adopted in
healthcare, e-commerce, and bioinformatics. Predictive
analytics are useful for disaster and market predictions,
whereas time series analysis is used in disaster forecasting,
medical imaging, speech recognition, social network
analysis, and e-governance.
TABLE 2: APPLICATIONS OF BIG DATA MINING FOR IOT
Method
Applications
Disaster management
Healthcare
Medical Imaging
Human Genetics
Market Analysis
Industry
Speech Recognition
Bioinformatics
NLP
Social Network Analysis
e-governance
Classification [46]
-
-
-
-
-
-
Clustering [57]
-
-
-
Association rule[58, 65]
-
-
-
-
-
-
Prediction [61]
-
-
-
-
-
-
-
-
Time Series [62] [63] [64]
-
-
-
-
-
-
has support
- not obvious
D. IoT architecture for big data analytics
The architectural concept of IoT has several definitions
based on IoT domain abstraction and identification. It offers
a reference model that defines relationships among various
IoT verticals, such as, smart traffic, smart home, smart
transportation, and smart health. The architecture for big
data analytics offers a design for data abstraction.
Furthermore, this standard provides a reference architecture
that builds upon the reference model. Many IoT
architectures are found in the literature [66] [67] [13]. For
example, [13] offered an IoT architecture with cloud
computing at the center and a model of end-to-end
interaction among various stakeholders in a cloud-centric
IoT framework for better comparison with the proposed IoT
architecture. This architecture is achieved by seamless
ubiquitous sensing, data analytics, and information
representation with IoT as the unifying architecture.
However, the current architecture focuses on IoT with
regard to communications. To our knowledge, our proposed
architecture, which integrates IoT and big data analytics,
has not been studied in the current literature. Figure 3
illustrates the IoT architecture and big data analytics. In this
figure, the sensor layer contains all the sensor devices and
the objects, which are connected through a wireless
network. This wireless network communication can be
RFID, WiFi, ultra-wideband, ZigBee, and Bluetooth. The
IoT gateway allows communication of the Internet and
various webs. The upper layer concerns big data analytics,
where a large amount of data received from sensors are
stored in the cloud and accessed through big data analytics
applications. These applications contain API management
and a dashboard to help in the interaction with the
processing engine.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
8
IoT Devices
IoT Devices
Big data analytic
Big data analytic
API Management Dashboard
IoT Gateway
Cloud storage
Network devices
Fig. 3. IoT architecture and big data analytics
A novel meta-model-based approach for integrating IoT
architecture objects is proposed. The concept is semi-
automatically federated into a holistic digital enterprise
architecture environment. The main objective is to provide
an adequate decision support for complex business,
architecture management with the development of
assessment systems, and IT environment. Thus,
architectural decisions for IoT are closely connected with
code implementation to allow users to understand the
integration of enterprise architecture management with IoT.
IV. USE CASES
This section presents a number of use cases for big IoT data
analytics. Although the use cases are relevant to IoT
applications, the choices have been guided for the ones that
are most commonly used in IoT applications and for the
amount of data that can be generated for analytics.
A. Smart metering
Smart metering is one of the IoT application use cases that
generates a large amount of data from different sources,
such as smart grids, tank levels, and water flows, and silos
stock calculation, in which processing takes a long time
even on a dedicated and powerful machine [68]. A smart
meter is a device that electronically records consumption of
electric energy data between the meter and the control
system. Collecting and analyzing smart meter data in IoT
environment assist the decision maker in predicting
electricity consumption. Furthermore, the analytics of a
smart meter can also be used to forecast demands to prevent
crises and satisfy strategic objectives through specific
pricing plans. Thus, utility companies must be capable of
high-volume data management and advanced analytics
designed to transform data into actionable insights.
B. Smart transportation
A smart transportation system is an IoT-based use case that
aims to support the smart city concept. A smart
transportation system intends to deploy powerful and
advanced communication technologies for the management
of smart cities. Traditional transportation systems, which are
based on image processing, are affected by weather
conditions, such as heavy rains and thick fog. Consequently,
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
9
the captured image may not be clearly visible. The design of
an e-plate system [69] using RFID technology provides a
good solution for intelligent monitoring, tracking, and
identification of vehicles. Moreover, introducing IoT into
vehicular technologies will enable traffic congestion
management to exhibit significantly better performance than
the existing infrastructure. This technology can improve
existing traffic systems in which vehicles can effectively
communicate with one another in a systematic manner
without human intervention.
Satellite navigation systems and sensors can also be applied
in trucks, ships, and airplanes in real time. The routing of
these vehicles can be optimized by using the bulk of
available public data, such as traffic jams, road conditions,
delivery addresses, weather conditions, and locations of
refilling stations. For example, in case of runtime address
change, the updated information (route, cost) can be
optimized, recalculated, and passed on to drivers in real
time. Sensors incorporated into these vehicles can also
provide real-time information to measure engine health,
determine whether equipment requires maintenance, and
predict errors [70].
C. Smart supply chains
Embedded sensor technologies can communicate
bidirectionally and provide remote accessibility to over 1
million elevators worldwide [71]. The captured data are
used by on- and off-site technicians to run diagnostics and
repair options to make appropriate decisions, which result in
increased machine uptime and enhanced customer service.
Ultimately, big IoT data analytics allows a supply chain to
execute decisions and control the external environment.
IoT-enabled factory equipment will be able to communicate
within data parameters (i.e., machine utilization,
temperature) and optimize performance by changing
equipment settings or process workflow [72]. In-transit
visibility is another use case that will play a vital role in
future supply chains in the presence of IoT infrastructure.
Key technologies used by in-transit visibility are RFIDs and
cloud-based Global Positioning System (GPS), which
provide location, identity, and other tracking information.
These data will be the backbone of supply chains supported
by IoT technologies. The information gathered by
equipment will provide detailed visibility of an item shipped
from a manufacturer to a retailer. Data collected via RFID
and GPS technologies will allow supply chain managers to
enhance automated shipment and accurate delivery
information by predicting time of arrival. Similarly,
managers will be able to monitor other information, such as
temperature control, which can affect the quality of in-
transit products.
D. Smart agriculture
Smart agriculture is a beneficial use case in big IoT data
analytics. Sensors are the actors in the smart agriculture use
case. They are installed in fields to obtain data on moisture
level of soil, trunk diameter of plants, microclimate
condition, and humidity level, as well as to forecast
weather. Sensors transmit obtained data using network and
communication devices. These data pass through an IoT
gateway and the Internet to reach the analytics layer shown
in Fig. The analytics layer processes the data obtained from
the sensor network to issue commands. Automatic climate
control according to harvesting requirements, timely and
controlled irrigation, and humidity control for fungus
prevention are examples of actions performed based on big
data analytics recommendations.
E. Smart grid
The smart grid is a new generation of power grid in which
managing and distributing electricity between suppliers and
consumers is upgraded using two-way communication
technologies and computing capabilities to improve
reliability, safety, efficiency with real-time control, and
monitoring [73, 74]. One of the major challenges in a power
system is integrating renewable and decentralized energy.
Electricity systems require a smart grid to manage the
volatile behavior of distributed energy resources (DERs)
[75]. However, most energy systems have to follow
governmental laws and regulations, as well as consider
business analysis and potential legal constraints [76]. Grid
sensors and devices continuously and rapidly generate data
related to control loops and protection and require real-time
processing and analytics along with machine-to-machine
(M2M) or human-to-machine (HMI) interactions to issue
control commands to the system. However, the system must
fulfill visualization and reporting requirements.
F. Smart traffic light system
The smart traffic light system consists of nodes that locally
interact with IoT sensors and devices to detect the presence
of vehicles, bikers, and pedestrians. These nodes
communicate with neighboring traffic lights to measure the
speed and distance of approaching transportation means and
manage green traffic signals [77]. IoT data gathered using
the system require real-time analytics processing to perform
necessary tasks, such as changing the timing cycles
according to traffic conditions, sending informative signals
to neighboring nodes, and detecting approaching vehicles
that use IoT sensors and devices to prevent long queues or
accidents. Moreover, smart traffic light systems can send
their collected IoT data to cloud storage for further
analytics. Table 3 presents the use cases of IoT big data
analytics.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
10
TABLE 3: COMPARISON OF IOT BIG DATA ANALYTICS USE CASES
Use cases
Benefits
IoT
devices
Data
source
Big data
analytics
applications
Smart metering [68]
Predict electricity consumption
Sensors
Text
Hadoop
Smart
transportation[69]
[70]
Improve existing traffic system by which vehicles can
effectively communicate with one another in a systematic
manner without human intervention
Sensors,
cameras
Text,
video,
audio
Hadoop, Spark,
Hive
Smart supply chains
[71] [72]
Allow a supply chain to execute decisions and control the
external environment
Sensors,
mobile
devices
Text,
image
Hadoop
Smart agriculture
[12, 13]
Obtain moisture level of soil, trunk diameter of plants,
microclimate condition and humidity level; forecast weather
Sensors
Text,
image
Hadoop
Smart grid [73, 74]
[75] [76]
Improves reliability, safety, and efficiency, along with real-
time control and monitoring
Sensors
Text
Hadoop
Smart traffic [77]
Detect the presence of vehicles, bikers, and pedestrians
Cameras
Video,
image
Hadoop, Spark
As shown in Table 3, most use cases are related to M2M
communication technologies and decrease the role of human
interaction. However, the technologies use prediction
methods and decision-making techniques to improve real-
time control, monitoring, and performance. Textual data are
among the common data types generated by IoT devices,
which are mostly sensors and cameras. Text-based data are
suitable for analysis by distributed file systems, such as
Hadoop.
V. OPPORTUNITIES
IoT is currently considered one of the most profound
transitions in technology. Current IoT provides several data
analytics opportunities for big data analytics. Figure 4
shows the examples of use cases and opportunities
discussed in Sections 4 and 5.
Big IoT data analytics
E-commerce
Smart citiesRetail & Logistics
Healthcare
Fig. 4. Example of use cases and opportunities for big IoT data analytics architecture
A. E-commerce
Big IoT data analytics offers well-designed tools to process
real-time big data, which produce timely results for decision
making. Big IoT data exhibit heterogeneity, increasing
volume, and real-time data processing features. The
convergence of big data with IoT brings new challenges and
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
11
opportunities to build a smart environment. Big IoT data
analytics has widespread applications in nearly every
industry. However, the main success areas of analytics are
in e-commerce, revenue growth, increased customer size,
accuracy of sale forecast results, product optimization, risk
management, and improved customer segmentation.
B. Smart cities
Big data collected from smart cities offer new opportunities
in which efficiency gains can be achieved through an
appropriate analytics platform/infrastructure to analyze big
IoT data. Various devices connect to the Internet in a smart
environment and share information. Moreover, the cost of
storing data has been reduced dramatically after the
invention of cloud computing technology. Analysis
capabilities have made huge leaps. Thus, the role of big data
in a smart city can potentially transform every sector of the
economy of a nation. Hadoop with YARN resource
manager has offered recent advancement in big data
technology to support and handle numerous workloads,
real-time processing, and streaming data ingestion.
C. Retail and logistics
IoT is expected to play a key role as an emerging
technology in the area of retail and logistics. In logistics,
RFID keeps track of containers, pallets, and crates. In
addition, considerable advancements in IoT technologies
can facilitate retailers by providing several benefits.
However, IoT devices generate large amounts of data on a
daily basis. Thus, powerful data analytics enables
enterprises to gain insights from the voluminous amounts of
data produced through IoT technologies. Applying data
analytics to logistic data sets can improve the shipment
experience of customers. Moreover, retail companies can
earn additional profit by analyzing customer data, which can
predict the trends and demands of goods. By looking into
customer data, optimizing pricing plans and seasonal
promotions can be planned efficiently to maximize profit.
D. Healthcare
Recent years have witnessed tremendous growth in smart
health monitoring devices. These devices generate
enormous amounts of data. Thus, applying data analytics to
data collected from fetal monitors, electrocardiograms,
temperature monitors, or blood glucose level monitors can
help healthcare specialists efficiently assess the physical
conditions of patients. Moreover, data analytics enables
healthcare professionals to diagnose serious diseases in their
early stages to help save lives. Furthermore, data analytics
improves the clinical quality of care and ensures the safety
of patients. In addition, physician profile can be reviewed
by looking into the history of treatment of patients, which
can improve customer satisfaction, acquisition, and
retention.
VI. OPEN CHALLENGES AND FUTURE DIRECTIONS
IoT and big data analytics have been extensively accepted
by many organizations. However, these technologies are
still in their early stages. Several existing research
challenges have not yet been addressed. This section
presents several challenges in the field of big IoT data
analytics.
A. Privacy
Privacy issues arise when a system is compromised to infer
or restore personal information using big data analytics
tools, although data are generated from anonymous users.
With the proliferation of big data analytics technologies
used in big IoT data, the privacy issue has become a core
problem in the data mining domain. Consequently, most
people are reluctant to rely on these systems, which do not
provide solid service-level agreement (SLA) conditions
regarding user personal information theft or misuse. In fact,
the sensitive information of users has to be secured and
protected from external interference. Although temporary
identification, anonymity, and encryptions provide several
ways to enforce data privacy, decisions have to be made
with regard to ethical factors, such as what to use, how to
use, and why use generated big IoT data [7].
Another security risk associated with IoT data is the
heterogeneity of the types of devices used and the nature of
generated data, such as raw devices, data types, and
communication protocols. These devices can have different
sizes and shapes outside the network and are designed to
communicate with cooperative applications. Thus, to
authenticate these devices, an IoT system should assign a
non-repudiable identification system to each device.
Moreover, enterprises should maintain a meta-repository of
these connected devices for auditing purposes. This
heterogeneous IoT architecture is new to security
professionals, and thus, results in increased security risks.
Consequently, any attack in this scenario compromises
system security and disconnects interconnected devices.
In the context of big IoT data, security and privacy are the
key challenges in processing and storing huge amounts of
data. Moreover, to perform critical operations and host
private data, these systems highly rely on third party
services and infrastructure. Therefore, an exponential
growth in data rate causes difficulty in securing each and
every portion of critical data. As previously discussed,
existing security solutions (Karim, 2016 #86) are no long
applicable to providing complete security in big IoT data
scenarios. Existing algorithms are not designed for the
dynamic observation of data, and thus, are not effectively
applied. Legacy data security solutions are specifically
designed for static data sets, whereas current data
requirements are changing dynamically (Lafuente, 2015).
Thus, deploying these security solutions is difficult for
dynamically increasing data. In addition, legislative and
regulatory issues should be considered while signing SLAs.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
12
With regard to data generated through IoT, the following
security problems can emerge [78]: (a) timely updates -
difficulty in keeping systems up to date, (b) incident
management - identifying suspicious traffic patterns among
legitimate ones and possible failure to capture unidentifiable
incidents, (c) interoperability - proprietary and vendor-
specific procedures will pose difficulties in finding hidden
or zero day attacks, (d) and protocol convergence - although
IPv6 is currently compatible with the latest specifications,
this protocol has yet to be fully deployed. Therefore, the
application of security rules over IPv4 may not be
applicable to protecting IPv6.
At present, no answer can address these challenges and
manage the security and privacy of interconnected devices.
However, the following guidelines can overcome these
adversities. (a) First, a true open ecosystem with standard
APIs is necessary to avoid interoperability and reliability
problems. (b) Second, devices must be well protected while
communicating with peers. (c) Third, devices should be
hardcoded with the best security practices to protect against
common security and privacy threats.
B. Data mining
Data mining methods provide efficient and best-fitting
predictive or descriptive solutions for big data that can also
be generalized for new data [79]. The evolution of big IoT
data and cloud computing platforms has brought the
challenges of data exploration and information extraction
[80]. However, for the overall big IoT data architecture,
Figure 5 presents the primary challenges related to
processing and data mining.
Fig. 5. Big data mining issues in IoT
Exhaustive data reads/writes: The high-volume, high-
velocity, and high-variety qualities of big IoT data challenge
exploration, integration, heterogeneous communication, and
extraction processes. The size and heterogeneity of data
impose new data mining requirements, and diversity in data
sources also poses a challenge [81-83]. Furthermore,
compared with small data sets, large data sets comprise
more abnormalities and ambiguities that require additional
preprocessing steps, such as cleansing, reduction, and
transmission [23, 84]. Another issue lies in the extraction of
exact and knowledgeable information from the large
volumes of diverse data. Consequently, obtaining accurate
information from complex data requires analyzing data
properties and finding association among different data
points.
Researchers have introduced parallel and sequential
programming models and proposed different algorithms to
minimize query response time while dealing with big data.
Moreover, researchers have selected existing data mining
algorithms in different manners to (a) improve single source
knowledge discovery, (b) implement data mining methods
for multi-source platforms, and (c) study and analyze
dynamic data mining methods and stream data [85]. Hence,
parallel k-means algorithm [86] and parallel association rule
mining methods [65] are introduced. However, the need to
devise algorithms remains to provide compatibility with the
latest parallel architectures. Moreover, synchronization
issues may occur in parallel computing, while information is
exchanged within different data mining methods. This
bottleneck of data mining methods has become an open
issue in big IoT data analytics that should be addressed.
C. Visualization
Visualization is an important entity in big data analytics,
particularly when dealing with IoT systems where data are
generated enormously. Furthermore, conducting data
visualization is difficult because of the large size and high
dimension of big data. This situation shows underlying
trends and a complete picture of parsed data. Therefore, big
data analytics and visualization should work seamlessly to
obtain the best results from IoT applications in big data.
However, visualization in the case of heterogeneous and
diverse data (unstructured, structured, and semi-structured)
is a challenging task. Designing visualization solution that is
compatible with advanced big data indexing frameworks is
a difficult task. Similarly, response time is a desirable factor
in big IoT data analytics. Consequently, cloud computing
architectures supported with rich GUI facilities can be
deployed to obtain better insights into big IoT data trends
[87].
Data
Knowledge
Discovery
Processing
Volume
Accessiblity
Correctness
Heterogeneity
Data Source
Data Type
Complexity
Sequential
Parallel
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
13
Different dimensionality reduction methods have been
introduced as a result of complex and high-dimensional big
IoT data [88, 89]. However, these methods are unsuitable
for all types of presented data. Similarly, when fine-grained
dimensions are visualized effectively, the probability to
identify observable correlations, patterns, and outliners is
high [90]. Moreover, data should be kept locally to obtain
usable information efficiently because of power and
bandwidth constraints. In addition, visualization software
should run with the concept of reference locality to achieve
efficient outcome in an IoT environment. Given that the
amount of big IoT data is increasing rapidly, the
requirement of enormous parallelization is a challenging
task in visualization. Thus, to decompose a problem into
manageable independent tasks to enforce concurrent
execution of queries is a challenge for parallel visualization
algorithms [91].
At present, most big data visualization tools used for IoT
exhibit poor performance results in terms of functionality,
scalability, and response time. To provide effective
uncertainty-aware visualization during the visual analytics
process, avoiding uncertainty imposes a considerable
challenge [32]. Furthermore, several important issues are
addressed [92], such as (a) visual noise - most data set
objects are closely related to one another, and thus, users
may perceive different results of the same type; (b)
information loss - applying reduction methods to visible
data sets can cause information loss; (c) large image
observation - data visualization tools have inherent
problems with respect to aspect ratio, devise resolution, and
physical perception limits; (d) frequently changing image -
users will not notice rapid data changes in an output; and (e)
high performance requirements - high performance
requirements are imposed because data are generated
dynamically in an IoT environment. Moreover, methods
supported by advanced analytics enable interactive graphics
on laptops, desktops, or mobile devices, such as
smartphones and tablets [93].
Real-time analytics is another consideration highlighted in
IoT architectures. Several guidelines on visualization in big
data are presented [94], such as (a) data awareness, i.e.,
appropriate domain expertise, (b) data quality - cleaning
data using information management or data governance
policies, (c) meaningful results - data clustering is used to
provide high-level abstraction such that the visibility of
smaller groups of data is possible, and (d) outliers should be
removed from the data or treated as a separate entity. [95]
suggested that visualization should adhere to the following
guidelines: (a) the system should provide special attention
to metadata, (b) visualization software should be interactive
and should require maximum user involvement, and (c)
tools should be built based on the dynamic nature of the
generated data.
D. Integration
Integration refers to having a uniform view of different
formats. Data integration provides a single view of the data
arriving from different sources and combines the view of
data [96]. Data integration includes all processes involved in
collecting data from different sources, as well as in storing
and providing data with a unified view. For each moment,
different forms of data are continuously generated by social
media, IoT, and other communication and
telecommunication approaches. The produced data can be
categorized into three groups: (a) structured data, such as
data stored in traditional database systems, including tables
with rows and columns; (b) semi-structured, such as HTML,
XML, and Json files; and (c) unstructured data, such as
videos, audios, and images. Good data offer good
information; however, this relationship is only achieved
through data integration [97]. Integrating diverse data types
is a complex task in merging different systems or
applications [98]. Overlapping the same data, increasing
performance and scalability, and enabling real-time data
access are among the challenges associated with data
integration that should be addressed in the future.
Another challenge is to adjust structures in semi-structured
and unstructured data before integrating and analyzing these
types of data [99]. Information, such as entities and
relationships, can be extracted from textual data by using
available technologies in the eras of text mining, machine
learning, natural processing, and information extraction.
However, new technologies should be developed to extract
images, videos, and other information from other non-text
formats of unstructured data [99]. Text mining is expected
to be conducted by applying several specialized extractors
on the same text. Hence, managing and integrating different
extraction results from a certain data source require other
techniques [100].
VII. CONCLUSION
The growth rate of data production has increased drastically
over the past years with the proliferation of smart and
sensor devices. The interaction between IoT and big data is
currently at a stage where processing, transforming, and
analyzing large amounts of data at a high frequency are
necessary. We conducted this survey in the context of big
IoT data analytics. First, we explored recent analytics
solutions. The relationship between big data analytics and
IoT was also discussed. Moreover, we proposed an
architecture for big IoT data analytics. Furthermore, big
data analytics types, methods, and technologies for big data
mining were presented. Some credible use cases were also
provided. In addition, we explored the domain by discussing
various opportunities brought about by data analytics in the
IoT paradigm. Several open research challenges were
discussed as future research directions. Finally, we
concluded that existing big IoT data analytics solutions
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
14
remained in their early stages of development. In the future,
real-time analytics solution that can provide quick insights
will be required.
ACKNOWLEDGMENT
This work is fully funded by Malaysian Ministry of Higher
Education under University Malaya Research Grant
(UMRG) Project/Program UM.0000168/HRU.RP.IT and
RP029D-14AET.
Reference
1. Tiainen, P., New opportunities in electrical
engineering as a result of the emergence of the
Internet of Things. 2016.
2. Beyer, M., Gartner Says Solving'Big
Data'Challenge Involves More Than Just Managing
Volumes of Data. Gartner. Archived from the
original on, 2011. 10.
3. Gantz, J. and D. Reinsel, Extracting value from
chaos. IDC iview, 2011. 1142: p. 1-12.
4. Mital, R., J. Coughlin, and M. Canaday. Using Big
Data Technologies and Analytics to Predict Sensor
Anomalies. in Proceedings of the Advanced Maui
Optical and Space Surveillance Technologies
Conference, held in Wailea, Maui, Hawaii,
September 15-18, 2014, Ed.: S. Ryan, The Maui
Economic Development Board, id. 84. 2015.
5. Golchha, N., Big DataThe information revolution.
IJAR, 2015. 1(12): p. 791-794.
6. Russom, P., Big data analytics. TDWI Best
Practices Report, Fourth Quarter, 2011.
7. Tsai, C.-W., et al., Big data analytics: a survey.
Journal of Big Data, 2015. 2(1): p. 1-32.
8. Chen, M., et al., Related Technologies, in Big
Data. 2014, Springer. p. 11-18.
9. Khan, Z., A. Anjum, and S.L. Kiani. Cloud based
big data analytics for smart future cities. in
Proceedings of the 2013 IEEE/ACM 6th
international conference on utility and cloud
computing. 2013. IEEE Computer Society.
10. Russom, P., Big data analytics. TDWI Best
Practices Report, Fourth Quarter, 2011: p. 1-35.
11. LaValle, S., et al., Big data, analytics and the path
from insights to value. MIT sloan management
review, 2011. 52(2): p. 21.
12. Al Nuaimi, E., et al., Applications of big data to
smart cities. Journal of Internet Services and
Applications, 2015. 6.
13. Gubbi, J., et al., Internet of Things (IoT): A vision,
architectural elements, and future directions.
Future Generation Computer Systems, 2013. 29(7):
p. 1645-1660.
14. Ciufo, C.A. Industrial equipment talking on the
IoT? Better get a gateway (device). 2014 [cited
2016 7-8-2016]; Available from:
http://eecatalog.com/caciufo/2014/07/15/iot-
gateway-adlink/.
15. Atzori, L., A. Iera, and G. Morabito, The internet
of things: A survey. Computer networks, 2010.
54(15): p. 2787-2805.
16. Hsieh, H.-C. and C.-H. Lai. Internet of things
architecture based on integrated plc and 3g
communication networks. in Parallel and
Distributed Systems (ICPADS), 2011 IEEE 17th
International Conference on. 2011. IEEE.
17. Kambatla, K., et al., Trends in big data analytics.
Journal of Parallel and Distributed Computing,
2014. 74(7): p. 2561-2573.
18. Manyika, J., et al., Big data: The next frontier for
innovation, competition, and productivity. 2011.
19. Hashem, I.A.T., et al., The rise of ―big data‖ on
cloud computing: Review and open research
issues. Information Systems, 2015. 47: p. 98-115.
20. Ali, W.B., Big Data-Driven Smart Policing: Big
Data-Based Patrol Car Dispatching. Journal of
Geotechnical and Transportation Engineering,
2016. 1(2).
21. Gantz, J. and D. Reinsel, THE DIGITAL
UNIVERSE IN 2020: Big Data, Bigger Digital
Shadows, and Biggest Growth in the Far East.
Study report, IDC, 2012.
22. Borkar, V., M.J. Carey, and a.C. Li., Inside "Big
Data management": Ogres,Onions, or Parfaits? . In
Proceedings of the 15th International Conference
on Extending Database Technology, EDBT ‘12,,
2012: p. 3-14.
23. Gani, A., et al., A survey on indexing techniques
for big data: taxonomy and performance
evaluation. Knowledge and Information Systems,
2016. 46(2): p. 241-284.
24. Paul, A., et al., Video search and indexing with
reinforcement agent for interactive multimedia
services. ACM Trans. Embed. Comput. Syst.,
2013. 12(2): p. 1-16.
25. Kwon, O., N. Lee, and B. Shin, Data quality
management, data usage experience and
acquisition intention of big data analytics.
International Journal of Information Management,
2014. 34(3): p. 387-394.
26. Oswal, S. and S. Koul. Big Data Analytic and
Visualization On Mobile Devices. in Proceedings
of National Conference on New Horizons in IT-
NCNHIT. 2013.
27. Candela, L., D. Castelli, and P. Pagano, Managing
big data through hybrid data infrastructures.
ERCIM News, 2012. 89: p. 37-38.
28. Assunçaoa, M.D., et al., Big Data Computing and
Clouds: Challenges, Solutions, and Future
Directions. arXiv preprint arXiv:1312.4722, 2013.
29. Singh, D. and C.K. Reddy, A survey on platforms
for big data analytics. Journal of Big Data, 2014.
2(1): p. 1.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
15
30. Siddiqa, A., et al., A Survey of Big Data
Management: Taxonomy and State-of-the-Art.
Journal of Network and Computer Applications,
2016.
31. Steed, C.A., et al., Big data visual analytics for
exploratory earth system simulation analysis.
Computers & Geosciences, 2013. 61: p. 71-82.
32. Chen, C.P. and C.-Y. Zhang, Data-intensive
applications, challenges, techniques and
technologies: A survey on Big Data. Information
Sciences, 2014. 275: p. 314-347.
33. Pfaffl, M.W., A new mathematical model for
relative quantification in real-time RTPCR.
Nucleic acids research, 2001. 29(9): p. e45-e45.
34. Waas, F.M. Beyond conventional data
warehousingmassively parallel data processing
with Greenplum database. in International
Workshop on Business Intelligence for the Real-
Time Enterprise. 2008. Springer.
35. Färber, F., et al., SAP HANA database: data
management for modern business applications.
ACM Sigmod Record, 2012. 40(4): p. 45-51.
36. Cheng, M., et al., Mu rhythm-based cursor control:
an offline analysis. Clinical Neurophysiology,
2004. 115(4): p. 745-751.
37. Castro, M., et al., SCRIBE: A large-scale and
decentralized application-level multicast
infrastructure. IEEE Journal on Selected Areas in
communications, 2002. 20(8): p. 1489-1499.
38. Kreps, J., N. Narkhede, and J. Rao. Kafka: A
distributed messaging system for log processing. in
Proceedings of the NetDB. 2011.
39. Notsu, H., et al. Time-tunnel: Visual analysis tool
for time-series numerical data and its extension
toward parallel coordinates. in International
Conference on Computer Graphics, Imaging and
Visualization (CGIV'05). 2005. IEEE.
40. Rabkin, A. and R.H. Katz. Chukwa: A System for
Reliable Large-Scale Log Collection. in LISA.
2010.
41. Hong, S. and H. Kim. An analytical model for a
GPU architecture with memory-level and thread-
level parallelism awareness. in ACM SIGARCH
Computer Architecture News. 2009. ACM.
42. Chodorow, K., MongoDB: the definitive guide.
2013: " O'Reilly Media, Inc.".
43. Jourdan, Z., R.K. Rainer, and T.E. Marshall,
Business Intelligence: An Analysis of the
Literature 1. Information Systems Management,
2008. 25(2): p. 121-131.
44. Bifet, A., et al., Moa: Massive online analysis. The
Journal of Machine Learning Research, 2010. 11:
p. 1601-1604.
45. Mukhopadhyay, A., et al., A survey of
multiobjective evolutionary algorithms for data
mining: Part I. Evolutionary Computation, IEEE
Transactions on, 2014. 18(1): p. 4-19.
46. Estivill-Castro, V., Why so many clustering
algorithms: a position paper. ACM SIGKDD
Explorations Newsletter, 2002. 4(1): p. 65-75.
47. Bielza, C. and P. Larrañaga, Discrete Bayesian
network classifiers: a survey. ACM Computing
Surveys (CSUR), 2014. 47(1): p. 5.
48. Chen, F., et al., Data mining for the internet of
things: literature review and challenges.
International Journal of Distributed Sensor
Networks, 2015. 2015: p. 12.
49. Luss, R. and A. d‘Aspremont, Predicting abnormal
returns from news using text classification.
Quantitative Finance, 2015. 15(6): p. 999-1012.
50. Melin, P. and O. Castillo, A review on type-2
fuzzy logic applications in clustering, classification
and pattern recognition. Applied soft computing,
2014. 21: p. 568-577.
51. Soualhi, A., K. Medjaher, and N. Zerhouni,
Bearing health monitoring based on hilberthuang
transform, support vector machine, and regression.
IEEE Transactions on Instrumentation and
Measurement, 2015. 64(1): p. 52-62.
52. Larose, D.T., k‐Nearest Neighbor Algorithm.
Discovering Knowledge in Data: An Introduction
to Data Mining, 2005: p. 90-106.
53. Su, M.-Y., Real-time anomaly detection systems
for Denial-of-Service attacks by weighted k-
nearest-neighbor classifiers. Expert Systems with
Applications, 2011. 38(4): p. 3492-3498.
54. Muja, M. and D.G. Lowe, Scalable nearest
neighbor algorithms for high dimensional data.
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2014. 36(11): p. 2227-2240.
55. Hu, C., et al., Data-driven method based on particle
swarm optimization and k-nearest neighbor
regression for estimating capacity of lithium-ion
battery. Applied Energy, 2014. 129: p. 49-55.
56. Srivastava, K., et al., Data mining using
hierarchical agglomerative clustering algorithm in
distributed cloud computing environment.
International Journal of Computer Theory and
Engineering, 2013. 5(3): p. 520.
57. Berkhin, P., A survey of clustering data mining
techniques, in Grouping multidimensional data.
2006, Springer. p. 25-71.
58. Gosain, A. and M. Bhugra. A comprehensive
survey of association rules on quantitative data in
data mining. in Information & Communication
Technologies (ICT), 2013 IEEE Conference on.
2013. IEEE.
59. Fitzwater, M., Efficient mining of maximal
sequential patterns using multiple samples. 2005.
60. Yang, Z. and M. Kitsuregawa. LAPIN-SPAM: An
improved algorithm for mining sequential pattern.
in 21st International Conference on Data
Engineering Workshops (ICDEW'05). 2005. IEEE.
61. Gandomi, A. and M. Haider, Beyond the hype: Big
data concepts, methods, and analytics. International
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
16
Journal of Information Management, 2015. 35(2):
p. 137-144.
62. Kalpakis, K., D. Gada, and V. Puttagunta. Distance
measures for effective clustering of ARIMA time-
series. in Data Mining, 2001. ICDM 2001,
Proceedings IEEE International Conference on.
2001. IEEE.
63. Kumar, N., et al. Time-series Bitmaps: a Practical
Visualization Tool for Working with Large Time
Series Databases. in SDM. 2005. SIAM.
64. Ryan, D., High performance discovery in time
series: techniques and case studies. 2013: Springer
Science & Business Media.
65. Wu, X. and S. Zhang, Synthesizing high-frequency
rules from different data sources. IEEE
Transactions on Knowledge and Data Engineering,
2003. 15(2): p. 353-367.
66. Duan, R., X. Chen, and T. Xing. A QoS
architecture for IOT. in Internet of Things
(iThings/CPSCom), 2011 International Conference
on and 4th International Conference on Cyber,
Physical and Social Computing. 2011. IEEE.
67. Zhang, Y., et al., ICN based Architecture for IoT.
IRTF contribution, October, 2013.
68. Darby, S., Smart metering: what potential for
householder engagement? Building Research &
Information, 2010. 38(5): p. 442-457.
69. Rahman, T.A. and S.K.A. Rahim. RFID vehicle
plate number (e-plate) for tracking and
management system. in Parallel and Distributed
Systems (ICPADS), 2013 International Conference
on. 2013. IEEE.
70. Sherly, J. and D. Somasundareswari, INTERNET
OF THINGS BASED SMART
TRANSPORTATION SYSTEMS. 2015.
71. Tohamy, N., What you need to know about the
Internet of Things. MHD Supply Chain Solutions,
2015. 45(3): p. 32.
72. Pettey, C. Five Ways the Internet of Things Will
Benefit the Supply Chain. 2015 [cited 2016;
Available from:
http://www.gartner.com/smarterwithgartner/five-
ways-the-internet-of-things-will-benefit-the-
supply-chain-2/.
73. Yan, Y., et al., A survey on smart grid
communication infrastructures: Motivations,
requirements and challenges. IEEE
communications surveys & tutorials, 2013. 15(1):
p. 5-20.
74. Bera, S., S. Misra, and J.J. Rodrigues, Cloud
computing applications for smart grid: A survey.
IEEE Transactions on Parallel and Distributed
Systems, 2015. 26(5): p. 1477-1494.
75. Dethlefs, T., et al. Energy Service Description for
Capabilities of Distributed Energy Resources. in
DA-CH Conference on Energy Informatics. 2015.
Springer.
76. Neureiter, C., et al. A Standards-based Approach
for Domain Specific Modelling of Smart Grid
System Architectures. in Proceedings of the 11th
International Conference on System of Systems
Engineering (SoSE), Kongsberg, Norway. 2016.
77. Bonomi, F., et al. Fog computing and its role in the
internet of things. in Proceedings of the first
edition of the MCC workshop on Mobile cloud
computing. 2012. ACM.
78. Steinklauber, K. Data Protection in the Internet of
Things. 2014 [cited 2016 20 June]; Available
from: https://securityintelligence.com/data-
protection-in-the-internet-of-things.
79. Mukhopadhyay, A., et al., A survey of
multiobjective evolutionary algorithms for data
mining: Part I. IEEE Transactions on Evolutionary
Computation, 2014. 18(1): p. 4-19.
80. Hu, T., et al. A survey of mass data mining based
on cloud-computing. in Anti-counterfeiting,
Security, and Identification. 2012. IEEE.
81. Sun, Y., et al., Mining knowledge from
interconnected data: a heterogeneous information
network analysis approach. Proceedings of the
VLDB Endowment, 2012. 5(12): p. 2022-2023.
82. Chen, M., et al., Itinerary planning for energy-
efficient agent communications in wireless sensor
networks. IEEE Transactions on Vehicular
Technology, 2011. 60(7): p. 3290-3299.
83. Zhang, D., et al., A Taxonomy of Agent
Technologies for Ubiquitous Computing
Environments. TIIS, 2012. 6(2): p. 547-565.
84. Chen, M., V.C. Leung, and S. Mao, Directional
controlled fusion in wireless sensor networks.
Mobile Networks and Applications, 2009. 14(2): p.
220-229.
85. Wu, X., et al., Data mining with big data. IEEE
transactions on knowledge and data engineering,
2014. 26(1): p. 97-107.
86. Su, K., et al., A logical framework for identifying
quality knowledge from different data sources.
Decision Support Systems, 2006. 42(3): p. 1673-
1683.
87. Wang, L., G. Wang, and C.A. Alexander, Big data
and visualization: methods, challenges and
technology progress. Digital Technologies, 2015.
1(1): p. 33-38.
88. Azar, A.T. and A.E. Hassanien, Dimensionality
reduction of medical big data using neural-fuzzy
classifier. Soft computing, 2015. 19(4): p. 1115-
1127.
89. Popov, V.L. and M. Heß, Method of
dimensionality reduction in contact mechanics and
friction. 2015: Springer.
90. Donalek, C., et al. Immersive and collaborative
data visualization using virtual reality platforms. in
Big Data (Big Data), 2014 IEEE International
Conference on. 2014. IEEE.
2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2689040, IEEE Access
17
91. Hank Childs, B.G., Will Schroeder, Jeremy
Meredith, Kenneth Moreland, Christopher Sewell,
Torsten Kuhlen, E. Wes Bethel, Research
Challenges for Visualization Software. Computer,
2013. 46(5): p. 34-42.
92. Gorodov, E.Y.e. and V.V.e. Gubarev, Analytical
review of data visualization methods in application
to big data. Journal of Electrical and Computer
Engineering, 2013. 2013: p. 22.
93. Center", I.I. Big Data Visualization: Turning Big
Data Into Big Insights [while paper]. 2013 [cited
2016 June 20]; Available from:
https://future.transport.nsw.gov.au/wp-
content/uploads/2016/02/big-data-visualization-
turning-big-data-into-big-insights.pdf.
94. Inc, S.I. Five big data challenges and how to
overcome them with visual analytics. 2013 [cited
2016 June 20]; Available from:
https://www.sas.com/resources/asset/five-big-data-
challenges-article.pdf.
95. Simon, P., The visual organization: data
visualization, Big Data, and the quest for better
decisions. 2014: John Wiley & Sons.
96. Ahamed, B.B., T. Ramkumar, and S. Hariharan.
Data Integration Progression in Large Data Source
Using Mapping Affinity. in Advanced Software
Engineering and Its Applications (ASEA), 2014
7th International Conference on. 2014.
97. Liu, J. and X. Zhang, Data integration in fuzzy
XML documents. Information Sciences, 2014. 280:
p. 82-97.
98. Ma‘ayan, A., et al., Lean Big Data integration in
systems biology and systems pharmacology.
Trends in pharmacological sciences, 2014. 35(9):
p. 450-460.
99. Divyakant Agrawal, P.B., Elisa Bertino, Susan
Davidson, Umeshwar Dayal, Michael, et al.,
Challenges and Opportunities with Big Data: A
community white paper developed by leading
researchers across the United States. Whitepaper,
Computing Community Consortium, 2012.
100. Rakesh Agrawal, A.A., Philip A. Bernstein, Eric A.
Brewer, Michael J. Carey,, et al., The Claremont
Report on Database Research. 2009. 52(6): p. 56-
65.
... This is because IoT services and functionalities often require fast and accurate data analytics. Effective and efficient data analytics enables IoT systems to make fast decisions, gain rapid insights, discover hidden patterns, and interact with users and other systems (Marjani et al., 2017). ...
... Although data analytics contributes significantly to IoT applications, it is still in its early stages (Marjani et al., 2017). Numerous challenges must be addressed before IoT data can be properly used in IoT applications (Mahdavinejad et al., 2018), such as the quality, privacy, and analytics speed of IoT data. ...
... However, as IoT data is often collected from different data sources and is highly variable, maintaining data quality is usually challenging. The high generation speed and volume of IoT data are also significant problems (Marjani et al., 2017). Effective data integration has also become a challenge for creating high-quality datasets from different IoT devices. ...
Article
Public Version of the paper: https://arxiv.org/pdf/2209.08018.pdf GitHub Code/AutoML Tutorial: https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics Abstract: With the wide spread of sensors and smart devices in recent years, the data generation speed of the Internet of Things (IoT) systems has increased dramatically. In IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges, specifically, effective model selection, design/tuning, and updating, which have brought massive demand for experienced data scientists. Additionally, the dynamic nature of IoT data may introduce concept drift issues, causing model performance degradation. To reduce human efforts, Automated Machine Learning (AutoML) has become a popular field that aims to automatically select, construct, tune, and update machine learning models to achieve the best performance on specified tasks. In this paper, we conduct a review of existing methods in the model selection, tuning, and updating procedures in the area of AutoML in order to identify and summarize the optimal solutions for every step of applying ML algorithms to IoT data analytics. To justify our findings and help industrial users and researchers better implement AutoML approaches, a case study of applying AutoML to IoT anomaly detection problems is conducted in this work. Lastly, we discuss and classify the challenges and research directions for this domain.
... This is because IoT services and functionalities often require fast and accurate data analytics. Effective and efficient data analytics enables IoT systems to make fast decisions, gain rapid insights, discover hidden patterns, and interact with users and other systems [7]. ...
... Although data analytics contributes significantly to IoT applications, it is still in its early stages [7]. Numerous challenges must be addressed before IoT data can be properly used in IoT applications [16], such as the quality, privacy, and analytics speed of IoT data. ...
... However, as IoT data is often collected from different data sources and is highly variable, maintaining data quality is usually challenging. The high generation speed and volume of IoT data are also significant problems [7]. Effective data integration has also become a challenge for creating highquality datasets from different IoT devices. ...
Preprint
Full-text available
With the wide spread of sensors and smart devices in recent years, the data generation speed of the Internet of Things (IoT) systems has increased dramatically. In IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges, specifically, effective model selection, design/tuning, and updating, which have brought massive demand for experienced data scientists. Additionally, the dynamic nature of IoT data may introduce concept drift issues, causing model performance degradation. To reduce human efforts, Automated Machine Learning (AutoML) has become a popular field that aims to automatically select, construct, tune, and update machine learning models to achieve the best performance on specified tasks. In this paper, we conduct a review of existing methods in the model selection, tuning, and updating procedures in the area of AutoML in order to identify and summarize the optimal solutions for every step of applying ML algorithms to IoT data analytics. To justify our findings and help industrial users and researchers better implement AutoML approaches, a case study of applying AutoML to IoT anomaly detection problems is conducted in this work. Lastly, we discuss and classify the challenges and research directions for this domain.
... It does not mention any details about big data management and analytics, near-real time visualization, and the autonomous control of smart buildings. In [61], the authors present high-level IoT architecture and challenges faced in the IoT domain. It lacks autonomous control of the facilities in general and smart building controls in particular. ...
Article
Full-text available
The management and analytics of big data generated from IoT sensors deployed in smart buildings pose a real challenge in today’s world. Hence, there is a clear need for an IoT focused Integrated Big Data Management and Analytics framework to enable the near real-time autonomous control and management of smart buildings. The focus of this paper is on the development and evaluation of the reference architecture required to support such a framework. The applicability of the reference architecture is evaluated by taking into account various example scenarios for a smart building involving the management and analysis of near real-time IoT data from 1000 sensors. The results demonstrate that the reference architecture can guide the complex integration and orchestration of real-time IoT data management, analytics, and autonomous control of smart buildings, and that the architecture can be scaled up to address challenges for other smart environments.
... By increasing population in the cities and expanding urbanization in various regions of the planet, the importance of the IoT will be more acknowledged due to the valuable utilities of its peripheral smart advantages (Farsi et al., 2019). Furthermore, in smart cities' complex environments, several sectors can become interconnected by IoT networks (Marjani et al., 2017). Various sectors of smart cities are connected to smart technologies . ...
Book
Full-text available
Urban Climate Adaptation and Mitigation offers evidence-based, scientific solutions for improving a city's ability to prepare, recover and adapt to global climate-related events. Bringing together a wide variety of research disciplines to addresses the linkages to climate change adaptation and mitigation topics with planning, transportation and waste management, the book informs different types of stakeholders on how they can enhance their preparation abilities to enable real-time response methods. Application-focused throughout, this book explores the complexities of urban systems and subsystems to support researchers, planners and decision-makers in their efforts toward developing more climate-resilient smart cities.
Chapter
People may now gain important insight into huge heterogeneous data created by IoT devices, thanks to a variety of bigdata, IoT, and analytics technologies. This paper examines the most recent academic initiatives focused on massive IoT data analytics. The connection between bigdata analytics and the IoT is explained. Furthermore, by introducing a novel paradigm for massive IoT data analytics, this research review introduces different types, methodologies, and technology for massive IoT data analytics. The advantages of data analytics under the IoT paradigm are then highlighted. Finally, future study directions include open research challenges, visualization, and integration. The Internet of Things is considered as a catalyst for the creation of intelligent, context-aware services, and applications. These services could respond to changes in the environment in real time.KeywordsBigdataIoT (Internet of Things)Data analytics
Chapter
Automating the life cycle of data management projects is a challenging issue that has attracted the interest of both academic researchers and industrial companies. Therefore, several commercial and academic tools have been proposed to be used in a broad range of contexts. However, when dealing with data generated from connected environments (e.g., smart homes, cities), the data acquisition and management becomes more complex and heavily dependant on the environmental context thus rendering traditional tools less efficient and appropriate. In this respect, we introduce here OpenCEMS, an open platform for data management and analytics that can be used in various application domains and contexts, and more specifically in designing connected environments and analysing their generated/simulated data. Indeed, OpenCEMS provides a wide array of functionalities ranging from data pre-processing to post-processing allowing to represent and manage data from the different components of a connected environment (e.g., hardware, software) and to define the interactions between them. This allows to both simulate data with respect to different parameters as well as to contextualise collected data from the connected devices (i.e., consider environmental/sensing contexts). In this paper, we compare OpenCEMS with existing solutions and show how data is represented and processed.
Article
Full-text available
The rapid growth of emerging applications and the evolution of cloud computing technologies have significantly enhanced the capability to generate vast amounts of data. Thus, it has become a great challenge in this big data era to manage such voluminous amount of data. The recent advancements in big data techniques and technologies have enabled many enterprises to handle big data efficiently. However, these advances in techniques and technologies have not yet been studied in detail and a comprehensive survey of this domain is still lacking. With focus on big data management, this survey aims to investigate feasible techniques of managing big data by emphasizing on storage, pre-processing, processing and security. Moreover, the critical aspects of these techniques are analyzed by devising a taxonomy in order to identify the problems and proposals made to alleviate these problems. Furthermore, big data management techniques are also summarized. Finally, several future research directions are presented.
Article
Full-text available
Many governments are considering adopting the smart city concept in their cities and implementing big data applications that support smart city components to reach the required level of sustainability and improve the living standards. Smart cities utilize multiple technologies to improve the performance of health, transportation, energy, education, and water services leading to higher levels of comfort of their citizens. This involves reducing costs and resource consumption in addition to more effectively and actively engaging with their citizens. One of the recent technologies that has a huge potential to enhance smart city services is big data analytics. As digitization has become an integral part of everyday life, data collection has resulted in the accumulation of huge amounts of data that can be used in various beneficial application domains. Effective analysis and utilization of big data is a key factor for success in many business and service domains, including the smart city domain. This paper reviews the applications of big data to support smart cities. It discusses and compares different definitions of the smart city and big data and explores the opportunities, challenges and benefits of incorporating big data applications for smart cities. In addition it attempts to identify the requirements that support the implementation of big data applications for smart city services. The review reveals that several opportunities are available for utilizing big data in smart cities; however, there are still many issues and challenges to be addressed to achieve better utilization of this technology.
Conference Paper
Full-text available
The increasing number of volatile Distributed Energy Resources (DERs) in the electricity grid implies a rising level of complexity and dynamics. The integration and management of these DERs have lead to the introduction of the aggregator role, with the aim of providing energy services to system operators and the market. With regard to the often changing capabilities of DERs, the dynamical aggregation of DERs to meet the demand is still a matter of concern. In this paper a generic description for the capabilities of DERs will be introduced in order to allow the aggregator to efficiently search and find DERs suitable for ag-gregation. These reduced as possible and abstracted descriptions of the DER capabilities are called Energy Services, which should be complete enough for the aggregators search demands. The Energy Service definition will be part of a recent research project, the Open System for Energy Services (OS4ES) that is going to enable the aggregator to control dynamically configured large scale Virtual Power Plants with IEC 61850. The results of this project and its field test should contribute to the further development of IEC 61850.
Article
Many kind of pattern integration need to be effectively analyzed in large data which require extremely accurate pattern. Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. Existing patterns integration extracts low quality of pattern mapping in large databases and the systems focus only on identifying useful patterns at the attribute-value level. We propose a generalized technique to enable seamless integration of Multiple Data Sources It improves the quality of pattern reorganization significantly. Finally, experiments are conducted on few datasets, and the results of the experiments show that our method is useful and efficient.