Conference PaperPDF Available

Big Data in IoT

July 2019

July 2019

DOI:10.1109/ICCCNT45670.2019.8944495

Conference: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT)

Authors:

Shivanjali Khare

University of New Haven

Michael Totaro

University of Louisiana at Lafayette

Internet of Things(IoT) reference model [27]

…

Total time for an IoT response

…

Figures - uploaded by Shivanjali Khare

Content may be subject to copyright.

Content uploaded by Shivanjali Khare

Content may be subject to copyright.

Big Data in IoT

1st Shivanjali Khare

University of Louisiana at Lafayette

Center for Advanced Computer Studies

Lafayette, USA

sxk7139@louisiana.edu

2nd Michael Totaro

University of Louisiana at Lafayette

Center for Advanced Computer Studies

Lafayette, USA

mwt3774@louisiana.edu

Abstract—The Internet of Things is generating an enormous

amount of data. Analyzing and managing that data requires

programming and statistical approaches. Big Data technology

operates on this massive data and pushes new products, appli-

cations, future research and developments to improve decision

making. In this paper, we explore Big data in IoT driven

technologies and the issue of the four V’s in Big Data. This

paper also highlights the importance of pre-processing, meta-

data, data storage formats, data management and how big data

is closely associated with IoT technologies. Today, with the

rapid growth of IoT, everything is connected. To stay ahead

of demands, new technologies such as Cloud Computing and

Edge Computing are transforming IoT organizations. This paper

discusses in which layers edge computing operates in the IoT

reference model to achieve low-latency and greater efﬁciency

solutions. This paper also reviews the IoT reference model layers

that are associated with cloud computing, the structure of cloud

computing architecture, data acquisition and data cleaning. This

paper also discusses on various cloud-based IoT platforms such as

AWS, Google Cloud IoT, Microsoft Azure, and Cisco IoT Cloud.

We examined the importance of Big Data visualization, gives

insights on various visualization tools and techniques. Lastly, this

paper also addresses various signiﬁcant challenges of Big Data

in IoT, security issues and future research directions.

Index Terms—IoT, Big Data, IoT security, meta-data, pre-

processing, Edge Computing, Cloud Computing, Data cleaning,

data acquisition, data visualization.

I. INTRODUCTION

Internet of Things (IoT) is generating massive quantities of

data every second. Bernard Marr in [1] projects the increase

in data creation from past years. The Internet daily generates a

massive amount of data through various services such as web

searches, social-media platforms such as Facebook, Instagram,

and so on. IoT is accelerating these statistics by connecting

physical devices (sensors) to the Internet, providing variety of

services to its users, while collecting different kinds of data.

IoT involves data management and data analysis techniques.

Data analysis requires an exclusive approach. Many organi-

zations accomplish the data generated from IoT devices and

use these insights for smart decision-making. Kashmir Hill

in [2] cites an example where a US-based store, Target, was

able to detect the pregnancy of women with advertising and

purchases they made through credit card and analysis of their

routine purchases against historical data.

IoT has many applications such as in healthcare, manufac-

turing, industrial IoT, smart homes, smart cities, and so on.

IoT devices require the right form of sensors to be deployed

in the right areas to capture the data. The collected data

can vary, depending upon the service provided by the IoT

device. IoT sensors have few restrictions such as environ-

ment sensitivity, distance limitations, etc. IoT sensors gather

information from the environment, forwards it to the central

node where data analysis take place, and then forwards the

information to another node. Consider a smart home, which

consists of multiple IoT devices such as thermostats, smart

lighting systems, smart door locks, smart gardening, personal

assistants, and so on. Across the entire house, there are bundles

of nodes passing formation to the main server which stores

or communicates this information the cloud. The user should

be aware of restrictions by the sensors, which affect the data

analysis, in order to avoid inaccurate or bad data.

The use of IoT devices shows a continuous collection

of data. Gathering this data leads to observations that are

remarkable. Big Data deals with the data set, analyzes and

extracts meaningful information from collected data. There

exist various online sources that provide open access data

collections [3] [4].

The objective of this paper is to highlight the association

between Big Data in IoT and create a relationship that de-

termines the processing and analysis of data collected by IoT

devices. This paper discusses big data management techniques

at various levels such as collection, processing, analysis, and

so forth. This paper also provides a survey of the existing IoT

related technologies such as cloud and edge computing. The

paper also delivers many attributes that are not addressed in

current survey papers, along with some new challenges and

future research.

II. RELATED WORK

A review of the IoT literature suggests that there is consider-

able eagerness in the ﬁeld of IoT systems [5] [6] [7] [8]. These

studies, however, centered their research direction entirely on

the architecture, applications and investments. Yunhao et. al in

[9] reviewed the state-of-the-art of big data. They introduced

general background, examined several applications and related

technologies. Archenaa et. al in [10] focus on the seriousness

of performing big data analysis on the data collected by the

healthcare and government. In contrast, our work focuses on

the techniques and mechanisms of data collected by the IoT

devices and establishes a correlation between them. Figure 1

illustrates topics covered in this paper.

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

TABLE I

STRUCTURE OF THE PAPER

III. BIG DATA

Gathering such massive data integrates storing the data

generated from multiple technology nodes. IoT networks op-

erate depending upon the analysis of this data. The network

generate different data types, noise and some redundant data.

Since IoT sensors and devices degrade over time, Big Data

organizations reduce the risk of errors and maintain accurate

decision making. Wetzkar et al. in [11] show various examples

in the area of Industrial IoT (IIoT) where they faced issues

in identifying, analyzing failures and troubleshooting failures.

There is the need of automatic data collection and automatic

error corrections. Traditionally, big data involves four dimen-

sions, also known as Four V’s. They are:

1) Volume: amount of data

2) Variety: different types of structured and unstructured

data

3) Velocity: processing speed of the data

4) Veracity: truthness of the data

Some research scholars list these issues of Big Data as 3Vs’

by removing Veracity, or by adding more issues such as

Value, Validity, and so on. Ishwarappa and Anuradha in [12]

considered 5Vs’, whereas Khan in [13] considered 10Vs’ as

Big Data issues.

A. Volume

IoT devices stores massive data such as employee records,

stock information, invoices, purchase history, card details,

along with location details, and so on. Such additional in-

formation is called a meta-data that helps to contextualize

the knowledge. The majority of large organizations invest in

cutting edge databases, data management ﬁrms, distributed

systems, and cloud storage for storing digital information. The

quantity of data generated and collected from IoT devices

is essential as all the data needs to be measured, stored or

transmitted to other nodes. This has become a challenge as the

amount of data has become very large and traditional database

technology is no longer favorable.

B. Variety

Big data involves the gathering of target data from a wide

range of sources simultaneously. IoT data involves data from

different kinds of sensors, non-numerical items such as mp3,

mp4, radio signals, and so on. Handling this variety of data is

a challenge. The meta-data should be stored in correct context

with the collected data and should allow to associate future

data collections automatically. Another issue when considering

the current state-of-art of IoT and change in techniques is

the ability for storage software to adapt to these changes. For

example, change of video quality or format in sensors.

C. Velocity

The data produced by sensors or other inputs in IoT devices

occur at an extremely high rate. This high velocity of data

production and collection becomes challenging because the

data should be handled promptly for new data to come

in. Moreover, the velocity of data production is not always

constant. The velocity changes over time; for example, sales

of a company increases during a certain offer period. Gandomi

and Haider in [14] discuss the importance of time here. In such

situations, there is a need for appropriate planning, processing

power and storage to avoid data loss and system outage.

Although such a commitment of computing power may be

expensive, it should be planned ahead of time to increase the

revenue of an organization.

D. Veracity

IoT sensors do not have margins of error in measurement.

Wireless sensors can face communication error, hardware

failure due to shift in the environment, animals or any other

factors. As such, it is essential that data is properly stored,

accurate and complete. The “truthness” of data forms the basis

of many business decisions. It is necessary to differentiate

between reliable and unreliable data.

IV. INTELLIGENT DATA PROCESSING

One common solution to the problem encountered during

data collection and use of big data in IoT is the intelligent

use of software. Some general approaches of intelligent data

processing are Pre-processing and Meta-data creation.

A. Pre-processing

The data collected by IoT sensors is often sent to different

locations and processed there. The large amount of data

produced needs to be sent quickly to the processing location.

Data can be lost entirely or in part if there is latency. Baker et

al. in [15] discuss instances of medical emergency situations

where such delays in communication can lead to possible detri-

mental effects on patients. Often, in many situations the data

regarding particular event is required for further processing.

Pre-processing helps to reduce the volume of data. It moves

the processing function closer to the sensors and reduces

the amount of data to be sent. Smart sensors in IoT uses

built-in resources to perform pre-processing before sending it

further. Antonini et al. in [16] presents a design framework for

smart audio sensors. These smart sensors locally perform the

computations on raw audio streams before transmitting those

features wirelessly to IoT gateway.

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

B. Meta-data creation

After processing, data is stored to be used again. Meta-

data is used to put the stored data into context. When needed

the stored data is queried for information. Given the variety

and volume of data, it can take a considerable amount of

resources to process that data again. Meta-data to speed up the

process by adding additional data that describes or references

the stored data. Park et al. in [17] proposed a conceptual meta-

data model for sensor data abstraction in IoT environments.

This model helps to create a structured format for the low-level

context and helps in higher abstraction procedures. Dawes

et al. in [18] describe a deployable system to bridge the

gap between data management. They propose a tiered meta-

data recording system using a non-semantic and a semantic

wiki related to a single sensor. Stevens in [19] discuss the

importance of meta-data in big data analytics.

V. DATA STORAGE FORMATS AND DATABASES

The relational database is used in traditional technical

environments to store the data. They are used extensively

and dominate most of the commercial data storage. The

characteristics of IoT data make the traditional relational-based

data management impractical. The use of a relational database

can make the overall querying slow and might result in delayed

responses.

A. Structured databases

An IoT program can be made more ﬂexible by involving

few restrictions, but it often makes the system less efﬁcient.

It is necessary to consider trade-offs while developing an IoT

system. Relationship among the data elements establishes the

structure of a database, making it efﬁcient for storage and

querying. The structured database leads to a lack of ﬂexibility

with modern software methodologies.

IoT devices have achieved technical advances and are

able to communicate with almost any “thing.” It requires an

expansion of a network to accommodate more devices and

their software. This is known as horizontal scalability. With

the relational database, it becomes difﬁcult to break these

multiple clusters of machines. Sarkar et al. in [20] proposed

an architecture to tackle the issue of scalability.

B. Unstructured data storage

Modern data today has made relational data management

less efﬁcient. Unstructured (also referred to as document store)

and Semi-structured databases are developed to meet the needs

of different types of data collected by IoT devices. Kumar in

[21] discuss various techniques for maintaining unstructured

data in IoT. According to Alnsari et al. in [22], due to massive

developments in information technology, there is a need for

solutions that should enable unstructured data management

and analysis .

A new range of databases such as MongoDB and NoSQL

are becoming more signiﬁcant in IoT developments. They

are unstructured database platforms that are proven effective

in many IoT applications. NoSQL is also a non-relational

database that can efﬁciently store key-value pairs, wide

columns or search engines data, and so on. It makes them

ideal for big data use and in particular IoT device develop-

ment. Serdar in [23] discusses NoSQL in detail and outlines

the advantages such as ﬂexibility and overcoming horizontal

scalability in detail.

VI. DATA MANAGEMENT

Collecting and utilizing data can be useful but it also carries

many risks and responsibilities. There are legal and ethical

issues involved in collecting data without consent. This results

in data breaches, which damages individuals’ privacy. Guan

et. al in [24] discuss how hackers can access the IoT data by

multiple sources and use it for illegal beneﬁts.

A. IoT device security

Many IoT devices that are accessible via the network

should have some sort of credentials by which to connect.

Unfortunately, this is not the situation. Many IoT devices are

shipped without authentication to connect with or have default

credentials which are highly insecure. In many situations,

those devices that come with complex authentication details

do not include credential changing manual which makes them

vulnerable to attack once the credentials become known. IoT

devices have thus become ideal targets for hackers. With the

growing number of IoT devices, there is an increased risk

of attackers present in a botnet. IoT devices can be used for

multiple functions such as distributed denial of service attacks.

This results in reducing the performance of the device along

with ”blacklisting” the network for hosting malicious attacks.

Cluley in [25] describes the vulnerability of IoT devices

to Mirai Botnet and stresses the importance of changing one’s

IoT device’s default password. Greene in [26] points to a huge

DDoS attack on IoT devices such as cameras, lightbulbs, and

thermostats by a botnet. The use of default or weak passwords

in IoT devices makes them more susceptible to such attacks.

VII. DATAATTHEEDGE

A. IoT reference model

According to Cisco’s IoT reference model in Figure 1 [27],

the data is in motion in the lower layer of IoT. The dynamic

data comes from the sensors and there exists a continuous

communication of messages to actuators. Recent advance-

ments in the IoT architecture has added more processing near

the Edge of the IoT network. Edge pushes the intelligent

processing capabilities closer to the network edge, which gives

ﬂexibility and makes the system much more responsive. There

is a slight difference between Edge and Fog computing. Fog

pushes the intelligence to the fog node, which resides in local

area networks, close to the data. At this node, some of the

information might transmit to the cloud. However, the edge

node directly pushes the data to the “thing.” In some cases,

the key data is transmitted to the cloud for further analysis.

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

Fig. 1. Internet of Things(IoT) reference model [27]

B. Data acquisition

The sensors in Level 1 of the IoT Reference Model [27]

are key sources of data in the IoT system. The sensors or

“things” (such as computers) are connected to the Internet.

IoT gateways provide an access route for devices without

IP-address (such as lights, locks, gates, etc) to the Internet.

A gateway provides a bridge between sensors, actuators and

the Internet or Intranet with the use of different communi-

cation technologies. These communication technologies differ

in terms of connectivity types, interfaces, or protocols. For

example, IoT devices use more common technologies such as

Bluetooth, LE, ZigBee, and Z-wave. Given the volume of the

data collected by sensors, data ﬁltration reduces the amount of

data that is forwarded to the back-end for further processing

or analysis. Also, edge computing helps to provide the IoT

gateway security.

The IoT architecture connects devices directly to the cloud

for processing and analysis. In Figure 2, all data from the

sensor is sent to the cloud which leads to an unnecessary trafﬁc

and security risk. Waiting for messages to and from increases

latency, which might affect real-time responses. This may not

be favorable in emergency situations. It requires the resources

to store and process the data, which is expensive. From Figure

2, we can estimate the latency of each part of network as:

Latency =T1+T2+T3+T4

With a gateway, T2, T3, T4 are replaced by much faster

interactions and data is transmitted to the cloud only when

needed. This reduction in transmission of data requires con-

sidering the rate and type of data. Sensors in the IoT system

collect huge amount and variety of data, which results in

considering the combination of four V’s in making a necessary

decision. Cisco in [28] describes how devices send the right

data to cloud for big data analytics and storage.

VIII. DATA IN T H E CLOUD

A. IoT reference model

From the IoT reference model in Figure 1, the accumulated

data in level 5 is abstracted for analysis. It involves processing

with the queries on data sets. The data is ﬁrst cleaned using

Fig. 2. Total time for an IoT response

various techniques such as normalization, standardisation, and

other terminologies prior to the analysis and is then made

available to level 6. Here, software applications of IoT devices

provide back-end support for users. It generates business

intelligence reports, analytics for decision-making, system

management and other uses to control the IoT system. Level 7

involves collaboration and processes beyond the IoT network

and application.

B. Data cleaning

Before the data collected by the sensor is ready for analysis,

this raw data is required to be cleaned to make it technically

correct and consistent. It should be done systematically and

should be well documented for reproducibility and possible

automation. Jonge et al. in [29] explain the steps involved

in improving and reﬁning data. The collected data comes

with some identiﬁcation. To reach technical correctness this

raw data is encoded, decoded, converted, stripped, tagged and

combined with meta-data. After this processing, data may still

be inconsistent and unexpected. It requires domain knowledge

of the IoT device to get past any compilation errors in the

system. This processing is required before analysis in level 5.

C. Why is data stored in cloud?

Cloud infrastructure services such as Infrastructure-as-a-

Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-

a-Service (SaaS) allows organizations to avoid the need for in-

house equipment, power, networking and IT support. Cloud,

as a part of the Internet, can be accessed from anywhere, can

shrink and grow according to the consumers demand. Clouds

can be both public and private. Clouds such as Amazon Web

Services (AWS) and Microsoft Azure, Google Cloud Platform

are public clouds whereas private clouds sit within the security

ﬁrewalls of an organization.

IoT devices have relatively small storage and processing

power. The big-data generated from the IoT devices is stored,

aggregated, processed and analysed in cloud. Moving the

data towards the cloud gives “inﬁnite” processing and storage

capabilities. Below the cloud there are data centers, with

numerous severs or host computers. Each host computer has

multiple instances of Virtual Machine (VM) running as an

application on the actual hardware, looking as a separate

machine. The speciﬁcation of these instances are taken into

account and thus, the organisation pays for the additional

resources used. VM’s are an example of IaaS. However, web,

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

blog-hosting, and IoT platforms are an example of SaaS which

are more expensive than primary IaaS.

D. Cloud architecture

Figure 3 represents the IBM reference architecture infras-

tructure [30]. Cloud services such as IaaS, PaaS and SaaS are

on the top left, while physical infrastructure is in the lower

section. Consumer tools and in-house IT are used by users to

interact with the Cloud Services. The service creation tools

allows sustaining cloud resources along with important non-

functional aspects such as security, performance, resilience,

consumability, compliance and overall governance. Cloud ar-

chitecture can be virtualized across many data-centers.

Fig. 3. IBM Cloud reference architecture [30]

E. Cloud based IoT platforms

IoT platforms include a dashboard to display and control

devices. Additional features such as data collection, data man-

agement, testing, software updates and inventory management

are also prominent.

Amazon Web Services (AWS) [31] is an IoT platform that

includes a wide range of tools and services to deploy, setup

and manage IoT solutions. It consists of four main products.

They are:

•AWS IoT Core - base to built an IoT application

•AWS IoT Device - allows easy addition and organization

of devices

•AWS IoT Analytics - provides service for automated

analytics of massive amount of varied IoT data, including

different data types

•AWS IoT Device Defender - support security mechanism

of IoT systems

The AWS environment provides scalable and secure environ-

ment for IoT systems.

Google Cloud IoT [32] builds and manages IoT systems of

any size and complexity. This cloud service includes:

•Cloud IoT core - allows connecting various devices and

collects their data

•Cloud Pub/Sub - provides real-time stream analytics and

processes event data

•Cloud Machine Learning Engine (ML) - allows the

building of ML models and use of data received from

IoT devices

Google Cloud IoT includes a number of service that might be

useful for building a comprehensive connection of networks.

Microsoft Azure IoT Suite [33] provides security mecha-

nisms, easy integration, and scalability. The Suite can easily

connect to many devices from different manufacturers, collects

data analytics and use the IoT data for machine learning

purposes. The suite also offers preconﬁgured and customisable

solutions to match requirements of the project.

Cisco IoT Cloud Connect [34] presents an end-to-end con-

venient platform for mobile cloud based IoT solutions. This

service supports data and voice communication, customization

of IoT applications and various monetization opportunities.

The cloud consists of a complete package of monitoring func-

tions, device management, advanced security measures, and

scalability. With the growth of IoT devices, Cisco developed

the kinetic platform supporting Edge and Fog computing. The

kinetic platform manages IoT devices and gateways by giving

support for data reduction, event processing, response, and

data transfer to the cloud.

IX. IOTAND BIG DATA VISUALIZATION

Big data generated by IoT devices (after collecting and

analyzing) have to be represented in a visual way that al-

lows humans to understand such analyses in an intuitive

way. Visualization often allows gaining additional beneﬁts or

interpretations from a data set, providing more meaningful

information. Along with this, presenting convincing graphics

of the data helps to communicate those results to a wider range

of audiences. Many algorithms and statistical methods are

used on a large-scale and high-dimensional varied data which

helps in the visualization of those data sets. The relationship

between geometric objects within a data set is established

using various parameters. Therefore, data visualization has

become an important strategy for many business organizations

to generate maximum revenue by improving decision making.

There are several very powerful data visualization tools and

techniques developed for IoT applications.

A. Data Visualization Techniques

Techniques such as simple plots, charts, maps, line or bar

graphs, diagrams and matrices can be a very powerful way of

highlighting any inconsistencies in the data set. This allows

uncovering complex tables or numerical summaries and easy

understanding of the results. Several techniques such as matrix

methods in data mining, aggregations of attributes, dimen-

sionality reduction techniques [35], [36] are highly used. Big

data visualization cannot be approached using conventional

techniques. Wang et. al in [37] propose a method called

Discriminative Generalized Eigendecomposition (DGE), based

on separation of multi-dimensional feature that could be useful

in ﬁnding better discriminant vectors. This method deals with

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

both Gaussian and non-Gaussian distribution. Zhong et. al in

[38] proposed a RFID-Cuboid model which visualizes the real-

time big data from cloud. This model can be used by end-users

for their daily operations in a practical and feasible way.

B. Data visualization Tools

It is important to decide on the appropriate tool to be used

for visualization to utilize the full potential of the collected

data. Before exploring different visualization platform, the

organization should identify its end-goals, identify its purpose,

keep in mind its target audience, and should concentrate how

to make the context more appealing. Several very sophisticated

tools such as Plotly [39] and Sisense [40] provide some level of

data analytics along with data visualization. Plotly enables the

user to build charts using R or Python programming languages.

It builds custom web applications using Python, provides

access to open sources libraries for R, Python and JavaScript.

Sisense is a cloud-based platform that has easy to use drag-

and-drop interface. It supports natural language queries and

can handle multiple data sources. Tableau [41] is a leading

data visualization tool that has easy interface and interactive

visualizations. Many large organizations rely on Tableau to

generate meanings from their collected data. It has features

such as automatic update, quick sharing, smart dashboards,

and so on. There are several other tools that can deal with

massive and complex IoT data. Microsoft Azure and Power BI

[42], outstanding tools that can deal with any amount and type

of real-time data. It provides several analytical power such as

large integration capabilities, learning curve, along with drag-

and-drop interface. ELK stack Kibana [43] is another tools that

provide certain advance analytics such as exploring correlation

between different observations, machines learning features to

identify relationships between data events and so on. Grafana

[44] provides services to query, visualize, create alerts and

notiﬁcations along with several other capabilities.

X. CHALLENGES AND FUTURE RESEARCH

With the rate of data growth and expansion of IoT networks,

it is important to have an accurate data of the environment.

Organizations should acquire a speciﬁc skill set to deal with

the analytical analysis of big data. The data collected by the

organizations should be well structured and should be made

compatible for use. To meet the demands of accurate data, it

is necessary to connect a wide range of devices at any point

and at any time. Therefore, there is need for investments in the

ﬁeld of sensors, data security and analytical capability to meet

supply chain demands. The collection, processing, analysis

and visualization of data set is a challenging task. Analysis

of data based on speciﬁc data formats can limit the efﬁciency

of the results. It is important to have full knowledge of the IoT

domain in order to decide on the structure and format of the

data collected by the sensors. Lack of this knowledge might

result in dirty or garbage data, which can be costly. The issue

of the 4 V’s also pose a challenge while dealing with big data

in IoT.

Nerkar et. al in [45] discuss data isolation in cloud com-

puting as another challenge. Common resources shared in a

cloud platform may cause the problem of inconsistency and

latency in data content. Erway et. al in [46] describe about the

challenge of efﬁciently proving the integrity of data stored at

dishonest cloud servers. Patil et. al in [47] addresses security

and privacy challenges as applied to the healthcare industry. As

IoT devices collect and analyze data in a decentralized model,

performing exhausting analysis operations while preserving

privacy might be a challenge.

Even though the current technologies have achieved great

results, there exists a wide scope in security and privacy

concerns for the data collected by IoT devices. The communi-

cation overheads between the IoT devices that lead to latency

must be optimized to achieve efﬁcient results. With the growth

of huge data, there is exist, storage overhead on the servers.

Consumers who use IoT devices for personal use might lack

the technical knowledge required to understand or process the

software requirements of the device. Some IoT devices and

their software lack accurate information for users to make

consenting decisions. It is necessary to make IoT software for

personal use user-friendly and should always requests user’s

consent before sharing or making any decision.

XI. CONCLUSION

IoT has transformed many domains such as healthcare,

infrastructures, manufacturing, retail, personal use and so on.

As the data collected by IoT devices became big it became

necessary to analyze this Big Data. Big Data has recently

become more prominent in the IT technology, where it helps

in product optimization, improves decision making and saves

energy. As a result, Big Data has contributed substantially to

IoT technology. Considering the huge amount of complex data

produced by IoT devices, the analysis and visualization of that

data has helped organizations meet demands and gain real-time

business insights. Along with this, edge computing and cloud

computing play highly important roles in aggregating large

amounts of data and managing big data from anywhere in the

world.

This paper does restrict itself to big data techniques in IoT

but these techniques themselves are very viable for future

research. In this paper, we discussed the issue of 4 V’s in Big

Data and how they are related to IoT. We discussed various

data structure and data management approaches that should

be used while managing Big Data in IoT. We discussed which

layers in IoT reference model functions with respect to the

existing and developing technologies such as edge and cloud

computing. We also presented various cloud based IoT plat-

forms, their key features and how they support organizations

to handle massive and complicated big data. We discussed how

data visualization approach is useful to interpret the meaning

of data, along with several visualization tools and techniques.

Lastly, we presented several challenges and future research

work.

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

REFERENCES

[1] Bernard Marr. How Much Data Do We Create Every Day? The

Mind-Blowing Stats Everyone Should Read. Forbes, October 06 2017.

[Accessed on: 06/01/2019].

[2] Kashmir Hill. How Target Figured Out A Teen Girl Was Pregnant Before

Her Father Did, volume Welcome to The Not-So Private Parts where

technology privacy collide. Feb 16 2012. [Accessed on: 06/01/2019].

[3] DATA.GOV. The home of the U.S. Governments open data. [Accessed

on: 06/01/2019].

[4] Amazon Web Services. Registry of Open Data on AWS. [Accessed on:

06/01/2019].

[5] Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu

Palaniswami. Internet of Things (IoT): A vision, architectural elements,

and future directions. Future Generation Computer Systems, 29(7):1645

– 1660, 2013. [Accessed on: 06/01/2019].

[6] In Lee and Kyoochun Lee. The Internet of Things (IoT): Applications,

investments, and challenges for enterprises. volume 58, pages 431 –

440, 2015. (Accessed on: 06/01/2019).

[7] H. Arasteh, V. Hosseinnezhad, V. Loia, A. Tommasetti, O. Troisi,

M. Shaﬁe-khah, and P. Siano. Iot-based smart cities: A survey. pages

1–6, June 2016. (Accessed on: 06/01/2019).

[8] Somayya Madakam, R Ramaswamy, and Siddharth Tripathi. Internet of

Things (IoT): A literature review. Journal of Computer and Communi-

cations, 3(05):164, 2015. (Accessed on: 06/01/2019).

[9] Min Chen, Shiwen Mao, and Yunhao Liu. Big Data: A Survey. Mobile

Networks and Applications, 19(2):171–209, Apr 2014. [Accessed on:

06/01/2019].

[10] J. Archenaa and E.A. Mary Anita. A Survey of Big Data Analytics in

Healthcare and Government. Procedia Computer Science, 50:408 – 413,

2015. [Accessed on: 06/01/2019].

[11] Ulf Wetzker, Ingmar Splitt, Marco Zimmerling, Kay Rmer, and Carlo Al-

berto Boano. Troubleshooting Wireless Coexistence Problems in the

Industrial Internet of Things. 08 2016. [Accessed on: 06/01/2019].

[12] Ishwarappa and J. Anuradha. A Brief Introduction on Big Data 5Vs

Characteristics and Hadoop Technology, volume 48. 2015. International

Conference on Computer, Communication and Convergence (ICCC

2015). [Accessed on: 06/01/2019].

[13] Nawsher Khan, Mohammed Alsaqer, Habib Shah, Gran Badsha, Aftab

Ahmad Abbasi, and Solmaz Salehian. The 10 Vs, Issues and Challenges

of Big Data. pages 52–56, 03 2018. [Accessed on: 06/01/2019].

[14] Amir Gandomi and Murtaza Haider. Beyond the hype: Big data concepts,

methods, and analytics, volume 35. 2015. [Accessed on: 06/01/2019].

[15] S. B. Baker, W. Xiang, and I. Atkinson. Internet of Things for Smart

Healthcare: Technologies, Challenges, and Opportunities, volume 5.

2017. [Accessed on: 06/01/2019].

[16] Mattia Antonini, Massimo Vecchio, Fabio Antonelli, Pietro Ducange,

and Charith Perera. Smart Audio Sensors in the Internet of Things Edge

for Anomaly Detection, volume PP. 10 2018. [Accessed on: 06/01/2019].

[17] Yoosang Park, Jongsun Choi, and Jaeyoung Choi. Conceptual metadata

model for sensor data abstraction in IoT environments, volume 383. 07

2018. [Accessed on: 06/01/2019].

[18] Nicholas Dawes, K Ashwin Kumar, Sebastian Michel, Karl Aberer, and

Michael Lehning. Sensor Metadata Management and Its Application

in Collaborative Environmental Research. pages 143 – 150, 01 2009.

[Accessed on: 06/01/2019].

[19] John P. Stevens. Why you need metadata for Big Data success. April 6

2016. [Accessed on: 06/01/2019].

[20] C. Sarkar, S. N. A. U. Nambi, R. V. Prasad, and A. Rahim. A scalable

distributed architecture towards unifying IoT applications. In 2014 IEEE

World Forum on Internet of Things (WF-IoT), pages 508–513, March

2014. [Accessed on: 06/01/2019].

[21] Sunil Kumar Mishra M. Handling the Unstructured Data in IOT. In

International Science Press, pages 377–384, March 2016. [Accessed

on: 06/01/2019].

[22] Mohammad Riyaz Belgaum Zainab Alansari, Safeeullah Soomro and

Shahaboddin Shamshirb. A New Conceptual Model for BYOD Organi-

zational Adoption. In Asian Journal of Scientiﬁc Research, volume 10,

pages 400–405, 2017. [Accessed on: 06/01/2019].

[23] Serdar Yegulalp. What is NoSQL? NoSQL databases explained. In In-

foWorld, From IDG, volume 10, Dec 7 2017. [Accessed on: 06/01/2019].

[24] Shuai Wang Xinyu Xing Lin Lin Heqing Huang Peng Liu Wenke Lee

Le Guan, Jun Xu. From Physical to Cyber: Escalating Protection for

Personalized Auto Insurance. pages 42–55, November 14-16 2016.

[Accessed on: 06/01/2019].

[25] Graham Cluley. These 60 dumb passwords can hijack over 500,000

IoT devices into the Mirai botnet. October 10 2016. [Accessed on:

06/01/2019].

[26] Tim Greene. Largest DDoS attack ever delivered by botnet of hijacked

IoT devices. September 23 2016. [Accessed on: 06/01/2019].

[27] Cisco Systems. The Internet of Things Reference Model. volume White

Paper, 2014. [Accessed on: 06/01/2019].

[28] Cisco Systems. Cisco Fog Computing Solutions: Unleash the Power

of the Internet of Things. volume White Paper, 2015. [Accessed on:

06/01/2019].

[29] Mark van der Loo Edwin de Jonge. An introduction to data cleaning

with R. 2013. [Accessed on: 06/01/2019].

[30] Franck Barillaud, Chuck Calio and John A. Jacobson. IBM cloud

technologies: How they all ﬁt together. June 9 2015. [Accessed on:

06/01/2019].

[31] Amazon Web Services. AWS IoT Core. [Accessed on: 06/01/2019].

[32] Google Cloud IoT. Google Cloud IoT”. [Accessed on: 06/01/2019].

[33] Microsoft Azure. Azure IoT Hub. [Accessed on: 06/01/2019].

[34] Cisco IoT. Internet of Things (IoT). [Accessed on: 06/01/2019].

[35] L. Eldn. Matrix Methods in Data Mining and Pattern Recognition.

SIAM (2007). [Accessed on: 06/01/2019].

[36] D. Skillicorn. Understanding Complex Datasets Data Mining with

Matrix Decompositions. Chapman Hall/CRC (2007). [Accessed on:

06/01/2019].

[37] Xiumei Wang, Weifang Liu, Jie Li, and Xinbo Gao. A novel di-

mensionality reduction method with discriminative generalized eigen-

decomposition, volume 173. 2016.

[38] Ray Y. Zhong, Shulin Lan, Chen Xu, Qingyun Dai, and George Q.

Huang. Visualization of RFID-enabled shopﬂoor logistics Big Data in

Cloud Manufacturing, volume 84. Apr 2016. [Accessed on: 06/01/2019].

[39] Plotly. Plotly. [Accessed on: 06/01/2019].

[40] Sisense. Sisense. [Accessed on: 06/01/2019].

[41] Tableau. Tableau. [Accessed on: 06/01/2019].

[42] Microsoft. Azure and Power BI. [Accessed on: 06/01/2019].

[43] Elastic. Kibana. [Accessed on: 06/01/2019].

[44] Grafana Labs. Grafana. [Accessed on: 06/01/2019].

[45] R. R. Papalkar, P. R. Nerkar, and C. A. Dhote. Issues of concern in

storage system of iot based big data. In 2017 International Conference

on Information, Communication, Instrumentation and Control (ICICIC),

pages 1–6, Aug 2017. [Accessed on: 06/01/2019].

[46] Chris Erway, Alptekin K¨

upc¸ ¨

u, Charalampos Papamanthou, and Roberto

Tamassia. Dynamic provable data possession. In Proceedings of the

16th ACM Conference on Computer and Communications Security, CCS

’09, pages 213–222, New York, NY, USA, 2009. ACM. [Accessed on:

06/01/2019].

[47] H. Kupwade Patil and R. Seshadri. Big Data Security and Privacy Issues

in Healthcare. pages 762–765, June 2014. [Accessed on: 06/01/2019].

IEEE - 45670

10th ICCCNT 2019

July 6-8, 2019, IIT - Kanpu

Kanpur, India

Cloud Computing for IoT Sensing Data

Chapter

Full-text available

Feb 2024

The integration of sensor cloud technology has emerged as a revolutionary force across numerous sectors in the ever-evolving field of intelligent technologies. Born out of the confluence of cloud-based technologies with internet of things (IoT), sensor clouds provide a flexible and scalable framework for a variety of applications, including environmental monitoring, healthcare, and agriculture. This chapter aims to provide a thorough examination of sensor cloud technology, elucidating its potential, drawbacks, and many applications. It delves into the complexities of IoT sensor data and highlights the vital significance that effective processing methods play. In managing the complexity and variety inherent in IoT sensor data, it emphasises the value of techniques like data fusion, denoising, data aggregation, etc. The specifics of certain sensor cloud applications, including iDigi, Xively, Nimbits, ThingSpeak, and healthcare monitoring systems are then explored. The chapter ends by emphasizing how innovation and technological advancements are essential to overcoming these obstacles.

IoT integration with Big Data and Cloud Computing: A Multidisciplinary Approach

Conference Paper

Nov 2023

IoT-based Big Data Storage Systems Challenges

Conference Paper

Dec 2023

Study of the Impact of Data Compression on the Energy Consumption Required for Data Transmission in a Microcontroller-Based System

Article

Full-text available

Dec 2023
SENSORS-BASEL

As the number of Internet of Things (IoT) devices continues to rise dramatically each day, the data generated and transmitted by them follow similar trends. Given that a significant portion of these embedded devices operate on battery power, energy conservation becomes a crucial factor in their design. This paper aims to investigate the impact of data compression on the energy consumption required for data transmission. To achieve this goal, we conduct a comprehensive study using various transmission modules in a severely resource-limited microcontroller-based system designed for battery power. Our study evaluates the performance of several compression algorithms, conducting a detailed analysis of computational and memory complexity, along with performance metrics. The primary finding of our study is that by carefully selecting an algorithm for compressing different types of data before transmission, a significant amount of energy can be saved. Moreover, our investigation demonstrates that for a battery-powered embedded device transmitting sensor data based on the STM32F411CE microcontroller, the recommended transmission module is the nRF24L01+ board, as it requires the least amount of energy to transmit one byte of data. This module is most effective when combined with the LZ78 algorithm for optimal energy and time efficiency. In the case of image data, our findings indicate that the use of the JPEG algorithm for compression yields the best results. Overall, our research underscores the importance of selecting appropriate compression algorithms tailored to specific data types, contributing to enhanced energy efficiency in IoT devices.

Predictive Maintenance in Industry 4.0 for the SMEs: A Decision Support System Case Study Using Open-Source Software

Article

Full-text available

Aug 2023

Predictive maintenance is one of the most important topics within the Industry 4.0 paradigm. We present a prototype decision support system (DSS) that collects and processes data from many sensors and uses machine learning and artificial intelligence algorithms to report deviations from the optimal process in a timely manner and correct them to the correct parameters directly or indirectly through operator intervention or self-correction. We propose to develop the DSS using open-source R packages because using open-source software such as R for predictive maintenance is beneficial for small and medium enterprises (SMEs) as it provides an affordable, adaptable, flexible, and tunable solution. We validate the DSS through a case study to show its application to SMEs that need to maintain industrial equipment in real time by leveraging IoT technologies and predictive maintenance of industrial cooling systems. The dataset used was simulated based on the information on the indicators measured as well as their ranges collected by in-depth interviews. The results show that the software provides predictions and actionable insights using collaborative filtering. Feedback is collected from SMEs in the manufacturing sector as potential system users. Positive feedback emphasized the advantages of employing open-source predictive maintenance tools, such as R, for SMEs, including cost savings, increased accuracy, community assistance, and program customization. However, SMEs have overwhelmingly voiced comments and concerns regarding the use of open-source R in their infrastructure development and daily operations.

Anomaly detection using Federated Learning: A Performance Based Parameter Aggregation Approach

Conference Paper

Jun 2023

Tulsi Leaf Disease Detection using CNN

Conference Paper

Full-text available

Dec 2022

Replicating File Segments between Multi-Cloud Nodes in a Smart City: A Machine Learning Approach

Article

Full-text available

May 2023
SENSORS-BASEL

The design and management of smart cities and the IoT is a multidimensional problem. One of those dimensions is cloud and edge computing management. Due to the complexity of the problem, resource sharing is one of the vital and major components that when enhanced, the performance of the whole system is enhanced. Research in data access and storage in multi-clouds and edge servers can broadly be classified to data centers and computational centers. The main aim of data centers is to provide services for accessing, sharing and modifying large databases. On the other hand, the aim of computational centers is to provide services for sharing resources. Present and future distributed applications need to deal with very large multi-petabyte datasets and increasing numbers of associated users and resources. The emergence of IoT-based, multi-cloud systems as a potential solution for large computational and data management problems has initiated significant research activity in the area. Due to the considerable increase in data production and data sharing within scientific communities, the need for improvements in data access and data availability cannot be overlooked. It can be argued that the current approaches of large dataset management do not solve all problems associated with big data and large datasets. The heterogeneity and veracity of big data require careful management. One of the issues for managing big data in a multi-cloud system is the scalability and expendability of the system under consideration. Data replication ensures server load balancing, data availability and improved data access time. The proposed model minimises the cost of data services through minimising a cost function that takes storage cost, host access cost and communication cost into consideration. The relative weights between different components is learned through history and it is different from a cloud to another. The model ensures that data are replicated in a way that increases availability while at the same time decreasing the overall cost of data storage and access time. Using the proposed model avoids the overheads of the traditional full replication techniques. The proposed model is mathematically proven to be sound and valid.

Research on the Deployment Strategy of Big Data Visualization Platform by the Internet of Things Technology

Article

Full-text available

May 2023

Guangtao Zhang

INTRODUCTION: To improve the big data visualization platform's performance and task scheduling capability, a big data visualization platform is constructed based on Field Programmable Gate Array (FPGA) chip application power equipment.OBJECTIVES: This study proposes to combine a genetic algorithm and an ant colony scheduling (ACOS) algorithm to design a big data visualization platform deployment strategy based on an improved ACOS algorithm.METHODS: Firstly, big data technology is analyzed. Then, the basic theory of the ant colony algorithm is studied. According to the basic theory of ACOS and genetic algorithm, an improved ACOS algorithm model is constructed. The improved ACOS algorithm scheduler is compared with the other three schedulers. Under the same environment, the completion time of scheduling the same job and different task amounts are analyzed. The Central Processing Unit (CPU) utilization is analyzed when different schedulers have entirely different workloads. RESULTS: The results show that the constructed big data visualization platform based on the improved ACOS algorithm model has higher task scheduling efficiency than other schedulers and can greatly shorten the data processing time. The experimental results show that under the homogeneous cluster, the completion time of the improved ACOS algorithm generally lags the capacity scheduler and the fair scheduler. Under the heterogeneous cluster, the improved ACOS algorithm scheduler can reasonably allocate tasks to nodes with different performances, reducing the task completion time. When the number of completed tasks increases from 50 to 200, the time increases by 45s, and the completion time is shorter than other schedulers. The CPU utilization of different task volumes is the highest, and the utilization rate increases from 81% to 95%. CONCLUSION: The improved ACOS algorithm scheduler has the shortest data processing time and the highest efficiency. This work provides a specific reference value for optimizing the big data visualization platform's deployment strategy and improving the platform's performance.

Moving from Cloud to Fog/Edge: The Smart Agriculture Experience

Article

Dec 2023

Integrating cloud with fog/edge is a main trend in networking. Many cloud computing applications have been shifted to the edge/fog domain. Such paradigm shift offers new opportunities for pervasive computing. An example is AgriTalk, an Internet of Things (IoT) application development platform for smart agriculture. By integrating cloud with edge/fog, this article describes how AgriTalk addresses six issues for developing edge/fog agriculture applications. These issues include device domain development, application generation and bug detection, sensor failure detection and calibration, big data management, Artificial Intelligence (AI) provisioning, and data privacy. We show how AgriTalk integrates fog/edge applications and use rice blast detection and piglet crushing mitigation as two examples to demonstrate that fog/edge computing is a better solution than cloud computing. Compared with cloud computing, fog/edge computing reduces the delays by 50% in AgriTalk. Through the low-code no-code approach, AgriTalk allows the farmers to create and maintain fog/edge agriculture applications by themselves.

Smart Audio Sensors in the Internet of Things Edge for Anomaly Detection

Article

Full-text available

Oct 2018

Everyday objects are becoming smart enough to directly connect to other nearby and remote objects and systems. These objects increasingly interact with machine learning applications that perform feature extraction and model inference in the cloud. However, this approach poses several challenges due to latency, privacy, and dependency on network connectivity between data producers and consumers. To alleviate these limitations, computation should be moved as much as possible towards the IoT edge, that is on gateways, if not directly on data producers. In this paper, we propose a design framework for smart audio sensors able to record and pre-process raw audio streams, before wirelessly transmitting the computed audio features to a modular IoT gateway. Here, an anomaly detection algorithm executed as a micro-service is capable of analyzing the received features, hence detecting audio anomalies in real-time. First, to assess the effectiveness of the proposed solution, we deployed a real smart environment showcase. More in detail, we adopted two different anomaly detection algorithms, namely Elliptic Envelope and Isolation Forest, that were purposely trained and deployed on an affordable IoT gateway to detect anomalous sound events happening in an office environment. Then, we numerically compared both the deployments, in terms of end-to-end latency and gateway CPU load, also deriving some ideal capacity bounds.

Conceptual metadata model for sensor data abstraction in IoT environments

Article

Full-text available

Jul 2018

Sensor data abstraction is necessary to provide users with context-aware services. Sensor data abstraction mechanism in context-aware system usually consists of collecting data, converting, and context reasoning. For this mechanism in IoT environments, sensor data is used that is described in the pairs of key-value set or digit value. However, environmental data cannot be sufficiently formed since these sensors data are provided with these values. For example, it may need metadata of contexts and things description. In this paper, we propose a new conceptual metadata model for sensor data abstraction in IoT environments. The proposed model provides sensor data and their metadata as low-level context, which is a part of basic context for presenting given environment, to context-aware system. In the experiments, we describe a procedure to generate low-level context for sensor data abstraction based on the proposed model and to provide this information to the context-aware system.

The 10 Vs, Issues and Challenges of Big Data

Conference Paper

Full-text available

Mar 2018

In this emerging computing and digital globe, information and Knowledge are created and then collected with a rapid approach by wide range of applications through scientific computing and commercial workloads. Over 3.8 billion people out of 7.6 billion population of the world are connected to the internet. Out of 13.4 billion devices, 8.06 billion devices have a mobile connection. In 2020, 38.5 billion devices will be connected and globally internet traffic will be 92 times greater than it was in 2005. The use of such devices and internet not only increase the data volume but the velocity of market brings in fast-track and accelerates as information is transferred and shared with light speed on optic fiber and wireless networks. This fast generation of huge data creates numerous challenges. The existing approaches addressing issues such as, Volume, Variety, Velocity and Value in big data research perspective. The objectives of the paper are to investigate and analyze the current status of Big Data and furthermore a comprehensive overview of various aspects has discussed, and additionally has been described all 10 Vs' (Issues) of Big Data.

Issues of concern in storage system of IoT based big data

Conference Paper

Full-text available

Aug 2017

Internet of Things for Smart Healthcare: Technologies, Challenges, and Opportunities

Article

Full-text available

Nov 2017

Internet of Things (IoT) technology has attracted much attention in recent years for its potential to alleviate the strain on healthcare systems caused by an aging population and a rise in chronic illness. Standardization is a key issue limiting progress in this area, and thus this paper proposes a standard model for application in future Internet of Things healthcare systems. This survey paper then presents the state-of-the-art research relating to each area of the model, evaluating their strengths, weaknesses, and overall suitability for a wearable IoT healthcare system. Challenges that healthcare IoT faces including security, privacy, wearability and low-power operation are presented, and recommendations are made for future research directions.

A New Conceptual Model for BYOD Organizational Adoption

Article

Full-text available

Sep 2017

Background and Objective: Due to rapid growth in the information technology, enterprises are seeking the most reliable and valuable solutions for their profits. One of recent solutions is to applying a method of Bring Your Own Device (BYOD) in organizations. The primary aim of this study is to identify and prioritize the criteria influencing the BYOD organizational adoption. Materials and Methods: The implementation of BYOD allowed the employees to use their personal devices for organizational tasks. This study proposed a new conceptual model regarding the same which contains two main criteria and five sub-criteria for each. The proposed method is based on modified fuzzy-AHP (Analytic Hierarchy Process) approach which is used to find the weight of each criterion and sub-criterion. Results: Among the ten sub-criteria influencing the BYOD organizational adoption, the Information Security Policies (with a final weight of 0.186) is placed in the first priority and Technical Complexity (with a final weight of 0.008) is positioned in the last priority. Conclusion: In this study, a new conceptual model is suggested by identifying the criteria influencing BYOD organizational adoption. On the basis of results, it is concluded that the proposed technique can enhance the quality in an organization.

Troubleshooting Wireless Coexistence Problems in the Industrial Internet of Things

Conference Paper

Full-text available

Aug 2016

The ever-growing proliferation of wireless devices and technologies used for Internet of Things (IoT) applications, such as patient monitoring, military surveillance, and industrial automation and control, has created an increasing need for methods and tools for connectivity prediction, information flow monitoring, and failure analysis to increase the dependability of the wireless network. Indeed, in a safety-critical Industrial IoT (IIoT) setting, such as a smart factory, harsh signal propagation conditions combined with interference from coexisting radio technologies operating in the same frequency band may lead to poor network performance or even application failures despite precautionary measures. Analyzing and troubleshooting such failures on a large scale is often difficult and time-consuming. In this paper, we share our experience in troubleshooting coexistence problems in operational IIoT networks by reporting on examples that show the possible hurdles in carrying out failure analysis. Our experience motivates the need for a user-friendly, automated failure analysis system, and we outline an architecture of such system that allows to observe multiple communication standards and unknown sources of interference.

IoT-based Smart Cities: a Survey

Conference Paper

Full-text available

Jun 2016

Understanding Complex Datasets: Data Mining with Matrix Decompositions

Book

May 2007

David Skillicorn

Making obscure knowledge about matrix decompositions widely available, Understanding Complex Datasets: Data Mining with Matrix Decompositions discusses the most common matrix decompositions and shows how they can be used to analyze large datasets in a broad range of application areas. Without having to understand every mathematical detail, the book helps you determine which matrix is appropriate for your dataset and what the results mean. Explaining the effectiveness of matrices as data analysis tools, the book illustrates the ability of matrix decompositions to provide more powerful analyses and to produce cleaner data than more mainstream techniques. The author explores the deep connections between matrix decompositions and structures within graphs, relating the PageRank algorithm of Google's search engine to singular value decomposition. He also covers dimensionality reduction, collaborative filtering, clustering, and spectral analysis. With numerous figures and examples, the book shows how matrix decompositions can be used to find documents on the Internet, look for deeply buried mineral deposits without drilling, explore the structure of proteins, detect suspicious emails or cell phone calls, and more. Concentrating on data mining mechanics and applications, this resource helps you model large, complex datasets and investigate connections between standard data mining techniques and matrix decompositions.

From Physical to Cyber: Escalating Protection for Personalized Auto Insurance

Conference Paper

Nov 2016

Nowadays, auto insurance companies set personalized insurance rate based on data gathered directly from their customers' cars. In this paper, we show such a personalized insurance mechanism -- wildly adopted by many auto insurance companies -- is vulnerable to exploit. In particular, we demonstrate that an adversary can leverage off-the-shelf hardware to manipulate the data to the device that collects drivers' habits for insurance rate customization and obtain a fraudulent insurance discount. In response to this type of attack, we also propose a defense mechanism that escalates the protection for insurers' data collection. The main idea of this mechanism is to augment the insurer's data collection device with the ability to gather unforgeable data acquired from the physical world, and then leverage these data to identify manipulated data points. Our defense mechanism leveraged a statistical model built on unmanipulated data and is robust to manipulation methods that are not foreseen previously. We have implemented this defense mechanism as a proof-of-concept prototype and tested its effectiveness in the real world. Our evaluation shows that our defense mechanism exhibits a false positive rate of 0.032 and a false negative rate of 0.013.

Big Data in IoT

Figures

Recommended publications

IoT-Driven Automated Object Detection Algorithm for Urban Surveillance Systems in Smart Cities

From IoT big data to IoT big services

Waste reduction possibilities for manufacturing systems in the industry 4.0

Nesnelerin İnternetinde Veri Madenciliği / Data Mining on Internet of Things