Conference PaperPDF Available

The Boom and Bust of Big Data Analytics

Authors:
  • Aryanet Institute of Technology, Palakkad, Kerala

Abstract

Big Data is massive data far exceeding the capacity of human intellect and mind. The concept of big data emerged due to the business needs to handle very large size datasets. The datasets are processed to get useful information. There are many challenges in solving big data problems. The upsurge in big data applications is made feasible due to advancements in computer technologies. Big Data Analytics is a paradigm to analyze and process big data. Invariably the analytics resort to parallelism both in hardware and software and efficient algorithms to generate accurate results within reasonable time.
National Conference at N.S.S College of Engineering- NCNSSCE’18
On
Technological Advancements for Sustainability of Mankind
The Boom and Bust of Big Data Analytics
Dr.C.Subramanian
Ex-Director(Research)
Aryanet Institute of Technology,
Mundur, Palakkad District
chamakuzhi@gmail.com
Abstract
Big Data is massive data far exceeding the capacity of human
intellect and mind. The concept of big data emerged due to the
business needs to handle very large size datasets. The datasets are
processed to get useful information. There are many challenges in
solving big data problems. The upsurge in big data applications is
made feasible due to advancements in computer technologies. Big
Data Analytics is a paradigm to analyze and process big data.
Invariably the analytics resort to parallelism both in hardware and
software and efficient algorithms to generate accurate results
within reasonable time.
Keywords; Big Data; Big Data Analytics; 3Vs; unstructured;
structured
1. Introduction
The term ‘Big Data’ though difficult to define precisely,
refers to large volume of data of the order of terabyte (TB).
The science fiction movie ‘Avatar’ of James
Cameron (2009) needed about 1 PB of storage to render its
graphics. The Wayback Machine which is a
digital archive of the World Wide Web and other
information on the Internet is estimated to contain around
15 PB of data in 2016. The memory capacity of the human
brain is estimated to be around 2.5 PB of data. Huge
volumes of data far exceed intellectual capacity of human
mind to interpret and give out useful solutions. Application
of Big data spreads across various fields such as access
logs, patient records, weather data, and data from analysis
of human behavior. In most cases the data is frequently
analyzed to support real time decision making. There are
many examples of big data affecting the life of individuals,
companies, and governments such as weather data and
clinical data. The sources of big data include pc, desktop
and mobile apps, web sites, social media, scientific
experiments, and internet of things (IoT). In spite of
technological advancements in computer technology many
challenges exit in solving big data problems. Due to
colossal research efforts all over the world, the knowledge
on big data and analytical techniques are improving
continuously. Due to advancements of research on big data
it would be possible for users to readily access big data of
their interest by the year 2025. In the following sections
various aspects of big data such as comparison with
conventional database, salient characteristics, applications,
predictive analytics, infrastructure and technology support,
skills to work and privacy threat are discussed.
2. Databases Vs Big Data
Databases are designed to organize the data to model
aspects of reality in a way that supports processes rendering
business information. Big datasets are very large in size and
complexity and it may be structured or unstructured and
time invariant or time sensitive. As a simile compare a train
and a bicycle. Both are means of conveyances. A train runs
long distances and cannot be easily stopped while running.
It is also very expensive to stop a train. So it is built with
wheels that never get flat. If a bicycle gets flat tire it can be
easily stopped and tire fixed. The techniques to keep these
conveyances running and maintaining are different.
Basically conventional database and big data are data
storage and processing paradigms. The main factor
differentiating these two is the size of the dataset. In either
case the data remains as a collection of raw data with
limited business use. So in both the cases processing of data
is to generate results to aid business decision making easier.
The tools and techniques required to process the two types
of datasets are different. For accessing the database query
languages such as SQL (structured query language) and
NoSQL for relational databases and non-relational
databases respectively are used. Traditional processing
techniques and tools of database cannot process big datasets
within reasonable time. So ‘big data analytics’ also known
as 'predictive analytics', resorting to massive parallelism
both in hardware and software and efficient algorithms, are
used to process big data time and accuracy.
3. Big Data
Big data mirrors the changing world. As more things
change, more data gets generated. Big data is a new
discipline of data science. The concept of big data emerged
due to the business needs to handle datasets of very large in
©
Author name / NCCNSSCE’18 000–000
size and complexity. The data could be structured or
unstructured and time invariant or time sensitive. The
phenomenal growth of big data applications is heavily
supported by the advancements in computer
technologies pertaining to memory and computational
power to process the data. The processing of big data
involves analysis and modeling of large volumes of
data for prediction of future outcomes. The constraints
on big data analytics are avoidance of time and cost
over runs of projects and loss of revenue due to
erroneous processing. Examples of big data projects
are:1) remarkable growth of internet access data due to
continuous increase of number of users and 2) weather
forecasting based on large volume of atmospheric
parameters.
3.1. Features of Big Data
The following are the salient features of big data.
1) Massive: In many applications size of data is
too large to fit to hard disk and hence, hosted
in cloud. The size of data could be in
petabytes and for very largest projects in
exabytes. The Table 1 shows the metric and
value of bytes of data.
Table 1 Byte Metrics of Data
Metric Value Bytes
Byte (B) 1 1
Kilobyte
(KB)
1,02411,024
Megabyte
(MB)
1,02421,048,576
Gigabyte
(GB)
1,02431,073,741,824
Terabyte
(TB)
1,02441,099,511,627,776
Petabyte
(PB)
1,02451,125,899,906,842,624
Exabyte
(EB)
1,02461,152,921,504,606,846,976
Zettabyte
(ZB)
1,02471,180,591,620,717,411,303,424
Yottabyte
(YB)
1,02481,208,925,819,614,629,174,706,176
Next additions to this table are Brontobyte and
Geopbyte on the same scale.
2) Cluttered and Unstructured: The type data can
be structured, unstructured, and semi-
structured. Most of the real data is
unstructured and requires distinct storage and
processing paradigms. Majority of work on
big data is converting and cleansing to make
search and sort feasible.
3) Data as a commodity: Data can be traded at
Data Markets. Cloud-based data can be
bought paying a fee.
3.2. Technical Characteristics of Big Data
Big data is characterized by the three Vs:
1) An extreme volume of data
2) A broad variety of types of data
3) The velocity at which the data needs to be
processed and analyzed
Fig. 1 depicts the characteristics 3Vs of big data. Fig. 2
shows details of characteristics.
Fig. 1. Characteristics 3Vs of Big Data
Fig. 2 Details of Characteristics
3.3 Applications of Big Data
Author name / NCNSSCE’18 000–000
A few of the applications are enumerated below.
Department store uses big data to adjust prices
of large number of commodities on the fly to
improve sales.
Anti-terrorist programs: by using big data to
study images, the intelligent agencies can
quickly narrow down the search to the
suspects.
Identify and catch fraudsters: by watching
millions of transactions, patterns of fraud and
dishonest credit card users can be detected.
Tailor advertising in FB: FB gets insight into
tastes of browsers and use complex algorithms
for watching habits of users.
Reduction of overall costs of Parcel Service:
M/S United Parcel Service achieve saving of
millions of gallons of fuel by optimizing
routes of delivery using big data analytics and
mobile apps and hence, cost reduction.
Applications of Big Data in Life : Further
research on big data would lead to:
(1) Doctors able to predict heart attacks and
strokes of individuals weeks before it
happen,
(2) Reduction in airplane and automobile
crashes by predictive analysis of data on
their mechanical parts, traffic and
weather patterns,
(3) Musicians able to get insight into what
music composition is the most pleasing
to the changing tastes of target audiences,
(4) Nutritionists able to predict which
combination of foods will help or
aggravate a person's medical
conditions, and
(5) Improved online dating process by
having big data predictors of who are
compatible personalities for a particular
individual.
4. Big Data Analytics
Big data analytics is the process to uncover the
hidden patterns and links of data in the large scale
assemblage of data and to gain insights into its contents
and trends and predict its future value. The analytics
can be application of business intelligence (BI) or
predictive analytics. An advanced analytic technic
known as ‘data mining’ helps to evaluate large scale
datasets to identify relationships, patterns, and trends.
The first step in analytics is converting of massive
unstructured data into formatted data searchable and
sortable. By adopting the analytics paradigm
companies get benefits such as increased sales,
improved customer service, greater efficiency, and an
overall boost in competitiveness. Data analytics can be
(1) exploratory data analysis to identify patterns and
relationships in data and (2) confirmatory data analysis
applying statistical techniques to check validity of
assumption about a particular data set. There are two
types of data analysis namely, quantitative analysis of
numerical data of quantifiable variables that can be
statistically compared, and qualitative analysis applied
to non-numerical data such as video, images, and text.
4.1 Big-data-specific technologies
Unstructured data can’t be analyzed by using
traditional techniques and hence, advanced analytic
environments and technologies have emerged. There
are several technologies specific to big data such as
Hadoop framework, MapReduce, Apache Spark, Data
lakes, NoSQL databases and In-memory databases.
Most of these technologies make up an open-source
software framework that can be used to process large
scale datasets on an appropriate IT infrastructure such
as clustered system.
4.2 IT Infrastructure for Big Data
For the big data applications, the organizations need
to have the infrastructure in place to store data, secure
the data in storage and in transit, and provide access to
authorized users. At a high level, the infrastructure
comprises storage systems and servers for big data,
data management and integration software, business
intelligence and data analytics software, and big data
applications. Among the storage options are traditional
data warehouses, data lakes, and cloud-based storage.
The organizations progressively start relying on cloud
computing services to fulfill the big data processing
requirements. In the case of many applications devices
to gather the data and processing environments are
already in place. With the wide spread use of IoT the
big data analytics oriented to this application will have
its own specialized techniques and tools. The set of
security tools includes data encryption, user
authentication and access controls, monitoring systems,
Author name / NCCNSSCE’18 000–000
firewalls, enterprise mobility management, and
products to protect systems and data.
5. Big Data Skills
Big data and big data analytics activities require
specific skills. Many of these skills are related to the
big data technologies such as Hadoop, Spark, NoSQL
databases, in-memory databases, and analytics
software. General skills are linked to data science, data
mining, statistical and quantitative analysis, data
visualization, general-purpose programming, and data
structure and algorithms. There is also a need for
personnel with overall management skills to see big
data projects throughout to its completion .
6. Intrusive Threat to Privacy
Big data intrudes into personal privacy. As it stands,
Google. YouTube and Facebook already track daily
online habits of individuals. The smartphones and
computers imprint digital footprints every day, and
companies are studying these footprints. The laws to
protect big data are evolving. What is to be done to
protect individual’s privacy: The biggest single step is
to cloak daily habits using a VPN network connection.
A VPN service will scramble the signal so that
individual’s identity and location are at least partially
masked from trackers. This will not make anyone
100% anonymous, but a VPN will substantially reduce
how much the world can observe online habits of an
individual.
7. Conclusions
Big Data if only stored in large volume of raw data
has no useful value. It needs to be processed to make it
valuable to end users. Big data analytics is the
processing paradigm using special tools and techniques
to analyze and retrieve useful information from large
scale data. The current research on big data has only
scratched the surface. Research work on big data is
progressing incessantly. The outcome of the research
has started transforming the business scenarios in many
domains and improving the life of individuals. Of
course the ill effects of threat to privacy are to be
nullified.
Acknowledgements
The author acknowledges the team of specialists
with whom closely associated while implementing ERP
system in a high tech industry. Also acknowledgements
are due to the research scholar who has completed
doctoral thesis on Big Data Analytics under the
guidance of the author.
References
[I]. Marko Grobelnik, Big Data Tutorial, 8 May 2012
[II]. House of Bots, How to Make a Decision Which Data
Science Projects to Pursue, 25 Oct 2018
[III]. Large number of literature published on website , period
2012 to 2018
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.