ArticlePDF Available

Big Data and Big Data Analytics: Concepts, Types and Technologies


Abstract and Figures

Nowadays, companies are starting to realize the importance of data availability in large amounts in order to make the right decisions and support their strategies. With the development of new technologies, the Internet and social networks, the production of digital data is constantly growing. The term "Big Data" refers to the heterogeneous mass of digital data produced by companies and individuals whose characteristics (large volume, different forms, speed of processing) require specific and increasingly sophisticated computer storage and analysis tools. This article intends to define the concept of Big Data, its concepts, challenges and applications, as well as the importance of Big Data Analytics
Content may be subject to copyright.
International Journal of Research and Engineering
ISSN: 2348-7860 (O) | 2348-7852 (P) | Vol. 5 No. 9 | September-October 2018 | PP. 524-528
Digital Object Identifier DOI®
Copyright © 2018 by authors and International Journal of Research and Engineering
This work is licensed under the Creative Commons Attribution International License (CC BY). | |
Big Data and Big Data Analytics: Concepts, Types and Technologies
Author(s) : 1Youssra Riahi, 2Sara Riahi
Affiliation(s): 1Faculty of Informatics,
International University of Rabat, Technopolis parc, Morocco
2 Department of Mathematics and Computer Science, Faculty of Sciences,
University of Chouaib Doukkali, Jabran Khalil Jabran Avenu , Morocco
*Corresponding Author:
Abstract - Nowadays, companies are starting to
realize the importance of data availability in large
amounts in order to make the right decisions and
support their strategies. With the development of
new technologies, the Internet and social networks,
the production of digital data is constantly growing.
The term "Big Data" refers to the heterogeneous
mass of digital data produced by companies and
individuals whose characteristics (large volume,
different forms, speed of processing ) require
specific and increasingly sophisticated computer
storage and analysis tools. This article intends to
define the concept of Big Data, its concepts,
challenges and applications, as well as the
importance of Big Data Analytics
Keywords: Big Data; Big Data Analytics; Hadoop;
Internet; Security
The digital data produced is partly the result of the use of
devices connected to the Internet. Thus, smartphones, tablets
and computers transmit data about their users. Connected
smart objects convey information about consumer‟s use of
everyday objects.
Apart from the connected devices, data come from a wide
range of sources: demographic data, climate data, scientific
and medical data, energy consumption data, etc. All these
data provide information about the location of users of the
devices, their travel, their interests, their consumption habits,
their leisure activities, and their projects and so on. But also
information on how the infrastructure, machinery and
apparatus are used. With the ever-increasing number of
Internet and mobile phone users, the volume of digital data
is growing rapidly. Today we are living in an Informational
Society and we are moving towards a Knowledge Based
Society. In order to extract better knowledge we need a
bigger amount of data. The Society of Information is a
society where information plays a major role in the
economical, cultural and political stage. [1]
A. Definition
The term "Big Data" refers to the evolution and use of
technologies that provide the right user at the right time with
the right information from a mass of data that has been
growing exponentially for a long time in our society. The
challenge is not only to deal with rapidly increasing volumes
of data but also the difficulty of managing increasingly
heterogeneous formats as well as increasingly complex and
interconnected data.
Being a complex polymorphic object, its definition varies
according to the communities that are interested in it as a
user or provider of services. Invented by the giants of the
web, the Big Data presents itself as a solution designed to
provide everyone a real-time access to giant databases.
Big Data is a very difficult concept to define precisely, since
the very notion of big in terms of volume of data varies from
one area to another. It is not defined by a set of technologies,
on the contrary, it defines a category of techniques and
technologies. This is an emerging field, and as we seek to
learn how to implement this new paradigm and harness the
value, the definition is changing. [2]
1) Characteristics of Big Data
The term Big Data refers to gigantic larger datasets (volume);
more diversified, including structured, semi-structured, and
unstructured (variety) data, and arriving faster (velocity) than
before. These are the 3V.
IJRE | Vol. 5 No. 9 | September-October 2018 | Y. Riahi et al.
a) 3V:
Fig. 1. 3V Concept
-Volume: represents the amount of data generated, stored and
operated within the system. The increase in volume is
explained by the increase in the amount of data generated
and stored, but also by the need to exploit it.
-Variety: represents the multiplication of the types of data
managed by an information system. This multiplication leads
to a complexity of links and link types between these data.
The variety also relates to the possible uses associated with a
raw data.
-Velocity: represents the frequency at which data is
generated, captured, and shared. The data arrive by stream
and must be analyzed in real time.
b) 5V:
Fig. 2. 5V Concept
To this classical characterization, two other "V"s are
-Veracity: level of quality, accuracy and uncertainty of data
and data sources.
-Value: the value and potential derived from data.
Big Data generally refers to data that exceeds the typical
storage, processing, and computing capacity of conventional
databases and data analysis techniques. As a resource, Big
Data requires tools and methods that can be applied to
analyze and extract patterns from large-scale data. [3]
The analysis of structured data evolves due to the variety and
velocity of the data manipulated. Therefore, it is no longer
enough to analyze data and produce reports, the wide variety
of data means that the systems in place must be capable of
assisting in the analysis of data. The analysis consists of
automatically determining, within a variety of rapidly
changing data, the correlations between the data in order to
help in the exploitation of it.
Big Data Anlytics refers to the process of collecting,
organizing, analyzing large data sets to discover different
patterns and other useful information. Big data analytics is a
set of technologies and techniques that require new forms of
integration to disclose large hidden values from large
datasets that are different from the usual ones, more complex,
and of a large enormous scale. It mainly focuses on solving
new problems or old problems in better and effective ways.
A. Types of Big Data Analytics
a) Descriptive Analytics
It consists of asking the question: What is happening?
It is a preliminary stage of data processing that creates a set
of historical data. Data mining methods organize data and
help uncover patterns that offer insight. Descriptive analytics
provides future probabilities and trends and gives an idea
about what might happen in the future.
b) Diagnostic Analytics
It consists of asking the question: Why did it happen?
Diagnostic analytics looks for the root cause of a problem. It
is used to determine why something happened. This type
attempts to find and understand the causes of events and
c) Predictive Analytics
It consists of asking the question: What is likely to happen?
It uses past data in order to predict the future. It is all about
forecasting. Predictive analytics uses many techniques like
data mining and artificial intelligence to analyze current data
and make scenarios of what might happen.
d) Prescriptive Analytics
It consists of asking the question: What should be done?
It is dedicated to finding the right action to be taken.
Descriptive analytics provides a historical data, and
predictive analytics helps forecast what might happen.
Prescriptive analytics uses these parameters to find the best
Big Data are collections of information that would have been
considered gigantic, impossible to store and process, a
decade ago.
The processing of such large quantities of data imposes
particular methods. A classic database management system is
unable to process as much information. Hadoop is an open
IJRE | Vol. 5 No. 9 | September-October 2018 | Y. Riahi et al.
source software product (or, more accurately, „software
library framework‟) that is collaboratively produced and
freely distributed by the Apache Foundation effectively, it
is a developer‟s toolkit designed to simplify the building of
Big Data solutions. [5]
Hadoop is used by companies with very large volumes of
data to process. Among them are web giants such as
Facebook, Twitter, LinkedIn, eBay and Amazon. Hadoop is
a distributed data processing and management system. It
contains many components, including: HDFS, YARN,Map
Reduce. HDFS is a
distributed file system that provides high-performance access
to data across Hadoop clusters. [6]
MapReduce is a core component of the Apache Hadoop
software framework. Hadoop enables resilient, distributed
processing of massive unstructured data sets across
commodity computer clusters, in which each node of the
cluster includes its own storage. MapReduce serves two
essential functions: It parcels out work to various nodes
within the cluster or map, and it organizes and reduces the
results from each node into a cohesive answer to a query.[7]
Hadoop relies on two servers:
JobTracker: there is only one JobTracker per Hadoop
cluster. It receives Map/Reduce tasks to run and organizes
their execution on the cluster.When you submit your code to
be executed on the Hadoop cluster, it is the JobTracker‟s
responsibility to build an execution plan. This execution
plan includes determining the nodes that contain data to
operate on, arranging nodes to correspond with data,
monitoring running tasks, and relaunching tasks if they
TaskTracker: several per cluster. Executes the
Map/Reduce work itself (as a Map and Reduce task with the
associated input data).
The JobTracker server is in communication with HDFS; it
knows where the Map/Reduce program input data is and
where the output data must be stored. It can thus optimize
the distribution of tasks according to the associated data.
To run a Map/Reduce program, we must:
Write input data in HDFS
Submit the program to the cluster's JobTracker.
Retrieve output data from HDFS.
Fig. 3. Hadoop Architecture
All TaskTrackers report their status continuously through
heartbeat packages. If a TaskTracker fails (missing heartbeat
or failed task), the JobTracker notifies the redistribution of
the task to another node.
HDFS relies on two servers:
•NameNode: unique on the cluster. It stores information
about file names and their characteristics. It is the master of
the HDFS that controls slave DataNode.
•Secondary NameNode: The Secondary NameNode monitors
the state of the HDFS cluster and takes “snapshots” of the
data contained in the NameNode. If the NameNode fails,
then the Secondary NameNode can be used in place of the
DataNode: multiple by cluster. Stores the contents of the
files themselves, fragmented into blocks (64KB by default)
Fig. 4. General Architecture
MapReduce is a Java environment for writing programs
intended for YARN. Java is not the simplest language for
this, there are packages to import and class paths to provide.
The data exchanged between Map and Reduce, in the entire
job are pairs (key, value):
-Key: it is any type of data: integer, text. . .
-Value: it is any type of data.
The two functions Map and Reduce receive and send such
A. Map
The Map function receives an input pair and can produce
any number of pairs in output: none, one or more. The types
of inputs and outputs are as desired. This very little
constraint specification allows so many things. In general,
the pairs Map are constituted as follows:
• The text value is one of the rows or one of the n-tuples of
the file to be processed
The key of type integer is the position of this line in the
YARN launches a Map instance for each row of each file in
the data to be processed. Each instance processes the row it
has been assigned and produces output pairs.
IJRE | Vol. 5 No. 9 | September-October 2018 | Y. Riahi et al.
B. Reduce
The Reduce function receives a list of input pairs. These are
the pairs produced by the instances of Map. Reduce can
produce any number of output pairs, but most of the time it
is one. On the other hand, the crucial point is that the input
pairs processed by an instance of Reduce all have the same
YARN launches a Reduce instance for each different key
that Map instances have produced, and provides only the
pairs with the same key. This is what makes it possible to
aggregate values. Generally, Reduce must do a processing
on the values, such as sum all values between them, or
determine the largest of the values. . .
When designing a MapReduce treatment, we must think
about the keys and values necessary so it can works. Reduce
tasks receive a list of pairs with the same key and produce a
pair that contains the expected result. This output pair can
have the same key as the input.
C. Steps for a MapRedcue job
1.Preprocessing of input data, eg: decompression of files
2.Split: Separate data into separately processable blocks and
formatted (key, value),
eg in rows or tuples
3.Map: application of the map function on all the pairs (key,
value) formed from the input data, this produces other pairs
(key, value) output
4.Shuffle & Sort: redistribution of data so that the pairs
produced by Map having the same keys are on the same
5. Reduce: Aggregation of pairs with the same key to get the
final result.
a) Schema explanation
1. At the beginning, YARN will inquire about the location
of the data using the Name node and have them decompress
if necessary by the Data nodes concerned.
2. The Split phase consists of constructing pairs (n° of n-
tuple, n-tuple) to be provided to the Map tasks.
3. YARN creates Map processes on each machine
containing part of the data and provides them with the pairs
of their machines successively.
4. Each Map task analyzes its data and provides or not a pair.
It can consist of converting strings into numbers, making
calculations, and so on.
5. YARN sorts the pairs leaving Map according to their key
and sends them to the machine that runs the Reduce task
concerned by this key.
6. The Reduce tasks receive a list of pairs and perform the
reduction of the values (max, sum, avg…). They emit only
the final value.
b) Schema
Fig. 5. MapReduce word count process
YARN (Yet Another Resource Negociator) - also called
MapReduce 2.0 (MRv2) - which takes the place of
MapReduce. YARN is placed on top of HDFS (Hadoop
Destributed File System) to provide operating system
capabilities for Big Data analytics applications. This
oranization allows simultaneous execution of multiple
applications while providing better tracking of the data
throughout its life cycle. It also allows to mix workloads in
batch, interactive and in real time.
YARN also maintains compatibility with MapReduce's
Application Programming Interface (API), requiring only a
recompile of the applications already developed.
The main difference is the separation of resources
management (ResourceManager - RM) and task or
application control (ApplicationMaster - AM) into two
daemons. AM is a framework with a specific library that
negotiates RM resources with the NodeManager (NM) to
run and monitor tasks.
YARN (Yet Another Resource Negotiator) is a mechanism
for managing jobs on a cluster of machines. YARN allows
users to launch Map-Reduce jobs on data in HDFS and
monitor their progress, retrieve the messages (logs)
displayed by the programs. Eventually, YARN can move a
process from one machine to another in the event of a
failure or of advancement judged too slow.
Fig. 6. General Architecture
IJRE | Vol. 5 No. 9 | September-October 2018 | Y. Riahi et al.
RM (Resource Manager): The central daemon of YARN.
It manages resources assignments (CPU/Memory) when it
comes to applications. It has two components: a scheduler
which is in charge of resources allocation to the running
application but it doesn‟t ensure restarting in case of task
failure. The second component is the Application Manager
which is in charge of App Masters management in the cluster.
It ensures restarting of application masters on different nodes
in case of failure.
NM (Node Manager): The slave daemon of YARN. NM is
responsible for containers monitoring their resource usage
and reporting the same to the RM [10]. NM tracks the status
of the node on which it is running.
AM (Application Master): There is only one application
master per application. It negotiates resources from the RM
and works with the NM. It manages the application life cycle.
The AM acquires containers from the RM‟s scheduler before
contacting the corresponding NMs to start the application‟s
individual tasks. [11]
YARN is an evolution of the architecture of Hadoop
allowing to unload the JobTracker which tended to
accumulate too many roles and thus became complex. This
rethinking of the roles allowed also to decouple Hadoop from
Map Reduce and, in so doing, to no longer remain bounded
to MapReduce. This will allow Hadoop, in addition to
better scalability, to be enriched by new frameworks
covering needs with little or no coverage with Map Reduce.
Big data refers to the set of numerical data produced by the
use of new technologies for personal or professional
purposes. Big Data analytics is the process of examining
these data in order to uncover hidden patters, market trends,
customer preferences and other useful information in order
to make the right decisions. Big Data Analytics is a fast
growing technology. It has been adopted by the most
unexpected industries and became an industry on its own.
But analysis of these data in the framework of the Big Data
is a process that seems sometimes quite intrusive.
Analytics is a data science. BI takes care of the decision-
making part while Data Analytics is the process of asking
questions. Analytics tools are used when company needs to
do a forecasting and wants to know what will happen in the
future, while BI tools help to transform those forecasts into
common language [12]. More often, Big Data is considered
as the successor to Business Intelligence. This comparison
will be discussed in a future work.
[1] Perspectives on Big Data and Big Data Analytics-
Database Systems Journal vol. III, no. 4/2012
[2] The Big Data Revolution, Issues and Applications,
Azzeddine Riahi, Sara Riahi- IJARCSSE, Volume
5, Issue 8
[3] Deep learning applications and challenges in big
data analytics-Najafabadi et al. Journal of Big Data
(2015) 2:1 DOI 10.1186/s40537-014-0007-7
Journal on Soft Computing, Artificial Intelligence
and Applications (IJSCAI), Vol.5, No.1, February
[5] Big Data- The definitive guide to the revolution in
business analytics-Fujitsu
[6] K
... According to Riahi and Riahi (2018), Big Data is referred to as the evolution and use of technologies that offer the right user at the right time with the right information from a mass of data that has been rising rapidly for a long time in our society. Similarly, the kind of visualisation Big Data can provide to enhance teaching and learning and provide a decision support system for educational establishments to attain excellence in education is unprecedented (Bhat & Ahmed 2016). ...
... According to Riahi and Riahi (2018), BDA can be categorised into four types, namely: ...
Full-text available
Background: Big data analytics in education is a new concept that has the potential to change the decision-making landscape in South African Colleges. Higher institutions of learning, including Technical and Vocation Education Training (TVET) colleges like all other organisations, rely on data for their decision-making. These decisions affect the way pedagogy and student management is administered. Colleges collect huge quantities of data in different formats from students, staff and stakeholders for different reasons and occasions. Objectives: The goal of this study was to investigate how Big Data analytics and their tools may improve decision making in TVET colleges in South Africa through the lens of actor-network theory (ANT). Method: A qualitative, interpretive inquiry was undertaken. A case study using focus group was conducted. The data collected through interviews were arranged into themes and a thematic approach was employed to analyse these themes using QDA Miner Lite software. Results: The results from focus group interviews revealed that TVET colleges collect an enormous amount of data. These data are extracted for different reasons, yet there are no Analytics used for decision-making. Decisions are made by the highest-paid individuals (HiPPO) in colleges. Conclusion: This dissertation recommends that the TVET colleges invest in data science skills for their staff, and Big Data infrastructure. Big Data technologies such as Mongo DB and Hadoop are recommended as the most commonly and advanced tools that can be used for Big Data analytics.
... Büyük Veri kavramı, uzun bir zaman içerisinde artan şekilde büyüyen veri kümesinden doğru kullanıcıya doğru bilgiyi doğru zamanda sağlayan teknolojilerin gelişimi ve kullanımı olarak ifade edilmektedir (Riahi & Riahi, 2018). Günümüzde veri oluşturma şaşırtıcı bir hızla gerçekleşmekte ve toplam veri miktarı sürekli olarak artmaktadır. ...
Full-text available
Endüstri 4.0’la birlikte hayatımızda önemli bir yere sahip olan bulut bilişim, veri, nesnelerin interneti, yapay zekâ, insan-makine etkileşimi gibi konular günümüzde sıklıkla kullanılan kavramlar olarak göze çarpmaktadır. Bu tip kavramlarla doğrudan ilişkili olmasa da insanlık tarihi boyunca birçok teknolojik uygulamanın kullanıldığı görülmektedir. İnsanlar geçmişte işlerini kolaylaştırmak, görevleri daha hızlı tamamlamak ve kaliteyi arttırmak gibi nedenlerle birtakım yöntemler keşfetmişler veya geliştirmişlerdir. Örneğin; insanların sürüleri gütmek için köpekleri, avlanmak için şahinleri, mesaj taşımak için güvercinleri eğitmeleri ve kullanmaları (Newman & Blanchard, 2019) oldukça eskilere dayanmaktadır. Yine yapay zekânın başlangıcının Cezeri (1136-1206) tarafından yapılan robot çizimlerine kadar uzandığı (Coşkun & Gülleroğlu, 2021) düşünülmektedir. Sanayi Devrimi’nden günümüze kadar yaşanan süreçler göz önüne alındığında son yıllarda yaşanan değişimin hızı şimdiye kadar hiç görülmemiş bir biçimde artmıştır. Özellikle Çin’in Wuhan Eyaleti’nde ilk olarak ortaya çıkan ve tüm dünyaya yayılan Covid-19 salgınıyla birlikte toplumsal hayat ve iş yaşamında köklü değişimlerin hızı daha da artmaktadır. Değişim her yerde ve koşulda hayatımızın bir parçası olmuşken, örgütlerde değişime uyum sağlamak ile birlikte, sürekli yenilenmeyi ön plana alan dijital dönüşüm kültürünü oluşturmak ve devam ettirmek önem arz etmektedir. Bir dijital dönüşüm kültürü oluşturma ve buna bağlı sürecin başarılı olmasını sağlamanın en önemli anahtarı, çalışanların değişim sürecine doğru bir şekilde entegre edilmesi ve aynı doğrultuda etkilenmesi ve yönlendirilmesi olarak düşünülebilir. Bu yönlendirmenin başlıca unsuru ise dijital liderler olarak görülebilir. Dijital bir lider, teknolojik altyapıyla birlikte çalışanları destekleyip güçlendiren ve değişime uygun kültür ve çevreyi oluşturan kişi olarak ifade edilebilir (Özmen, Eriş & Süral Özer, 2020). Bu çerçevede organizasyon kültürünün geliştirilmesini sağlayabilecek stratejik düşünce bilincine sahip dijital liderler; örgütün vizyonu ve amacını çalışanlarla birlikte belirleyerek, fırsatlar oluşturarak ve onlarla iş birliği içerisinde çalışarak (Araujo vd., 2021) dijital dönüşüm sürecini başarılı bir şekilde yönetmede aktif rol oynamaktadırlar.
... The fourth industrial revolution as a current viewpoint, is gaining attention globally [121]. This current technology and its enabling proficiencies, however, are believed to have the ability to manipulate available machines, thereby making factories and organizations smart; this will significantly facilitate major changes in our society, especially the aspect of the network of the interaction of products in supply chain and logistics operations [139], III [140] stated that numerous engineering disciplines have made significant contributions to technical breakthroughs in various industries, and that the necessity for varied initiatives to tackle various obstacles is driving these changes. Technology advancement benefits are enormous; these include enhanced productivity, profitability, security, and livelihoods in agriculture, finance, medicine, etc. [108]. ...
... Data science: a process of analysis of data and extraction of information from the analyzed data [10]. Big data: analyses a huge amount of data that is steadily expanding in the right direction over years to give consumers correct information [11]. ...
Full-text available
Artificial intelligence (AI) has remarkably increased its presence and significance in a wide range of sectors, including dentistry. It can mimic the intelligence of humans to undertake complex predictions and decision-making in the healthcare sector, particularly in endodontics. The models of AI, such as convolutional neural networks and/or artificial neural networks, have shown a variety of applications in endodontics, including studying the anatomy of the root canal system, forecasting the viability of stem cells of the dental pulp, measuring working lengths, pinpointing root fractures and periapical lesions and forecasting the success of retreatment procedures. Future applications of this technology were considered in relation to scheduling, patient care, drug-drug interactions, prognostic diagnosis, and robotic endodontic surgery. In endodontics, in terms of disease detection, evaluation, and prediction, AI has demonstrated accuracy and precision. AI can aid in the advancement of endodontic diagnosis and therapy, which can enhance endodontic treatment results. However, before incorporating AI models into routine clinical operations, it is still important to further certify the cost-effectiveness, dependability, and applicability of these models.
Clinical decision support systems (CDSSs) symbolize a significant transformation in healthcare delivery. CDSS enhances healthcare delivery by enabling personnel in medical institutions to handle complex decision-making processes with great speed and high accuracy. Decision support systems are developed using a knowledge-driven or data-driven approach, although both approaches seem to complement each other. For instance, while data-driven is an objective approach, the knowledge-driven approach is subjective. The objective of the chapter is to elaborate on the integration of data-driven and knowledge-driven methodologies for clinical decision support systems. An overview of data-driven and knowledge-driven approaches is presented with a review of both current and dated literature on the subject with numerous viewpoints to support the discussion. Based on the findings, a promising methodology is proposed that integrates data-driven and knowledge-driven approaches and is believed to overcome the challenges of the individual approaches.
Technical Report
Full-text available
With such a population and density, the enrollment for universities in Indonesia is tight. So there also lies the question, with such demands and requirements every student must prepare, with even the most arbitrary chances they could ever get accepted, how does the government ensure and support all its youth to receive higher education?
As knife-related crimes continue to increase it has become an ever-growing area of concern in the UK. The highest rates were recorded in the London Metropolitan area with a rate of 168 offences involving a knife per 100,000 people in 2017/2018. This is an increase of 26 offences per 100,000 people since the previous year’s statistics (2016/2017). With knife-related incidents continuing to climb on an upwards trend, the Metropolitan Police continues to explore innovative methods to address these worrying statistics. This chapter provides an in-depth analysis on the research of knife-enabled crimes, causes and motivating factors with the focus on the 33 boroughs of London. This chapter reviews the contemporary literature to answer many of the important questions including what are the main causes and who are perpetrators of knife crimes? how are the London Metropolitan Police dealing with knife crime? and how does the London Metropolitan Police force employ big data and predictive analysis for pre-emptive policing? The findings of this chapter uncover not only an analysis of the recent knife crime statistics in London but also a review of the motivations causing individuals to carry a bladed article. The chapter also addresses how technological innovations can be utilised to address the knife crime epidemic. Through investigating pre-emptive policing using big data and innovative technology, the chapter provides a critical review on how technology can help the London Metropolitan Police address knife crime.
Many industries, including financial sector, telecommunications, transportation, and health care, are paying increasing attention to enterprise data fabrics. The continuous data expansion that enterprises are currently experiencing is being driven by digital transformation. As businesses adopt hybrid and multicloud settings, there is a big opportunity for managing data, but it has also greatly increased complexity. A data fabric is a concept that data executives are working over more and more, when it comes to choosing an architecture that supports and improves the data strategy. Enterprise data fabrics, which offer a distributed, functional data platform positioned between software architectures and back-end datasets, are built towards high performance and reliability.
Full-text available
It is noted that the uniqueness of the data as an asset lies in the combination of mutually exclusive characteristics –dynamism (rapid loss of relevance) and at the same time sustainability (preservation of value potential): for example, the company can get all the value using data as an information asset and then sell it at full cost to another organization, for which data potential has not been used. The value of data as an intangible asset is influenced by such qualitative characteristics as: 1) the ability to ensure the security and confidentiality of data; 2) reliability of data; 3) relevance and timeliness of data; 4) analytical infrastructure; 5) flexibility and adaptability of data to specific needs; 6) speed of data extraction and integration for further analytical needs. The sphere of accounting for data as intangible assets is still underdeveloped and requires significant standardization, without which the accuracy of projected income from their use will be low, so it will not reflect the true value of data, which will form the difference between the book value of intangible assets and its actual value. The basic approaches to estimating the value of data are described, namely –through utility (revenue method), market price (market method) and cost of collection and storage (cost method). The market approach is based on determining the value of a similar asset in the market and requires information on the available market value of identical or similar data (active market, market operations). The cost approach involves determining the minimum cost to cover the cost of reproducing similar data or the cost of reproducing the usefulness of data, but the prerequisite is the possibility of reproduction or replacement of data. The revenue approach is based on determining the added value that the consumer can generate with the data asset: additional income (determining value based on projected additional data income after deducting additional costs) or reducing costs (determining value based on savings from data use). A caveat to the income approach is the ability to quantify the benefits of net cash flows that generate data as an asset. None of the described methods of valuing data as an intangible asset has universal application, and therefore can be used only in certain circumstances. It`s underlined that the potential losses of businesses arising from the unauthorized use of data are ignored in data value assessments methodology. Therefore, the approach to estimating losses from data ownership infringement needs to be improved, as potential losses may, under certain conditions, be several times higher than the revenue side of using data as an intangible asset.
Deep learning applications and challenges in big data analytics-Najafabadi et al
Deep learning applications and challenges in big data analytics-Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7
  • Big
  • Applications For
  • Text
  • Audio
  • Video
  • Social
  • Data
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND SOCIAL MEDIA DATA-International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.5, No.1, February 2016