ArticlePDF Available

Using Big Data in the Academic Environment

Authors:
  • National University of Science and Technology Politehnica Bucharest

Abstract and Figures

Massive amounts of data are collected across social media sites, mobile communications, business environments and institutions. In order to efficiently analyze this large quantity of raw data, the concept of Big Data was introduced. This new concept is expected to help education in the near future, by changing the way we approach the e-learning process, by encouraging the interaction between students and teachers, by allowing the fulfilment of the individual requirements and goals of learners.
Content may be subject to copyright.
Procedia Economics and Finance 33 ( 2015 ) 277 – 286
Available online at www.sciencedirect.com
2212-5671 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of Department of Accountancy and Finance, Eastern Macedonia and Thrace Institute of Technology
doi: 10.1016/S2212-5671(15)01712-8
ScienceDirect
7th International Conference, The Economies of Balkan and Eastern Europe Countries in the
changed world, EBEEC 2015, May 8-10, 2015
Using Big Data in the Academic Environment
Banica Logica
a
*
, Radulescu Magdalena
a
a
Faculty of Economics, University of Pitesti, Targu din Vale Street, no. 1, 110040 Pitesti, Romania
Abstract
Massive amounts of data are collected across social media sites, mobile communications, business environments and institutions.
In order to efficiently analyze this larg
e quantity of raw data, the concept of Big Data was introduced. This new concept is
expected to help education in the n
ear future, by changing the way we approach the e-learning process, by encouraging the
interaction between students and teachers, b
y allowing the fulfilment of the individual requirements and goals of learners.
The paper discusses aspects regarding the evolution of B
ig Data technologies, the way of applying them to e-Learning and their
influence on the academic environment. Also, we
have designed a three-step system architecture for a consortium of universities,
based on actual software solutions, having the purpose to analyze, organize and access huge d
ata sets in the Cloud environment.
We focused our research on exploring unstructured data using the graphical Gephi tool.
© 2015 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of [Department of Accountancy and Finance, Eastern Macedon
ia and Thrace Institute of
Technology].
Keywords: Big data; E-learning; Academic environment; Cloud Computing.
JEL classification codes: C82, I21, D83.
1. Introduction
In last years the IT world is
facing with a massive increase in the produced data volume, mainly due to the
Internet services, leading to the redefining the term database. The new concept used for
the description and
*
Corresponding author. Tel. +40-745-227-774
E-mail address: olga.banica@upit.ro
© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer
-review under responsibility of Department of Accountancy and Finance, Eastern Macedonia and Thrace Institute of Technology
278 Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
organization of enormous quantities of data, structured or unstructured, provided by companies, institutions and by
social media environments is Big Data.
The first definition of term originating i
n 1997, and was introduced by two NASA researchers, Michael Cox and
David Ellsworth (1997); in 1998, John R. Mashey, a researcher from
Silicon Graphics Inc. (SGI) used this concept
(1998), and after one year, Bryson et
al. published a paper concerning Big Data in the Communications of the
Association for Computing Machinery (ACM) (1999).
Starti
ng from 2009, research gave birth to hardware
and software solutions, and merged with other technologies
that support the entire ecosystem in order to achieve the desired goals. T
his way, the Cloud Computing environment
provided the resources required to store and access im
portant data volumes, and thus facilitated and developed the
symbiosis of them. These emerging technologies became a foundation for the e-Learning industry and offered an
opportunity to help higher education in the near future, by changing the way to approach e-learning process, by
encouraging the interaction between students a
nd teachers, by allowing them to follow the individual needs and
performances of learners.
The paper aims at analysing these aspects, especially the applications
of Big Data, supported by Cloud
Computing, in the e-Learning process. We designed a three-step model architecture for a university e-learning
system, based on actual software s
olutions, having the purpose to achieve, organize and access huge data sets in
cloud environment.
Our work includes three sections and a Conclusions part. Section
2 presents the three concepts involved (Big
Data, Cloud Computing and e-Learning System), summarizing the state-of-the-art, and investigates several methods
to store, filter and process a large volumes of data, with the
help of commercially-available software solutions. Also,
we briefly presented the characteristics of Gephi s
oftware. Section 3 specifies the methodology used, by presenting
software platforms necessary for each of the three levels of the designed architectural model. Section 4 includes our
solution for
a system that is able to accommodate Big Learning Data in a Public Cloud environment. The main
concluding remarks close the paper, and suggest ways of i
mproving our future research activity in this domain.
2. Literature Review
2.1. Big Data concept
The appropriate definition of the concept, as seen by the authors
of this paper is the following: “Big data is a
massive collection of shareable data originating from any kind of
private or public digital sources, which represents
on its own a source for ongoing discovery, analysis, and Business Intelligence and Forecasting”, according to
Banica et al (2014).
The most important volume of data is provided by social
media sites and mobile networks, but the percent of
useful information is reduced, in comparison with other categories of data sources which are more valuable, such as
financial and governmental institutions, education institutions and the busi
ness environment.
Big data, in the context of e-Learning systems (also called Big Learning Data), c
onsists in the information
sources (courses, modules, experiments etc.) created by the teachers, but especially in data coming from the learners
(students) throughout the education pr
ocess, collected by the Learning Management Systems, social networks,
multimedia, as they were defined by the organization or the professionals.
Oracle described Big Data by
four keys characteristics (the four Vs): Volume, Velocity, Variety and Value as
point out Dijcks (2013) and Briggs (2014). By adding t
hese features to the Big Learning Data, we will further
describe the content and the importance of e
very key characteristic, in the four Vs approach:
x Volume: the size of the data. It is difficult t
o define the limits for Big Data, so this is a very relative aspect for
every domain of application, also for the e
ducation field. In our opinion, even if data originates from thousands of
students in one university, we consider that Big Data term may be use
d if several higher education institutions
collaborate in the information exchange and might bring to
gether learning data and learners.
x Velocity: the increasing flows of data need hardware and communicati
on equipment able to carry more and more
information, and software solutions to process them as fast as possi
ble; Big Learning Data must ensure for
279
Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
students and teachers the quick access to the information, needed in educational process; for example to correct a
wrong answer into an assessment exam or to allow teache
rs to make adjustments to the content of course during
the class, or to answer to the students questions in real time.
x Variety: Big Data is a combination of all ty
pes of formats, unstructured and multi-structured. Therefore Big
Learning Data collects, analyzes and provides the
information with different backgrounds to ensure better
learning resources; the focus is on handling them, so there should be no various behaviors or performances.
x Value: concerns the scientific value or the c
ommercial value of Big Data. So, if for enterprises it is important to
use data originating from social media in combination with internal data in order to develop their business, for
educational environment is more important the degree of innovation. T
he target of Big Learning Data is to obtain
a high level of education and knowledge, and to develop projects in research domains, that lead, as a
consequence, to new solutions in all areas (economic, financial, health, education and social).
2.2. Meting Bi
g Data with e-Learning in Cloud environment
The problem of hardware and software resources neede
d to store huge volumes of data may be solved by using
Cloud Computing technology. Not only the business environment is interested in collecting information from
unc
onventional data sources, but also government agencies, higher education institutions and other organizations
analyze and extract meaningful insights from
this labyrinth of data, underlines Ferkoun (2014), be it security related,
behavioral patterns of consumers or feedback to the e-Learning courses.
It is
difficult to suppose that ordinary businesses and institutions could afford such advanced technological
resources; therefore we believe that the developm
ent of Big Data relies mostly on Public and Hybrid Cloud
implementations. There is a powerful symbiosis between these two technologies, considering that any Cloud
Computing implementation includes a high-capacity storage solution and any Big Dat
a platform needs to collect,
analyze and process large volumes of data, from multiple sources and in various types.
In a
hierarchical structure, the base c
ould be the Cloud, which would offer the resources, Big Data would come in
the middle, which would be responsible for data organ
ization and processing, and at the top we may develop new e-
Learning industry opportunities.
We presented in several papers Cloud-based Big Data scenarios and we em
phasized that Hybrid Clouds are often
the preferred option for the institutions and c
ompanies, which may use Private Clouds to manage internal structured
data, while Public Clouds allow the storage of volum
es of external data or archives (Big Data).
Many powerful corporations like Google,
IBM, Sun, Amazon, Cisco, Intel, and Oracle have invested in a wide
range of cloud-based solutions, which confirms that this is
a technology that they rely on, and from whom there are
great expectations. In such a competitive environment are taken into account many aspects: the storage capacity, the
security of hosted data, the services provide
d, but also the subscription cost.
Cloud Computing has also several
drawbacks, some of them of major importance, according to Banica et al.
(2014):
x data security risks secured access policies are require
d in order to keep unauthorized users away from the
business data;
x data loss challenge all databases are required to implem
ent automatic backup and transaction-based queries,
thus mitigating the chance to affect service
quality;
x system unavailability network outages and OS crashes can negatively affect the perform
ance of the solution,
and redundant architectures must be implemented by providers.
Banica and Burtescu (2014) argue that some of the sec
urity problems were solved, not entirely, but at a
reasonable level by new ways to ensure protectio
n from unauthorized access to the Cloud by layered security
approach (different sets of user privileges, grouped in access roles), by firewall policies, with powerful rule
sets for
Internet and Intranet access, by usage of cryptography for the data, or by using smart solutions for t
raffic filtering
with automated alerting.
Big Data in the Cloud is a powerful platform for the e-Learning world, especially for the
higher education, in
ways that we will try to emphasize in the following section.
280 Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
2.3. Some motivations of introducing Big Data in e-Learning
We consider that the preferred option for the universities is, as for the
business environment, the Hybrid Cloud
solution, which may use Private Clouds for the learning management systems (LMS), while Public Clouds is
dedicated to storing and processing Big Data, consistin
g mainly in unstructured data from students, via social
networks and other media.
Universities all over the world are usi
ng learning management systems (LMS), based on integrated collaborative
software platforms. Applications like wikis, chat rooms and blogs enable teachers to continuously observe and
check the progress of students, and students
to communicate more efficiently among them and with their teachers, in
order to faster and better evolve in a knowledge field.
Resource sharing and exchanges of ideas are the perfect s
upport for a teacher that wants to know the level of
knowledge of the students, about the topics proposed for study. A
discussion of the educational potential of
collaborative software needs to be started from
the point of view of the involved groups of students, on one side, and
from the one of the learning professionals, on the
other side, as pointed out by Banica (2014).
We have asked ourselves, as probably many othe
r university professors did, if this model is working for many
universities, as teachers give to their students some learning items, called Learning Objects, and may cooperate with
them maintaining a professional relationship, by
building weblogs and wikis for courses or projects and the students
could have an appropriate way to communicate with the t
eachers, why is necessary the migration of an LMS to a
Big Learning Data in Cloud environment? Or, more directly, how Big Data could bring performance to e-Learning
process?
The answer is not so simple, taking account of the evolving volum
e of information, the liberty of expression and
the candor to be found in social media and t
he differences between university budgets compared to the strong need
for equal opportunities for the students around the world.
This project is not aimed at the students of a
single university, but to any given consortium of many educational
institutions, that could enrich the knowledge of
the learners and open the way for comparative analyzes, and thus
overcome the lack of financing the Cloud-powered Big Data from local resources.
There are many advantages of Big Data for t
he e-Learning process:
x from teacher’s angle - the opportunity to understand the real patterns of t
heir students, to assess the current level
of their knowledge, as Briggs (2014) says, to determ
ine which parts were too easy or too difficult and improve
the content by enabling them to personalize courses;
x from
the point of view of students ensuring a rich communication and offering e
ndless learning opportunities.
Big Learning Data based on Cloud computing is a new concept and its applicatio
n for higher education is just
starting, but we could say that its deployment invol
ves the following important stages, according to Banica et al.
(2013):
x ensuring a powerful infrastructure, including the hardware a
nd software computing resources required for e-
learning activities;
x finding a unifying solution for the Learni
ng Objects representation, given that most universities have their own
Learning Management System, implemented in private cloud, but it is a different one for each institution;
x keeping the privately-owned cloud to ensure the confidentiality of schola
r records, teaching personnel data and
research projects;
x the access to the educational content and entirely use the collabo
ration tools for the students and teachers from
the universities involved in the e-Learning Cloud project ;
x building open-source cluster architectures for gathering and processing the unstructured information.
Not always the amount of data may be the cause of intr
oducing Big Data in education, the main problem could be
their heterogeneity, most of available data being unstructured, that prevents t
he usage of the classical relational
databases. Therefore, the IT world launched a new wave of tec
hnologies, capable to solve the problem, such as
Hadoop and Spark so
ftware.
281
Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
High-profile vendors such as Amazon Web Services, Google, Microsoft, Rackspace and IBM offer cloud-based
Hadoop and NoSQL database platforms for Big Data, underlines Vaughan (2014). Due to the services that run on
these platforms, and taking advantage of the reduce
d costs and increased flexibility, Big Data on Cloud computing is
the first choice for business companies.
That cannot be said about public institutions or educationa
l organizations, which have a slower adoption rate and
mention the lack of security as the main cause. But the motivation to use the Cloud is far more powerful for the
universities, which can benefit from borderless access to knowledge and information exchange between learners and
educators, and empower scientific research.
3. Methodology
In this section we will briefly present the new type of database - NoSQL, used for storing Big Data, the software
that allows for massively parallel computing Hadoop, and the Social network analysis (SNA) software Gephi.
NoSQL databases (or “Not Only SQL”) encompass a wide
variety of unstructured data described by several
models: key value stores, graph, and document data. NoSQL data may be implemented in a distributed architecture,
based on multiple nodes, able to process and store Big Data.
In the dedicated literature, there are four ty
pes of NoSQL databases, each of them having specific attributes, such
as they are mentioned by Mohamed et al. (2014):
x Key-value store (KVS) uses a hash table of keys and values for desi
gning databases; in this table a unique key
exists, and a pointer to each record of data; the key-value model is inefficient for querying and updati
ng part of a
value. Example of key-value databases: Oracle BDB, Amazon SimpleDB
x Document represents the next level of Key-value type, where data is a c
ollection of key value pairs,
compressed as a document in different format standa
rds, such as XML or JSON. It is a complex category of
storage that enables data querying more efficiently
. Examples of document databases: CouchDB, MongoDB;
x Column refers to a database structure similar to the standard relational
databases, data being stored as sets of
columns and rows. The columns are logically grouped i
nto column families. Column category databases are
recommended to be used when the number of write operations
exceeds reads, for example in logging. Examples
of Column databases: Cassandra, HBase;
x Graph designs the structures where data may be repres
ented as a graph with interlinked elements. Instead the
rigid structure of SQL, based on tables of rows and columns, a
flexible graph model with edges and nodes is
used, scanning across multiple machines. In this category, social networking and maps are the m
ain applications.
Examples of Graph databases: Neo4J, InfoGrid, Infinite Graph.
Choosing a data model for NoSQL solution depends on technical diffe
rences and working features, especially on
keeping data consistency and a very fast retrieval.
The eval
uation of NoSQL implementations takes into account: the storage ca
pacity, the ability to use memory
efficiently, the support for deployment on virtual
machines, and the Cloud, but also the execution time for different
operations, as indicated by Henschen (2014).
Accor
ding to Baby (2014), a researcher from
IGI Global's InfoSci-Dictionary, Hadoop is „an open source, Java-
based programming framework that supports the processing of large data
sets for scalable and distributed computing
environment”.
Essential to the effectiveness of this software is to do t
he processing in proximity to the location where data is
stored and not to bring the data to the computation
units, preventing unnecessary network transfers, point out the
scientists from Yahoo Developer Network (2007). Its parallel-processing capability is better used when deployed in
the Cloud, because large amounts of data stored in the cloud can be processed, queried a
nd analyzed at high speeds.
Hadoop can be installed on any of operating system (OS) families (Linux, Unix,
Windows, Mac OS) and can be
run on a single node or on multi-node cluster. A Hadoop distribution includes two core parts
: the storage
component, Hadoop Distributed File System
(HDFS) and the processing component, the MapReduce engine. The
base Hadoop framework also contains a m
odule for libraries and utilities (Hadoop Common) and a module
responsible for managing and scheduling cl
uster resources (Hadoop YARN). HDFS splits large data files into blocks
282 Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
which are distributed amongst the nodes of the cluster and are managed by them and also are replicated across
several machines, for security reasons, according to Frank Lo (2014). MapReduce is the key component that
Hadoop uses to transfer blocks around a cluster, so that operations
can be run in parallel on different nodes and data
is to be processed locally.
Hadoop has the advantage that it can be
used in the Cloud environment and supports distributed data processing
for Big Data across clusters. Henschen (2014) suggests that today, Hadoop is the pre
ferred solution for Big Data
architectures and practically, the two techno
logies have become synonymous.
The most widespread solution is t
he open source Apache Hadoop distribution, but there are powerful software
corporations, such as IBM and Oracle, that
have their own distributions. Also, Yahoo and Facebook seem to have
the largest Hadoop clusters in the world. T
his technology deployed in Cloud environments is offered as a service by
companies, such as Microsoft, Amazon, and Google.
Figure 1 shows the functional structure of Hadoop, the stages that data
passes through from acquisition to storage
in the NoSQL database.
Fig.1. Big Data Processing with Hadoop (Source: Banica et al., 2014)
An important question that our study should answer is following: how do existing Learning Management
Systems interact with Hadoop?
Obviously, the new architecture for the Learning Management Systems is designed to complement and to extend
the existing systems, and not replace them. As the previous versions of L
MSs are based-on RDBMSs and Hadoop
has native support for extracting data over JDBC, one solution is dum
ping the entire database to HDFS and making
updates, according to Bisciglia (2009). Another option coul
d be the transfer of LMSs structured data into a
consolidated Data Warehouse. There are several t
ools to do these operations, such as Apache Sqoop or ETL
(Extract, Transform, Load) process, which can c
ollect data originated from external databases.
After the output data storage coul
d be applied conversion and filtering operations and then conducted advanced
analysis, having different goals: finding correlations across multiple data sources
, predicting an entity behavior, or
analyzing social networks.
Node
1
Node
2
Node
N
REDUCING
PROCESS 1
REDUCING
PROCESS 2
REDUCING
PROCESS 3
INPUT
DATA
DATA
DISTRIBUTION
TEMPORARY
STORAGE OF
DATA
REDUCING
PROCESS
INTERMEDIATE
DATA FROM
MAPPERS
OUTPUT
DATA
Node
1
Node
2
Node
N
MAPPING
PROCESS 1
MAPPING
PROCESS 2
MAPPING
PROCESS 3
283
Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
In our study, we suggest that the system will apply a new filter in order to search for certain keywords, and store
the results in .csv, .gml, .gdf, .gefx files, or even spreadsheets, from
which they can be further explored using social
network analysis (SNA) software, such as Gephi. In a Report of the International
Institute for Sustainable
Development, published on 2012, Ryan and Creech (2012) mentioned that „Social network analysis software is used
to identify, represent, analyze, visualize, or simulate nodes (e.g. agents, organizations,
or knowledge) and edges
(relationships) from various types of input data (relational and non-relational), including mathematical models of
social net
works.”
In addition to providing a networked visualization, such software generates m
etrics, identifies subgroups in a
network, clusters of actors or individuals, or emphasizes is
olated nodes of the network.
Krebs (2013) considers centrality a commonly used measure, which refers to
the importance of a node into the
network and the hierarchy of the entire network. Another
important measure in SNA is network density. This
measure is useful for assessing the overall relationships within a network of n nodes.
Gephi is an int
eractive visualization and exploration platform for all kinds of networks a
nd complex systems, up
to 50,000 nodes and 1,000,000 edges. In opinion of Bastian et
al. (2009) this is an application that implements the
most frequently used algorithms in descriptive statistics for networks.
After the graph was built, controls can
be applied in order to select nodes and/or edges, to view their implications
on the network structure or to measure average accesses, the groups with most frequent accesses. The graph can be
undirected, representing only symmetric relations, directed, for asym
metric and symmetric relations and weight,
representing intensities, distances or costs of relations. Gephi works with imported files from
.csv, .gml, .gdf and
.gefx format, which can be achieved with software converters (e.g. Facebook or Twitter to .gdf files) from
unstructured data, by applying an algorithm that transforms key-value words on nodes and their connections on
edges. The Gephi tool was successfully used to a
nalyze a personal page of the Facebook network.
In our case, we will analyze social network data from the point
of view of a teacher, in relation with their
students, and the interconnections among t
he students themselves.
4. A Model
for Big Learning Data on Cloud Architecture
In this section we will introduce a three-layered Cloud-enabled Big Data architecture for e-Learning, based on a
Hadoop cluster which belongs to a given consortium of universities.
The main task is not Hadoop itself, as it is offered a
s SaaS by many cloud providers, but integrating it with the
existing LMSs that the universities usually already ow
n. Thus, our model refers to a unified architecture, using
Hadoop as a data integration platform.
The power of Big Data consists in aggr
egating flows from social media, such as course information and
availability, services, project collabora
tion and all gathered feedback, for the education environment.
For the moment, teachers can continue the
ir activity without taking Big Data into account, but if they want
relevant insights on their real efficiency and the progress of their students, t
hey need to integrate this kind of
solutions. Thus, Big Learning Data is not an unstoppable information flow that affect operational applications, but a
chance to refine the educational process, to adapt to the new requirements. The key is not sheer information volume,
but it is complexity and diversity.
So, the LMS from each university of the consortium should accept t
he transfer of their Learning Objects, Student
Information System and Te
acher Information System into the Data Warehouse. Data Warehousing (DW) is a
method of organizing structured data, built upon a relational database, therefore it needs to have all the information
consistent, organized and standardized. But, a DW cannot capture an important data segment (clickstream logs,
sensor and location data from mobile devices, customer emails and c
hat transcripts) and this is the point where Big
Data systems prove their importance, allowing to analyse and e
xtract educational value from this unstructured
information.
There are two scenarios of the integration between Hadoop and DW, acc
ording to Dumitru (2011):
x Using ETL process (Extract, Transform, Load) to obtain Data Warehouse from heterogeneous data collected by
Hadoop and then applying advanced analytics to Data Warehouse;
284 Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
x Considering Hadoop as a data integration platform, it is able to collect data from all type of data sources, then to
process it in order to make the data suitable for analysis.
The model presented in intended to work on both scenarios, taking into account that its target is the educational
world and sometime is needed a subset of the initial data (first version) and, most of the time, is needed access to the
actual data, not only to a subset of it (second version). Also, we focused on the third level, which begins after the
data flow was processed based on Map and Reduce functions.
For both SQL and NoSQL databases, the proposed solution involves a classification software that could
differentiate domains and subdomains, a categorization that is standard for Data Warehouses, and that must be
generated for unconventional data representations. For example, a criterion could be the counting of word
occurrences and comparing this number with Domain Dictionaries. A metadata level is added, in order to direct data
to specific storage locations (SQL database or unstructured datasets). The identification of trends, patterns or
clusterization models is based on keywords, which trigger a parallel processing of the data stored for the required
domain/subdomain, using both NoSQL and SQL systems, as we already claimed in another paper, Banica et al.
(2014).
In figure 2 is shown the Big Learning Data model proposed for a consortium of Romanian universities, in a three-
level structure:
Fig.2. A model for Big Learning Data on Cloud architecture
NoSQL
Second level Attaching Domain/
Subdomain Metadata to the NoSQL
and SQL databases
Formatting
Index
catalogues
World
Wide
Web
Social
Data
Node
M
Mapping
and
Reducing
Task 1
Mapping
and
Reducing
Task M
Node
1
First level Gathering and Processing
data with Hadoop Apache
UNIVERSITIES PUBLIC CLOUD
F
ormattin
g
Index
c
atalo
g
ue
s
Processing rules
NoSQL
Databases
Data
Warehouse
ANALYTICS
BI Tools
Visualization
& Analytics
Query Tool
Third level Accessing
Information from NoSQL and
SQL Databases on Clouds
Gephi Tool
285
Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
x Gathering and Processing of all kinds of data (structured and unstructured) with Apache Hadoop, interesting data
for the universities involved in the project;
x Attaching them Domain/Subdomain Metadata (different treatm
ent for NoSQL and SQL databases);
x Processing filtered subsets of NoSQL Databases with SNA softwa
re, such as the Gephi tool, and finding a
pattern, a trend as the response to the search requirements.
The first level is designated for collecting any type of data based on software t
ools such as Apache Flume
(unstructured data), Apache Sqoop (structured data) and ETL (struct
ured data) and processing them using Hadoop
cluster. The most important volumes of data are the one originating from social media, containing relatively low
amounts of useful information, compared to a smaller volume of data from educational institutions, containing a
bigger percent of useful information. This level can be based on Public Clouds
or Grid computing environments, but
we find the Public Cloud approach more sui
table for the academic budgets.
At the second level we propose a new software layer, havi
ng the goal to realize a classification of the data stored
in the Hadoop cluster nodes into domains and subdomains, usi
ng dictionaries and the rule set that inserts specific
metadata. For structured data, there are a several tools for metadata management into the Data Warehouse, such as
MetaStage. The core components of Hadoop itself have no special capabilities for cataloging, indexing, or querying
structured data.
The thir
d level is focused on further ways to access the information. We consider that an efficient method to
query this kind of data would be to build Index Catalogues, by using accompanying metadata and place the original
information into separate storage spaces ac
cording to its type. Our model involves also the existence of a search
engine for the Index Catalogues, based on keywords and dat
a types.
Also, we are interested in how unstructured data can be
processed in order to discover useful insights for teachers
and students. In NoSQL databases, a search is performed using Index C
atalogues, by mentioning the search terms
(such as: university, course title and teacher name) and results are stored in Gephi-compatible formats (such as .csv,
spreadsheets, .
gml, .gdf etc).
The Gephi node-table is built, starting from the main node (teache
r name) and followed by lower-level nodes
(students enrolled in the course or interested on that topic). The next ste
p consists in creating the connection table,
where all information related directly or indirectly to the teacher are placed.
5. Conclusions and Future Work
The traditional e-Learning architecture is obsolete and a new data model, that integrates Hadoop t
o the existing
systems, is emerging in the IT world.
The entire solution described is efficient,
because activities are separated on levels and resources, the traffic is
managed by Hadoop in Clouds, and the analysis is able to add graphic representations to other ty
pes of results.
Considering that Big Data in the Cloud e
nvironment solutions, promoted by the biggest software companies, are
performant, but also expensive, we will recomm
end a unified learning management system based on open-source
software. Thus, the evolution of open-source products for this domain will allow unive
rsities to benefit from this
new trend that empowers today’s education.
In our future research we intent to implem
ent a multiple node Hadoop cluster and evaluate its performance
working with structured data from our university LMS and unst
ructured data from Social media.
We will expand this project by approaching the second level, by
assessing the available tools that allow adding
metadata to the relational and No-SQL databases, and creating index catalogues for interest domains in order to
im
prove retrieval.
Also we intend to analyse the performance of several analytic tools (Level 3 of the proposed model), testing them
against workloads with query and graphic interpretation of Big
Learning Data.
286 Banica Logica and Radulescu Magdalena / Procedia Economics and Finance 33 ( 2015 ) 277 – 286
References
Baby, N., Pethuru, R., IGI Global, 2014, Big Data Computing and the Reference Architecture, What is Hadoop, http://www.igi-
global.com/dictionary/hadoop/12699
Banica, L., Burtescu, E., Stefan, C., 2014, Advanced S
ecurity Models for Cloud Infrastructures, Journal of Emerging Trends in Computing and
Information Sciences, Vol. 5, No. 6, pp. 484-491
Banica, L., 2014, Different Hype Cycle Viewpoints for an e-learning System, Journal of Research & Method in Education, Vol. 4, Issue 5, pp.88-
95, 2014
Banica, L., Stefan, C., Rosca, D. & Enescu, F., 2013,
Moving from Learning Management Systems to the e-Learning Cloud, AWERProcedia
Information Technology & Computer Science. pp 865-874, www.awer-center.org/pitcs.Bisciglia, C., 2009, 5 Common Questions About
Apache Hadoop, available at http://blog.cloudera.
com/blog/2009/05/5-common-questions-about-hadoop/
Banica, L., Paun, V., Stefan, C., 2014, Big Data leverages Cloud Co
mputing opportunities, International Journal of Computers & Technology,
Volume 13, No.12, http://cirworld.org/journals/index.php/ijct/article/view/3036
Bastian M., Heymann S., Jacomy M., 2009, Gephi: an open s
ource software for exploring and manipulating networks, International AAAI
Conference on Weblogs and Social Media, San Jose, USA
Briggs, S., 2014, Big Data in Education: Big Potential or
Big Mistake? http://www.opencolleges.edu.au/informed/features/big-data-big-potential-
or-big-mistake/
Bryson, S., Kenwright, D., Cox, M., Ellsworth, D., and Haimes, R., 1999, Visually exploring gigabyte data sets
in real time,
http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/
Cox, M., Ellsworth, D., 1997, Application-Controlled Demand Paging for Out-of-Core Visualization, Proceedings of the 8th IEEE Visualization
'97 Conference, available at
http://www.evl.uic.edu/cavern/rg/20040525_renambot/Viz/ parallel_volviz/paging_outofcore_viz97.pdf
Dijcks, J., 2013, Big Data for Enterprise, http://www.oracle.co
m/us/products/database/big-data-for-enterprise-519135.pdf
Dumitru, A., 2011, Hadoop - Enterprise Data Warehouse Data Flow Analysis and Optimization
, OSCON Open Source Convention, Portland,
available at http://www.oscon.com/oscon2011/public/schedule/detail/21348
Ferkoun, M., 2014, Cloud computing and big data: An ideal combination, available
at http://thoughtsoncloud.com/2014/02/cloud-computing-and-
big-data-an-ideal-combination/
Frank Lo, 2014, Big Data Technology, available at
https://datajobs.com/what-is-hadoop-and-nosql
Henschen, D., 2014, 16 Top Big Data Analytics Platform
s, available at http://www.informationweek.com/big-data/big-data-analytics/16-top-big-
data-analytics-platforms/d/d-id/1113609
Krebs, V., 2013, Social Network Analysis, A Brief Introduction, available at
http://www.orgnet.com/sna.html
Mashey, R., J., 1998, Big Data and the Next Big Wave of InfraStr
ess, Usenix conference, available at http://static.usenix.org/event/
usenix99/invited_talks/
mashey.pdf.
Mohamed, A., M., Altrafi, O.,G., Ismail, M., O., 2014, Relational vs.
NoSQL Databases: A Survey, International Journal of Computer and
Information Technology, Vol.3, Issue 3, pp. 598-601
Ryan, C., Creech, H., 2012, An Experiment With Social Network Analysis, available at
https://www.google.ro/?gws_rd=ssl#q=as+experiment+
Network+analysis+software
Vaughan, J., 2014, Big data and cloud computing look for bigger foothold in enterprises, http://searchdatamanagement.techtarget.com//Big-data-
and-cloud-computing-look-for-bigger-foothold-in-enterprises
Yahoo Developer Network, 2007, Hadoo
p Tutorial, available at https://developer.yahoo.com/hadoop/tutorial/
... In line with developing big data model for Smart e-learning system this paper introduces a reference architecture for the development of future and big-data-capable elearning platforms. Although many studies were conducted on big data in enhancing teaching and learning process as well as on integration of smart technologies and big data within learning applications (Hudaet al., 2016;Hudaet al., 2017aHudaet al., , 2017bMancaet al., 2016;Cenet al., 2015;Logica and Magdalena, 2015;Dietz-Uhler and Hurn, 2013;Yu and Jo, 2014;El-Mabrouket al., 2017;Kolekaret al., 2017;Habeggeret al., 2014, Caviglione andCoccoli, 2018), there are few scholarly attentions to address the reference architecture for the development of future and big-data-capable e-learning platforms. ...
... In fact, the rich variety of personal information can be used to profile students as well as to infer political views or attitudes for commercial purposes. Logica and Magdalena (2015) discussed opportunities of big data in education, with special reference to the academic scenario. ...
Conference Paper
Full-text available
Nowadays, the Internet connects people, multimedia and physical objects leading to a new-wave of services. This includes learning applications, which require to manage huge and mixed volumes of information coming from Web and social media, smart-cities and Internet of Things nodes. Unfortunately, designing smart e-learning systems able to take advantage of such a complex technological space raises different challenges. In this perspective, this paper introduces a reference architecture for the development of future and big-data-capable e-learning platforms. Also, it showcases how data can be used to enrich the learning process.
... Liang et al. (2016) and presented a case study on dropout prediction in MOOCs using big data analytics, demonstrating the feasibility of predictive modeling in identifying at-risk learners. Logica and Magdalena (2015) explored the use of big data in the academic environment, discussing its potential applications for institutional research, student support, and educational policy development. Maldonado-Mahauad et al. (2018) mined theory-based patterns from big data to identify self-regulated learning strategies in massive open online courses, offering insights into effective learning behaviors and interventions. ...
Article
Full-text available
The integration of big data technologies in higher education is a topic of growing interest due to its potential to revolutionize teaching, learning, and administrative processes. This study aims to explore the impact of big data technologies on educational practices and outcomes in higher education settings. Through a comprehensive investigation, including literature review, surveys, and statistical analysis, the study examines the utilization, effectiveness, and challenges associated with integrating big data technologies in educational settings. Key findings reveal a significant positive correlation between the utilization of big data technologies and the frequency of interaction among faculty, researchers, and practitioners. Additionally, faculty training is identified as a crucial factor influencing the successful integration of big data technologies in higher education. Institutional support emerges as a key facilitator in the effective implementation of big data technologies, while student readiness, including technological proficiency and willingness to engage, is found to positively correlate with integration efforts. The perceived effectiveness of big data technologies mediates the relationship between integration efforts and outcomes in higher education settings. Based on these findings, recommendations are provided to enhance the integration of big data technologies in higher education, including the need for continuous faculty training, institutional support, and student readiness initiatives. Overall, this study contributes to the ongoing discourse on leveraging data-driven approaches to enhance educational practices and outcomes in higher education
... As the carrier of various information, it has rich knowledge and great value (Rodríguez-Mazahua et al. 2016). At present, big data has penetrated into society, economy, scientific research and other aspects, providing important support for people to understand the essential laws of social development in a data-driven manner, and has become an important topic for academic research of various industries and governments (Logica and Magdalena 2015). However, big data is different from traditional large-scale data. ...
Article
Full-text available
Due to the improvement of Internet technology and information technology, more and more students hope to learn and consolidate knowledge through video in the classroom. Teachers are more accustomed to using video in the classroom to improve and improve their teaching quality. In the current English class, teachers and students are more accustomed to using video English for teaching. English teaching videos are informative, intuitive and efficient. Through video teaching, we can make the classroom atmosphere more interesting, thus simplifying complex problems. In this context, this paper analyzes how neural networks can improve the application effect of English video courses in the context of big data, optimizes the PDCNO algorithm by using the neural network principle, and then discusses the impact of the optimized PDCNO algorithm on classification and system performance. This improves the accuracy of English video, reduces the execution time of the algorithm and reduces the memory occupation. Compared with ordinary video, the training time required under the same training parameters is shorter, and the convergence speed of the model itself will be faster. From the students' attitude towards video teaching, we can see that students prefer video English teaching, which also reflects the effectiveness of neural network big data in English video teaching. This paper introduces the neural network and big data technology into the video English course to improve the teaching effectiveness.
... It consists of foreign keys referencing the primary keys of the dimension tables. Logica & Magdalena (2015) in their work presented a three-step system architecture allowing them access in big data stored in the Cloud arranged efficiently so that they can conduct social Network Analysis. DWs solutions have been also proposed in secondary education providing the mechanism to reveal strength and weaknesses of the educational setting (Ramos et al., 2015). ...
Article
New challenges in education demand effective solutions. Although Learning Analytics (LA), Educational Data Mining (EDM) and the use of Big Data are often presented as a panacea, there is a lot of ground to be covered in order for the EDM to answer the real questions of educators. An important step toward this goal is to implement holistic solutions that allow educational stakeholders to engage in the core of the EDM processes. The effectiveness of such an attempt relies on (a) having access to data arranged in an organized and meaningful way and (b) setting a sequence of processes that are flexible and reusable. Therefore, a data pipeline that imports data from a specially developed data warehouse is designed and created. Additionally, it is tested in real-life data, and results are discussed.
... It defines the life cycle of big data and categorizes various tools and technologies based on data retrieval, data storage, data analysis and exploitation of available data and implementation of life cycle conditions. This article or section requires sources or references that appear in reliable, third-party publications [7]. This insight has the potential to change the future of many businesses through data-driven decision making. ...
Article
Full-text available
Big Data is a collection of technologies developed to store, analyze and manage this data. It is a macro tool. Today it is used in fields as diverse as medicine, agriculture, gambling and environmental protection. Machine learning, forecasting Companies use big data to streamline their marketing campaigns and techniques. Modeling and other advanced analytics applications enable big data organizations to generate valuable insights. Companies use it in machine learning. Programs cannot balance large data in any particular database. When datasets are large in size, velocity, and volume, the following three distinct dimensions constitute "big data." Examples of big data analytics include stock markets, social media platforms, and jet engines. But it is growing exponentially with time. Traditional data management tools cannot store or process it efficiently. It is a large scale technology developed by Story, Analysis and Manoj, Macro-Tool. Find patterns of exploding confusion in information about smart design, however, implying that various big data structures are structured, whether this includes the amount of information, the speed with which it is generated and collected, or the variety or scope of related data points. In the past, data was collected only from spreadsheets and databases of large, disparate information that grew at an ever-increasing rate. Very large, complex data refers to large amounts of data for sets that are impossible to analyze with traditional data processing applications. Is software configuration used to handle the problem? Apache is an advanced application for storing and processing large, complex data sets and analyzing big data. Analytical techniques against very large, heterogeneous databases containing varying amounts of data, big data analytics help businesses gain insights from today's massive data sources. Defined as software tools for processing and extraction. People, companies and more and more machines are now developing technologies to analyze more complex and large databases that cannot be handled by traditional management tools. It is designed to discover patterns of chaos that explode in information in order to design smart solutions.
Chapter
Big data has commonly been linked to volume, speed, and variety and the way in which data are produced and stored. This chapter aims to explore the trends, classify the research themes, and discuss the limitations of studies that approach big data associated with mathematical and statistical education contexts. The discussion intends to contribute to possible future directions toward a statistics literacy that critically approaches big data in mathematics and statistics education. The text focuses on reflections of an integrative review of articles published in the Web of Science database with the following descriptors: big data, literacy, and teacher education. The results indicate a polysemy referring to the term big data. The review also suggests that studies linking big data to mathematics and statistics teacher education are scarce. In the thematic analysis, some surprising questions emerge, such as the narrative—or even folklore—around big data as a form of knowledge capable of solving many economic, social, and, particularly, educational problems. Finally, the results revealed that most studies have a predominantly non-critical perspective of big data in the interface with mathematics and statistics education.KeywordsBig dataStatistical literacyStatistics educationMathematics educationTeacher education
Conference Paper
Big Data is the term used to describe the enormous amount of information produced due to technological advancements and users’ ongoing actions and interactions in digital environments. In recent times, this phenomenon has grown in prominence. As a result, analysis of this vast amount of data has started to enhance the teaching-learning process. Big Data has a huge potential in higher education institutions for monitoring, understanding, and evaluating educational processes and guiding efforts to increase educational efficacy and actualize highly effective learning and teaching. Moreover, it invigorates new research concepts and techniques, employs state-of-the-art tools and methods for collecting and examining data, ultimately becoming a widely accepted part of the research field.
Article
The a i m o f the s tudy is t o a ppl y the the or y o f p eda gogic a l te c hn ologi e s to reveal the possibilities of the new content and methodological line “Big Data Analysis” in the aspect of modernizing the training system of the future economist. Materials and methods. During the study, theoretical and empirical research methods were used, in particular, a theoretical analysis of methods for structuring the content of education and managing the educational and cognitive activities of higher school students based on technological goal-setting and identifying a sequence of tasks, according to the content and methods of solving graduates close to future professional activities; studying the products of pedagogical activities of lecturers of higher education and experimental work, including the method of pedagogical experiment. Results. The necessity of modernization of the system of professional training of the future economist in the context of the development of data science through the selection and implementation of a new content-methodological line “Big Data Analysis” is substantiated. It points to the demand for methods of pedagogical design and the theory of pedagogical technologies for the methodologically expedient inclusion of elements of Big Data theory in the practice of professional training of future bachelors of economics. At the same time, attention is paid to both the content of already developed academic disciplines “Probability theory and mathematical statistics”, “Decision theory”, “ S yste m ’s ana l y sis i n E co n o m ics” , “ In stru m e n t a l m et h ods i n Economics”, and the setting of new professionally significant academic disciplines related to quantitative justification of the decisions made. The article presents and methodically describes the components of the content-methodological line “Big Data Analysis”: firstly, a sequence of six micro-goals that allow setting the implementation of this contentmethodological line in the language of educational and cognitive activities of the future bachelor of economics and taking into account the capabilities of new digital tools that support Big Data analysis models; secondly, five didactic modules that can be used to form individual educational trajectories of students of economic bachelor’s degree. Six types of application tasks are presented and characterized, which are of fundamental importance for the implementation of this content-methodological line. These tasks include the following: “Application problem for the analysis of Big Data on the RapidMiner platform”; “Data clustering application”; “Applied problem of soft and hard clustering”; “Applied classification problem”; “Applied problem on the application of methods for finding association rules”; “Applied problem for text mining”. Conclusion. The approach proposed by the authors to structuring the content of professional training of the future bachelor of economics allows us to maintain the balance of four educational components of the content and methodological line “Big Data Analysis”: experience in cognitive and creative activities, experience in implementing standard methods of activity and emotional and value relations (ideals of entrepreneurship, value orientations and motives of economic activity, etc.) The material presented in this article can be useful for lecturers of the higher economic school, as well as for everyone who is interested in the modern achievements of data science.
Article
The paper examines the special aspects of using Big Data technology in education. The population was made up of 356 third-year university students. To study Big Data technology, a questionnaire was used where respondents rated: cloud technology; apps; Massive Open Online Courses (MOOCs) and digital learning platforms. The study suggested that the education sector is ambitiously applying Big Data technology, both online and offline. All surveyed respondents use apps in Big Data learning and analysis: 73.03% use Moodle, 67.13% use Zoom, 65.17% use Quizlet, 50.84% use Skype, and 35.11% use Slack. MOOCs in education are used by 75% of respondents. Digital learning platforms are used by all respondents. All students use cloud technology. When dealing with Big Data technologies, students preferred apps (8.9 ± 1.33) instead of the cloud (6.9 ± 0.11). Students believe that the important factors for using Big Data in the learning process include: quality of information (85.96%); interest (77.81%); instructor’s support (66.85%). The research findings make it possible to integrate Big Data technology into the learning process, thus improving learning outcomes and providing greater speed in processing reliable and meaningful data.
Article
Full-text available
Penelitian ini bertujuan untuk mengetahui pengaruh penggunaan sistem informasi akademik di lingkungan pendidikan tinggi. Melalui analisis, manfaat yang diberikan oleh sistem informasi akademik yaitu dapat meningkatkan kinerja dan kemampuan belajar mahasiswa yang menjadikan pelajaran lebih pribadi; pembelajaran dapat disesuaikan dari guru atau dosen dengan bantuan analitik; sistem informasi akademik mampu membuat administrasi pendidikan tinggi menjadi lebih efektif. Metode yang digunakan dalam penelitian ini adalah jenis penelitian kepustakaan ( library research ). Berdasarkan penelitian yang telah dilakukan, dapat disimpulkan bahwa sistem informasi akademik memiliki pengaruh terhadap pekerjaan di lingkungan pendidikan tinggi, sehingga pekerjaan menjadi lebih efektif, mudah dan terkendali. Implikasi penelitian ini bagi perkembangan ilmu pengetahuan yaitu sebagai bahan pertimbangan bagi pendidikan tinggi untuk memanfaatkan sistem informasi akademik dalam pengelolaan manajemen data.
Article
Full-text available
Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results. We present several key features of Gephi in the context of interactive exploration and interpretation of networks. It provides easy and broad access to network data and allows for spatializing, filtering, navigating, manipulating and clustering. Finally, by presenting dynamic features of Gephi, we highlight key aspects of dynamic network visualization.
Article
Full-text available
Nowadays the information is available on large scale, spatial and temporal barriers have been surpassed and the limits in data storage volume are constantly being pushed further. The biggest confrontation with huge amounts of data is specific to search engines. One of the most important data sources today is originated in social media sites and mobile communications which, among data from business environments and institutions, lead to the definition of a new concept, known today as Big Data. The paper aims at discussing aspects regarding the evolution of two technologies (Big Data and Cloud Computing) and their fusion. Also, we have concentrated our efforts around the principles to achieve, organize and access huge datasets in Cloud environment, offering a 3-step system architecture, based on actual software solutions.
Article
Full-text available
Cloud Computing is the next-generation Internet paradigm that is offered to the business environment, educational domain and, generally, to individual users. While educational institutions and individuals quickly embraced this new technology, companies were reluctant, most of their doubts being related to the security problems. With that in mind, companies prefer Private Cloud model due to their tight security policies for data and applications. Also, a preference to address the security problems on the client-side always existed and will continue to do so in order to keep control on the confidentiality of company data. In this paper we focused on the analysis of several security methods applied in Cloud Computing environments and we proposed two security models that can overcome the safety issues.
Article
Full-text available
This paper uses Gartner Group’s Hype Cycle as an analysis tool for the e-learning platform integration process at the University of Pitesti. The platform was installed three years ago and during this time there were many points of view (by major user groups) regarding the usefulness and adoption rate. Based on this fact, according to us, the e-learning Hype Cycle has two graphical representations, two variation curves that have the first part identical but differ significantly in the following two stages, differences that disappear in the end. Based on our analysis, we found necessary to improve the management strategies, to better motivate the teachers in order for them to reach the same level of interested shown by the students in using the system, and thus bringing the two educational groups to the same opinion. We have depicted all this evolution and the different points of view by the Hype Cycle curve, that should be only one for the 5th year of system presence in our institution, a convergence reached a bit too late in the IT world, where software solutions have a limited life cycle.
Article
Full-text available
The huge growth in the Internet market and the emerging of the new web technologies and the trend toward what is called web 2.0 and recently web 3.0 come with a new challenges, new applications and new concepts such as NoSQL databases which is recently becomes a very popular as an alternative to the relational databases specially in dealing with large data which is one of the most common features of web today, providing high availability and scalability to the distributed systems which need fast access time and can’t tolerate any down time during failures and have been used heavily by the big enterprises and web companies such as Facebook, amazon and Google. Every new technology faced many challenges like Security vulnerabilities. This paper addresses the concepts of NoSQL, the movement, motivations and needs behind it, and reviews the types of NoSQL databases and the issues concerning to these databases mainly areas of application and the security issues compared with traditional relational databases.
Chapter
Earlier, the transactional and operational data were maintained in tables and stored in relational databases. They have formal structures and schemas. However, the recent production and flow of multi-structured data has inspired many to ponder about the new ways and means of capturing, collecting, and stocking. E-mails, PDF files, social blogs, musings, tweets, still photographs, videos, office documents, phone call records, sensor readings, medical electronics, smart grids, avionics data, real-time chats, and other varieties of data play a greater role in presenting highly accurate and actionable, timely insights for executives and decision-makers. The chapter provides an insight into the big data phenomenon, its usability and utility for businesses, the latest developments in this impactful concept, and the reference architecture.
Chapter
Humans have been generating data for thousands of years. More recently we have seen an amazing progression in the amount of data produced from the advent of mainframes to client server to ERP and now everything digital. For years the overwhelming amount of data produced was deemed useless. But data has always been an integral part of every enterprise, big or small. As the importance and value of data to an enterprise became evident, so did the proliferation of data silos within an enterprise. This data was primarily of structured type, standardized and heavily governed (either through enterprise wide programs or through business functions or IT), the typical volumes of data were in the range of few terabytes and in some cases due to compliance and regulation requirements the volumes expectedly went up several notches higher.
Chapter
Earlier, the transactional and operational data were maintained in tables and stored in relational databases. They have formal structures and schemas. However, the recent production and flow of multistructured data has inspired many to ponder about the new ways and means of capturing, collecting, and stocking. E-mails, PDF files, social blogs, musings, tweets, still photographs, videos, office documents, phone call records, sensor readings, medical electronics, smart grids, avionics data, real-time chats, and other varieties of data play a greater role in presenting highly accurate and actionable, timely insights for executives and decision-makers. The chapter provides an insight into the big data phenomenon, its usability and utility for businesses, the latest developments in this impactful concept, and the reference architecture.