Content uploaded by Sohail Jabbar
Author content
All content in this area was uploaded by Sohail Jabbar on Nov 25, 2016
Content may be subject to copyright.
Big-data: transformation from heterogeneous data
to semantically-enriched simplified data
Kaleem Razzaq Malik
1
&Tauqir Ahmad
1
&
Muhammad Farhan
1,2
&Muhammad Aslam
1
&
Sohail Jabbar
2,3
&Shehzad Khalid
3
&Mucheol Kim
4
Received: 15 May 2015 / Revised: 7 August 2015 /Accepted: 24 August 2015 /
Published online: 8 September 2015
#Springer Science+Business Media New York 2015
Abstract In big data, data originates from many distributed and different sources in the shape
of audio, video, text and sound on the bases of real time; which makes it massive and complex
for traditional systems to handle. For this, data representation is required in the form of
semantically-enriched for better utilization but keeping it simplified is essential. Such a
representation is possible by using Resource Description Framework (RDF) introduced by
World Wide Web Consortium (W3C). Bringing and transforming data from different sources
in different formats into the RDF form having rapid ratio of increase is still an issue. This
requires improvements to cover transition of information among all applications with
Multimed Tools Appl (2016) 75:12727–12747
DOI 10.1007/s11042-015-2918-5
*Muhammad Farhan
farhansajid@gmail.com
Kaleem Razzaq Malik
krmalik@gmail.com
Tauqir Ahmad
tauqir_ahmad@hotmail.com
Muhammad Aslam
maslam@uet.edu.pk
Sohail Jabbar
sjabbar.research@gmail.com
Shehzad Khalid
shehzad_khalid@hotmail.com
Mucheol Kim
mucheol.kim@gmail.com
1
Department of Computer Science and Engineering, University of Engineering and Technology,
Lahore, Pakistan
2
Department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan
3
Department of Computer Science, Bahria University, Islamabad, Pakistan
4
Department of Multimedia, Sungkyul University, Anyang-si 430-742, Republic of Korea
induction of simplicity to reduce complexities of prominently storing data. With the improve-
ments induced in the shape of big data representation for transformation of data to form into
Extensible Markup Language (XML) and then into RDF triple as linked in real time. It is
highly needed to make transformation more data friendly. We have worked on this study on
developing a process which translates data in a way without any type of information loss. This
requires to manage data and metadata in such a way so they may not improve complexity and
keep the strong linkage among them. Metadata is being kept generalized to keep it more useful
than being dedicated to specific types of data source. Which includes a model explaining its
functionality and corresponding algorithms focusing how it gets implemented. A case study is
used to show transformation of relational database textual data into RDF, and at end results are
being discussed.
Keywords Resource description framework schema (RDFS) .Big data .Data representation
1 Introduction
The constant increment in the volume and subtle element of data caught by associations, for
example, the ascent of social media, Internet of Things (IoT), and multimedia, has delivered a
mind-boggling stream of data in either organized or unstructured arrangement. Data creation is
happening at a record rate, alluded to in this as big data, and has risen as a generally perceived
pattern. Big data is inspiring consideration from the educated community, government, and
industry [15]. So as to comprehend data and to pick up knowledge, the data must be sorted,
changed, blended and prepared both measurably and logically. The potential development
increment in data bases in all zones makes the investigation of colossal measures of more
intricate data more difficult and less clear. A noteworthy test for specialists and professionals is
that this development rate surpasses their capacity to outline fitting data representation for data
examination and upgrade concentrated workloads [20].
Today, semantic web innovation is spoken to by the as yet advancing Web Ontology
Language (OWL), or ‘web cosmology dialect’.(Philosophy^isoftenusedtoalludetoany
component of learning representation.) As it happens, the development of the semantic web,
including OWL and its colleagues, for example, RDF (‘asset depiction structure’). Where RDF
is introduced by W3C as a standard structure for representing data in semantic information
which links data in hierarchal form at the level of metadata. Generally RDF is used to represent
information and resources on web. Where these resources need to be interpreted by machine
with the use of reasoning and rules [25]. RDF is used to represent data in hierarchal
classification based relationships [19]. These RDF based data classifications can’t capture
complete semantics of a relationship. But it is really hard to handle complexity with higher
level predicates, for example, OWL [26]. This makes RDF most suitable and simple as data
representation used for Big Data with some updates.
There are multiple strategies available for transforming data into ontology. Among which
one is to use an intermediate standard of xml as a mean to both to transform into any other
computer platform based data form. Another way is to use ERD graphs to be transformed into
ontology graphs or Class Diagrams [11]. Regular expression a mathematical approach can also
be used to convert among relational DB and OWL. There is also another approach, focusing
on storing such an information of triples taken from RDF in the shape of resource-property-
value as defined in a W3C standard in Database for storage purposes [24]. Under the umbrella
12728 Multimed Tools Appl (2016) 75:12727–12747
of big data representation plays major key role for analyzing, storage, retrieval, purifying and
visualization of data in distributed and huge scale [10,21]. This representation is done using
semantic web based language for RDF. Then there is a term used as data science involving
transformation of data based information into knowledge [29,33]. Such knowledge can be
achieved due the data linkage capable of integrating between heterogeneous data structures.
This gives us some hint towards Big Data representation means in a nutshell. As the way, to
know how data is found in web semantic Ontologies as linked form [6,40].
Whereas, Linked Data is to build web of connected data using unique identifiers. Linked
Data are to connect linked open data through network HTTP protocol for linkage and
collaboration. Advances in Semantic Web have made cosmology another helpful hotspot for
depicting multimedia semantics [16,27]. Metaphysics constructs a formal and express repre-
sentation of semantic orders for the ideas and their connections in data broad occasions, and
permits thinking to infer understood information [9].
These days, the utilization of chart arranged representations and rich semantic vocabularies
are picking up energy. From one viewpoint, charts are adaptable models for coordinating
information with distinctive degrees of structure, as well as empower these heterogeneous
information to be connected in a uniform manner [5,36]. Then again, vocabularies portray
what the information mean. The most commonsense pattern, in this line, proposes the
utilization of the Resource Description Framework: RDF a standard model for information
encoding and semantic innovations for distribution, trade, and utilization of this Big Semantic
Data at all inclusive scale. Primary information stream performed on this web of Data:
production, trade, and utilization [4,23,35].
Whatever remains of the paper is composed as takes after. Area 2 gives the related work of
big data and semantic connection system. The issue definition and scientific displaying is
presented in Section 3 and 4. Area 6 proposes the technique for measuring relationship through
demonstrating. Contextual analysis are introduced in Section 7. Conclusions are made in the
last segment.
2 Literature review
To understand in depth how all these relate to achieving a goal which is common to all these
approaches is through their data model study. And what they are being made to achieve. By
doing so there is much more reliable result of showing which approach can bypass other in
gaining better results. On ontology end Resource Description Framework (RDF) model uses a
directed labeled graphs (DLG) due to the similarity between both, but differs when comes to
actually defining DLG by providing multiple routes to nodes [12,37,38].
Big Data bring about the conjunction of three V’s, depicting complex and huge data
challenges and opportunities in 3 dimensions. Where first V is for Volume which is the
clearest component because of a huge measure of information that is persistently accumulated
and put away in enormous information sets. These information sets are uncovered for diverse
purposes and utilizations. Adaptability is the significant errand connected with Big Data
volume on the off chance that we consider that successful stockpiling apparatuses are the first
essential in this circumstance [1].
Second V is for Velocity, essentially capacity choices influence information recovery, a
definitive objective for the client which trusts it to be executed at conceivable quick speed;
especially on account of continuous courses of action [7,14]. Speed depicts how information
Multimed Tools Appl (2016) 75:12727–12747 12729
stream, at high rates, in an inexorably disseminated situation. These days, speed increments in
a comparable manner than volume. Gushing information handling is the primary test identified
with this measurement on the grounds that specific stockpiling is required for viable volume
administration, additionally for constant reaction. Assortment alludes to different degrees of
structure (or deficiency in that department) inside the source information. This is basically
because of Big Data may originate from numerous sources (e.g., sciences, legislative issues,
economy, informal organizations, or web server logs, among others) and every one depicts its
own particular semantics, consequently information take after a particular auxiliary demon-
strating [24,41].
Third V is for Variety, the fundamental test of Big Data mixture is to attain to a successful
instrument for connecting various classes of information varying in the inward structure.
Connected Data is about utilizing the WWW to unite related information that wasn’tbefore-
hand connected, or utilizing the WWW to bring down the obstructions to connecting infor-
mation as of now connected utilizing different routines. All the more particularly, Wikipedia
characterizes Linked Data as Ba term used to portray a suggested best practice for uncovering,
offering, and uniting bits of information, data, and learning on the Semantic Web utilizing
URIs (Uniform Resource Identifiers) and RDF [3].
To achieve speedy, simple and effective data for capturing, sharing, searching, analysis,
transfer and visualization into a unified, linked, and less complex data representation is needed;
like RDF [1]. Efforts are made by researchers for transforming databank into DTD (Document
Type Definition) or RDFS (Resource Description Framework Schema) either partly or
completely. By observing at the identifiers in XML document, a machine can choose better
fitting tags for the adequate role as class or property [39]. Whereas, XML document can be
transformed to improve capability of being interpreted as RDF. It’s essential to keep XML’s
original structure unaffected during transformation process for improved results and handling
of the data [28].
Transformation requires mapping of two different data models to come to an agreement.
Different transformation techniques are being introduced among them some state of art
techniques include Direct Mapping, R2O, eD2R, Relational.OWL, D2RQ, Triplify, R2RML,
and R3M [17]. Another representation of Big Data representation is through Header Dictio-
nary Triples (HDT) advanced form of RDF [1]. HDT is adopted for publishing and exchanging
RDF based data; translated through heterogeneous data. Transformation process of this study
can be proved in developing RDF data simplified for fastest results production in case of huge
data interpretation.
3Problem
This study is to see problems at the level of data collection and representation when
discussing big-data. Big-date data come majorly in the form of videos, audio, and textual
forms either linked or scattered [22,31,32]. This data is further used for real time
prediction or analysis purposes for catering issues of any kind. These issues majorly
involve geographical, biological, threats [2,30]. Now it becomes necessarily more impor-
tant to evolve data according to the need of big-data, so that less complexity should be
invoked when data is coming from all and distributed source at real time and increasing
simultaneously. We have proposed improvement in the XML, RDF to overcome the issues
concerning updates rapidly and keeping linkage maintained [12]. We also attempted to see
12730 Multimed Tools Appl (2016) 75:12727–12747
it through using some mathematical modeling. Finally, some results are being discussed to
see its outcome.
4 Mathematical representation
Most data in Big-data appears in four formats Video, Audio, Textual, and Images. Following
Table 1represents the terminology used in mathematical modeling
Now let’s first define the generalized form of XML data for a specific piece of information
X¼ln tðÞ
ln 2ðÞ
þl
t¼e−lambertW ln 2ðÞeXln 2ðÞ
ðÞ
þXln 2ðÞ
∵tðÞ∈T
Where tbelongs to the family of set of XML; now let’s take a function X
t
Xt¼kln kðÞ
ln 2ðÞþn−k
t
∵k;n∈N
ð1Þ
Here klog
2
(k) represents all tags with opening and closing tags, whereas n–k are remaining
single tags in XML under some specific information representation.
Equation (1)X
t
Can provide a complete set of tags to represent specific information
If α=klog
2
(k)+(n−k)then
α¼kln kðÞ
ln 2ðÞþn−kð2Þ
By putting the value of Eq. (2)inEq.(1)
Xt¼αtð3Þ
Tab le 1 Definitions of terms used in mathematical modeling
Notation Description
VA set of all video contents
WA set of all words
IA set of all images
AA set of all sound data
TXML set of interlinked data using tags
tTag representing element for XML
kTotal number of tags having opening and closing
nTotal number of tags to represent an information
S
Source for RDF triple
R
Resource for RDF triple
P
Predicate for RDF triple
Multimed Tools Appl (2016) 75:12727–12747 12731
For constant increase of data on time basis Eq. (3) becomes
Xt¼αtþ2αtþ3αtþ4αtþ…þmαt
Xt¼1
2mm−1ðÞλαtð4Þ
If change is constant, then constant factor lambda lies between
0<λ≤1ð5Þ
Where m∈Nand mis maximum change which can occur in an instance
Xt¼
αtλ>0
mm−1ðÞ
2λαtλ≤1
() ð6Þ
Big data when translated into XML form contain values and schema for all types of
contents which requires to be taken care of. Eq. (6) shows the importance of the change factor
in our case lambda which remains non effective if closest to zero. For the set of Video Vtags
can be represented in the form of t
v
for XML function X
tv
and similarly, set of Audio Atags can
be represented in the form of tfor XML function X
ta
, set of Word tags can be represented in
the form of t
w
for XML function X
tw
and finally set of Images tags can be represented in the
form of t
i
for XML function X
ti
Xtv ¼
αtλ>0
mm−1ðÞ
2λαtvλ≤1
()
Xtα¼
αtλ>0
mm−1ðÞ
2λαtaλ≤1
()
Xtw ¼
αtλ>0
mm−1ðÞ
2λαtwλ≤1
()
Xti ¼
αtλ>0
mm−1ðÞ
2λαtiλ≤1
()
Xbigdata ¼Xtv þXta þXtw þXti
ð7Þ
Equation (7) is the simplified form of XML data translated for data represented in any type
of incoming big data contents. Here we can also say that
Xtv
ðÞ
≈Xta þXtw þXti
ðÞ ð8Þ
Equation (8) can only be true if coming from the same source and in that case Eq. (7)will
look like
Xbigdata ¼Xtv þXtv
Xbigdata ¼2Xtv
Let Tis a set of all tag sets {T
1
,T
2
,T
3
,T
4
,…,T
n
} where each element of Tis a set as ⊆T
12732 Multimed Tools Appl (2016) 75:12727–12747
In RDF each T
i
tag set is transformed into multiple linked triples of (S,P,O)setR
i
Ri¼S1;P1;O1
ðÞ
;S2;P2;O2
ðÞ
;S3;P3;O3
ðÞ
;…;Sk;Pk;Ok
ðÞ
fgð9Þ
Where kis the possible triple value of the corresponding / tag set. The RDF complete set
can be seen as
R¼R1;R2;R3;R4;…;Rm
fg
∵m∈N
Here mis the real number representing a maximum set produced for a specific XML data.
According to Eqs. (5)and(9) it can be said
RT¼RþλR0ð10Þ
In Eq. (10)λis same as defined in Eq. (4) same change factor and Ŕis new RDF triples
addedtooldsetRat instance T
Rise in the value of λappear due to conflict, duplications, rapidly increasing linkage and
collisions in data produced in the form of RDF. This can be reduced by decreasing the degree
of complexity of transformation at the level of data representation of big data. There is need of
controlling factor which can link data positively at the time it’s been produced. This control
can be achieved at the point of XML creation. Each datum can further use classification from
metadata to see linked data origin and purpose [13,18,34].
5 Methodology to overcome issues
5.1 Implementation and process
When trying to be more in control, transformation (as shown in Fig. 1)isrequiredthanRDB
Schema to XML Schema and then into XML Schema will work fine as all of these come with
rich interface to transform up to the need. Among semi-structured data which has richer
approach is better due to as being improved to gain better results and future needs [8].
Then in Fig. 2is a complete process of how bidirectional transformation is being performed
through showing input-process-output for our research work. In Fig. 2, heterogeneous data is
sent to the system of XML transformation. This system builds XML document from incoming
data. This document further gets transformed into RDFS through another system of transfor-
mation. This information gets form of linked data which is ready for inference. When data is
passed to inference engine in the shape of linked data new results against rules and predicates
are generated. This newly generated data can be in any form and size making it vulnerable to
Richer (Near to
Structured)
Basic (Semi
Structured))
Transformation
(RDBS-XML
Schema)
XML Schema
RDBS
Transformation
(RDBS-DTD) DTDRDBS
Fig. 1 Two approaches with
structural level difference in
Transformation Process
Multimed Tools Appl (2016) 75:12727–12747 12733
be passed again from this system to get more results or can be used by inference engine
directly, depending on the nature of data.
In Fig. 3, Big Data source is producing data and being collected at the process of XML
transformation. This process further generates XML equivalence form of data along with its
metadata (data about data) which is necessary for data representation. This process is important
as to keep track of data without introducing complexities. To resolve issues of transformation
through simplicity and standardization of XML centered data model either form of XML
document i.e., DTD or XML Schema are used. In further section, transformation algorithms 1
and 2 are covering this process of XML Transformation either using DTD or XML Schema.
6Algorithms
Following algorithms are made to simplify and transform heterogeneous data to reduce the
issues discusses in above sections. Each of discussed algorithms of this manuscript can further
be divided into two steps i) preprocessing part and ii) transformation part. Preprocessing
portion is to perform activities to make coming data in a form being ready for transformation
process for execution. Transformation the main portion is where mapping and transformation
between different data models will happen.
Transformaon
(data-XML
Schema)
XML Schema
RDFS
Linked
Data
Heterogeneous
data
Transformation
(XML Schema-
RDFS)
Inference Engine
Data Linkage in
Knowledgebase
Big Data
Fig. 2 Complete Process of Big
data from heterogeneous data
getting Transformed and Inferred
Big data
Source
Data
Data
XML
Metadata
XML
Link
XML
Transformaon
Data
Representaon
Fig. 3 Data Representation of Big
Data at the level of XML based
Transformation
12734 Multimed Tools Appl (2016) 75:12727–12747
Algorithm 1: Big data to DTD
Transforming Big Data into DTD has a trick of its own as been shown in the Table 1of this
paper in the section of mapping that how these technologies can relate to each other.
Furthermore, following algorithm provides detail implementation for a transformation to
happen among given RDBS in DTD.
Multimed Tools Appl (2016) 75:12727–12747 12735
The algorithm 1 shows how making first element as a data file representative and add
parameters to this element as all the files contained by the big data source. Then by the use of
ATTLIST tag against each element defined to store all fields of the table as attributes. In the
attribute are with id, then can represent primary key and similarly IDREF for reference
representation. And remaining attributes are having a data type of PCDATA, which is
equivalent to the string data structure.
Algorithm-1takesthedatafileofBigDataasaninputandtransformsitintoa
DTD of a XML document as output. In this algorithm, there exist two nested loops.
The outer loop is responsible of dealing with the main representative fields of the File
and transforming them into corresponding element of the DTD document. The inner
loop transforms every nested fields of the representative into the attributes of the
corresponding elements. Then on the bases of different test cases discussed below
covers that whom field will be identifier, reference to other resource or attribute of a
resource itself using ID, IDREF and PCDATA of DTD schema syntax.
There exist multiple test cases in algorithm 1 as follows:
Case 1 Primary and reference indexed data
These indexes are for data concerning identification and allocation of resource dependen-
cies. When file is transformed into DTD then use ID is for within file identifiers and IDREF is
for resource dependencies identifiers for the data file.
Case 2 Simple Field data
Other than indexes all data will be transformed into simple attributes of type PCDATA.
Bankorder
(ordered text,
title text,
branch text,
city text,
amount float
);
Fig. 4 Schema of RDB taken as
an example to show the results of
the transformation process
12736 Multimed Tools Appl (2016) 75:12727–12747
Algorithm 2: Big data to XML Schema
Multimed Tools Appl (2016) 75:12727–12747 12737
Now this is another algorithm for transformation of Big data as if the XML document is
being generated using XML Schema. In this case we only considered complex elements to
represent a file along with its primary and foreign indexing relationships as attributes.
Algorithm-2 takes the data file of Big Data as an input and transforms it into a XML
Schema of a XML document as output. In this algorithm, there exist two nested loops. The
outer loop is responsible of dealing with the main representative fields of the File and
transforming them into corresponding element of the XML Schema document. The inner loop
transforms every nested fields of the representative into the attributes of the corresponding
XML elements. Then on the bases of different test cases discussed below covers that whom
field will be identifier, reference to other resource or attribute of a resource itself using
attributes having ‘use as required’and maintaining an array to produce all referenced recourse
into XML schema syntax.
There exist multiple test cases in algorithm 2
Case 1 Primary and reference indexed data
These indexes are for data concerning identification and allocation of resource dependen-
cies. When file is transformed into XML tags then making attribute as required for within file
identifiers and attributes containing extra info is for resource dependencies identifiers for the
data file.
Case 1 Simple Field data
Other than indexes all data will be transformed into complex element containing elements
of type similar to resource file.
<xs:element name=”bankorder” >
<xs:complextype>
<xs:sequence>
<xs:element name=”tle” type=”xs:string” />
< xs:element name=”branch” type=”xs:string” />
< xs:element name=”city” type=”xs:string” />
< xs:element name=”amount” type=”xs:decimal” />
</xs:sequence>
<xs:aribute name=”orderid” type=”xs:string” use=”required” />
</xs:complextype>
</xs:element>
Fig. 5 bankorder schema transformed using a transformation algorithm between RDBS to XML Schema
<!ELEMENT Bank (BankOrder+) >
<!ELEMENT BankOrder (#PCDATA) >
<!ATTLIST BankOrder OrderId ID #REQUIRED
PId IDREF #REQUIRED
Title CDATA #FIXED
Branch CDATA #FIXED
City CDATA #FIXED
Amount CDATA #FIXED>
Fig. 6 bankorder schema transformed using a transformation algorithm between RDBS to DTD
12738 Multimed Tools Appl (2016) 75:12727–12747
Algorithm 3: DTD to RDFS
In above algorithm root element is made as a root class and then through a looping
mechanism a Class for each table is found in root element attribute list. For each element
found in the list make it, its property and assign a data type against each type found to the
property which in this case will only be strings. Through this way we can generate triples as a
Subject Predicate Object
BankDB rdfs:Class rdf:resource
BankOrder rdfs:Class BankDB
BankOrder rdf:Property ID
ID rdfs:DataType String
ID rdfs:range BankOrder
Fig. 7 AsampleBbankorder^
triples list generated using a
transformation algorithm between
DTD to RDFS / XML Schema to
RDFS
Multimed Tools Appl (2016) 75:12727–12747 12739
representative to DTD and indirectly our source files. And furthermore, this list can be
translated into a directed graph of semantic web resources.
There exist multiple test cases in algorithm 3 as follows:
Case 1 Range and domain of triples
Range in hierarchal triples represent an identifier whereas domain is for reference based
identifier corresponding to a domain/area to which this information belongs.
Case 2 Simple triples
Other information in translated into triples related to corresponding XML information of
data.
Algorithm 4: XML Schema to RDFS
This is the last algorithm where document name is used to represent a root class and then
through a looping mechanism a Class for each table is found to be in complex element. For
each element found makes it its property and assigns data type against each type of that
12740 Multimed Tools Appl (2016) 75:12727–12747
property. And now through this way we can generate triples as a representative to XML
Schema and indirectly our source files. And furthermore, this list can also be translated into a
directed graph of semantic web resources as done in the case of the DTD. Test cases for this
algorithm are same as for algorithm 3.
Through these algorithms it’s been shown that our solution is based on two possible cases.
One is when we start from raw data state into common standard form that is XML. Then after
performing transformation we will end at RDFS state useful for analysis and inference on
given data. Whereas intermediate state either can be of DTD or XML Schema type for XML
document. Above in this section of manuscript, all related four algorithms are presented
covering complete transformation mechanism discussed in Fig. 2.
7Casestudy
To show the working of the elaborated process for transformation in big data, first let’s look at
the data format as RDBS to show algorithm working on small samples, first let’s look at the
RDBS of our example taken of relation named Bbankorder^given in Fig. 4.
Now by using algorithms 1 and 2 given in the paper’s section of implementation RDBS shown
in Fig. 4is transformed into Figs. 5and 6as a result. Through this we have gained an intermediate
format understood by web based technologies in the shape of DTD and XML Schema.
So, the remaining task is to transform results into RDFS format which in our case is done
using algorithms 3 and 4 from implementation section. And the resultant RDFS not the
complete list of triples gained are shown in Fig. 7. Then further in Fig. 8shows a directed
graph of showing resources gained through complete triple list.
Whereas Fig. 9shows if algorithm 3 is further improved to handle constraints, then that
could be done using simple type tags.
<xs:simpleType name=”orderby” >
<xs:restricon base=”xs:string”>
<xs:paern value=”[a-z]{10}” />
</xs:restricon>
</xs:simpleType>
Fig. 9 An example of
implementing constraints using
XML Schema
BankDB
BankOrder
ID BankDB BankDB BankDB
rdfs:range
rdf:Property
rdf:Property rdf:Property
rdf:Property
rdfs:Class
Fig. 8 Bbankorder^triples list translated into a directed graph
Multimed Tools Appl (2016) 75:12727–12747 12741
8 Conclusion and future concerns
Late research demonstrates that fast data assets in the wild are developing at an amazing rate. The
fast build number of data assets has conveyed a critical need to create insightful techniques to
compose and procedure them. In this paper, the Semantic Representation model is utilized for
arranging approaching data assets. Numerical demonstrating is intended to set up related relations
among different assets (e.g., Web pages or reports in advanced library) going for expanding the
approximately associated system of no semantics (e.g., the Web) to an affiliation rich system.
Bringing real time data assets to a system requires fast response time by reducing any
features which are only delaying the processing. In the system of Big Data requires to be
translated into a form which can be analyzed with lowest complexity induced in data model for
performance improvement. This data model in our case is RDF in its basic form. Which we
have shown throughout process and mathematical modeling, such that a factor of simplicity
can improve response concerning analysis process with the reduction of delay factor. For
implementation four algorithms are given which are used for basic level transformation of data
assets into RDF form. Yet improvement in these algorithms can be useful for covering large
scale of data model concerning audio, video, image and textual formats.
Compiling the main idea collectively and making it look like comparative how these newly
developed technologies like RDF for web semantic can be made compatible with old and
traditional technologies of web and databases as XML and RDBMS respectively. It is
important for new researchers to gain proper understanding of their working and capability
to make suitable improvements to upcoming technologies of the web. Currently left issues are
like composite key handling and improving DTD as much as that it can be used to resolve
RDB constraints and data types. So, issues like these can be considered as future concerns of
this research work.
References
1. R Akerkar (2013) Big data computing: CRC Press
2. E Antezana, Kuiper M, Mironov V (2009) Biological knowledge management: the emerging role of the
semantic web technologies. Brief Bioinform 10:392–407
3. S Auer, A-C N Ngomo, P. Frischmuth, J Klimek (2013) Linked Data in Enterprise Integration, Big Data
Computing, p. 169
4. BFdF Souza, ACO Salgado, MdCMC Batista (2013) Information Quality Criteria Analysis in Query
Reformulation in Dynamic Distributed Environments
5. Bizer C, Boncz P, Brodie ML, Erling O (2012) The meaningful use of big data: four perspectives–four
challenges. ACM SIGMOD Record 40:56–60
6. Broekstra J, Klein M, Decker S, Fensel D, Van F, Horrocks I (2001) Enabling Knowledge Representation on
the Web by Extending RDF Schema, WWW01, May 1–5, 2001 Hong Kong
7. S Christodoulou, N Karacapilidis, M Tzagarakis, V Dimitrova, G de la Calle (2014) Data Intensiveness and
Cognitive Complexity in Contemporary Collaboration and Decision Making Settings, Mastering Data-
Intensive Collaboration Decision Making, ed: Springer, pp. 17–48
8. A Cuzzocrea, C Diamantini, L Genga, D Potena, E Storti (2014) A composite methodology for supporting
collaboration pattern discovery via semantic enrichment and multidimensional analysis, in Soft Computing
Pattern Recognition (SoCPaR), 2014 6th Int Conf, pp. 459–464
9. de Diego R, Martínez J-F, Rodríguez-Molina J, Cuerva A (2014) A semantic middleware architecture
focused on data and heterogeneity management within the smart grid. Energies 7:5953–5994
12742 Multimed Tools Appl (2016) 75:12727–12747
10. M Dörk (2012) Visualization for Search: Exploring Complex and Dynamic Information Spaces, Citeseer
11. A Eberhart (2003) Ontology-based Infrastructure for Intelligent Applications, Universitätsbibliothek
12. Frasincar F, Houben G, Vdovjak R, Barna P (2002) RAL: an Algebra for Querying RDF, Proc 3rd Int Conf
Web Information Syst Eng, IEEE
13. Frey JG, Bird CL (2013) Cheminformatics and the semantic web: adding value with linked data and
enhanced provenance. Wiley Interdisciplinary Rev: Computational Mol Sci 3:465–481
14. D Gentner, F van Harmelen, P Hitzler, K Janowicz, K-U Kuhnberger (2012) Cognitive approaches for the
semantic web
15. H-M Haav, P Küngas (2013) Semantic Data Interoperability: The Key Problem of Big Data, Big Data
Computing, p. 245
16. Herrmann-Krotz G, Kohlmetz D, Müller-Rowold B (2011) Publikationen. New Rev Hypermedia
Multimedia 20:53–77
17. Hert M, Reif G, Gall HC (2011) A comparison of RDB-to-RDF mapping languages, In Proc 7th Int Conf
Semantic Syst, pp. 25–32, ACM
18. Hitzler P, Janowicz K (2013) Linked data, big data, and the 4th paradigm. Semantic Web 4:233–235
19. Hsu PL, Hsieh HS, Liang JH, Chen YS (2015) Mining various semantic relationships from unstructured
user-generated web data. Web Semant Sci Serv Agents World Wide Web 31:27–38
20. Hu C, Xu Z, Liu Y, Mei L, Chen L, Luo X (2014) Semantic link network-based model for organizing
multimedia big data. Emerging Topics Comput, IEEE Trans 2(3):376–387
21. HM Jamil (2014) Mapping abstract queries to big data web resources for on-the-fly data integration and
information retrieval, in Data Engineering Workshops (ICDEW), IEEE 30th Int Conf, pp. 62–67
22. Khalili A, Auer S (2013) User interfaces for semantic authoring of textual content: a systematic literature
review. Web Semant Sci Serv Agents World Wide Web 22:1–18
23. H Kim, K Kim (2014) Semantic levels of information hierarchy for urban street navigation, Int Conf Big
Data Smart Computing (BIGCOMP), pp. 235–240
24. Kim Y, Kim B, Lim H (2006) The Index Organizations for RDF and RDF Schema, ICACT
25. Manola F, Miller E, McBride B (2004) RDF primer. W3C Recommendation 10:1–107
26. Manuja M, Garg D (2011) Semantic web mining of un-structured data: challenges and opportunities. Int J
Eng (IJE) 5(3):268
27. Margara A, Urbani J, van Harmelen F, Bal H (2014) Streaming the web: reasoning over dynamic data. Web
Semant Sci Serv Agents World Wide Web 25:24–44
28. Martens W, Neven F, Schwentick T, Bex GJ (2006) Expressiveness and complexity of XML schema. ACM
Trans Database Syst (TODS) 31(3):770–813
29. SRH Noori (2011) A Large Scale Distributed Knowledge Organization System, University of Trento
30. SF Pileggi, R Amor (2015) Semantic Geographic Space: From Big Data to Ecosystems of Data, in Big Data
in Complex Systems, ed: Springer, pp. 351–374
31. D Riemer, L Stojanovic, N Stojanovic (2014) SEPP: Semantics-Based Management of Fast Data Streams, in
Service-Oriented Computing and Applications (SOCA), 2014 I.E. 7th International Conf, pp. 113–118
32. OR Rocha (2014) Context-Aware Service Creation on the Semantic Web, Politecnico di Torino
33. MA Sakka, B Defude (2012) Towards a Scalable Semantic Provenance Management System, in
Transactions on Large-Scale Data-and Knowledge-Centered Systems VII, ed: Springer, pp. 96–127
34. P Serrano-Alvarado, E Desmontils (2013) Personal linked data: a solution to manage user’s privacy on the
web, in Atelier sur la Protection de la Vie Privée (APVP)
35. S Sicari, C Cappiello, F De Pellegrini, D Miorandi, A Coen-Porisini (2014) A security-and quality-aware
system architecture for Internet of Things, Information Systems Frontiers, pp. 1–13
36. R Soussi (2012) Querying and extracting heterogeneous graphs from structured data and unstrutured content,
Ecole Centrale Paris
37. M Spaniol (2014) A Framework for Temporal Web Analytics, Université de Caen
38. M Strohbach, H Ziekow, V Gazis, N Akiva (2015) Towards a Big Data Analytics Framework for IoT and
Smart City Applications, in Modeling and Processing for Next-Generation Big-Data Technologies, ed:
Springer, pp. 257–282
39. PTT Thuy, Y-K Lee, S Lee, B-S Jeong (2007) Transforming valid XML documents into RDF via RDF
schema. pp. 35–40
40. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. Knowledge Data Eng, IEEE Trans 26:
97–107
41. J Zhao, O Corcho, P Missier, K Belhajjame, D Newmann, D De Roure et al. (2011) eScience, Handbook of
Semantic Web Technologies, pp. 701–736
Multimed Tools Appl (2016) 75:12727–12747 12743
Kaleem Razzaq Malik is the student of PhD in University of Engineering and Technology, Lahore, Pakistan and
also working as instructor of computer science in Virtual University of Pakistan. He is working as Lecturer in
Department of Software Engineering, Government College University Faisalabad, Pakistan from June 2013 -
Present (1 year 9 months) performing duties like Teaching. Doing Doctor of Philosophy (Ph.D.) in the field of
Computer Science from UET, Lahore since year 2011. His interests include, computer programming, Semantic
Web a nd Da ta base s.
Tauqir Ahmad working as Associate Professor in Department of Computer Science & Engineering, University
of Engineering and Technology (UET), Lahore from January 1999 –Present (16 years 5 months) performing
duties like Teaching and Research. Completed Doctor of Philosophy (Ph.D.) in the field of Computer Science
from UET, Lahore on year 2012.
12744 Multimed Tools Appl (2016) 75:12727–12747
Muhammad Farhan is Assistant Professor at COMSATS Institute of Information Technology, Sahiwal
Campus, Pakistan and also PhD Scholar at Department of Computer Sciences and Engineering in University
of Engineering and Technology (UET), Pakistan and working as instructor of computer sciences in Virtual
University of Pakistan (VU). He obtained MSCS from University of Management and Technology (UMT),
Pakistan. He has received BSCS from Virtual University of Pakistan (VU). Currently he has 11+ years of
teaching experience. His interests include, e-Learning, Computer Programming, Semantic Web and Databases.
Muhammad Aslam he is associate professor at University of Engineering and Technology, Lahore, Pakistan.
Six years’experience of Software Architecture Design, Team Leading, Team Building and Project, Five years’
experience of research and Development, Nine years’experience of Research and Development as well as
teaching at Postgraduate level(Supervising Ph. D. and M. Sc. Thesis). Ph.D. Computer Sciences: (CGPA 8.9/10)
2001–2005) CINVESTAV-IPN, Mexico (Cultural Exchange Scholarships between Pakistan and Mexico).
RESEARCH/TEACHING INTERESTS: Knowledge Based Systems, Expert Systems, Intelligent Agents, Hu-
man Computer Interaction, and Computer Supported Cooperative Work, Cooperative Writing and Authoring,
Communication, Coordination, Awareness, Cooperative Learning, Modern Operating Systems. DISTINC-
TIONS: Award of merit scholarship from Board of Intermediate and Secondary Education, Sargodha Division,
Pakistan (1984–1986), Award of merit scholarship during B. Sc. Agricultural Engineering from Faculty of
Agricultural Engineering, University of Agricultural, Faisalabad, Pakistan (1987–1991), Award of Silver Medal
on account of winning second position in the faculty of Agricultural Engineering, University of Agricultural,
Faisalabad Pakistan (1991), Award of Cultural Exchange Scholarship between Pakistan and Mexico for higher
studies (2000–2004).
Multimed Tools Appl (2016) 75:12727–12747 12745
Sohail Jabbar completed his MS (Telecom & Networking) with the honor of magna cum laude from Bahria
University, Islamabad in 2009 a n d BS(Computer Science) from Allama Iqbal Open University in 2006. He has
almost 7 years of teaching and research experience. He got many distinctions in his 25 research publications in
various renewed journals and conferences. He has also been the reviewer of number of Impact Factor Journals.
He is an active member of Bahria University Wireless Research Center (BUWRC). Currently, he is serving as
Lecturer at department of Computer Science, COMSATS Institute of Information Technology, Sahiwal, Pakistan.
His research interests include Wireless Sensor Networks, Machine to Machine communication and Internet of
Things.
Dr. Shahzad Khalid an associate professor in the department of C&SE, has been declared as ‘Bahria
University’sPride’, says a report published in ‘Bugle’, the newsletter of Bahria University, Islamabad. Dr.
Shehzad Khalid has recently published a research paper titled ‘Frameworks for multivariate m-Mediods based
modelling and classification in Euclidean and general feature spaces,’in an ISI indexed jo urnal of Patter n
Recognition with an impact factor of approximately 3. The professor has also published a conference paper titled
‘Image matching using feature set transformers.’
12746 Multimed Tools Appl (2016) 75:12727–12747
Mucheol Kim is an Assistant Professor in a group of Industry-University Cooperation at Sungkyul University in
Korea. His research interests include Information Retrieval, Web Technology, Social Networks and Wireless
Sensor Networks. He was a Senior Researcher in Korea Institute of Science and Technology Information
(KISTI), Daejeon, Korea. He received the BS, MS, PhD degrees from the school of Computer Science and
Engineering at Chung-Ang University, Seoul, Korea in 2005, 2007 and 2012, respectively.
Multimed Tools Appl (2016) 75:12727–12747 12747