Content uploaded by Ammar Almomani
Author content
All content in this area was uploaded by Ammar Almomani on Sep 16, 2015
Content may be subject to copyright.
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
A Novel Model for Global Schema Integration and Optimization
Muhammad M. Kwafha1, Mohammed Azmi Al-Betar1, Ammar Almomani1, Mahmud
Alkoffash1, Ayman Jaradat2
1Dept. of Information Technology, Al-huson University College,
Al-Balqa Applied University, P.O. box 50, Irbid, Jordan.
2College of Science and Human Studies at Hotat Sudair
Majmaah University, Hotat Sudair, Saudia Arabia
Abstract:
This study is based on the understanding that the main objectives of database are to guarantee consistency
and this is the reason why concurrency, security, and reliability and integrity control tools are
used. Integrity controls are used to present any fork of semantic errors caused by users due to
recklessness or lack of adequate knowledge. Additionally, many organizations use heterogeneous data
sources that must be semantically integrated.
This paper explains how to deal with integrity constraints over global schema and source schema. It is
important to understand that queries can be reformulated and sent to the data sources to derive the most
relevant answers which are the main objectives of the paper. The paper seeks to determine how integrity
constraints can be used to derive more information from the incomplete sources and databases with
incomplete information. Additionally, this paper seeks to determine how inconsistency caused by
contradicting data at the sources affects the whole system and how it can be solved. For that purpose, a
novel model for solving problems caused by data integration constraints was developed. The researcher
argues that data mapping can solve both source and global schema problems by developing a retrieval
global database that satisfies all the foreign key constraints in the global schema
The paper concludes that data integration system query I = G; S; Mi can easily provide specific tuples to
extract data from the database. The paper concludes that data integration under constraints should be
directed at search algorithms that can produce more relevant and optimized query.
Keywords: integrity constraints, global schema, Global-As-View, mapping.
1. Introduction
One of the key objectives of data integration
under integrity constraints is efficiency.
However, many organizations have not been
able to realize this because they lack the
knowledge of how to optimize the global
schema. Data integration systems have also been
known to provide the users with access to
uniforms and autonomous data sources. Their
success, however, depends on the approach
pursued by an organization to design the data
integrations system. For example, one may
decide to use the global centric approach or the
local centric approach. The local centric
approach is not preferred because it is not
efficient and does not provide reliable search
queries as it uses incomplete information. It is
important to note that the main constraints in
data integration include foreign key constraints
and key constraints. The other constraint is the
integrity constrains [10].
1.1. Why data integration
As the world economy has increasingly
become more data driven, most people tend to
rely on data to make decisions which
necessitates that these data be adequate to
optimize decision making and realize the best
decision. However, over time, it has become
evident that the data information system relies
on data too and these data must be massive
enough, while software companies are also
increasingly developing more complex data
management tools. This paper posits that data
integrations are becoming a challenge because
of the demand for data is increasing [11].
Data sources and types are increasing,
over time which means that organizations have
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
to contend with variety of sources, types and
volume. The main idea behind data integration is
to create a robust system to facilitate data access
at the highest level of integrity for both the
organization and individual. Most organizations
have adopted the data federation tools but these
tools have inherent limitations. Additionally,
most organizations still consider them big data
users but have not invested in adequate data
integrations tools that improve integrity. While
the data federation tools mesh up different
databases, the drift computing tools help to
manage the localized data [9].
1.2. Integrity constraints
There are various commercial data
integration tools and the most common include
Oracle 10 g Information Integration, Microsoft
SQL Server 2005 and IBM DB2 Information
Integrator. These can be combined to improve
efficiency, and data integrity. There are also
many forms of integrity constraints in any data
schema, but these integrity constraints limit the
ability of an organization to realize the
maximum from of its data unless the number of
modules is combined together to handle each
form of integrity contains [12].
While data integration is geared at combining
different data types from different sources, it is
important to be able to combine real data stored
at the sources and the global data that can be
mapped. However, there is another major
issue/constraint with such systems. For example,
data integration requires that the mapping
between the global cinema and the data sources
be specified. It also requires that the questions
expressed on the global schema must be
processed.
2. The global scheme mappings can be done
over the source data
The most common model is the GAV
(Global-As-View) model which is also the most
preferred mapping system. Designing a data
integration system can be quite challenging, as
such data requires choosing the method suitable
for computing the desired answers to queries. [1]
Argues that all queries must be conducted in
consideration of the global schema and this must
be in a way that the queries can be reformulated
and sent to the data sources to derive the most
relevant answers.
2.1. Assumption in data integration
mapping
The way in which mapping assertions
are interpreted assumes particular
importance in defining the semantics of
the data integration system. According
to the literature, three different
assumptions can be made on the
mapping assertions: under the sound
assumption, data provided by the
sources are interpreted as a subset of
the global data. In contrast,, the
mapping is assumed complete, when
sources data provide a superset of the
data of the global schema. The mapping
is assumed exact, when it is both sound
and completes [7].
Another important aspect in a data
integration system is whether the
system is able to materialize data
retrieved from the sources (through the
mappings). In the materialized
approach, the system computes the
extension of the structures in the global
schema by replicating the data at the
sources. Obviously, maintenance of
replicated data against updates at the
sources is a central aspect in this
context. A possible way to deal with
this problem is to re-compute
materialized data when the sources
change, but it could be extremely
expensive and impractical for dynamic
scenarios.
Integrity constraints can be
useful when extracting more
information especially when there are
incomplete sources or database with
incomplete information. However,
integrity constraints can also be useful
because the problems of inconsistency
in the entire system as the data would
be contradictory at the sources. To
manage these issues, it is important to
develop a data integration system [6].
2.2. Data integration model
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
In its beginnings it was note that data
integration faced integrity constraints in
the systems of global schema.
However, data integration is seen in
terms of three main components,
namely the global schema, source
schema and mapping. This can be
expressed in the forms of
DI=GS+SS+M
Global schema is similar to
relational schema and may include the
integrity constraints. For example, key
constraints, general inclusion
dependencies as well as foreign key
constraints are the main constraints
associated with the global schema. The
global schema (g) refers to the
reconciled, integrated and a virtual
view of all the sources irrespective of
the users who query the integration
system. The global schema is
composed of the relational schema and
the constraints. In this case, the
constraints will include both the key
constraints, and the foreign constraints.
The source schema in this case
represents all the sources that the
integration system can access because
they are wrapped in such a way that
they cannot be viewed as relations.
Finally, the M refers to the main g
between the global schema and the
source schema and is known to be the
only connection linking between all the
elements of the global schema and the
source schema.
In this study, the mapping under
consideration is the GAV mapping
which is associated to ach of the
relations in any querying this case. The
data are only accessed during the query
processing being the most common
approach because of the on demand
request for information.
2.3. Managing integrity constraints
To handle integrity constraints, it is
important to realize the fact that most
transaction companies are faced with
federation constraints. Currently, most
users’ domain of interest is to express
integrity constraints over the global
schema. It is also important to note that
data from various sources cannot
satisfy the integrity constraints that are
not under the control of the data
integrations system. Besides, there are
chances that the local data may be
consistent, but becomes inconsistent
when they are integrated. In such case,
if the sources are changed, the data get
lost [5].
One of the main advantages of
integrating data under integrity
constraints is that it helps in extracting
additional information from the
incomplete sources and this is the case
with this database with incomplete
information. Additionally, when
integrating data under integrity
constraints, one is faced with
inconsistency in the whole system.
Hence, one has to tweak the system in
order to accommodate the integrity
constraints. The contradicting data
sources therefore make it important to
ensure that.
3. Solving the problems of data integration
yonder integrity constraints
3.1. Accommodating the integrity
constraints
To accommodate the integrity
constraints especially when there are
inconsistencies, it is important to use a
system that can accommodate both the
key and foreign constraints in the
global schema. The GAV approach can
be effective in defining all the
mappings between the global schema
and the source schema because all the
heterogeneous data sources especially
the relational databases and the legacy
databases. In the past few years, web
bases have also become the other
source sofa inconsistencies. This might
explain why companies and individual
query users prefer integrating data. On
the other hand, new systems have been
developed to deal with non-relational
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
data sources by wrapping them before
they are presented to the query
processing subsystems. Wrapping the
non-relational data sources is easy
incorporation of data queries and data
cleaning thereby solving conflicts [3].
To completely realize the GAV
mapping between the relational and the
global schema, it is important to first
determine the arbitrary queries that will
be used as it will allow for the
incorporation of the queries data
cleaning. It is also important to note
that by resolving the conflicts through
extraction process from the data
sources, the foreign and key constraints
will not cause any further violation
even through the generate tuple. The
first thing to do is to delegate the
responsibility of dealing with the
foreign and key constraints to allow
the system to provide answers to any
query depending on the joins of
attribute values which are not stored in
the sources. These queries availability
is guaranteed by the foreign key
constraints.
Additionally, the system will deal with
the foreign key constraints by
automation. For example, it is quite
clear that the query-processing
algorithm processes similar set of
answers to queries. These answers can
be considered complete with respect to
the to the data integrations system. In
most cases, it is important to realize the
conceptually different phases
considering the fact that the system
works mostly in optimized sub phases
leading to complete computations [2].
3.2. How the system works
The sub phases are three and the
iterative is as shown below:
First the query is expanded in order to
accommodate the foreign, and key
constrains common with the global
schema.
Secondly, the expanded are unrolled
based on their definition in the
mapping to obtain a query that can be
expressed in the global and the
relational sources.
Thirdly, the expanded and unfolded
query is executed over the global and
the rational sources thereby
producing the right answers to the
initial query.
3.3. Alternative approach
If the query answering approach
is as started below: I=(G,S, Mg,S) for
the data integrations, then for each of
the relations of the global schema, the
query over the sources schema the
mapping s associates to r.. Therefore,
the p(r) must implement suitable
duplicate record elimination strategy in
which not every database can provide
any pair of tuple. This eliminates the
possibility of duplicate returns.
Duplicate record elimination and data
cleaning are some of the fundamental
objectives of data integration system.
Additionally, if the quivery (q) posed
by the I to the database D, then
to compute for answers qI,D to the
original equation qw.r.t I, and D, the
computational algorithm will be as
follows:
If for each r (global schema), the
relation is rD
If for each r (global schema),
the relation rD is computed by simply
evaluating the query p(r) over the
sources database D, then all the
relations obtained can form the
retrieved global database ret (I, D) of
the p(r) does not violate any of the key
constraints then it is healthy to assume
that the retrieval global database will
satisfy all the constraints in the global
schema. However, if the retrieved
global database can satisfy all the
foreign key constraints in the global
schema, hence the integrity constraints
are overwritten by simply evaluating
the q over ret (I, D) for the query to
return relevant answer. If the retrieved
global database can be used to build a
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
database? and satisfy the key
constraints by adding relevant tulles to
the relational of the global schema
thereby satisfying the foreign key
constraints [8].
Queries to a data integration
system I = G; S; Mi are posed in terms
of the relations in G, and are intended
to provide the specification of which
data to extract from the virtual database
represented by I. The task of specifying
which tuples are in the answer to a
query is complicated by the existence
of several legal global databases, and
this requires introducing the notion of
certain answers. A tuple t is a certain
answer to a query q wrt a source
database D, if t 2 qB for all global
databases B that are legal for I wrt D.
6. Conclusion
The database is composed of three main
components, the extensional database (herein
known as the facts), the intentional database
(also called the rules) and the constraint theory
that are referred to as the integrity constraints. In
the modern communication environment, data
integrations under integrity constraints is very
important because it affects the quality of search
ret turn; this means that the semantics of data
integrations system must be defined to deal with
incomplete information and any inconsistencies
in the system. Though data integration system is
a complex topic, it is one of the key topics that
would dominate the search algorithms as it helps
in refining the search returns. Therefore, as
economies become more data driven, the need
for an optimal system that can handle data from
global source and relational sources becomes
even more important. This paper argues that
further interest in data integrity under constraints
should be directed to search algorithms that can
produce more relevant and optimized query
retunes.
References
1. Ten Cate, Balder, Phokion G. Kolaitis,
and Wang-Chiew Tan. "Schema
mappings and data examples."
Proceedings of the 16th International
Conference on Extending Database
Technology. ACM, 2013.
2. Liu, X., et al. "Exploration and
Comparison of Approaches for
Integrating Heterogeneous
Information Sources to Support
Performance Analysis of HVAC
Systems." Bridges 10 (2014):
9780784412343-0004.
3. M.F. Fernandez, D. Florescu, J. Kang,
A.Y. Levy, D. Suciu, (1998). Catching
the boat with Strudel: experiences with
aweb-site management system, in:
Proceedings of the ACM SIGMOD
International Conference on
Management ofData, Seattle, WA, USA,
pp. 414–425.
4. M. Arenas, L.E. Bertossi, &, J.
Chomicki, (1999), Consistent query
answers in inconsistent databases, in:
Proceedings of the18th ACM SIGACT
SIGMOD SIGART Symposium on
Principles of Database Systems
(PODS’99), Philadelphia,PA, USA, pp.
68–79.
5. R. Fagin, P.G. Kolaitis, R.J. Miller, &,
L. Popa, (2003). Data exchange:
semantics and query answering, in:
Proceedingsof the 9th International
Conference on Database Theory
(ICDT). Siena, Italy, pp. 207–224.
6. L. Popa, Y. Velegrakis, R.J. Miller,
M.A. Hernandez, R. Fagin, (2002).
Translating Web data, in:
Proceedings of the
28thInternationalConference on Very
Large Data Bases (VLDB Hong Kong,
China. Pp. 598–609.
7. Martinenghi, Davide. "On the difference
between checking integrity constraints
before or after updates." arXiv preprint
arXiv:1312.2353 (2013).
8. Özsu, M. Tamer, and Patrick Valduriez.
Principles of distributed database
systems. Springer, 2011.
9. Varajao, Joao, et al. "Enterprise
information systems." Learning
Organization, The 20.6 (2013).
MAGNT Research Report (ISSN. 1444-8939) Vol.2 (4). PP: 470-475
(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.58)
10. Beskales, George, Ihab F. Ilyas, and
Lukasz Golab. "Sampling the repairs of
functional dependency violations under
hard constraints." Proceedings of the
VLDB Endowment 3.1-2 (2010): 197-
207.
11. LaValle, Steve, et al. "Big data,
analytics and the path from insights to
value." MIT Sloan Management Review
21 (2013).
12. Sim, Ida, et al. "Ontology-Based
Federated Data Access to Human
Studies Information." AMIA Annual
Symposium Proceedings. Vol. 2012.
American Medical Informatics
Association, 2012.