Conference Paper

Information extraction and integration from heterogeneous, distributed, autonomous information sources - a federated ontology-driven query-centric approach

Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
DOI: 10.1109/IRI.2003.1251412 Conference: Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on
Source: IEEE Xplore

ABSTRACT This paper motivates and describes the data integration component of INDUS (intelligent data understanding system) environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Data heterogeneity in the public sector is a serious problem and remains to be a key issue as different naming conventions are used to represent similar data labels. The e-government effort in many countries has provided a platform for government entities and their business partners to exchange data through Information Communication Technologies (ICT) and standards such as RosettaNet (B2B data exchange standard), EDIFACT (Electronic Data Interchange for Administration, Commerce, and Transport), XML (Extensible Mark- up Language) and EDI (Electronic Data Interchange). However, e-government efforts have not really resolved data heterogeneity problems significantly due to limitation of these standards. One such limitation is the inability of data inheritance. In order to solve this problem with emphasis on Service Oriented Architectures (SOA) and Web Services, a semantically enriched web service for the public sector is needed. Thus we propose an ontology-based solution which allows data inheritance and polymorphism. This goal of this paper is to show how heterogeneous e-government documents can be semantically matched. We propose a shared hierarchical knowledge repository approach and a detailed process methodology for semantic mediation. A two-part semantic mediation approach using SRS (Semantic Relatedness Scores) and SWRL (Semantic Web Rule Language) is highlighted. Both measures are complimentary and provide the semantics necessary for resolving schema heterogeneity. Our approach incorporates a rule-based engine that reads and executes SWRL rules (i.e. RacerPro). We also adopted several tools for proof-of-concept such as Protégé (i.e. ontology editor) and JESS (Java Expert Shell System).
    Journal of Theoretical and Applied Electronic Commerce Research 01/2008; 3:52-63.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modular approaches to design and use of ontologies are essential to the success of the Semantic web enterprise. We describe P-OWL (Package-based OWL) which extends OWL, a widely used ontology language that supports modular design, adaptation, use, and reuse of ontologies. P-OWL localizes the semantics of entities and relationships in OWL to modules called packages. P-OWL and the associated tools will greatly facilitate collaborative ontology construction, use, and reuse.
  • Source