Conference Paper

Information extraction and integration from heterogeneous, distributed, autonomous information sources - A federated ontology-driven query-centric approach

Department of Computer Science, Iowa State University, Ames, Iowa, United States
DOI: 10.1109/IRI.2003.1251412 Conference: Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on
Source: IEEE Xplore


This paper motivates and describes the data integration component of INDUS (intelligent data understanding system) environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies.

Download full-text


Available from: Vasant Honavar, Oct 24, 2015
    • "This community will evaluate all of the submissions regarding the subject manner. Ontology has become a buzzword in the Semantic Web and semantic data processing fields (Berners-Lee et al. 2001), and its importance is being recognized in a multiplicity of research fields and application areas, such as knowledge engineering (Gruber 1993), database design and integration, and information retrieval and extraction (Noy and McGuiness 2001; Castillo et al. 2003). Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes, and relations in every area of reality (Smith and Welty 2001). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ontologies are widely considered to be the backbone of the Semantic Web. Its importance is being recognized in a multiplicity of research fields and application areas. Ontology building is crucial for the aforementioned issues. The main goal of this research is to investigate an effective methodology for collaborative ontology building. A trust-based consensus is proposed to support an efficient solution for conflicts among different viewpoints of participants in the collaborative ontology (CoO) building process. In every cycle of the iterative collaborative process, the ontology is refined and evolved by reaching a trust-based consensus among the participants’ viewpoints. The proposed method is applied for collaborative Vietnamese WordNet building. The result is significant in comparison with previous approaches.
    Cybernetics and Systems 03/2014; 45(2). DOI:10.1080/01969722.2014.874815 · 0.84 Impact Factor
  • Source
    • "Currently, there are two commonly used approaches for such information integration:''data warehousing " (e.g., [20,22– 24,29]), and ''virtual information integration " (e.g., [6] [18] [31]). Data-warehouse techniques require importing data from various data sources and maintaining them at a centralized data warehouse; such methods thus provide for centralized control over the data stored within and necessitate that queries are executed at the data warehouse itself. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The issues of data integration and interoperability pose significant challenges in scientific hydrological and environmental studies, due largely to the inherent semantic and structural heterogeneities of massive datasets and non-uniform autonomous data sources. To address these data integration challenges, we propose a unified data integration framework, called Hydrological Integrated Data Environment (HIDE). HIDE is based on a labeled-tree data integration model referred to as DataNode tree. Using this framework, characteristics of datasets gathered from diverse data sources – with different logical and access organizations – can be extracted and classified as Time–Space–Attribute (TSA) labels and are subsequently arranged in a DataNode tree. The uniqueness of our approach is that it effectively combines the semantic aspects of the scientific domain with diverse datasets having different logical organizations to form a unified view. Further, we also adopt a metadata-based approach for specifying the TSA-DataNode tree in order to achieve flexibility and extensibility. The search engine of our HIDE prototype system evaluates a simple user query systematically on the TSA-DataNode tree, presenting integrated results in a standardized format that facilitates both effective and efficient data integration.
    Information Sciences 12/2010; 180(24-180):5008-5028. DOI:10.1016/j.ins.2010.06.015 · 4.04 Impact Factor
  • Source
    • "temporally); to display a data schema or to inform navigation through the data [17]. In particular cases, ontology-based visualization has been used to support queries based on temporal abstractions [18]; to enrich maps with additional geographic information [19]; to reveal multiple levels of abstraction in decision-tree generation [20] and to assist in information mining [21]; very popularly, to map social networks and communities of common interest [22], [23], [24]. Ontologies have also been used for knowledge discovery without visualization, especially in the integration of heterogeneous scientific repositories ([25]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical practice and research rely increasingly on analytic approaches to patient data. Visualization enables the comparative exploration of similar patients, a key requirement in certain clinical decision support systems. Patient data is complex and heterogeneous, may have different formats, reside in various structures and carry different semantics. This makes the comparison and analysis of clinical data a challenging task. Most medical applications visualize patient data without integrating additional semantic information to structure the analysis. Our objective is to map patient data onto relevant fragments of ontologies and inferred ontological structures as a basis for improved patient data visualization, comparison, and analysis. Two visualization scenarios that we have implemented using the patient data acquired in the Health-e-Child project will be presented and their clinical evaluation will be provided.
    Proceedings of the Twenty-First IEEE International Symposium on Computer-Based Medical Systems, June 17-19, 2008, Jyväskylä, Finland; 06/2008
Show more