Conference Paper

Information extraction and integration from heterogeneous, distributed, autonomous information sources - A federated ontology-driven query-centric approach

Department of Computer Science, Iowa State University, Ames, Iowa, United States
DOI: 10.1109/IRI.2003.1251412 Conference: Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on
Source: IEEE Xplore

ABSTRACT This paper motivates and describes the data integration component of INDUS (intelligent data understanding system) environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The issues of data integration and interoperability pose significant challenges in scientific hydrological and environmental studies, due largely to the inherent semantic and structural heterogeneities of massive datasets and non-uniform autonomous data sources. To address these data integration challenges, we propose a unified data integration framework, called Hydrological Integrated Data Environment (HIDE). HIDE is based on a labeled-tree data integration model referred to as DataNode tree. Using this framework, characteristics of datasets gathered from diverse data sources – with different logical and access organizations – can be extracted and classified as Time–Space–Attribute (TSA) labels and are subsequently arranged in a DataNode tree. The uniqueness of our approach is that it effectively combines the semantic aspects of the scientific domain with diverse datasets having different logical organizations to form a unified view. Further, we also adopt a metadata-based approach for specifying the TSA-DataNode tree in order to achieve flexibility and extensibility. The search engine of our HIDE prototype system evaluates a simple user query systematically on the TSA-DataNode tree, presenting integrated results in a standardized format that facilitates both effective and efficient data integration.
    Information Sciences 12/2010; DOI:10.1016/j.ins.2010.06.015 · 3.89 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical practice and research rely increasingly on analytic approaches to patient data. Visualization enables the comparative exploration of similar patients, a key requirement in certain clinical decision support systems. Patient data is complex and heterogeneous, may have different formats, reside in various structures and carry different semantics. This makes the comparison and analysis of clinical data a challenging task. Most medical applications visualize patient data without integrating additional semantic information to structure the analysis. Our objective is to map patient data onto relevant fragments of ontologies and inferred ontological structures as a basis for improved patient data visualization, comparison, and analysis. Two visualization scenarios that we have implemented using the patient data acquired in the Health-e-Child project will be presented and their clinical evaluation will be provided.
    Proceedings of the Twenty-First IEEE International Symposium on Computer-Based Medical Systems, June 17-19, 2008, Jyväskylä, Finland; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cost-effective equipment maintenance for electric power transmission systems requires ongoing integration of information from multiple, highly distributed, and heterogeneous data sources storing various information about equipment. This paper describes a federated, query-centric data integration and knowledge acquisition framework for condition monitoring and failure rate prediction of power transformers. Specifically, the system uses substation equipment condition data collected from distributed data resources, some of which may be local to the substation, to develop Hidden Markov Models (HMMs) which transform the condition data into failure probabilities. These probabilities provide the most current knowledge of equipment deterioration, which can be used in system-level simulation and decision tools. The system is illustrated using dissolved gas-in-oil field data for assessing the deterioration level of power transformer insulating oil.
    System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on; 02/2006