ArticlePDF Available

Modèle d'entrepôt de ressources hétérogè nes pour le traitement sémantique des documents

Authors:
  • Yellow Pages Group, Montreal

Abstract and Figures

Multiple sources of information can improve knowledge mangement if they are properly combined and processed. Knowledge engineering usually relies on knowledge resources, typically ontologies. We propose a domain-independent framework which models, combines and represents heterogenous sources of information. Our aim is to build a resources repository and afford operations of loading, storing, indexing, translating, generating and matching different resources. We propose an ontology as a model of these resources and we explain how can we represent, annotate and load new resources into our repository. These resources are treated to fit a specific need in a knowledge management process.
Content may be subject to copyright.
... These models were integrated within the proposed resources model (Thesauri entities, ontology entities, etc.); We implemented some knowledge engineering operators within a use case of merging multiple ontological and terminological resources in order to create an enriched version of WordNet [Ghoula et al. 2010a, Ghoula 2012]. ...
... Consequently, after trying different design alternatives and applying several improvements and more expressiveness in the meta-model, we propose to create a generic model that supports multiple representations for the content of a resource [Ghoula et al. 2010a]. The challenge is to create a model representing heterogeneous resources (multiple representation models) and to perform operations that involve several resources (single representation model). ...
... We have built a prototype of a lightweight repository [Ghoula et al. 2010a] using the meta-model that we described in the previous chapter. We implemented the model as a relational database because the aim of this application was to build a terminological knowledge base containing multiple terminological resources that is stored in a database. ...
Research
Full-text available
Multiple tasks related to documents, such as indexing, retrieving, annotation, or translation are based on linguistic, terminological and ontological knowledge existing in resources of different types represented using various formalisms. Building bridges between these resources and using them together is a complex task. Solving this problem relies on finding the right resources before extracting the required data. Ontology repositories have been created to help in this task by collecting ontologies and offering effective indexing of these resources. However, these repositories treat a single category of resources and do not provide operations for generating new resources. To meet these needs in terms of knowledge engineering, our contributions are (1) an ontology for representing heterogeneous resources and knowledge combination operators; (2) an approach based on the principles of semantic web to ensure the representation, storage and alignment of heterogeneous resources and (3) the development of an ontology-based repository for combining alignment resources.
Conference Paper
Full-text available
In this work we introduce a novel retrieval language, named OntoPath, for specifying and retrieving relevant ontology fragments. This language is intended to extract customized self-standing ontologies from very large, general-purpose ones. Through OntoPath, users can specify the desired detail level in the concept taxonomies as well as the properties between concepts that are required by the target applications. The syntax and aims of OntoPath resemble XPath’s in that they are simple enough to be handled by non-expert users and they are designed to be included in other XML-based applications (e.g. transformations sheets, semantic annotation of web services, etc.). OntoPath has been implemented on the top of the graph-based database G, for which a Protégé OWL plug-in has been designed to access and retrieve ontology fragments.
Conference Paper
Full-text available
Several studies have tried to improve retrieval performances based on automatic Word Sense Disambiguation techniques. So far, most attempts have failed. We try, through this paper, to give a deep analysis of the reasons behind these failures. During our participation at the Robust WSD task at CLEF 2008, we performed experiments on monolingual (English) and bilingual (Spanish to English) collections. Our official results and a deep analysis are described below, along with our conclusions and perspectives.
Article
The development of thesaurus structure for implementation into integrated system of information resources (ISIR) was discussed. This universal thesaurus scheme can be used to represent any standard thesauri, whose structures correspond to the GOST or ISO standards. Two ways of thesaurus functioning - a thesaurus built in the information system and a thesaurus in the form of stored resources was implemented in the system. The thesaurus data were serialized in the RDF/XML format in accordance with the suggested thesaurus schema and the import/export tools allowed the system to interact with other systems.
Article
Although controlled biomedical terminologies have been with us for centuries, it is only in the last couple of decades that close attention has been paid to the quality of these terminologies. The result of this attention has been the development of auditing methods that apply formal methods to assessing whether terminologies are complete and accurate. We have performed an extensive literature review to identify published descriptions of these methods and have created a framework for characterizing them. The framework considers manual, systematic and heuristic methods that use knowledge (within or external to the terminology) to measure quality factors of different aspects of the terminology content (terms, semantic classification, and semantic relationships). The quality factors examined included concept orientation, consistency, non-redundancy, soundness and comprehensive coverage. We reviewed 130 studies that were retrieved based on keyword search on publications in PubMed, and present our assessment of how they fit into our framework. We also identify which terminologies have been audited with the methods and provide examples to illustrate each part of the framework.