Sylvie Ranwez

Human-computer Interaction, Artificial Intelligence, Algorithms

PhD
17.55

Publications

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The need of indexing biomedical papers with the MeSH is incessantly growing and automated approaches are constantly evolving. Since 2013, the BioASQ challenge has been promoting those evolutions by proposing datasets and evaluation metrics. In this paper, we present our system, USI, and how we adapted it to participate to this challenge this year. USI is a generic approach, which means it does not directly take into account the content of the document to annotate. The results lead us to the conclusion that methods that solely rely on semantic annotations available in the corpus can already perform well compared to NLP-based approaches as our results always figure in the top ones.
    CLEF 2015; 08/2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments---most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli. In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances defined into knowledge bases. The aim of these measures is to assess the similarity or relatedness of such semantic entities by taking into account their semantics, i.e. their meaning---intuitively, the words tea and coffee, which both refer to stimulating beverage, will be estimated to be more semantically similar than the words toffee (confection) and coffee, despite that the last pair has a higher syntactic similarity. The two state-of-the-art approaches for estimating and quantifying semantic similarities/relatedness of semantic entities are presented in detail: the first one relies on corpora analysis and is based on Natural Language Processing techniques and semantic models while the second is based on more or less formal, computer-readable and workable forms of knowledge such as semantic networks, thesauri or ontologies. Semantic measures are widely used today to compare units of language, concepts, instances or even resources indexed by them (e.g., documents, genes). They are central elements of a large variety of Natural Language Processing applications and knowledge-based treatments, and have therefore naturally been subject to intensive and interdisciplinary research efforts during last decades. Beyond a simple inventory and categorization of existing measures, the aim of this monograph is to convey novices as well as researchers of these domains toward a better understanding of semantic similarity estimation and more generally semantic measures. To this end, we propose an in-depth characterization of existing proposals by discussing their features, the assumptions on which they are based and empirical results regarding their performance in particular applications. By answering these questions and by providing a detailed discussion on the foundations of semantic measures, our aim is to give the reader key knowledge required to: (i) select the more relevant methods according to a particular usage context, (ii) understand the challenges offered to this field of study, (iii) distinguish room of improvements for state-of-the-art approaches and (iv) stimulate creativity toward the development of new approaches. In this aim, several definitions, theoretical and practical details, as well as concrete applications are presented.
    Edited by Graeme Hirst, 05/2015; Morgan & Claypool.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. Results: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. Conclusions: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion-instead of one score per concept.
    BMC Bioinformatics 03/2015; 16(1):83. DOI:10.1186/s12859-015-0513-4 · 2.67 Impact Factor
  • Source
    ROADEF 2015; 02/2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic similarity has become, in recent years, the backbone of numerous knowledge-based applications dealing with textual data. From the different methods and paradigms proposed to assess semantic similarity, ontology-based measures and, more specifically, those based on quantifying the Information Content (IC) of concepts are the most widespread solutions due to their high accuracy. However, these measures were designed to exploit a single ontology. They thus cannot be leveraged in many contexts in which multiple knowledge bases are considered. In this paper, we propose a new approach to achieve accurate IC-based similarity assessments for concept pairs spread throughout several ontologies. Based on Information Theory, our method defines a strategy to accurately measure the degree of commonality between concepts belonging to different ontologies—this is the cornerstone for estimating their semantic similarity. Our approach therefore enables classic IC-based measures to be directly applied in a multiple ontology setting. An empirical evaluation, based on well-established benchmarks and ontologies related to the biomedical domain, illustrates the accuracy of our approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works.
    Information Sciences 11/2014; 283:197–210. DOI:10.1016/j.ins.2014.06.039 · 3.89 Impact Factor
  • Source
    IPMU 2014; 07/2014
  • Source
    15th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2014; 07/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Concept-based information retrieval is known to be a powerful and reliable process. However, the need of a semantically annotated corpus and its respective data structure – e.g. a domain ontology – can be problematic. The conception and enlargement of a semantic index is a tedious task, which needs to be addressed. We previously suggested an annotation propagation approach in a vector space representation of the corpus to help users enriching a corpus. In this paper, we propose an extension of this process for semantic indexations. Starting from a map showing the documents of the corpus, a user will just have to place a new resource on this map to obtain a first annotation of this resource. This annotation is obtained by optimizing an objective function, which assesses the semantic similarity between the annotation suggested for this new resource and those of documents found in its vicinity. Here, we illustrate this strategy on tumor-related scientific papers.
    CORIA 2014; 03/2014
  • Source
    19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014; 01/2014
  • Source
    23ème Rencontres francophones sur la Logique Floue et ses Applications, Cargèse (Corse - France); 01/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Semantic Measures Library and Toolkit are robust open source and easy to use software solutions dedicated to semantic measures. They can be used for large scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. OBO, RDF). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool which can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org. harispe.sebastien@gmail.com.
    Bioinformatics 10/2013; 30(5). DOI:10.1093/bioinformatics/btt581 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic measures are today widely used to estimate the strength of the semantic relationship between elements of various types: units of language (e.g., words, sentences), concepts or even entities (e.g., documents, genes, geographical locations). They play an important role for the comparison these elements according to semantic proxies, texts and knowledge models (e.g., ontologies), implicitly or formally supporting their meaning or describing their nature. Semantic measures are therefore essential for designing intelligent agents which will use semantic analysis to mimic human ability to compare things. This paper proposes a survey of the broad notion of semantic measure. This notion generalizes the well-known notions of semantic similarity, semantic relatedness and semantic distance, which have been extensively studied by various communities over the last decades (e.g., Cognitive Science, Linguistics, and Artificial Intelli-gence to mention a few). Definitions, practical applications, and the various approaches used for their definitions are presented. In addition, the evaluations of semantic measures, as well as, software solutions dedicated to their computation and analysis are introduced. The general presentation of the large diversity of existing semantic measures is further completed by a detailed survey of measures based on knowledge bases analysis. In this study, we mainly focus on measures which rely on graph analyses, i.e. framed in the relational setting. They are of particular interest for numerous communities and have recently gained a lot of attention in research and application by taking advantage of several types of knowledge bases (e.g., ontologies, semantic graphs) to compare words, concepts, or entities.
  • Source
    On the Move to Meaningful Internet Systems: OTM 2013 Conferences, 09/2013: pages 606-615; Springer Berlin Heidelberg., ISBN: 9783642410291
  • Source
    24es journées francophones d’Ingénierie des Connaissances – IC 2013, Lille; 07/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The exponential growth of available electronic data is almost useless without efficient tools to retrieve the right information at the right time. It is now widely acknowledged that information retrieval systems need to take semantics into account to enhance the use of available information. However, there is still a gap between the amounts of relevant information that can be accessed through optimized IRSs on the one hand, and users' ability to grasp and process a handful of relevant data at once on the other. This chapter shows how conceptual and lexical approaches may be jointly used to enrich document description. After a survey on semantic based methodologies designed to efficiently retrieve and exploit information, hybrid approaches are discussed. The original approach presented here benefits from both lexical and ontological document description, and combines them in a software architecture dedicated to information retrieval and rendering in specific domains.
    New Trends of Research in Ontologies and Lexical Resources, Edited by Oltramari, Alessandro; Vossen, Piek; Qin, Lu; Hovy, Eduard, 02/2013: pages 209-230; Springer., ISBN: 9783642317811
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context.
    Journal of Biomedical Informatics 01/2013; 48. DOI:10.1016/j.jbi.2013.11.006 · 2.48 Impact Factor
  • Source
    On the Move to Meaningful Internet Systems: OTM 2013 Workshops; 01/2013
  • Vincent Ranwez · Sylvie Ranwez · Stefan Janaqi
    [Show abstract] [Hide abstract]
    ABSTRACT: Ontologies are successfully used as semantic guides when navigating through the huge and ever increasing quantity of digital documents. Nevertheless, the size of numerous domain ontologies tends to grow beyond the human capacity to grasp information. This growth is problematic for a lot of key applications that require user interactions such as document annotation or ontology modification/evolution. The problem could be partially overcome by providing users with a sub-ontology focused on their current concepts of interest. A sub-ontology restricted to this sole set of concepts is of limited interest since their relationships can generally not be explicit without adding some of their hyponyms and hypernyms. This paper proposes efficient algorithms to identify these additional key concepts based on the closure of two common graph operators: the least common-ancestor and greatest common descendant. The resulting method produces ontology excerpts focused on a set of concepts of interest and is fast enough to be used in interactive environments. As an example, we use the resulting program, called OntoFocus (http://www.ontotoolkit.mines-ales.fr/), to restrict, in few seconds, the large Gene Ontology (~30,000 concepts) to a sub-ontology focused on concepts annotating a gene related to breast cancer.
    IEEE Transactions on Knowledge and Data Engineering 12/2012; DOI:10.1109/TKDE.2011.173 · 2.07 Impact Factor

5 Following View all

48 Followers View all