Publications

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic similarity has become, in recent years, the backbone of numerous knowledge-based applications dealing with textual data. From the different methods and paradigms proposed to assess semantic similarity, ontology-based measures and, more specifically, those based on quantifying the Information Content (IC) of concepts are the most widespread solutions due to their high accuracy. However, these measures were designed to exploit a single ontology. They thus cannot be leveraged in many contexts in which multiple knowledge bases are considered. In this paper, we propose a new approach to achieve accurate IC-based similarity assessments for concept pairs spread throughout several ontologies. Based on Information Theory, our method defines a strategy to accurately measure the degree of commonality between concepts belonging to different ontologies—this is the cornerstone for estimating their semantic similarity. Our approach therefore enables classic IC-based measures to be directly applied in a multiple ontology setting. An empirical evaluation, based on well-established benchmarks and ontologies related to the biomedical domain, illustrates the accuracy of our approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works.
    Information Sciences 11/2014; 283:197–210. · 3.64 Impact Factor
  • Source
    IPMU 2014; 07/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Concept-based information retrieval is known to be a powerful and reliable process. However, the need of a semantically annotated corpus and its respective data structure – e.g. a domain ontology – can be problematic. The conception and enlargement of a semantic index is a tedious task, which needs to be addressed. We previously suggested an annotation propagation approach in a vector space representation of the corpus to help users enriching a corpus. In this paper, we propose an extension of this process for semantic indexations. Starting from a map showing the documents of the corpus, a user will just have to place a new resource on this map to obtain a first annotation of this resource. This annotation is obtained by optimizing an objective function, which assesses the semantic similarity between the annotation suggested for this new resource and those of documents found in its vicinity. Here, we illustrate this strategy on tumor-related scientific papers.
    CORIA 2014; 03/2014
  • Source
    15th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2014; 01/2014
  • 23ème Rencontres francophones sur la Logique Floue et ses Applications, Cargèse (Corse - France); 01/2014
  • Source
    19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014; 01/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Semantic Measures Library and Toolkit are robust open source and easy to use software solutions dedicated to semantic measures. They can be used for large scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. OBO, RDF). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool which can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org. harispe.sebastien@gmail.com.
    Bioinformatics 10/2013; · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic measures are today widely used to estimate the strength of the semantic relationship between elements of various types: units of language (e.g., words, sentences), concepts or even entities (e.g., documents, genes, geographical locations). They play an important role for the comparison these elements according to semantic proxies, texts and knowledge models (e.g., ontologies), implicitly or formally supporting their meaning or describing their nature. Semantic measures are therefore essential for designing intelligent agents which will use semantic analysis to mimic human ability to compare things. This paper proposes a survey of the broad notion of semantic measure. This notion generalizes the well-known notions of semantic similarity, semantic relatedness and semantic distance, which have been extensively studied by various communities over the last decades (e.g., Cognitive Science, Linguistics, and Artificial Intelli-gence to mention a few). Definitions, practical applications, and the various approaches used for their definitions are presented. In addition, the evaluations of semantic measures, as well as, software solutions dedicated to their computation and analysis are introduced. The general presentation of the large diversity of existing semantic measures is further completed by a detailed survey of measures based on knowledge bases analysis. In this study, we mainly focus on measures which rely on graph analyses, i.e. framed in the relational setting. They are of particular interest for numerous communities and have recently gained a lot of attention in research and application by taking advantage of several types of knowledge bases (e.g., ontologies, semantic graphs) to compare words, concepts, or entities.
    10/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The exponential growth of available electronic data is almost useless without efficient tools to retrieve the right information at the right time. It is now widely acknowledged that information retrieval systems need to take semantics into account to enhance the use of available information. However, there is still a gap between the amounts of relevant information that can be accessed through optimized IRSs on the one hand, and users' ability to grasp and process a handful of relevant data at once on the other. This chapter shows how conceptual and lexical approaches may be jointly used to enrich document description. After a survey on semantic based methodologies designed to efficiently retrieve and exploit information, hybrid approaches are discussed. The original approach presented here benefits from both lexical and ontological document description, and combines them in a software architecture dedicated to information retrieval and rendering in specific domains.
    New Trends of Research in Ontologies and Lexical Resources, Edited by Oltramari, Alessandro; Vossen, Piek; Qin, Lu; Hovy, Eduard, 02/2013: pages 209-230; Springer., ISBN: 9783642317811
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context.
    Journal of Biomedical Informatics 01/2013; · 2.13 Impact Factor
  • Source
    On the Move to Meaningful Internet Systems: OTM 2013 Conferences, 01/2013: pages 606-615; Springer Berlin Heidelberg., ISBN: 9783642410291
  • Source
    On the Move to Meaningful Internet Systems: OTM 2013 Workshops; 01/2013
  • Source
    24es journées francophones d’Ingénierie des Connaissances – IC 2013, Lille; 01/2013
  • Vincent Ranwez, Sylvie Ranwez, Stefan Janaqi
    Techniques et sciences informatiques 01/2012; 31(1):11-38.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. RESULTS: This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. CONCLUSIONS: The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
    BMC Bioinformatics 01/2012; 13 Suppl 1:S4. · 3.02 Impact Factor
  • V. Ranwez, S. Ranwez, S. Janaqi
    [Show abstract] [Hide abstract]
    ABSTRACT: Ontologies are successfully used as semantic guides when navigating through the huge and ever increasing quantity of digital documents. Nevertheless, the size of numerous domain ontologies tends to grow beyond the human capacity to grasp information. This growth is problematic for a lot of key applications that require user interactions such as document annotation or ontology modification/evolution. The problem could be partially overcome by providing users with a sub-ontology focused on their current concepts of interest. A sub-ontology restricted to this sole set of concepts is of limited interest since their relationships can generally not be explicit without adding some of their hyponyms and hypernyms. This paper proposes efficient algorithms to identify these additional key concepts based on the closure of two common graph operators: the least common-ancestor and greatest common descendant. The resulting method produces ontology excerpts focused on a set of concepts of interest and is fast enough to be used in interactive environments. As an example, we use the resulting program, called OntoFocus (http://www.ontotoolkit.mines-ales.fr/), to restrict, in few seconds, the large Gene Ontology (~30,000 concepts) to a sub-ontology focused on concepts annotating a gene related to breast cancer.
    IEEE Transactions on Knowledge and Data Engineering 01/2012; · 1.89 Impact Factor
  • Vincent Ranwez, Stefan Janaqi, Sylvie Ranwez
    [Show abstract] [Hide abstract]
    ABSTRACT: The least common ancestor on two vertices, denoted lca(x,y), is a well defined operation in a directed acyclic graph (dag) G. We introduce U lca (S), a natural extension of lca(x,y) for any set S of vertices. Given such a set S 0 , one can iterate S k+1 =U lca (S k ) in order to obtain an increasing set sequence. G being finite, this sequence has always a limit which defines a closure operator. Two equivalent definitions of this operator are given and their relationships with abstract convexity are shown. The good properties of this operator permit to conceive an O(n·m) time complexity algorithm to calculate its closure. This performance is crucial in applications where dags of thousands of vertices are employed. Two examples are given in the domain of life-science: the first one concerns genes annotations understanding by restricting Gene Ontology, the second one deals with identifying taxonomic group of environmental DNA sequences.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2012; 104. · 0.28 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Because of the increasing number of electronic data, designing efficient tools to retrieve and exploit documents is a major challenge. Current search engines suffer from two main drawbacks: there is limited interaction with the list of retrieved documents and no explanation for their adequacy to the query. Users may thus be confused by the selection and have no idea how to adapt their query so that the results match their expectations. This paper describes a request method and an environment based on aggregating models to assess the relevance of documents annotated by concepts of ontology. The selection of documents is then displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive exploration of data corpus.
    Proceedings of the Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, Germany, December 10, 2010; 12/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nous nommons ‘photos sociales' les photos qui sont prises lors d'événements familiaux ou de soirées entre amis et qui représentent des individus ou des groupes d'individus. Leur indexation consiste à repérer l'événement et les personnes présentes sur les photos. Dans cet article nous présentons une méthode et des outils pour faciliter cette tâche. De nouvelles photos sont indexées à partir de photos déjà indexées selon un procédé de ‘propagation‘ qui se compose d'un ‘glisser-déposer' suivi d'une fusion et d'une affectation des contenus. Il convient au préalable d'organiser sur l'écran les photos déjà indexées selon une disposition qui facilite l'identification des personnages. Dans ce but nous faisons appel aux techniques d'Analyse Formelle de Concepts et nous proposons un algorithme de construction incrémentale d'un Diagramme de Hasse pour à la fois faciliter le repérage, intégrer les photos nouvellement indexées dans le processus d'indexation et maintenir la représentation mentale de l'utilisateur.
    01/2009;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Social photos, which are taken during family events or parties, represent individuals or groups of people. We show in this paper how a Hasse diagram is an efficient visualization strategy for eliciting different groups and navigating through them. However, we do not limit this strategy to these traditional uses. Instead we show how it can also be used for assisting in indexing new photos. Indexing consists of identifying the event and people in photos. It is an integral phase that takes place before searching and sharing. In our method we use existing indexed photos to index new photos. This is performed through a manual drag and drop procedure followed by a content fusion process that we call 'propagation'. At the core of this process is the necessity to organize and visualize the photos that will be used for indexing in a manner that is easily recognizable and accessible by the user. In this respect we make use of an Object Galois Sub-Hierarchy and display it using a Hasse diagram. The need for an incremental display that maintains the user's mental map also leads us to propose a novel way of building the Hasse diagram. To validate the approach, we present some tests conducted with a sample of users that confirm the interest of this organization, visualization and indexation approach. Finally, we conclude by considering scalability, the possibility to extract social networks and automatically create personalised albums.
    IEEE Transactions on Visualization and Computer Graphics 01/2009; 15(6):985-92. · 1.90 Impact Factor

4 Following View all

39 Followers View all