Fleur Mougin

Fleur Mougin
University of Bordeaux · UMR Inserm U1219

PhD

About

94
Publications
9,615
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
712
Citations
Education
November 2002 - November 2006
Université de Rennes 1
Field of study
  • Biomedical informatics

Publications

Publications (94)
Chapter
Secondary use of health data is made difficult in part because of large semantic heterogeneity. Many efforts are being made to align local terminologies with international standards. With increasing concerns about data privacy, we focused here on the use of machine learning methods to align biological data elements using aggregated features that co...
Chapter
Information about drugs is numerous and varied, and many drugs can share the same information. Grouping drugs that have common characteristics can be useful to avoid redundancy and facilitate interoperability. Our work focused on the evaluation of the relevance of classes allowing this type of grouping: the “Virtual Drug”. Thus, in this paper, we d...
Article
Full-text available
Objectives: To introduce the 2021 International Medical Informatics Association (IMIA) Yearbook by the editors. Methods: The editorial provides an introduction and overview to the 2021 IMIA Yearbook whose special topic is “Managing Pandemics with Health Informatics - Successes and Challenges”. The Special Topic, the keynote paper, and survey papers...
Article
Full-text available
As the capacity for generating large-scale molecular profiling data continues to grow, the ability to extract meaningful biological knowledge from it remains a limitation. Here, we describe the development of a new fixed repertoire of transcriptional modules, BloodGen3, that is designed to serve as a stable reusable framework for the analysis and i...
Chapter
Full-text available
This paper presents a prototype for the visualization of food-drug interactions implemented in the MIAM project, whose objective is to develop methods for the extraction and representation of these interactions and to make them available in the Thériaque database. The prototype provides users with a graphical visualization showing the hierarchies o...
Article
Full-text available
Objective Our study consists in aligning the interface terminology of the Bordeaux university hospital (TLAB) to the Logical Observation Identifiers Names and Codes (LOINC). The objective was to facilitate the shared and integrated use of biological results with other health information systems. Materials and Methods We used an innovative approach...
Article
Full-text available
The aim of our study was to create a graph model for the description of LOINC® concepts. The main objective of the constructed structure is to facilitate the alignment of French local terminologies to LOINC. The process consisted of automatically incorporating the naming rules of LOINC labels, based on punctuation. We implemented these rules and ap...
Preprint
Full-text available
As the capacity for generating large scale data continues to grow the ability to extract meaningful biological knowledge from it remains a limitation. Here we describe the development of a new fixed repertoire of transcriptional modules. It is meant to serve as a stable reusable framework for the analysis and interpretation of blood transcriptome p...
Article
Full-text available
The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological informatio...
Conference Paper
In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate le...
Article
Full-text available
Clinical information in electronic health records (EHRs) is mostly unstructured. With the ever-increasing amount of information in patients' EHRs, manual extraction of clinical information for data reuse can be tedious and time-consuming without dedicated tools. In this paper, we present SmartCRF, a prototype to visualize, search and ease the extra...
Article
Full-text available
The W3C project, "Linking Open Drug Data" (LODD), linked several publicly available sources of drug data together. So far, French data, like marketed drugs and their summary of product characteristics, were not integrated and remained difficult to query. In this paper, we present Romedi (Référentiel Ouvert du Médicament), an open dataset that links...
Preprint
Full-text available
The revolution in new sequencing technologies, by strongly improving the production of omics data, is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze these massive data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in b...
Article
Full-text available
Motivation: The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that needs to be addressed. Automatic methods have been...
Preprint
Full-text available
In this paper, we describe the approach and results for our participation in the task 1 (multilingual information extraction) of the CLEF eHealth 2018 challenge. We addressed the task of automatically assigning ICD-10 codes to French death certificates. We used a dictionary-based approach using materials provided by the task organizers. The terms o...
Conference Paper
Full-text available
De nombreuses sources de données non structurées comme les dossiers patients informatisés, les articles scientifiques, les recommandations de bonnes pratiques et les forums mentionnent les médicaments. La détection des médicaments en texte libre est une étape importante pour faciliter leur recherche et extraire des informations sur ces derniers. Da...
Article
Life sciences are currently going through a great number of transformations raised by the in-going revolution in high-throughput technologies for the acquisition of data. The integration of their high dimensionality, ranging from omics to clinical data, is becoming one of the most challenging stages. It involves inter-disciplinary developments with...
Article
In oncology, the reuse of data is confronted with the heterogeneity of terminologies. It is necessary to semantically integrate these distinct terminologies. The semantic integration by using a third terminology as a support is a conventional approach for the integration of two terminologies that are not very structured. The aim of our study was to...
Conference Paper
Full-text available
Nowadays, one of the main challenges in biology is to make use of several sources of data to improve our understanding of life. When analyzing experimental data, researchers aim at clustering genes that show a similar behavior through specific external conditions. Thus, the functional interpretation of genes is crucial and involves making use of th...
Article
Full-text available
Background Identifying incident cancer cases within a population remains essential for scientific research in oncology. Data produced within electronic health records can be useful for this purpose. Due to the multiplicity of providers, heterogeneous terminologies such as ICD-10 and ICD-O-3 are used for oncology diagnosis recording purpose. To enab...
Article
Full-text available
With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. The...
Article
Full-text available
With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such kno...
Article
Full-text available
Clinical data recorded in modern EHRs are very rich, although their secondary use research and medical decision may be complicated (eg, missing and incorrect data, data spread over several clinical databases, information available only within unstructured narrative documents). We propose to address the issue related to the processing of narrative d...
Article
Full-text available
Significant efforts have been undertaken for providing the Gene Ontology (GO) in a computable format as well as for enriching it with logical definitions. Automated approaches can thus be applied to GO for assisting its maintenance and for checking its internal coherence. However, inconsistencies may still remain within GO. In this frame, the objec...
Article
Recent and intensive research in the biomedical area enabled to accumulate and disseminate biomedical knowledge through various knowledge bases increasingly available on the Web. The exploitation of this knowledge requires to create links between these bases and to use them jointly. Linked Data, the SPARQL language and interfaces in natural languag...
Conference Paper
Full-text available
With the rapid growth of biomedical litera-ture, automated methods for assigning index-ing terms to textual documents have received a growing interest. While many efforts have been done towards this direction, it remains a real challenge. Moreover, the issue is even more complicated since full text is not always freely available. In this paper, we...
Article
Introduction L’identification et la caractérisation des cas incidents de cancers en population sont des enjeux importants pour faciliter la recherche en cancérologie. L’informatisation des données médicales produit des données qui pourraient répondre à ces besoins. En France, plusieurs terminologies sont utilisées pour coder des diagnostics de canc...
Article
Full-text available
This work focuses on multiply-related Unified Medical Language System (UMLS) concepts, that is, concepts associated through multiple relations. The relations involved in such situations are audited to determine whether they are provided by source vocabularies or result from the integration of these vocabularies within the UMLS. We study the compati...
Article
Full-text available
Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method c...
Conference Paper
Full-text available
The exploitation of clinical reports for generating alerts especially relies on the alignment of the dedicated terminologies, i.e., MedDRA (exploited in the pharmacovigilance area) and SNOMED International (exploited recently in France for encoding clinical documents). In this frame, we propose a cross-language approach for acquiring automatically...
Conference Paper
In this paper, we present a method for building (bilingual) domain ontologies from existing resources. This method combines two approaches: knowledge extraction from texts and the reuse of existing terminological resources. The approach consists of four steps: the extraction of terms from French and English corpus using textual analysis tools, term...
Article
Full-text available
Objectives The aim of this research was to automate the search of publications concerning adverse drug reactions (ADR) by defining the queries used to search MEDLINE and by determining the required threshold for the number of extracted publications to confirm the drug/event association in the literature. Methods We defined an approach based on the...
Article
Full-text available
Objective Data from electronic healthcare records (EHR) can be used to monitor drug safety, but in order to compare and pool data from different EHR databases, the extraction of potential adverse events must be harmonized. In this paper, we describe the procedure used for harmonizing the extraction from eight European EHR databases of five events o...
Article
We present in this paper a method for acquiring a bilingual terminology concerning the Alzheimer's disease using a parallel corpus. NLP techniques are used for parsing English and French texts in order to extract candidate terms. These terms are then matched automatically using an approach that combines two alignment techniques: one based on the ca...
Article
Full-text available
Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals may face difficulties for making diagnoses and/or therapeutic decisions. Moreover, patients may misunderstand their health status. These medical practitioners need effective tools to locate in real time relevant elements within the patients' EHR and vis...
Conference Paper
Full-text available
Background: The SOS Project aims to assess the risk of cardiovascular and gastrointestinal events of non-steroidal anti-inflammatory drugs. Seven European databases (DB), which contain health records of more than 35 million citizens, are involved in the project. These DB use four different terminologies to code events (ICD-9-CM, ICD-10-GM, READ and...
Conference Paper
Background: The SOS Project aims to assess the risk of cardiovascular and gastrointestinal events of non-steroidal anti-inflammatory drugs. Seven European databases (DB), which contain health records of more than 35 million citizens, are involved in the project. These DB use four different terminologies to code events (ICD-9-CM, ICD-10-GM, READ and...
Conference Paper
Full-text available
Nous présentons dans cet article une méthode de construction d’une ontologie bilingue (français/anglais) à partir de résumés et analyses critiques d’articles scientifiques sur la maladie d’Alzheimer. Cette méthode combine deux approches : l’acquisition d’ontologies à partir de textes et la réutilisation de ressources terminologiques existantes. Les...
Article
Full-text available
The Anatomical Therapeutic Chemical (ATC) classification sys-tem is widely used in Europe for the classification and coding of drugs. However, ATC is not well integrated with other medication terminologies (e.g., NDF-RT – the National Drug File-Reference Terminology), which hinders the integration of data coded to these two systems. In this work, w...
Conference Paper
Full-text available
MedDRA is exploited for the indexing of pharmacovigilance spontaneous reports. But since spontaneous reports cover only a small proportion of the existing adverse drug reactions, the exploration of clinical reports is seriously considered. Through the UMLS, the current mapping between MedDRA and SNOMED CT, this last being used for indexing clinical...
Article
Full-text available
Health professionals are faced with challenges when they have to exploit the semantics of concepts present in clinical terminologies in support of research activities. The difficulty lies in the fact that this semantics is represented not only through the labels of concepts, but also their position in the hierarchy, and, when available, their logic...
Article
Full-text available
Objectives: To determine the anti-coagulation status of patients, based on the list of medications they have been prescribed, using the publicly available resource NDF-RT (National Drug File Reference Terminology). Methods: We explored the legacy VA classes and we refined the definition of external pharmacologic classes (EXT) in NDF-RT in order to...
Article
Full-text available
The overall objective of the EU-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in...
Article
Full-text available
Linkages between animal models of diseases and human data enable the development of translational research hypotheses. The objective of this study is to investigate two approaches to integrating phenotype and clinical information. On the one hand, we develop a terminology mapping between phenotypes from the Mammalian Phenotype Ontology (MPO) and On...
Article
Polysemy is a frequent issue in biomedical terminologies. In the Unified Medical Language System (UMLS), polysemous terms are either represented as several independent concepts, or clustered into a single, multiply-categorized concept. The objective of this study is to analyze polysemous concepts in the UMLS through their categorization and hierarc...
Article
The overall objective of the eu-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in...
Article
Full-text available
Unlike recent biomedical terminologies, the International Classification of Diseases (ICD) does not state any explicit associations between a given disease and the corresponding anatomical structure(s). As a consequence, clinical repositories coded with ICD cannot be searched by anatomical structure. The objective of this work is to find associatio...
Article
Full-text available
Purpose: Collecting and analyzing findings constitute the basis of medical activity. Computer assisted medical activity raises the problem of modelling findings. We propose a unified representation of findings integrating the representations of findings in the GAMUTS in Radiology [M.M. Reeder, B. Felson, GAMUTS in radiology Comprehensive lists of...
Conference Paper
Full-text available
The information needed by biologists and physicians for research purposes is distributed over many heterogeneous sources. Integration systems provide a single, centralized and homogeneous interface for users to query mul- tiple information sources simultaneously. The major limitation of integration systems, including mediator-based systems, is that...
Article
Full-text available
Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple st...