
Fleur MouginUniversity of Bordeaux · UMR Inserm U1219
Fleur Mougin
PhD
About
94
Publications
9,615
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
712
Citations
Introduction
Education
November 2002 - November 2006
Publications
Publications (94)
Secondary use of health data is made difficult in part because of large semantic heterogeneity. Many efforts are being made to align local terminologies with international standards. With increasing concerns about data privacy, we focused here on the use of machine learning methods to align biological data elements using aggregated features that co...
Information about drugs is numerous and varied, and many drugs can share the same information. Grouping drugs that have common characteristics can be useful to avoid redundancy and facilitate interoperability. Our work focused on the evaluation of the relevance of classes allowing this type of grouping: the “Virtual Drug”. Thus, in this paper, we d...
Objectives: To introduce the 2021 International Medical Informatics Association (IMIA) Yearbook by the editors.
Methods: The editorial provides an introduction and overview to the 2021 IMIA Yearbook whose special topic is “Managing Pandemics with Health Informatics - Successes and Challenges”. The Special Topic, the keynote paper, and survey papers...
As the capacity for generating large-scale molecular profiling data continues to grow, the ability to extract meaningful biological knowledge from it remains a limitation. Here, we describe the development of a new fixed repertoire of transcriptional modules, BloodGen3, that is designed to serve as a stable reusable framework for the analysis and i...
This paper presents a prototype for the visualization of food-drug interactions implemented in the MIAM project, whose objective is to develop methods for the extraction and representation of these interactions and to make them available in the Thériaque database. The prototype provides users with a graphical visualization showing the hierarchies o...
Objective
Our study consists in aligning the interface terminology of the Bordeaux university hospital (TLAB) to the Logical Observation Identifiers Names and Codes (LOINC). The objective was to facilitate the shared and integrated use of biological results with other health information systems.
Materials and Methods
We used an innovative approach...
The aim of our study was to create a graph model for the description of LOINC® concepts. The main objective of the constructed structure is to facilitate the alignment of French local terminologies to LOINC. The process consisted of automatically incorporating the naming rules of LOINC labels, based on punctuation. We implemented these rules and ap...
As the capacity for generating large scale data continues to grow the ability to extract meaningful biological knowledge from it remains a limitation. Here we describe the development of a new fixed repertoire of transcriptional modules. It is meant to serve as a stable reusable framework for the analysis and interpretation of blood transcriptome p...
The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological informatio...
In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate le...
Clinical information in electronic health records (EHRs) is mostly unstructured. With the ever-increasing amount of information in patients' EHRs, manual extraction of clinical information for data reuse can be tedious and time-consuming without dedicated tools. In this paper, we present SmartCRF, a prototype to visualize, search and ease the extra...
The W3C project, "Linking Open Drug Data" (LODD), linked several publicly available sources of drug data together. So far, French data, like marketed drugs and their summary of product characteristics, were not integrated and remained difficult to query. In this paper, we present Romedi (Référentiel Ouvert du Médicament), an open dataset that links...
The revolution in new sequencing technologies, by strongly improving the production of omics data, is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze these massive data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in b...
Motivation:
The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that needs to be addressed. Automatic methods have been...
In this paper, we describe the approach and results for our participation in the task 1 (multilingual information extraction) of the CLEF eHealth 2018 challenge. We addressed the task of automatically assigning ICD-10 codes to French death certificates. We used a dictionary-based approach using materials provided by the task organizers. The terms o...
De nombreuses sources de données non structurées comme les dossiers patients informatisés, les articles scientifiques, les recommandations de bonnes pratiques et les forums mentionnent les médicaments. La détection des médicaments en texte libre est une étape importante pour faciliter leur recherche et extraire des informations sur ces derniers. Da...
Life sciences are currently going through a great number of transformations raised by the in-going revolution in high-throughput technologies for the acquisition of data. The integration of their high dimensionality, ranging from omics to clinical data, is becoming one of the most challenging stages. It involves inter-disciplinary developments with...
In oncology, the reuse of data is confronted with the heterogeneity of terminologies. It is necessary to semantically integrate these distinct terminologies. The semantic integration by using a third terminology as a support is a conventional approach for the integration of two terminologies that are not very structured. The aim of our study was to...
Nowadays, one of the main challenges in biology is to make use of several sources of data to improve our understanding of life. When analyzing experimental data, researchers aim at clustering genes that show a similar behavior through specific external conditions. Thus, the functional interpretation of genes is crucial and involves making use of th...
Background
Identifying incident cancer cases within a population remains essential for scientific research in oncology. Data produced within electronic health records can be useful for this purpose. Due to the multiplicity of providers, heterogeneous terminologies such as ICD-10 and ICD-O-3 are used for oncology diagnosis recording purpose. To enab...
With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. The...
With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such kno...
Clinical data recorded in modern EHRs are very rich, although their secondary use research and medical decision may be complicated (eg, missing and incorrect data, data spread over several clinical databases, information available only within unstructured narrative documents). We propose to address the issue related to the processing of narrative d...
Significant efforts have been undertaken for providing the Gene Ontology (GO) in a computable format as well as for enriching it with logical definitions. Automated approaches can thus be applied to GO for assisting its maintenance and for checking its internal coherence. However, inconsistencies may still remain within GO. In this frame, the objec...
Recent and intensive research in the biomedical area enabled to accumulate and disseminate biomedical knowledge through various knowledge bases increasingly available on the Web. The exploitation of this knowledge requires to create links between these bases and to use them jointly. Linked Data, the SPARQL language and interfaces in natural languag...
With the rapid growth of biomedical litera-ture, automated methods for assigning index-ing terms to textual documents have received a growing interest. While many efforts have been done towards this direction, it remains a real challenge. Moreover, the issue is even more complicated since full text is not always freely available. In this paper, we...
Introduction
L’identification et la caractérisation des cas incidents de cancers en population sont des enjeux importants pour faciliter la recherche en cancérologie. L’informatisation des données médicales produit des données qui pourraient répondre à ces besoins. En France, plusieurs terminologies sont utilisées pour coder des diagnostics de canc...
This work focuses on multiply-related Unified Medical Language System (UMLS) concepts, that is, concepts associated through multiple relations. The relations involved in such situations are audited to determine whether they are provided by source vocabularies or result from the integration of these vocabularies within the UMLS.
We study the compati...
Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method c...
The exploitation of clinical reports for generating alerts especially relies on the alignment of the dedicated terminologies, i.e., MedDRA (exploited in the pharmacovigilance area) and SNOMED International (exploited recently in France for encoding clinical documents). In this frame, we propose a cross-language approach for acquiring automatically...
In this paper, we present a method for building (bilingual) domain ontologies from
existing resources. This method combines two approaches: knowledge extraction from texts and the
reuse of existing terminological resources. The approach consists of four steps: the extraction of
terms from French and English corpus using textual analysis tools, term...
Objectives The aim of this research was to automate the search of publications concerning adverse drug reactions (ADR) by defining the queries used to search MEDLINE and by determining the required threshold for the number of extracted publications to confirm the drug/event association in the literature.
Methods We defined an approach based on the...
Objective Data from electronic healthcare records (EHR) can be used to monitor drug safety, but in order to compare and pool data from different EHR databases, the extraction of potential adverse events must be harmonized. In this paper, we describe the procedure used for harmonizing the extraction from eight European EHR databases of five events o...
We present in this paper a method for acquiring a bilingual terminology concerning the Alzheimer's disease using a parallel corpus. NLP techniques are used for parsing English and French texts in order to extract candidate terms. These terms are then matched automatically using an approach that combines two alignment techniques: one based on the ca...
Because of the ever-increasing amount of information in patients' EHRs, healthcare professionals may face difficulties for making diagnoses and/or therapeutic decisions. Moreover, patients may misunderstand their health status. These medical practitioners need effective tools to locate in real time relevant elements within the patients' EHR and vis...
Background: The SOS Project aims to assess the risk of cardiovascular and gastrointestinal events of non-steroidal anti-inflammatory drugs. Seven European databases (DB), which contain health records of more than 35 million citizens, are involved in the project. These DB use four different terminologies to code events (ICD-9-CM, ICD-10-GM, READ and...
Background: The SOS Project aims to assess the risk of cardiovascular and gastrointestinal events of non-steroidal anti-inflammatory drugs. Seven European databases (DB), which contain health records of more than 35 million citizens, are involved in the project. These DB use four different terminologies to code events (ICD-9-CM, ICD-10-GM, READ and...
Nous présentons dans cet article une méthode de construction d’une ontologie bilingue (français/anglais) à partir de résumés et analyses critiques d’articles scientifiques sur la maladie d’Alzheimer. Cette méthode combine deux approches : l’acquisition d’ontologies à partir de textes et la réutilisation de ressources terminologiques existantes. Les...
The Anatomical Therapeutic Chemical (ATC) classification sys-tem is widely used in Europe for the classification and coding of drugs. However, ATC is not well integrated with other medication terminologies (e.g., NDF-RT – the National Drug File-Reference Terminology), which hinders the integration of data coded to these two systems. In this work, w...
MedDRA is exploited for the indexing of pharmacovigilance spontaneous reports. But since spontaneous reports cover only a small proportion of the existing adverse drug reactions, the exploration of clinical reports is seriously considered. Through the UMLS, the current mapping between MedDRA and SNOMED CT, this last being used for indexing clinical...
Health professionals are faced with challenges when they have to exploit the semantics of concepts present in clinical terminologies in support of research activities. The difficulty lies in the fact that this semantics is represented not only through the labels of concepts, but also their position in the hierarchy, and, when available, their logic...
Objectives: To determine the anti-coagulation status of patients, based on the list of medications they have been prescribed, using the publicly available resource NDF-RT (National Drug File Reference Terminology). Methods: We explored the legacy VA classes and we refined the definition of external pharmacologic classes (EXT) in NDF-RT in order to...
The overall objective of the EU-ADR project is the design, development, and validation of a computerised system that exploits data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different databases, containing health records of more than 30 million European citizens, are involved in...
Linkages between animal models of diseases and human data enable the development of translational research hypotheses. The objective of this study is to investigate two approaches to integrating phenotype and clinical information. On the one hand, we develop a terminology mapping between phenotypes from the Mammalian Phenotype Ontology (MPO) and On...
Polysemy is a frequent issue in biomedical terminologies. In the Unified Medical Language System (UMLS), polysemous terms are either represented as several independent concepts, or clustered into a single, multiply-categorized concept. The objective of this study is to analyze polysemous concepts in the UMLS through their categorization and hierarc...
The overall objective of the eu-ADR project is the design, development, and validation of a computerised system that exploits
data from electronic health records and biomedical databases for the early detection of adverse drug reactions. Eight different
databases, containing health records of more than 30 million European citizens, are involved in...
Unlike recent biomedical terminologies, the International Classification of Diseases (ICD) does not state any explicit associations between a given disease and the corresponding anatomical structure(s). As a consequence, clinical repositories coded with ICD cannot be searched by anatomical structure. The objective of this work is to find associatio...
Purpose:
Collecting and analyzing findings constitute the basis of medical activity. Computer assisted medical activity raises the problem of modelling findings. We propose a unified representation of findings integrating the representations of findings in the GAMUTS in Radiology [M.M. Reeder, B. Felson, GAMUTS in radiology Comprehensive lists of...
The information needed by biologists and physicians for research purposes is distributed over many heterogeneous sources. Integration systems provide a single, centralized and homogeneous interface for users to query mul- tiple information sources simultaneously. The major limitation of integration systems, including mediator-based systems, is that...
Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple st...