About
315
Publications
44,422
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,307
Citations
Introduction
Additional affiliations
October 2008 - present
January 2001 - October 2008
Publications
Publications (315)
The analysis of large sets of spatio-temporal data is a fundamental challenge in epidemiological research. As the quantity and the complexity of such kind of data increases, automatic analysis approaches, such as statistics, data mining, machine learning, etc., can be used to extract useful information. While these approaches have proven effective,...
Purpose:
Mapping clinical observations and medical test results into the standardized vocabulary LOINC is a prerequisite for exchanging clinical data between health information systems and ensuring efficient interoperability.
Methods:
We present a comparison of three approaches for LOINC transcoding applied to French data collected from real-wor...
NDM-1 (New-Delhi-Metallo-β-lactamase-1) is an enzyme developed by bacteria that is implicated in bacteria resistance to almost all known antibiotics. In this study, we deliver a new, curated NDM-1 bioactivities database, along with a set of unifying rules for managing different activity properties and inconsistencies. We define the activity classif...
Context:
We present a post-hoc approach to improve the recall of ICD classification.
Method:
The proposed method can use any classifier as a backbone and aims to calibrate the number of codes returned per document. We test our approach on a new stratified split of the MIMIC-III dataset.
Results:
When returning 18 codes on average per document...
Machine learning methods are becoming increasingly popular to anticipate critical risks in patients under surveillance reducing the burden on caregivers. In this paper, we propose an original modeling that benefits of recent developments in Graph Convolutional Networks: a patient's journey is seen as a graph, where each node is an event and tempora...
Cartographers have long been interested in the representation of various movements such as migration, commercial exchanges and transportation. There are several techniques for visualizing this information; this paper focuses on flow mapping. A flow map shows a set of movements through line symbols connecting an origin to a destination. Each link is...
In this paper, we first present a new dataset of NDM-1 biological activities that is compiled by a cleaned version of the NMDI database. A literature review enriched the former database by 741 new compounds, comprising activities against NDM-1 classified in three classes (inactive, weakly and strongly active compounds) by specifying a unifying proc...
Acute coronary syndrome (ACS) in women is a growing public health issue and a death leading cause. We explored whether the hospital healthcare trajectory was characterizable using a longitudinal clustering approach in women with ACS. From the 2009–2014 French nationwide hospital database, we extracted spatio-temporal patterns in ACS patient traject...
Study of trajectory of care is attractive for predicting medical outcome. Models based on machine learning (ML) techniques have proven their efficiency for sequence prediction modeling compared to other models. Introducing pattern mining techniques contributed to reduce model complexity. In this respect, we explored methods for medical events’ pred...
Many real world data can be modeled by a graph with a set of nodes interconnected to each other by multiple relationships. Such a rich graph is called multilayer graph or network. Providing useful visualization tools to support the query process for such graphs is challenging. Although many approaches have addressed the visual query construction, f...
In the context of graph layout, many algorithms have been designed to remove node overlapping, and many quality criteria and associated metrics have been proposed to evaluate those algorithms. Unfortunately, a complete comparison of the algorithms based on some metrics that evaluate their quality has never been provided and it is thus difficult for...
Many algorithms have been designed to remove node overlapping, and many quality criteria and associated metrics have been proposed to evaluate those algorithms. Unfortunately, a complete comparison of the algorithms based on some metrics that evaluate the quality has never been provided and it is thus difficult for a visualization designer to selec...
In the recent years, there has been a massive increase in the amount of data published on the web about human and animal health events. Epidemiologists use this spatio-temporal information on a daily basis to detect and monitor dis-ease outbreaks over time. While official sources such as the World Organization for Animal Health release formal outbr...
Many algorithms have been designed to remove node overlapping, and many quality criteria and associated metrics have been proposed to evaluate those algorithms. Unfortunately, a complete comparison of the algorithms based on some metrics that evaluate the quality has never been provided and it is thus difficult for a visualization designer to selec...
The use of electronic media for the detection and monitoring of animal disease outbreaks is crucial for disease surveillance and early warning systems. Animal health specialists regularly query web pages using various formulations to obtain up-to-date news on disease outbreaks. This task, however, is often manual and time-consuming. Visualization t...
Unilateral Spatial Neglect is a cognitive impairment commonly observed in patients after right hemispheric lesions. A patient with this condition will show a lack of attention or response to visual stimuli presented to the left space. In order to assess visual neglect, several “paper and pencil” tests are traditionally used, like the Albert’s Test...
More and more health websites hire medical experts (physicians, medical students, experienced volunteers, etc.) and indicate explicitly their medical role in order to notify that they provide high-quality answers. However, medical experts may participate in forum discussions even when their role is not officially indicated. Detecting posts written...
Mieux connaître les trajectoires hospitalières des patients est essentiel pour planifier les soins. À partir des données administratives, nous extrayons des motifs spatio-temporels et les utilisons pour identifier des profils de délais inter-séjours et d’évolution de tarifs des parcours de soins. Cette approche pourrait favoriser l’amélioration des...
Background
Collecting data on the localization of users is a key issue for the MASK (Mobile Airways Sentinel networK: the Allergy Diary) App. Data anonymization is a method of sanitization for privacy. The European Commission’s Article 29 Working Party stated that geolocation information is personal data.
To assess geolocation using the MASK method...
In the recent years, there has been a massive increase in the amount of data being produced about human and animal health related events. Epidemiologists have to analyze this epidemiological data on a regular basis. They use this spatio-temporal information, most of which is shared online, to detect, observe, and track geographic locations of disea...
Resource description framework (RDF) data are cherished and exploited by various domains such as life sciences, Semantic Web and social networks. This chapter provides basic definitions on the interplay between RDF and its multigraph representation. The multigraph representation enables to construct lightweight indexing structures that ameliorate t...
Many real world data can be represented by a network with a set of nodes linked each other by multiple relations. Such a rich graph is called multilayer graph. In this demo, we present a tool for Visual Querying of Large Multilayer Graphs that allows to visually draw the query, retrieve result patterns and finally navigate and browse the results co...
Recent advances in data mining and machine learning techniques are focused on exploiting location data. These advances, combined with the increased availability of location-acquisition technology, have encouraged social networking services to offer to their users different ways to share their location information. These social networks, called loca...
Readitopics provides a new tool for browsing a textual corpus that showcases several recent work on topic labeling and topic coherence. We demonstrate the potential of these techniques to get a deeper understanding of the topics that structure different datasets. This tool is provided as a Web demo but it can be installed to experiment with your ow...
The new and emerging infectious diseases are an incising threat to countries due to globalisation, movement of passengers and international trade. In order to discover articles of potential importance to infectious disease emergence it is important to mine the Web with an accurate vocabulary. In this paper, we present a new methodology that combine...
For more than a decade, extracting frequent patterns from single large graphs has been one of the research focuses. However, in this era of data eruption, rich and complex data is being generated at an unprecedented rate. This complex data can be represented as a multigraph structure - a generic and rich graph representation. In this paper, we prop...
Multiple time series are a set of multiple quantitative variables occurring at the same interval. They are present in many domains such as medicine, finance, and manufacturing for analytical purposes. In recent years, streamgraph visualization (evolved from ThemeRiver) has been widely used for representing temporal evolution patterns in multiple ti...
A better knowledge of patient flows would improve decision making in health planning. In this article, we propose a method to characterise patients flows and also to highlight profiles of care pathways considering times and costs. From medico-administrative data, we extracted spatio-temporal patterns. Then, we clustered time between hospitalisation...
Connaître les flux de patients peut être décisif pour la planification sa-nitaire. Dans cet article, nous proposons une méthode pour caractériser les flux de patients mais aussi pour mettre en évidence des profils de délais et de tarifs des parcours de soins. À partir des données du PMSI (Programme Médicalisé des Systèmes d'Information), nous avons...
Multiple time series are present in many domains such as medicine, finance, and manufacturing for analytical purposes. When dealing with several time series scalability problem overcome. To solve this problem, multiple time series can be organized into a hierarchical structure. In this work, we introduce a Streamgraph-based approach to convey this...
Sentiment analysis allows the semantic evaluation of pieces of text according to the expressed sentiments and opinions. While considerable attention has been given to the polarity (positive, negative) of English words, only few studies were interested in the conveyed emotions (joy, anger, surprise, sadness, etc.) especially in other languages. In t...
The problem of local community detection in graphs refers to the identification of a community that is specific to a query node and relies on limited information about the network structure. Existing approaches for this problem are defined to work in dynamic network scenarios, however they are not designed to deal with complex real-world networks,...
Enhancing the frequency of satellite acquisitions represents a key issue for Earth Observation community nowadays. Repeated observations are crucial for monitoring purposes, particularly when intra-annual process should be taken into account. Time series of images constitute a valuable source of information in these cases. The goal of this paper is...
This paper describes the system we used on the tasks of the text mining challenge (DEFT 2017). This thirteenth edition of this challenge concerned the analysis of opinions and figurative language in French tweets. Three tasks have been proposed : (i) the first one concerns the classification of non-figurative tweets according to their polarity ; (i...
The problem of node-centric, or local, community detection in information networks refers to the identification of a community for a given input node, having limited information about the network topology. Existing methods for solving this problem, however, are not conceived to work on complex networks. In this paper, we propose a novel framework f...
The web and social media have been growing exponentially in recent years. We now have access to documents bearing opinions expressed on a broad range of topics. This constitutes a rich resource for natural language processing tasks, particularly for sentiment analysis. Nevertheless, sentiment analysis is usually difficult because expressed sentimen...
The problem of node-centric, or local, community detection in information networks refers to the identification of a community for a given input node, having limited information about the network topology. Existing methods for solving this problem, however, are not conceived to work on complex networks. In this paper, we propose a novel framework f...
Link prediction is a " hot topic " in network analysis and has been largely used for friendship recommendation in social networks. With the increased use of location-based services, it is possible to improve the accuracy of link prediction methods by using the mobility of users. The majority of the link prediction methods focus on the importance of...
Many real world datasets can be represented by graphs with a set of nodes interconnected with each other by multiple relations (e.g., social network, RDF graph, biological data). Such a rich graph, called multigraph, is well suited to represent real world scenarios with complex interactions. However, performing subgraph query on multigraphs is stil...
Many real world datasets can be represented by a network with a set of nodes interconnected with each other by multiple relations. Such a rich graph is called a multigraph. Unfortunately, all the existing algorithms for subgraph query matching are not able to adequately leverage multiple relationships that exist between the nodes. In this paper we...
Recent improvements in positioning technology have led to a much wider availability of massive moving object data. A crucial task is to find the moving objects that travel together. In common, these object sets are called object movement patterns. Due to the emergence of many different kinds of object movement patterns in recent years, different ap...
Many real world information can be represented by a graph with a set of nodes interconnected with each other by multiple type of relations called edge layers (e.g., social network, biological data). Edge bundling techniques have been proposed to solve cluttering issue for standard graphs while few efforts were done to deal with the similar issue fo...
We propose in this paper to handle the problem of overload in social interactions by grouping messages according to three important dimensions: (i) content (textual and hashtags), (ii) users, and (iii) time difference. We evaluated our approach on a Twitter data set and we compared it to other existing approaches and the results are promising and e...
Social media is strongly present in people's everyday life and Twitter is one example that stands out. The data within these types of services can be analyzed in order to discover useful knowledge. One interesting approach is to use data mining techniques to perceive hidden behaviours and patterns. The primary focus of this paper is the identificat...
RDF is a standard for the conceptual description of knowledge , and SPARQL is the query language conceived to query RDF data. The RDF data is cherished and exploited by various domains such as life sciences, Semantic Web, social network, etc. Further, its integration at Web-scale compels RDF management engines to deal with complex queries in terms...
Social media is strongly present in people’s everyday life and Twitter is one example that stands out. The data within these types of services can be analyzed in order to discover useful knowledge. One interesting approach is to use data mining techniques to perceive hidden behaviours and patterns. The primary focus of this paper is the identificat...
In this letter, we propose a new active transductive learning (ATL) framework for object-based classification of satellite images. The framework couples graph-based label propagation with active learning (AL) to exploit positive aspects of the two learning settings. The transductive approach considers both labelled and unlabelled image objects to p...
Collaborative ratings of forum posts have been successfully applied in order to infer the reputations of forum users. Famous websites such as Slash-dot or Stack Exchange allow their users to score messages in order to evaluate their content. These scores can be aggregated for each user in order to compute a reputation value in the forum. However, e...
Online health forums are increasingly used by patients to get information and help related to their health. However, information reliability in these forums is unfortunately not always guaranteed. Obviously, consequences of self-diagnosis may be severe on the patient's health if measures are taken without consulting a doctor. Many works on trust is...
Recent advances in network science allows the modeling and analysis of complex interrelated entities. These entities often interact with each other in a number of different ways. Simple graphs fail to capture these multiple types of relationships requiring more sophisticated mathematical structures. One such structure is multigraph, where entities...
Ce papier décrit les systèmes que nous avons soumis au défi DEFT 2015 (Défi Fouille de Texte). Cette onzième édition a porté sur l'analyse de l'opinion, du sentiment et de l'émotion dans des tweets rédigés en Français. Le défi propose trois tâches, nous avons participé à la tâche 1 qui concerne la classification des tweets selon leur polarité, à la...
Ask the doctor services are personalized forums allowing patients to ask questions directly to doctors. Usually, patients must choose the most appropriate category for their question among lots of categories to be redirected to the most relevant physician. However, manual selection is tedious and error prone activity. In this work we propose to ass...
Gradual patterns highlight covariations of attributes of the form " The more/less X, the more/less Y ". Their usefulness in several applications has recently stimulated the synthesis of several algorithms for their automated discovery from large datasets. However, existing techniques require all the interesting data to be in a single database relat...
Log files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises som...
Economic development based on industrialization, intensive agriculture expansion and population growth places greater pressure on water resources through increased water abstraction and water quality degradation [40], River pollution is now a visible issue, with emblematic ecological disasters following industrial accidents such as the pollution of...
The main use of satellite imagery concerns the process of the spectral and spatial dimensions of the data. However, to extract useful information, the temporal dimension also has to be accounted for which increases the complexity of the problem. For this reason, there is a need for suitable data mining techniques for this source of data. In this wo...
High-spatial-resolution satellites usually have the constraint of a low temporal frequency, which leads to long periods without information in cloudy areas. Furthermore, low-spatial-resolution satellites have higher revisit cycles. Combining information from high-and low-spatial-resolution satellites is thought a key factor for studies that require...
In this contribution we present a local evaluation procedure of Landsat-MODIS fusion methods for crop monitoring purposes. Two fusion methods are applied to obtain a two-year time series of Landsat-resolution images. The validation is applied at pixel level in order to analyze if the simulated images are capable of unmixing coarse-resolution pixels...
More and more data come with contextual information describing the circumstances of their acquisition. While the frequent pattern mining literature offers a lot of approaches to handle and extract interesting patterns in data, little effort has been dedicated to relevantly handling such contextual information during the mining process. In this pape...
Online health fora are increasingly visited by patients to get help and information related to their health. However, these fora are not limited to patients: a significant number of health professionals actively participate in many discussions. As experts their posted information are very important since, they are able to well explain the problems,...