Michael Aupetit

Michael Aupetit
Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU) · Qatar Center for Artificial Intelligence (QCAI)

Ph.D, Hab. (HDR Accreditation to Supervise Research)

About

135
Publications
25,675
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,464
Citations
Introduction
Michael Aupetit works at the Qatar Center for Artificial Intelligence of the Qatar Computing Research Institute. Michael does research in Data Mining, Visual Analytics, and Artificial Intelligence.
Additional affiliations
August 2014 - May 2020
Qatar Computing Research Institute
Position
  • Senior Researcher
Description
  • Analysis and visualization of genomics data and cyber security data. Research and design of new visualization and machine learning techniques.
January 2010 - July 2014
Atomic Energy and Alternative Energies Commission
Position
  • Big Data analytics: spatio-temporal telecom events data processing and analysis
Description
  • Global Earth data indexing and distributed processing for anomaly detection, pattern mining and localization prediction.
January 2009 - December 2009
Atomic Energy and Alternative Energies Commission
Position
  • Human Machine Interface and Machine Learning
Description
  • Models for decision support of custom officer in controlling containers content for illicit or dangerous materials
Education
July 2012
Université Paris-Sud 11
Field of study
  • Computer Science
October 1998 - October 2001
Grenoble Institute of Technology
Field of study
  • Computer Science
September 1997 - September 1998
French National Centre for Scientific Research
Field of study
  • Robotics and Micro-Electronics (SyAM)

Publications

Publications (135)
Preprint
Full-text available
Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new VQM for visual grouping patterns in scatterplots, called ClustML, which is trained on previously collected human subject judgments. Our model encodes scatterplots in the parametric space of a Gaussi...
Article
Full-text available
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be...
Preprint
Full-text available
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be...
Conference Paper
Full-text available
Data scientists often deal with multiclass multidimensional data. There is a need to support the exploration of such data with topological methods. We propose a new visualization metaphor for multiclass data and illustrate it with two complementary analytic approaches. We design ClassMat, a visualization matrix similar in spirit to the scatterplot...
Preprint
Full-text available
We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. A standard scheme in external clustering validation is to use class labels as ground truth clusters, based on the assumption that each class forms a single, clearly separated cluster. However, as such cluster-label matching (CLM) assumption often bre...
Article
Full-text available
Background Several tools have been developed for health care professionals to monitor the physical activity of their patients, but most of these tools have been considering only the needs of users in North American and European countries and applicable for only specific analytic tasks. To our knowledge, no research study has utilized the participat...
Preprint
Full-text available
In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data. The model is trained on the data and tested on unseen data with known labels to evaluate its quality. The results are visualized as a confusion matrix which shows how many data labels have been predicted correct...
Preprint
Full-text available
Brushing is an everyday interaction in 2D scatterplots, which allows users to select and filter data points within a continuous, enclosed region and conduct further analysis on the points. However, such conventional brushing cannot be directly applied to Multidimensional Projections (MDP), as they hardly escape from False and Missing Neighbors dist...
Conference Paper
Full-text available
Exploratory visual analysis of multidimensional labeled data is challenging. Multidimensional Projections for labeled data attempt to separate classes while preserving neighborhoods. In this work, we consider the case where instances are assigned multiple labels with probabilities or weights: for example, the output of a probabilistic classifier, f...
Preprint
Full-text available
Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new data-driven technique called ClustRank that allows to rank scatterplots according to visible grouping patterns. Our model first encodes scatterplots in the parametric space of a Gaussian Mixture Mod...
Article
Full-text available
Background: Robust and useful tools for exploratory analysis in biosciences are still lagging behind the size and complexity of the biological datasets produced since the completion of the human genome in 2000. A possible reason is that developers are unlikely to understand domain and case-specific requirements of existing research questions. Metho...
Preprint
Full-text available
We propose "aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. Animated transitions are used to facilitate understanding of graphical transformations between different visualizations. Area is key information to preserve during filtering or ordering transitions of area-based charts lik...
Conference Paper
Full-text available
Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distorti...
Preprint
Full-text available
MA plots are used to analyze the genome-wide differences in gene expression between two distinct biological conditions. An MA plot is usually rendered as a static scatter plot. Our interview with 3 experts in genomics showed that we could improve the usability of this plot by adding interactive analytic features. In this work we present the design...
Article
Full-text available
In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applic...
Preprint
BACKGROUND Several tools have been developed for health care professionals to monitor the physical activity of their patients, but most of these tools have been considering only the needs of users in North American and European countries and applicable for only specific analytic tasks. To our knowledge, no research study has utilized the participat...
Preprint
Full-text available
We propose "Aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. As liquids are incompressible fluids, we use a hydraulic metaphor to convey the sense of area preservation during animated transitions. We study the design space of Aquanims for rectangle-based charts.
Data
This file contains Rdata files of 1000 scatterplots with their (x,y) coordinates and 34 human judgments for each of them, telling if they display 1 (1) or more-than-1 (2) clusters. These data have been used to benchmark standard clustering techniques (Short paper IEEE VIS 2019) and to develop the ClustMe visual Quality measure (Full paper Eurovis 2...
Article
We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our f...
Article
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Editor’s notes: This article studies a layered architecture for robotics design, proposes a contract-based interface between the layers, and shows how cross-layer adaptation between these layers can be done in response to different scenarios. – Samarji...
Conference Paper
Full-text available
Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of...
Article
Motivation: It is important to characterize individual relatedness in terms of familial relationships and underlying population structure in genome-wide association studies for correct downstream analysis. The characterization of individual relatedness becomes vital if the cohort is to be used as reference panel in other studies for association te...
Conference Paper
Full-text available
The amount of generated and analyzed data is ever increasing, and processing such large data sets can take too long in situations where time-to-decision or fluid data exploration are critical. Progressive visual analytics (PVA) has recently emerged as a potential solution that allows users to analyze intermediary results during the computation with...
Article
Full-text available
Accurately measuring sleep and its quality with polysomnography (PSG) is an expensive task. Actigraphy, an alternative, has been proven cheap and relatively accurate. However, the largest experiments conducted to date, have had only hundreds of participants. In this work, we processed the data of the recently published Multi-Ethnic Study of Atheros...
Article
Full-text available
We propose ClustMe, a new visual quality measure to rank monochrome scatterplots based on cluster patterns. ClustMe is based on data collected from a human‐subjects study, in which 34 participants judged synthetically generated cluster patterns in 1000 scatterplots. We generated these patterns by carefully varying the free parameters of a simple Ga...
Preprint
Full-text available
We present a highly effective unsupervised method for detecting the stance of Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our method has th...
Chapter
In this chapter, we overview the current and future impact of pervasive computing in the health domain. In this context, we focus on some of the crucial aspects of data-driven applications. We present examples of recently proposed lifestyle applications and highlight the ethical issues with such applications. We discuss challenges and opportunities...
Chapter
In this chapter, we overview the current and future impact of pervasive computing in the health domain. In this context, we focus on some of the crucial aspects of data-driven applications. We present examples of recently proposed lifestyle applications and highlight the ethical issues with such applications. We discuss challenges and opportunities...
Conference Paper
Full-text available
We present two interactive data visualizations of fine-grained demographic information for New York City, US, and Doha, Qatar, obtained using Facebook's Marketing API. The visualizations make innovative use of treemaps to support a bi-modal data selection and visualization of both "where are people of type X" and "what type of people are in locatio...
Article
Full-text available
Visual analysis of multidimensional data requires effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots where patterns reflect some notion of similarity in the data. Howev...
Conference Paper
Obesity is one of the major health risk factors behind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical...
Conference Paper
Accurate and up-to-date census data is vital for informed policy decisions ranging from healthcare to infrastructure planning. However, collecting such data takes considerable effort and cost, with, for example, United States performing its census every 10 years. Though different approaches exist, see e.g. https://unstats.un.org/UNSD/demographic/so...
Conference Paper
Full-text available
People increasingly use social media such as Facebook and Twitter during disasters and emergencies. Research studies have demonstrated the usefulness of social media information for a number of humanitarian relief operations ranging from situational awareness to actionable information extraction. Moreover, the use of social media platforms during s...
Conference Paper
As urban data keeps getting bigger, deep learning is coming to play a key role in providing big data predictive analytics solutions. We are interested in developing a new generation of deep learning based computational technologies that predict traffic congestion and crowd management. In this work, we are mainly interested in efficiently predicting...
Technical Report
Full-text available
We present the design study of an interactive tool that allows exploration of data on awareness of several health conditions in the Middle East. The underlying data is obtained via Facebook’s Marketing API and includes rich demographic details. We propose a health awareness score, a scale to visualize it and a treemap-based interactive slicing and...
Conference Paper
Full-text available
According to many existing studies, the data available on social media platforms such as Twitter at the onset of a crisis situation could be useful for disaster response and management. However, making sense of this huge data coming at high-rate is still a challenging task for crisis managers. In this work, we present an interactive social media mo...
Article
Full-text available
Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy b...
Article
Full-text available
Obesity is one of the major health risk factors be- hind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physica...
Article
We present an interactive tool that visualizes data on awareness of several health conditions in the Middle East. The underlying data is obtained via Facebook's Marketing API and includes rich demographic details. We discuss how this tool may be useful for planning more targeted public health campaigns and for monitoring campaign effectiveness.
Article
Full-text available
Background The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile technology, and wearable devices. Data from mobile phones, wearables and social media can facilitate a better understandi...
Article
Full-text available
Background The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed t...
Presentation
Full-text available
Invited talk at IEEE PacificVAST 2016 conference.
Conference Paper
Full-text available
Our goal is to accurately model human class separation judgements in color-coded scatterplots. Towards this goal, we propose a set of 2002 visual separation measures, by systematically combining 17 neighborhood graphs and 14 class purity functions, with different parameterizations. Using a Machine Learning framework, we evaluate these measures base...
Conference Paper
Genome Wide Association Studies (GWAS) examine genetic variants in different individuals to detect variants associated to specific diseases. The 1000 Genomes project is such a collaborative research effort to sequence the genomes of at least 1000 participants of 26 different ethnicities, to establish a detailed summary of human genetic variation. T...
Code
Full-text available
Proximity graphs have edges which depend on the position of the vertices in some metric space. This toolbox contains many of the known and less known proximity graphs: - K Nearest Neighbors; - K Nearest Center of Gravity; - Delaunay; - Gabriel; - Infinite Strip Band; - Relative Neighborhood; - Sphere of Influence; - Alpha Shape; - Epsilon-Ball; - L...
Research
Full-text available
Habilitation for Research Supervision (HDR, highest French academic diploma)
Article
Visual quality measures seek to algorithmically imitate human judgments of patterns such as class separability, correlation, or outliers. In this paper, we propose a novel data-driven framework for evaluating such measures. The basic idea is to take a large set of visually encoded data, such as scatterplots, with reliable human " ground truth " jud...
Article
Full-text available
Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the dat...
Patent
A method and a system for evaluating the class of a test datum in an original metric space, each datum belonging to at least one class grouping a plurality of data, includes a step of graphical representation of the spatial organization of a set of learning data of the original space in a representation metric space, a conjoint membership level ind...
Article
Full-text available
Brushing is a fundamental interaction for visual analytics. A brush is usually defined as a closed region of the screen used to select data items and to highlight them in the current view and other linked views. Scatterplots are also standard ways to visualize values for two variables of a set of multidimensional data. We propose a technique to bru...
Conference Paper
Full-text available
Dimension Reduction techniques used to visualize multidimensional data provide a scatterplot spatialization of data similarities. A widespread way to evaluate the quality of such DR techniques is to use labeled data as a ground truth and to call the reader as a witness to qualify the visualization by looking at class-cluster correlations within the...
Conference Paper
Full-text available
When dealing with complex problems, it is often the case that fuzzy systems must undergo an optimization process. During this process, the preservation of interpretability is a major concern. Here we present a new mathematical framework to analyze the notion of inter-pretability of a fuzzy partition, and a generic algorithm to preserve it. This app...
Conference Paper
Full-text available
We propose a mathematical framework to analyze the interpretability of a fuzzy par-tition, and a generic algorithm to preserve it during the optimization of a fuzzy system. This approach is rather flexible and it al-lows the weakening of the usual constraints while helping to highly automatize the opti-mization process. The underlying tools come fr...
Conference Paper
Full-text available
Photometric stereo is a technique of surface reconstruction using several object images made with a fixed camera position and varying illumination directions. Reconstructed surfaces can have complex reflecting properties which are unknown a priori and often simplified by Lambertian model (reflecting light uniformly in all directions). Such simplifi...
Conference Paper
Full-text available
Dans cet article, nous proposons un mod�ele g�en�eratif qui permet d'extraire les nombres de Betti d'un ensemble de vari�et�es de RD �a partir d'un �echantillon. Ce mod�ele est bas�e sur le Complexe Simplicial G�en�eratif, un mod�ele de m�elange dont les composantes sont des simplexes g�eom�etriques convolu�es �a une distribution gaussienne multiva...
Conference Paper
Full-text available
As dimensionality increases, analysts are faced with difficult problems to make sense of their data. In exploratory data analysis, multidimensional scaling projections can help analyst to discover patterns by identifying outliers and enabling visual clustering. However to exploit these projections, artifacts and interpretation issues must be overco...
Conference Paper
Full-text available
Dimensionality reduction algorithms may be of great help as decision support, representing the information as a map which summarizes the data similarities. When data come with an assigned class label, such a map can be used to check the quality of the labeling detecting class outliers or data near decision boundary, or to evaluate the relevance of...
Conference Paper
Full-text available
Résumé. L'analyse exploratoire de données multidimensionnelles est un pro-blème complexe. Nous proposons d'extraire certains invariants topologiques ap-pelés nombre de Betti, pour synthétiser la topologie de la structure sous-jacente aux données. Nous définissons un modèle génératif basé sur le complexe sim-plicial de Delaunay dont nous estimons le...
Conference Paper
Full-text available
The purpose is to extract Betti numbers, a topological invariant, from a point-cloud dataset. The points are sampled from a manifold, and may be corrupted with an unknown noise. We are interested in the Betti numbers of the original manifold, but the sampling and corruption make them difficult to get. The Generative Simplicial Complex optimize a st...
Conference Paper
Full-text available
New design of the ProxiViz visualization technique to overcome Dimension Reduction techniques artefacts.
Article
Full-text available
Depuis au moins les premières pierres taillées de l'ère Paléolithique, les hommes n'ont cessé de créer des artefacts, moyens d'agir sur leur environnement et moyens de l'observer au-delà de leurs capacités propres. Ils ont développé ces outils pour les assister dans leur quête viscérale de compréhension (sciences) et de maîtrise (techniques) de ce...
Conference Paper
Full-text available