
Michael AupetitQatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU) · Qatar Center for Artificial Intelligence (QCAI)
Michael Aupetit
Ph.D, Hab. (HDR Accreditation to Supervise Research)
About
135
Publications
25,675
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,464
Citations
Introduction
Michael Aupetit works at the Qatar Center for Artificial Intelligence of the Qatar Computing Research Institute. Michael does research in Data Mining, Visual Analytics, and Artificial Intelligence.
Additional affiliations
Education
July 2012
October 1998 - October 2001
September 1997 - September 1998
Publications
Publications (135)
Visual quality measures (VQMs) are designed to support analysts by automatically
detecting and quantifying patterns in visualizations. We propose a new VQM for visual grouping patterns in scatterplots, called ClustML, which is trained on previously collected human subject judgments. Our model encodes scatterplots in the parametric space of a Gaussi...
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be...
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be...
Data scientists often deal with multiclass multidimensional data. There is a need to support the exploration of such data with topological methods. We propose a new visualization metaphor for multiclass data and illustrate it with two complementary analytic approaches. We design ClassMat, a visualization matrix similar in spirit to the scatterplot...
We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. A standard scheme in external clustering validation is to use class labels as ground truth clusters, based on the assumption that each class forms a single, clearly separated cluster. However, as such cluster-label matching (CLM) assumption often bre...
Background
Several tools have been developed for health care professionals to monitor the physical activity of their patients, but most of these tools have been considering only the needs of users in North American and European countries and applicable for only specific analytic tasks. To our knowledge, no research study has utilized the participat...
In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data. The model is trained on the data and tested on unseen data with known labels to evaluate its quality. The results are visualized as a confusion matrix which shows how many data labels have been predicted correct...
Brushing is an everyday interaction in 2D scatterplots, which allows users to select and filter data points within a continuous, enclosed region and conduct further analysis on the points. However, such conventional brushing cannot be directly applied to Multidimensional Projections (MDP), as they hardly escape from False and Missing Neighbors dist...
Exploratory visual analysis of multidimensional labeled data is challenging. Multidimensional Projections for labeled data attempt to separate classes while preserving neighborhoods. In this work, we consider the case where instances are assigned multiple labels with probabilities or weights: for example, the output of a probabilistic classifier, f...
Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new data-driven technique called ClustRank that allows to rank scatterplots according to visible grouping patterns. Our model first encodes scatterplots in the parametric space of a Gaussian Mixture Mod...
Supplemental material of our survey paper.
Background: Robust and useful tools for exploratory analysis in biosciences are still lagging behind the size and complexity of the biological datasets produced since the completion of the human genome in 2000. A possible reason is that developers are unlikely to understand domain and case-specific requirements of existing research questions. Metho...
We propose "aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. Animated transitions are used to facilitate understanding of graphical transformations between different visualizations. Area is key information to preserve during filtering or ordering transitions of area-based charts lik...
Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distorti...
MA plots are used to analyze the genome-wide differences in gene expression between two distinct biological conditions. An MA plot is usually rendered as a static scatter plot. Our interview with 3 experts in genomics showed that we could improve the usability of this plot by adding interactive analytic features. In this work we present the design...
In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applic...
BACKGROUND
Several tools have been developed for health care professionals to monitor the physical activity of their patients, but most of these tools have been considering only the needs of users in North American and European countries and applicable for only specific analytic tasks. To our knowledge, no research study has utilized the participat...
We propose "Aquanims" as new design metaphors for animated transitions that preserve displayed areas during the transformation. As liquids are incompressible fluids, we use a hydraulic metaphor to convey the sense of area preservation during animated transitions. We study the design space of Aquanims for rectangle-based charts.
This file contains Rdata files of 1000 scatterplots with their (x,y) coordinates and 34 human judgments for each of them, telling if they display 1 (1) or more-than-1 (2) clusters. These data have been used to benchmark standard clustering techniques (Short paper IEEE VIS 2019) and to develop the ClustMe visual Quality measure (Full paper Eurovis 2...
We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our f...
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Editor’s notes:
This article studies a layered architecture for robotics design, proposes a contract-based interface between the layers, and shows how cross-layer adaptation between these layers can be done in response to different scenarios. –
Samarji...
Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of...
Motivation:
It is important to characterize individual relatedness in terms of familial relationships and underlying population structure in genome-wide association studies for correct downstream analysis. The characterization of individual relatedness becomes vital if the cohort is to be used as reference panel in other studies for association te...
The amount of generated and analyzed data is ever increasing, and processing such large data sets can take too long in situations where time-to-decision or fluid data exploration are critical. Progressive visual analytics (PVA) has recently emerged as a potential solution that allows users to analyze intermediary results during the computation with...
Accurately measuring sleep and its quality with polysomnography (PSG) is an expensive task. Actigraphy, an alternative, has been proven cheap and relatively accurate. However, the largest experiments conducted to date, have had only hundreds of participants. In this work, we processed the data of the recently published Multi-Ethnic Study of Atheros...
We propose ClustMe, a new visual quality measure to rank monochrome scatterplots based on cluster patterns. ClustMe is based on data collected from a human‐subjects study, in which 34 participants judged synthetically generated cluster patterns in 1000 scatterplots. We generated these patterns by carefully varying the free parameters of a simple Ga...
We present a highly effective unsupervised method for detecting the stance of Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our method has th...
In this chapter, we overview the current and future impact of pervasive computing in the health domain. In this context, we focus on some of the crucial aspects of data-driven applications. We present examples of recently proposed lifestyle applications and highlight the ethical issues with such applications. We discuss challenges and opportunities...
In this chapter, we overview the current and future impact of pervasive computing in the health domain. In this context, we focus on some of the crucial aspects of data-driven applications. We present examples of recently proposed lifestyle applications and highlight the ethical issues with such applications. We discuss challenges and opportunities...
We present two interactive data visualizations of fine-grained demographic information for New York City, US, and Doha, Qatar, obtained using Facebook's Marketing API. The visualizations make innovative use of treemaps to support a bi-modal data selection and visualization of both "where are people of type X" and "what type of people are in locatio...
Visual analysis of multidimensional data requires effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots where patterns reflect some notion of similarity in the data. Howev...
Obesity is one of the major health risk factors behind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical...
Accurate and up-to-date census data is vital for informed policy decisions ranging from healthcare to infrastructure planning. However, collecting such data takes considerable effort and cost, with, for example, United States performing its census every 10 years. Though different approaches exist, see e.g. https://unstats.un.org/UNSD/demographic/so...
People increasingly use social media such as Facebook and Twitter during disasters and emergencies. Research studies have demonstrated the usefulness of social media information for a number of humanitarian relief operations ranging from situational awareness to actionable information extraction. Moreover, the use of social media platforms during s...
As urban data keeps getting bigger, deep learning is coming to play a key role in providing big data predictive analytics solutions. We are interested in developing a new generation of deep learning based computational technologies that predict traffic congestion and crowd management. In this work, we are mainly interested in efficiently predicting...
We present the design study of an interactive tool that allows exploration
of data on awareness of several health conditions in the
Middle East. The underlying data is obtained via Facebook’s Marketing
API and includes rich demographic details. We propose a
health awareness score, a scale to visualize it and a treemap-based
interactive slicing and...
According to many existing studies, the data available on social media platforms such as Twitter at the onset of a crisis situation could be useful for disaster response and management. However, making sense of this huge data coming at high-rate is still a challenging task for crisis managers. In this work, we present an interactive social media mo...
Multidimensional scaling allows visualizing high-dimensional data as 2D maps with the premise that insights in 2D reveal valid information in high-dimensions. However, the resulting projections suffer from artifacts such as bad local neighborhood preservation and clusters tearing. Interactively coloring the projection according to the discrepancy b...
Obesity is one of the major health risk factors be- hind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physica...
We present an interactive tool that visualizes data on awareness of several health conditions in the Middle East. The underlying data is obtained via Facebook's Marketing API and includes rich demographic details. We discuss how this tool may be useful for planning more targeted public health campaigns and for monitoring campaign effectiveness.
Background
The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile technology, and wearable devices. Data from mobile phones, wearables and social media can facilitate a better understandi...
Background
The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed t...
Invited talk at IEEE PacificVAST 2016 conference.
Our goal is to accurately model human class separation judgements in color-coded scatterplots. Towards this goal, we propose a set of 2002 visual separation measures, by systematically combining 17 neighborhood graphs and 14 class purity functions, with different parameterizations. Using a Machine Learning framework, we evaluate these measures base...
Genome Wide Association Studies (GWAS) examine genetic variants in different individuals to detect variants associated to specific diseases. The 1000 Genomes project is such a collaborative research effort to sequence the genomes of at least 1000 participants of 26 different ethnicities, to establish a detailed summary of human genetic variation. T...
Proximity graphs have edges which depend on the position of the vertices in some metric space. This toolbox contains many of the known and less known proximity graphs: - K Nearest Neighbors; - K Nearest Center of Gravity; - Delaunay; - Gabriel; - Infinite Strip Band; - Relative Neighborhood; - Sphere of Influence; - Alpha Shape; - Epsilon-Ball; - L...
Habilitation for Research Supervision (HDR, highest French academic diploma)
Visual quality measures seek to algorithmically imitate human judgments of patterns such as class separability, correlation, or outliers. In this paper, we propose a novel data-driven framework for evaluating such measures. The basic idea is to take a large set of visually encoded data, such as scatterplots, with reliable human " ground truth " jud...
Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the dat...
A method and a system for evaluating the class of a test datum in an original metric space, each datum belonging to at least one class grouping a plurality of data, includes a step of graphical representation of the spatial organization of a set of learning data of the original space in a representation metric space, a conjoint membership level ind...
Brushing is a fundamental interaction for visual analytics. A brush is usually defined as a closed region of the screen used to select data items and to highlight them in the current view and other linked views. Scatterplots are also standard ways to visualize values for two variables of a set of multidimensional data. We propose a technique to bru...
Dimension Reduction techniques used to visualize multidimensional data provide a scatterplot spatialization of data similarities. A widespread way to evaluate the quality of such DR techniques is to use labeled data as a ground truth and to call the reader as a witness to qualify the visualization by looking at class-cluster correlations within the...
When dealing with complex problems, it is often the case that fuzzy systems must undergo an optimization process. During this process, the preservation of interpretability is a major concern. Here we present a new mathematical framework to analyze the notion of inter-pretability of a fuzzy partition, and a generic algorithm to preserve it. This app...
We propose a mathematical framework to analyze the interpretability of a fuzzy par-tition, and a generic algorithm to preserve it during the optimization of a fuzzy system. This approach is rather flexible and it al-lows the weakening of the usual constraints while helping to highly automatize the opti-mization process. The underlying tools come fr...
Photometric stereo is a technique of surface reconstruction using several object images made with a fixed camera position and varying illumination directions. Reconstructed surfaces can have complex reflecting properties which are unknown a priori and often simplified by Lambertian model (reflecting light uniformly in all directions). Such simplifi...
Dans cet article, nous proposons un mod�ele g�en�eratif
qui permet d'extraire les nombres de Betti d'un ensemble
de vari�et�es de RD �a partir d'un �echantillon. Ce
mod�ele est bas�e sur le Complexe Simplicial G�en�eratif,
un mod�ele de m�elange dont les composantes sont des
simplexes g�eom�etriques convolu�es �a une distribution
gaussienne multiva...
As dimensionality increases, analysts are faced with difficult problems to make sense of their data. In exploratory data analysis, multidimensional scaling projections can help analyst to discover patterns by identifying outliers and enabling visual clustering. However to exploit these projections, artifacts and interpretation issues must be overco...
Dimensionality reduction algorithms may be of great help as decision support, representing the information as a map which summarizes the data similarities. When data come with an assigned class label, such a map can be used to check the quality of the labeling detecting class outliers or data near decision boundary, or to evaluate the relevance of...
Résumé. L'analyse exploratoire de données multidimensionnelles est un pro-blème complexe. Nous proposons d'extraire certains invariants topologiques ap-pelés nombre de Betti, pour synthétiser la topologie de la structure sous-jacente aux données. Nous définissons un modèle génératif basé sur le complexe sim-plicial de Delaunay dont nous estimons le...
The purpose is to extract Betti numbers, a topological invariant, from a point-cloud dataset. The points are sampled from a manifold, and may be corrupted with an unknown noise. We are interested in the Betti numbers of the original manifold, but the sampling and corruption make them difficult to get. The Generative Simplicial Complex optimize a st...
New design of the ProxiViz visualization technique to overcome Dimension Reduction techniques artefacts.
Depuis au moins les premières pierres taillées de l'ère Paléolithique,
les hommes n'ont cessé de créer des artefacts, moyens d'agir sur leur environnement
et moyens de l'observer au-delà de leurs capacités propres.
Ils ont développé ces outils pour les assister dans leur quête viscérale de
compréhension (sciences) et de maîtrise (techniques) de ce...