Dominique Vaufreydaz

Dominique Vaufreydaz
Laboratoire d'Informatique de Grenoble | LIG · Pervasive Interaction

Ph.D in Computer Sciences, Associate Professor (Maître de Conférences - HDR) in Computer Sciences

About

89
Publications
15,425
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
746
Citations
Citations since 2017
36 Research Items
359 Citations
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
Introduction
I am Associate Professor (Maître de Conférences - HDR) in Computer Sciences at Univ. Grenoble Alpes, leading the M-PSI team from LIG laboratory. My current research interests are about multimodal perception and behavior analysis, mainly of humans, in the context of smart spaces/ubiquitous computing, healthcare and assistive technologies and/or affective computing. These researches could be applied to sociable robot companion, autonomous car, smart home or any human/agent interaction.
Additional affiliations
January 2016 - present
Université Grenoble Alpes
Position
  • Maître de Conférences/Associate Professor
January 2007 - present
Laboratoire d'Informatique de Grenoble
Position
  • Professor (Associate)

Publications

Publications (89)
Preprint
The transition angles are defined to describe the vowel-to-vowel transitions in the acoustic space of the Spectral Subband Centroids, and the findings show that they are similar among speakers and speaking rates. In this paper, we propose to investigate the usage of polar coordinates in favor of angles to describe a speech signal by characterizing...
Preprint
We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space.Deviating from talking head generation conditioned on audio that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that allows obtaining rich head motion sequences while avoiding known caveats...
Preprint
Full-text available
In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a...
Article
Full-text available
Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-seque...
Conference Paper
Full-text available
Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in...
Preprint
Full-text available
Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in...
Article
Full-text available
Les récentes avancées en traitement et analyse du signal ont permis de créer de nouvelles manières d'instrumenter l'observation et l'analyse des événements scolaires, et donc de recueillir de nouveaux types de preuves des pratiques d'enseignement ou d'apprentissage. Cet article en recense certains en se fondant sur le cadre d'analyse de l'apprentis...
Preprint
Urban autonomous driving in the presence of pedestrians as vulnerable road users is still a challenging and less examined research problem. This work formulates navigation in urban environments as a multi objective reinforcement learning problem. A deep learning variant of thresholded lexicographic Q-learning is presented for autonomous navigation...
Article
Full-text available
Le machine learning a aujourd'hui fait preuve de son efficacité : on peut produire, à partir d'une grande masse d'informations, des Intelligences Artificielles capables de répondre à de nombreux besoins, comme le montrent les progrès en vision par ordinateur ou en traduction automatique ces dernières années. Pour autant, cette technique a des limit...
Conference Paper
Full-text available
Urban autonomous driving in the presence of pedestrians as vulnerable road users is still a challenging and less examined research problem. This work formulates navigation in urban environments as a multi objective reinforcement learning problem. A deep learning variant of thresholded lexicographic Q-learning is presented for autonomous navigation...
Preprint
Full-text available
Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model...
Preprint
Decision making for autonomous driving in urban environments is challenging due to the complexity of the road structure and the uncertainty in the behavior of diverse road users. Traditional methods consist of manually designed rules as the driving policy, which require expert domain knowledge, are difficult to generalize and might give sub-optimal...
Conference Paper
Full-text available
This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremend...
Preprint
This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown treme...
Article
Full-text available
La recherche en ingénierie éducative (instructional design) a jusqu’à présent été riche en théories et applications d’une grande puissance prescriptive et centrées principalement sur l’enseignant. En revanche elle paraît manquer encore de travaux rendant compte de l’activité de l’enseignant et de l’apprenant en contexte, donc avec une dimension des...
Article
Full-text available
For the localization of multiple users, Bluetooth data from the smartphone is able to complement Wi-Fi-based methods with additional information, by providing an approximation of the relative distances between users. In practice, both positions provided by Wi-Fi data and relative distance provided by Bluetooth data are subject to a certain degree o...
Article
Full-text available
Should Big Teacher be watching you? The Teaching Lab project at Grenoble Alpes University proposes recommendations for designing smart classrooms with ethical considerations taken into account.
Conference Paper
Full-text available
Progresses on autonomous vehicles suggest a future where they will share urban environments with fragile road users such as pedestrians, cyclists and two wheelers. In this article, we focus on the visual external Human-Machine Interface (eHMI) of autonomous vehicles used while interacting with pedestrians, and more particularly on its placement to...
Conference Paper
Full-text available
This article reports on an investigation of the use of convolutional neural networks to predict the visual attention of chess players. The visual attention model described in this article has been created to generate saliency maps that capture hierarchical and spatial features of chessboard, in order to predict the probability fixation for individu...
Preprint
Full-text available
This article reports on an investigation of the use of convolutional neural networks to predict the visual attention of chess players. The visual attention model described in this article has been created to generate saliency maps that capture hierarchical and spatial features of chessboard, in order to predict the probability fixation for individu...
Conference Paper
Full-text available
Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs)...
Preprint
Full-text available
In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging prob...
Conference Paper
Full-text available
In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging prob...
Conference Paper
Full-text available
In a multiuser context, the Bluetooth data from the smartphone could give an approximation of the distance between users. Meanwhile, the Wi-Fi data can be used to calculate the user's position directly. However, both the Wi-Fi-based position outputs and Bluetooth-based distances are affected by some degree of noise. In our work, we propose several...
Preprint
Full-text available
Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs)...
Preprint
In a multiuser context, the Bluetooth data from the smartphone could give an approximation of the distance between users. Meanwhile, the Wi-Fi data can be used to calculate the user's position directly. However, both the Wi-Fi-based position outputs and Bluetooth-based distances are affected by some degree of noise. In our work, we propose several...
Conference Paper
Full-text available
This article deals with the specific context of an autonomous car navigating in an urban center within a shared space between pedestrians and cars. The driver delegates the control to the autonomous system while remaining seated in the driver's seat. The proposed study aims at giving a first insight into the definition of human perception of space...
Preprint
Full-text available
This article deals with the specific context of an autonomous car navigating in an urban center within a shared space between pedestrians and cars. The driver delegates the control to the autonomous system while remaining seated in the driver's seat. The proposed study aims at giving a first insight into the definition of human perception of space...
Article
Full-text available
In this paper we present the first results of a pilot experiment in the interpretation of multimodal observations of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjec...
Conference Paper
Full-text available
Social robots are transitioning from lab experiments to commercial products, creating new needs for proto-typing and design tools. In this paper, we present a framework to facilitate the prototyping of expressive animated robots. For this, we start by reviewing the design of existing social robots in order to define a set of basic components of soc...
Conference Paper
Full-text available
This paper presents Figurines, an offline framework for narrative creation with tangible objects, designed to record storytelling sessions with children, teenagers or adults. This framework uses tangible diegetic objects to record a free narrative from up to two storytellers and construct a fully annotated representation of the story. This represen...
Article
Full-text available
In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of...
Conference Paper
Full-text available
This paper proposes to model pedestrian behaviour in urban scenes by combining the principles of urban planning and the sociological concept of Natural Vision. This model assumes that the environment perceived by pedestrians is composed of multiple potential fields that influence their behaviour. These fields are derived from static scene elements...
Article
Full-text available
This paper presents the analysis and discussion of the off-site localization competition track, which took place during the Seventh International Conference on Indoor Positioning and Indoor Navigation (IPIN 2016). Five international teams proposed different strategies for smartphone-based indoor positioning using the same reference data. The compet...
Conference Paper
Full-text available
Pretend play is a storytelling technique, naturally used from very young ages, which relies on objectsubstitution to represent the characters of the imagined story. We propose a system which assists thestoryteller by generating a virtualized story from a recorded dialogue performed with 3D printed figurines.We capture the gestures and facial expres...
Conference Paper
Full-text available
This paper introduces our work in the framework of Track 3 of the IPIN 2016 Indoor Localization Competition, which addresses the smartphone-based tracking problem in an offline manner. Our approach splits the path-reconstruction into several smaller tasks, including building identification, floor identification, user direction and speed inference....
Conference Paper
Full-text available
In the near future, robots will support human to perform tasks in many domains (industrial, domestic, educational and health tasks). Such robot behaviors need to take into account the social interaction between robot and human. In this context, we focus on the expressiveness of a moving head for an assistive robot for the elderly. We designed a new...
Conference Paper
Full-text available
In the present paper we describe a bio-inspired non von Neumann controller for a simple sensorimotor robotic system. This controller uses a bitwise version of the Gibbs sampling algorithm to select commands so the robot can adapt its course of action and avoid perceived obstacles in the environment. The VHDL specification of the circuit implementat...
Conference Paper
Full-text available
Movements are an important part of robot design and we need dedicated tools to design them. As previous research has shown, 3D animation techniques are of great use to animate a robot. However, most robots don't benefit from an animation tool and therefore from animation artists knowledge. We present an open-source robot animation software to addre...
Article
Full-text available
New technologies and especially robotics is going towards more natural user interfaces. Works have been done in different modality of interaction such as sight (visual computing), and audio (speech and audio recognition) but some other modalities are still less researched. The touch modality is one of the less studied in HRI but could be valuable f...
Conference Paper
Full-text available
New technologies and especially robotics is going towards more natural user interfaces. Works have been done in different modality of interaction such as sight (visual computing), and audio (speech and audio recognition) but some other modalities are still less researched. The touch modality is one of the less studied in HRI but could be valuable f...
Article
Full-text available
Recognition of intentions is a subconscious cognitive process vital to human communication. This skill enables anticipation and increases the quality of interactions between humans. Within the context of engagement, non-verbal signals are used to communicate the intention of starting the interaction with a partner. In this paper, we investigated me...
Article
Full-text available
Since the commercialization of low cost RGB-D sensors, like the Kinect, more and more indoor robots have been equipped with this kind of sensors to perform tasks as people tracking or gesture recognition. Nevertheless, as far as we know from the literature, studies do not consider the limits of the sensors in term of motion speed, position of the s...
Article
Full-text available
Med-e-Tel: The International Ehealth, Telemedicine And Health Ict Forum For Education, Networking And Business
Conference Paper
Full-text available
Recognition of intentions is an unconscious cognitive process vital to human communication. This skill enables anticipation and increases interactive exchanges quality between humans. Within the context of engagement, i.e. intention for interaction, non-verbal signals are used to communicate this intention to the partner. In this paper, we investig...
Article
Full-text available
OMiSCID 2.0 is a lightweight middleware for ubiquitous computing and ambient intelligence. Its main objective is to bring Service Oriented Architectures to all developers. After reviewing related works, we demonstrate how OMiSCID 2.0, compared to other available solutions, integrates easily in classical workflows without adding any constraints on t...
Conference Paper
Full-text available
—This paper presents a case study of the usage of OMiSCID 2.0, the new version of a lightweight middleware for ubiquitous computing and ambient intelligence. The objective of this middleware is to bring Service Oriented Architectures to all developers. After comparing to available solutions, we show how it integrates in classical workflow without ad...
Article
Full-text available
Cet article présente OMiSCID et ses dernières évolutions vers la version 2.0. OMiSCID est un intergiciel facilitant le développement et le déploiement d'application réparties et notamment des applications ubiquitaires dans les environnements intelligents. OMiSCID est entièrement gratuit, libre et opensource avec une licence non collante de type MIT...
Conference Paper
Full-text available
This paper presents a usable developer-oriented functionality composition language (UFCL) designed for ubiquitous systems developers. Easy to write, this language is used to semantically describe functionalities implemented by services in a service oriented architecture where each service exposes its own description. Service factories can also be d...
Article
This article deals with the problem of implementing a context model for a smart environ- ment. This problem has already been addressed several times using many dierent data or problem driven methods. In order to separate modeling phase from implementation, we first represent the context model by a network of situations. Then, dierent implementation...
Article
Full-text available
Building applications composing perceptive services in a pervasive environment can lead to an inextricable problem: they were built by several people, using different programming languages and multiple conventions and protocols. Moreover, services can be volatile, so appear or disappear during running time of the application. This paper proposes th...
Conference Paper
Full-text available
Cet article présente un intergiciel multi-plateforme (Windows, Linux, MacOSX) et multi-langage (C++, Java) pour l'informatique ubiquitaire. Cet intergiciel permet de faire abstraction de la communication réseau. Il permet l'inspection de services distants et s'appuie sur DNS-SD1 pour permettre leur découverte. La version Java est disponible sous de...
Article
Full-text available
This paper introduces a new lightweight middleware for pervasive environments. This middleware abstracts network communications and provides service introspection and discovery using DNS-SD (DNS-based Service Discovery [1]). Services can declare simplex or duplex communication channels and variables. The middleware supports the low-latency, highban...
Conference Paper
Full-text available
This paper addresses the problem of segmenting small group meetings in order to detect different group configurations and activities in an intelligent environment. Our approach takes speech activity detection of individuals attending a meeting as input. The goal is to separate distinct distributions of speech activity observation corresponding to d...
Conference Paper
Full-text available
In this paper, we address the problem of speech activity detection in multimodal perceptive environments. Such space may contain many different microphones (lapel, distant or table top). Thus, we need a generic speech activity detector in order to cope with different speech conditions (from closetalking to noisy distant speech). Moreover, as the nu...
Conference Paper
Full-text available
This paper addresses the problem of implementing an abstract context model. First, the abstract context model is represented by a network of situations. Two different implementations for the situation model are then proposed: a deterministic one based on Petri nets and a probabilistic one based on hidden Markov models. Both implementations are illu...
Conference Paper
Full-text available
This paper describes the “FAME” multi-modal demonstrator, which integrates multiple communication modes – vision, speech and object manipulation – by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving. The major challenges are automatic perception of human actions and u...
Conference Paper
Full-text available
The construction of a speech recognition system requires a recorded set of phrases to compute the pertinent acoustic models. This set of phrases must be phonetically rich and balanced in order to obtain a robust recognizer. By tradition, this set is defined manually implicating a great human effort. In this paper we propose an automated method for...
Article
Full-text available
Depuis toujours, le document papier est notre support privilégié dés lors qu’il nous est nécessaire de conserver le témoignage d’un accord entre plusieurs parties. Traditionnellement, et à défaut de pouvoir en protéger l’intégrité, l’usage de sceaux ou de signatures, permet de garantir l’authenticité de tels documents. Avec l’utilisation croissante...
Article
Full-text available
En este artículo presentamos una metodología para la elaboración de un corpus balanceado fonéticamente para el español mexicano. Este corpus será utilizado para el entrenamiento y evaluación de modelos acústi-cos indispensables en el proceso de reconocimiento del habla. En la primera parte de este artículo se expli-ca la motivación de este trabajo....
Conference Paper
Full-text available
The language model is an important component of any speech recogni- tion system. In this paper, we present a lexical enrichment methodology of corpora focused o n the construction of statistical language models. This methodology co n- siders, on one hand, the identification of the set of poor represented words of a given training corpus, and on the...
Conference Paper
The language model is an important component of any speech recogni- tion system. In this paper, we present a lexical enrichment methodology of corpora focused o n the construction of statistical language models. This methodology co n- siders, on one hand, the identification of the set of poor represented words of a given training corpus, and on the...
Article
Full-text available
En este artículo se presenta un estudio para evaluar la riqueza léxica de un corpus específicamente recolectado para el entrenamiento de modelos de lenguaje estadísticos. Para ello se presenta un estudio comparativo entre un corpus oral –el corpus DIME– y un corpus recolectado de la Web para la construcción de modelos de lenguaje –el corpus WebDIME...
Article
Full-text available
Web is a rich and diversified source of information. In this article, we propose to benefit from this richness to collect and analyze documents, with the aim of a relational indexation based on noun phrases. Proposed data processing chain includes a spider collecting data to build textual corpora, and a linguistic module analyzing text to extract i...
Article
Full-text available
Web is a rich and diversified source of information. In this article, we propose to benefit from this richness to collect and analyze documents, with the aim of a relational indexation based on noun phrases. Proposed data processing chain includes a spider collecting data to build textual corpora, and a linguistic module analyzing text to extract i...
Article
Full-text available
Web is a rich and diversified source of information. In this article, we propose to benefit from this richness to collect and analyze documents, with the aim of a relational indexation based on noun phrases. Proposed data processing chain includes a spider collecting data to build textual corpora, and a linguistic module analyzing text to extract i...
Article
Full-text available
Performance and usability of realworld speech-to-speech translation systems, like the one developed within the NESPOLE! project, are affected by several aspects that go beyong the pure translation quality provided by the underlying components of the system. In this paper we describe these aspects as perspectives along wich we have evaluated the NES...
Article
Full-text available
Over the past year, we have developed a fully functional showcase of the NESPOLE! system within the domain of travel and tourism, and have significantly improved system performance and usability based on a series of studies and evaluations wih real users. Our experience has shown that improving translation quality is only one of the several importa...
Article
Full-text available
The main goal of NESPOLE! is to advance the state-of-the-art of speech-to-speech translation in realistic scenarios and involving naive users. The first showcase presented in this demonstration involves an English, French, or German speaking client enquiring about winter-sports possibilities in the Trentino region of the Italian Alps via a NetMeeti...
Article
Full-text available
In statistical language modelling researches, there is a lack of huge text corpora, especially for spoken language modelling. This thesis deals with using Internet documents in order to train such statistical models. After gathering corpora, we highlighted several interesting properties like the huge quantity of text, the number of different French...
Article
Full-text available
Les ressources textuelles sont celles qui font le plus défaut dans les recherches sur la modélisation statistique du langage, surtout pour l'apprentissage de modèles adaptés au dialogue. Cette thèse propose d'utiliser les documents en provenance d'Internet pour l'apprentissage de tels modèles. La collecte de plusieurs corpus a permis la mise en évi...