Pablo CesarCentrum Wiskunde & Informatica | CWI
Pablo Cesar
About
362
Publications
50,057
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,834
Citations
Introduction
Skills and Expertise
Publications
Publications (362)
Emotions are a powerful phenomenon that influences the way we think and behave. Affective Computing is the field dedicated to the development of automatic emotion recognition systems. These systems can be used to improve the quality of life through the development of, for example, mental health applications, providing personalized services, or impr...
In the 21st century, stress has emerged as a major epidemic, impacting various sectors. The current methods to assess stress and related mental health issues are still mostly based on self-reporting questionnaires, which are time-consuming, prone to bias, do not allow for continuous monitoring, and are not scalable. This results in mental health is...
Point clouds denote a prominent solution for the representation of 3D photo-realistic content in immersive applications. Similarly to other imaging modalities, quality predictions for point cloud contents are vital for a wide range of applications, enabling trade-off optimizations between data quality and data size in every processing step from acq...
Cloud Virtual Reality (VR) gaming offloads computationally-intensive VR games to resourceful data centers. However, ensuring good Quality of Experience (QoE) in cloud VR gaming is inherently challenging as VR gamers demand high visual quality, short response time, and negligible cybersickness. In this article, we study the QoE of cloud VR gaming an...
Emotion Recognition systems are typically trained to classify a given psychophysiological state into emotion categories. Current platforms for emotion ground-truth collection show limitations for real-world scenarios of long-duration content (e.g., > 10m), namely: 1) Real-time annotation tools are distracting and become exhausting in a longer video...
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an...
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an...
Advances in Generative Artificial Intelligence (AI) are resulting in AI-generated media output that is (nearly) indistinguishable from human-created content. This can drastically impact users and the media sector, especially given global risks of misinformation. While the currently discussed European AI Act aims at addressing these risks through Ar...
Immersive technologies like eXtended Reality (XR) are the next step in videoconferencing. In this context, understanding the effect of delay on communication is crucial. This paper presents the first study on the impact of delay on collaborative tasks using a realistic Social XR system. Specifically, we design an experiment and evaluate the impact...
The latest social VR technologies have enabled users to attend traditional media and arts performances together while being geographically removed, making such experiences accessible despite budget, distance, and other restrictions. In this work, we aim at improving the way remote performances are shared by designing and evaluating a VR theatre lob...
The Internet of Multisensory, Multimedia and Musical Things (Io3MT) is a new concept that arises from the confluence of several areas of computer science, arts, and humanities, with the objective of grouping in a single place devices and data that explore the five human senses, besides multimedia aspects and music content. In the context of this br...
Affective computing has experienced substantial advancements in recognizing emotions through image and facial expression analysis. However, the incorporation of physiological data remains constrained. Emotion recognition with physiological data shows promising results in controlled experiments but lacks generalization to real-world settings. To add...
The Internet of Multisensory, Multimedia and Musical Things (Io3MT) is a new concept that arises from the confluence of several areas of computer science, arts, and humanities, with the objective of grouping in a single place devices and data that explore the five human senses, besides multimedia aspects and music content.
In this paper, we present...
The remix technique has been widely used in musical practice, mainly due to the figure of Disc Jockeys (DJs), which combines several pre-existing sounds to produce completely new content. However, this creation method also appears in other forms of artistic expressions, such as architecture, photography, fashion design, video games, etc.
Recent tec...
Measuring interoception (‘perceiving internal bodily states’) has diagnostic and wellbeing implications. Since heartbeats are distinct and frequent, various methods aim at measuring cardiac interoceptive accuracy (CIAcc). However, the role of exteroceptive modalities for representing heart rate (HR) across screen-based and Virtual Reality (VR) envi...
During group interactions, we react and modulate our emotions and behaviour to the group through phenomena including emotion contagion and physiological synchrony. Previous work on emotion recognition through video/image has shown that group context information improves the classification performance. However, when using physiological data, literat...
Virtual Reality telecommunication systems promise to overcome the limitations of current real-time teleconferencing solutions, by enabling a better sense of immersion and fostering more natural interpersonal interactions. Many solutions that currently enable immersive teleconferencing employ synthetic avatars to represent their users. However, phot...
In this position paper, we outline our research challenges in Affective Interactive Systems, and present recent work on visualizing avatar biosignals for social VR entertainment. We highlight considerations for how biosignals animations in social VR spaces can (falsely) indicate users' availability status.
The rise of capturing systems for objects and scenes in 3D with increased fidelity and immersion has led to the popularity of volumetric video contents that can be seen from any position and angle in 6 degrees of freedom navigation. Such contents need large volumes of data to accurately represent the real world. Thus, novel optimization solutions a...
Remote communication has rapidly become a part of everyday life in both professional and personal contexts. However, popular video conferencing applications present limitations in terms of quality of communication, immersion and social meaning. VR remote communication applications offer a greater sense of co-presence and mutual sensing of emotions...
Social VR enables people to interact over distance with others in real-time. It allows remote people, typically represented as avatars, to communicate and perform activities together in a shared virtual environment, extending the capabilities of traditional social platforms like Facebook and Netflix. This paper explores the benefits and drawbacks p...
Fuelled by the increase in popularity of virtual and augmented reality applications, point clouds have emerged as a popular 3D format for acquisition and rendering of digital humans, thanks to their versatility and real-time capabilities. Due to technological constraints and real-time rendering limitations, however, the visual quality of dynamic po...
Visualizing biosignals can be important for social Virtual Reality (VR), where avatar non-verbal cues are missing. While several biosignal representations exist, designing effective visualizations and understanding user perceptions within social VR entertainment remains unclear. We adopt a mixed-methods approach to design biosignals for social VR e...
Fuelled by the increase in popularity of virtual and augmented reality applications, point clouds have emerged as a popular 3D format for acquisition and rendering of digital humans, thanks to their versatility and real-time capabilities. Due to technological constraints and real-time rendering limitations, however, the visual quality of dynamic po...
Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are...
Fine-grained emotion recognition can model the temporal dynamics of emotions, which is more precise than predicting one emotion retrospectively for an activity (e.g., video clip watching). Previous works require large amounts of continuously annotated data to train an accurate recognition model, how- ever experiments to collect such large amounts o...
Narrative videos usually illustrate the main content through multiple narrative information such as audios, video frames and subtitles. Existing video summarization approaches rarely consider the multiple dimensional narrative inputs, or ignore the impact of shots artistic assembly when directly applied to narrative videos. This paper introduces a...
This work focuses on enabling user-centric immersive systems, in which every aspect of the coding-delivery-rendering chain is tailored to the interactive users. Understanding the actual interactivity and behaviour of those users is still an open challenge and a key step to enable such a user-centric system. Our main goal is to enable user behaviour...
With the increasing popularity of extended reality technology and the adoption of depth-enhanced visual data in information exchange and telecommunication systems, point clouds have emerged as a promising 3D imaging modality. Similarly to other types of content representations, visual quality predictors for point cloud data are vital for a wide ran...
Watching 360 videos using Virtual Reality (VR) head-mounted displays (HMDs) provides interactive and immersive experiences, where videos can evoke different emotions. Existing emotion self-report techniques within VR however are either retrospective or interrupt the immersive experience. To address this, we introduce the Continuous Physiological an...
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a divers...
Social Virtual Reality (VR) applications are becoming the next big revolution in the field of remote communication. Social VR provides the possibility for participants to explore and interact with a virtual environments and objects, feelings of a full sense of immersion, and being together. Understanding how user behaviour is influenced by the shar...
Thanks to recent advances in computer graphics, wearable technology, and connectivity, Virtual Reality (VR) has landed in our daily life. A key novelty in VR is the role of the user, which has turned from merely passive to entirely active. Thus, improving any aspect of the coding-delivery-rendering chain starts with the need for understanding user...
Recently an impressive development in immersive technologies, such as Augmented Reality (AR), Virtual Reality (VR) and 360 video, has been witnessed. However, methods for quality assessment have not been keeping up. This paper studies quality assessment of 360 video from the cross-lab tests (involving ten laboratories and more than 300 participants...
Inferring emotions from Head Movement (HM) and Eye Movement (EM) data in 360° Virtual Reality (VR) can enable a low-cost means of improving users’ Quality of Experience. Correlations have been shown between retrospective emotions and HM, as well as EM when tested with static 360° images. In this early work, we investigate the relationship between m...