Fig 5 - uploaded by John Apostolopoulos
Content may be subject to copyright.
Source publication
Communication has seen enormous advances over the past 100 years including radio, television, mobile phones, video conferencing, and Internet-based voice and video calling. Still, remote communication remains less natural and more fatiguing than face-to-face. The vision of immersive communication is to enable natural experiences and interactions wi...
Context in source publication
Context 1
... has recently been standardized [37], and ongoing efforts are incorporating depth maps in the compression. Distributed coding principles have also been applied for both independent coding of the video from multiple cameras with joint decoding, and for creating low- complexity encoders [40]. In the future, the visual information may be expressed as a collection of 2-D layers or 3-D objects, lighting information, and a scene description of how to compose them to render the scene [37], [42]. Similarly, the auditory information may be expressed as a collection of audio sources (e.g., individual people) and environmental effects such as reverberation, and a scene description of how to compose them to render the auditory scene [43]. Object-based video and audio coding was developed within the MPEG-4 standard in the late 1990s, however it was not successful because of the difficulty of decomposing a scene into objects, and the high computational requirements. For example, segmenting video into meaningful objects is a very challenging problem. Fortu- nately, the advent of 3-D-depth cameras provides a major step forward. Similarly, algorithms for determining the number of speakers in a room and separating their voices are improving. Since immersive communication involves both video and audio, multimodal processing can help, where, for example, face detection and tracking is applied to help estimate the number of speakers and their bearing relative to a microphone array, and thereby the visual information can guide the audio processing. These object- based systems would analyze the captured signals, decom- pose them into meaningful objects, and appropriately compress and deliver each of them. The separate coding and transport of the different objects greatly facilitates object-based processing, such as the addition or removal of objects, or the placement of objects within a virtual visual or auditory environment. Another trend is toward model- based image/video synthesis, where textures such as hair or grass are created which are conceptually faithful, though not pixel-wise accurate. Future collaborative meetings may also have some participants photo-realistically rendered, and others expressed as avatars because of limited bandwidth, lack of available camera, or privacy preferences. The compression technique selected for a specific immersive communication session would depend on the available bandwidth, computational capability, available sensors, and application-level constraints such as privacy concerns. Immersion in a remote environment requires the ability to physically interact with objects in that environment. This relies on the haptics modality, comprising tactile sense (touch, pressure, temperature, pain) and kinaesthe- tics (perception of muscle movements and joint positions). To provide timely haptic feedback requires a control loop with roughly 1-ms latency. This corresponds to a high packet transmission rate of nominally 1000 packets/s. To overcome this problem, recent work leveraged perceptual models to account for the human haptic sensitivity. For example, based on Weber’s law of the just noticeable dif- ference (JND), the JND is linearly related to the intensity of the stimulus. Changes less than the JND would not be perceived and hence do not need to be transmitted. This leads to the notion of perceptual dead-bands within which changes to the haptic signal would in principle be im- perceptible to the user [44]. Accounting for haptic signal perceptibility, coupled with conventional compression techniques, can lead to significant reductions in the transmitted packet rate and the total bit rate transmitted [45]. Haptics, and the compression and low-delay two-way transmission of haptics information, is an emerging area that promises to dramatically increase the realism and range of uses of immersive communication systems. Ultimately, the success or failure of any system for immersive communication lies in the quality of the human experience that it provides, not in the technology that it uses. As mentioned in the Introduction, immersive communication is about exchanging natural social signals between remote people and/or experiencing remote locations in ways that suspend one’s disbelief in being there. Funda- mentally, the quality of such an immersive experience relates to human perception and mental state. For immersive communication, while it may be sufficient to reproduce the sensory field at a distant location, it may not be necessary, or even feasible, to do so. More important to achieving a high QoE is inducing in each participant an intended illusion or mental state. Hence, it may come as no surprise that the latest generation of immersive communication products, which set a new standard for QoE, was conceived by veteran storytellers in Hollywood. Moviemaker DreamWorks Ani- mation SKG, in partnership with technologist Hewlett- Packard, developed the HP Halo Video Collaboration Studio, shown in Fig. 5 bringing a moviemaker’s perspective to video conferencing. (Various companies, including Cisco, Tandberg, Teliris, and Polycom, have since offered similar systems.) Unlike their predecessor video conferencing systems, these high-end telepresence systems are carefully designed to induce participants to suspend their disbelief in being with each other in the same room. In fact these high-end telepresence systems preserve many of the important immersive cues listed in Table 2, auditory and visual, beginning with peripheral awareness and consis- tency. Peripheral awareness is the awareness of everything that is going on in one’s immediate vicinity, providing a sense of transparency, of being there. Halo achieves that, in part, by a large field of view. Consistency between local and remote sites is important for maintaining the illusion of a single room. So all elements of room design, such as tables, chairs, colors, wall coverings, acoustics, and lighting, match across all locations by design. In addition, a number of measures were taken to maintain consistent 3-D cues. For example: • remote people appear life size and occupy the same field-of-view as if they were present; • multichannel audio is spatially consistent with the video; • video streams from remote sites are placed in an order that is consistent with a virtual meeting lay- out, improving eye contact and maintaining consistent gaze direction. In addition: • low latency and full-duplex echo-controlled audio allow natural conversation; • people’s facial expressions are accurately conveyed by a combination of sufficient video resolution and constrained seating that keeps participants in front of the cameras. Despite the attention paid to these details, other con- flicting cues can serve to break the illusion. One example is eye contact. In a real physical space, each person is able to make eye contact with at most one other person. But in today’s telepresence systems, eye contact is compromised because all local participants see the same view of each remote participant. Future multiview 3-D displays promise to improve on this aspect. Evaluating the QoE of an immersive communication system can be done on several levels, which one could call the performance, psychometric, and psychophysical levels. At the performance level, QoE is evaluated in terms of the overall effectiveness of the experience with respect to performing some task. For example, Bailenson et al. [46] studied whether gaze awareness could reduce the number of questions, time per question, or completion time in a B game of 20 questions [ with multiple players. Such evaluations are often performed as user studies in a lab, since well-defined, measurable tasks must be carried out. Clearly, measures of performance may be completely different for one task (e.g., playing poker) compared to another (e.g., negotiating a business deal). Hence, it is possible for different immersive communication systems to excel in different settings. At the psychometric level, QoE is evaluated in terms of how the participants feel about the experience. Such evaluations can be carried out either as a study in a lab (often in the context of performing a task) or as a field study (in the context of a regular workload). As an example of a field study, Venolia et al. [47] studied the deployment of embodied social proxies, or telepresence robots, in four real- world software development teams in which one member of each team was remote. Within each team, an embodied social proxy for the remote team member was deployed for six weeks. A baseline survey was completed before the deployment, and the same survey was completed after the deployment, by all team members. The survey listed a set of assertions to be scored on an eight-point range from 0 (low agreement) to 7 (high), known as a Likert scale in the psychometric literature [48]: one set of assertions related to meeting effectiveness (e.g., B I think X has a good sense of my reactions [ ); another set of assertions related to awareness (e.g., B I am aware of what X is currently working on and what is important to X [ ); and a final set of assertions related to social aspects (e.g., B I have a sense of closeness to X [ ). Comparing the surveys before and after the deployment yielded quantitative psychometric results on the overall experience with respect to meeting effectiveness, awareness, and social aspects. Qualitative results were also obtained through freeform comments on the surveys as well as direct observations of the teams in action by ethnographers. Other examples of psychometric evaluations include those of [49]–[51], which showed the importance of awareness of remote participants’ object of attention on collaborative tasks, and [52], which showed that trust can be improved using a multiview display to support proper eye gaze. These were assessed in lab studies using surveys. The third level on which to evaluate QoE is the psychophysical level. ...
Citations
... Organizers must find solutions for virtual conferences that provide the same value and level of social exchange to their participants. While current solutions fail to do so, technology that enables immersion in virtual spaces by replicating a physical space virtually enhances the value of participation [1,27,31,17]. However, the development of these solutions is expensive, requires technical expertise, and an adequate IT infrastructure [5]. ...
In a human-centered society 5.0, participation in events such as trade fairs and academic conferences should fit the consumer’s profile and enable restriction-free participation by addressing potential economic, spatial, temporal, and individual constraints. Global challenges that have emerged or intensified in the past few years have caused a transformation in the event industry. Solutions that make the industry more resilient to future global challenges are now being explored. Within this transformation, the shift from physical to digital spaces gained importance. However, creating an environment that can satisfy the participants’ needs in interaction and immersion remains a challenge for both, academia and the event industry. This paper introduces an integration framework for game engine technologies and event management systems to enable restriction-free participation while strengthening the collaboration between participants in a virtual event. The use of the integration framework is demonstrated in an industry setting where a state-of-the-art game engine accessed through browser technologies is used to recreate a physical conference venue and populate it with content from an open-source event management information system.
... Emerging CPS technologies, such as Virtual Reality (VR) and Unmanned Aerial Vehicles (UAV), have a broad societal impact. VR suspends our disbelief of being at a remote location, similar to virtual human teleportation [1]. The flurry of related devices, services, and platforms led to the integration of online real-time sensor measurements and device control in diverse industrial, commercial, and societal application domains. ...
Virtual Reality (VR) has the potential to revolutionize the way we operate Unmanned Aerial Vehicles (UAVs) by providing a more immersive and intuitive way to control and interact with UAV. This paper proposes a framework for VR in UAVs that includes hardware, software, communication, and application layers in Beyond fifth Generation (B5G). The hardware layer includes the system's physical components, such as the UAV, sensors, and VR headset. The software layer includes the algorithms and software applications that enable the UAV to be controlled using VR. The communication layer includes the protocols and infrastructure to enable real-time communication between the UAV and the VR headset. The application layer includes designing and implementing the VR interface, which is critical for ensuring ease of use and efficiency. The proposed framework is validated through testing, user feedback, and performance metrics that can be applied to various applications, including surveillance and mapping, inspection, and training. The framework offers a powerful and flexible approach to operating UAVs using VR, and has the potential to transform how VR in UAV uses for various purposes in academia and industry.
... Except for computer-generated holography, a capture system is required to record 3D images of a physical object. An ideal capture system for holographic communication would capture the light field, i.e., all the information of each light ray, in the target scene (Apostolopoulos et al., 2012). In practice, capture is conducted with visual sensors such as a camera array (Nakamura et al., 2019) or light detection and ranging (LIDAR) sensors (Fratz et al., 2021). ...
... The depth information of the object of interest is either directly captured (e.g., in the case of a capture system with LIDAR sensors) or computed in the subsequent data processing step (e.g., in the case of a capture system with a camera array). The performance of the visual capture system depends on factors such as the number of sensors and the camera sampling rate (Apostolopoulos et al., 2012). ...
... It is worth noting that holographic communication may also involve audio data capture, processing, and rendering. In such a case, capturing the sound field in the target scene and ensuring audio and video synchronization are important for users to enjoy an immersive holographic communication experience (Apostolopoulos et al., 2012). ...
The sixth generation (6G) networks are expected to enable immersive communications and bridge the physical and the virtual worlds. Integrating extended reality, holography, and haptics, immersive communications will revolutionize how people work, entertain, and communicate by enabling lifelike interactions. However, the unprecedented demand for data transmission rate and the stringent requirements on latency and reliability create challenges for 6G networks to support immersive communications. In this survey article, we present the prospect of immersive communications and investigate emerging solutions to the corresponding challenges for 6G. First, we introduce use cases of immersive communications, in the fields of entertainment, education, and healthcare. Second, we present the concepts of immersive communications, including extended reality, haptic communication, and holographic communication, their basic implementation procedures, and their requirements on networks in terms of transmission rate, latency, and reliability. Third, we summarize the potential solutions to addressing the challenges from the aspects of communication, computing, and networking. Finally, we discuss future research directions and conclude this study.
... Apostolopoulos and colleagues pitched "immersive communication" as a vision for remote communication, involving an "exchange of natural social signals with remote people... that suspend disbelief in being there" that could even surpass being face-to-face communication [2]. The researchers suggest that advancements in augmented reality (AR) offer a promising direction to bringing this vision closer to reality. ...
... Apostolopoulos and colleagues proposed that immersiveness is key to revolutionizing remote communication because it can enable more natural experiences and interactions among people communicating and with the environment. By perceiving more natural social signals, such as voice or life-size body movements, people can "suspend disbelief in being there" with each other and heighten a "sense of presence" even when physically apart [2]. Indeed, social presence, or the feeling of being there with another person [56], is a crucial component of immersive communication, and can improve feelings of satisfaction and connection in communication [8,22]. ...
... 2.2 AR for Immersive Communication AR refers to enhancing or overlaying physical world environments or objects with computergenerated perceptual information such as visual, auditory, and haptic data [18]. AR can create feelings of immersiveness by enabling natural interaction experiences between people when communicating [2]. Previous research has applied AR to social contexts, including collaboration [24,46,60], games [13,49], museum visiting [35], online learning [68], storytelling [4,12,82], and AR annotation [55]. ...
A central challenge of social computing research is to enable people to communicate expressively with each other remotely. Augmented reality has great promise for expressive communication since it enables communication beyond texts and photos and towards immersive experiences rendered in recipients' physical environments. Little research, however, has explored AR's potential for everyday interpersonal communication. In this work, we prototype an AR messaging system, ARwand, to understand people's behaviors and perceptions around communicating with friends via AR messaging. We present our findings under four themes observed from a user study with 24 participants, including the types of immersive messages people choose to send to each other, which factors contribute to a sense of immersiveness, and what concerns arise over this new form of messaging. We discuss important implications of our findings on the design of future immersive communication systems.
... Apart from a large range of terms used in this space like for example computermediated communication (CMC), video-mediated communication (VMC) and immersive communication (Apostolopoulos et al., 2012;Chou, 2013), the term telepresence (IJsselsteijn, 2005;Minsky, 1980) is a very essential term in the context of this research and is typically defined as the 'sense of being there', i.e., when communicating with remote people with the feeling of being in the same room (cf. Rae, Venolia, Tang, & Molnar, 2015). ...
For recorded video content, researchers have proposed advanced concepts and approaches that enable the automatic composition and personalised presentation of coherent videos. This is typically achieved by selecting from a repository of individual video clips and concatenating a new sequence of clips based on some kind of model. However, there is a lack of generic concepts dedicatedly enabling such video mixing functionality for scenarios based on live video streams. This thesis aims to address this gap and explores how a live vision mixing process could be automated in the context of live television production, and, consequently, also extended to other application scenarios. This approach is coined the 'Virtual Director' concept. The name of the concept is inspired by the decision making processes which human broadcast TV directors are conducting when vision mixing live video streams stemming from multiple cameras. Understanding what is currently happening in the scene, they decide which camera view to show, at what point in time to switch to a different perspective, and how to adhere to cinematographic and cinematic paradigms while doing so. While the automation of vision mixing is the focus of this thesis, it is not the ultimate goal of the underlying vision. To automate for many viewers in parallel in a scalable manner allows taking decisions for each viewer or groups of viewers individually. To successfully do so allows moving away from a broadcast model where every viewer gets to see the same output. Particular content adaptation and personalisation features may provide added value for users. Preferences can be expressed dynamically, enabling interactive media experiences. In the course of this thesis, Virtual Director research prototypes are developed for three distinct application domains. Firstly, for distributed theatre performance, a script-based approach and a set of software tools are designed. A basic approach for the decision making process and a pattern how to decouple it into two core components are proposed. A trial validates the technology which does not implement full automation, yet successfully enables a theatre play. The second application scenario is live event 'narrowcast', a term used to denote the personalised equivalent to a 'broadcast'. In the context of this scenario, several computational approaches are considered for the implementation of an automatic Virtual Director with the conclusion to use and recommend a combination of (complex) event processing engines and event-condition-action (ECA) rules to model the decision making behaviour. Several content genres are subject to experimentation. Evaluation interviews provide detailed feedback on the specific research prototypes as well as the Virtual Director concept in general. In the third application scenario, group video communication, the most mature decision making behaviour is achieved. This behaviour needs to be defined in what can be a challenging process and is formalised in a model that is referred to as the 'production grammar'. The aforementioned pattern is realised such that a 'Semantic Lifting' process is processing low-level cue information in order to derive in more abstract, higher-level terms what is currently happening in the scene. The output of the Semantic Lifting process is informing and triggering the second process which is called the 'Director' decision making and eventually takes decisions on how to present the available content on screens. Overall, the exploratory research on the Virtual Director concept resulted in its successful application in the three domains, validated by stakeholder feedback and a range of informal and formal evaluation efforts. As a synthesis of the research in the three application scenarios, the thesis includes a detailed description of the Virtual Director concept. This description is contextualised by many detailed learnings that are considered relevant for both scholars and practitioners regarding the development of such technology.
... Measuring user behaviours (or so-called human influential factors) is extremely important [5], as immersive experiences are known to vary between users, with some experiencing motion sickness (termed cybersickness) while others report no side effects. Ultimately, it is this IMEx that will dictate the success or failure of any new immersive technology [6], thus quantifying user behaviour is of crucial importance. ...
With the advent of standalone virtual reality (VR) headsets, multisensory VR experiences are emerging with the hopes of better simulating real-world experiences. For example, innovations in haptic suits and scent diffusion devices are burgeoning. While stimulating multiple senses re-orientates the perceived quality of experience (QoE) by increasing factors such as realism, presence and immersion, the impact it has on user behaviour has yet to be fully quantified. Advances in wearables and instrumented VR headsets, however, have allowed for such user behaviours to be easily measured in real-time. In this paper, we describe a pilot experiment in which participants play a custom-developed VR game under two conditions: (1) conventional audio-visual and (2) multisensory, where haptic feedback is provided via a haptic sleeve and olfaction is enabled via a scent diffusion add-on for the VR headset. We describe a developed instrumented VR headset capable of measuring electroencephalography (EEG), electrocardiography (ECG), and electro-oculography (EOG) signals in real-time. From these signals, metrics of mental workload, engagement, and attention are extracted. We report changes seen in these behavioural measures between the two conditions, thus providing insights on factors driving the improved QoE seen with multisensory experiences.
... Applications in rehabilitation, education, training, and exercise are also emerging (Arndt et al., 2018;Radianti et al., 2020). Ultimately, the success of immersive applications are known to rely on the user experience they provide and not necessarily on the technology they use (Apostolopoulos et al., 2012). As virtual reality and the metaverse are projected to burgeon in the coming years, being able to objectively quantify user experience in immersive settings is crucial. ...
Measuring a gamer’s behaviour and perceived gaming experience in real-time can be crucial not only to assess game usability, but to also adjust the game play and content in real-time to maximize the experience per user. For this purpose, affective and physiological monitoring tools (e.g., wearables) have been used to monitor human influential factors (HIFs) related to quality of experience (QoE). Representative factors may include the gamer’s level of engagement, stress, as well as sense of presence and immersion, to name a few. However, one of the major challenges the community faces today is being able to accurately transfer the results obtained in controlled laboratory settings to uncontrolled everyday settings, such as the gamer’s home. In this paper, we describe an instrumented virtual reality (VR) headset, which directly embeds a number of dry ExG sensors (electroencephalography, EEG; electrocardiography, ECG; and electrooculography, EOG) to allow for gamer behaviour assessment in real-time. A protocol was developed to deliver kits (including the instrumented headset and controllers, laptop with the VR game Half-life Alyx, and a second laptop for data acquisition) to participants’ homes during the COVID-19 lockdown. A brief videoconference session was made to provide the participants with instructions, but otherwise the experiment proceeded with minimal experimenter intervention. Eight participants consented to participate and each played the game for roughly 1.5 h. After each gaming session, participants reported their overall experience with an online questionnaire covering aspects of emotions, engagement, immersion, sense of presence, motion sickness, flow, skill, technology adoption, judgement and usability. Here, we describe our obtained findings, as well as report correlations between the subjective ratings and several QoE-related HIFs measured directly from the instrumented headset. Promising results are reported.
... Humans are social animals; we have an intrinsic need to feel connected to other people, and a key element of that connection is communication. This is why communication technology has evolved so much in the last 100 years, but while this evolution has significantly increased the number of connections we can maintain, it has not been so successful in preserving its quality compared to face-to-face interactions [4]. Especially during a crisis such as the current COVID-19 pandemic, which forces people to physical distancing and isolation, a sense of connection with others is most needed. ...
... In this context, the concept of immersive and interactive communication is spreading, identifying a completely novel way of communicating with other people and displaying multimedia content. Traditional remote communications (e.g., television, radio, video calling) are no more sufficient tools for our society: humans are inherently social, in need of realistic experiences, and traditional remote communications do not offer such full sense of immersion and a natural experience/interactions [1]. The impact of realis- Proposed taxonomy of researches related with ODV streaming systems. ...
Omnidirectional videos (ODVs) have gone beyond the passive paradigm of traditional video, offering higher degrees of immersion and interaction. The revolutionary novelty of this technology is the possibility for users to interact with the surrounding environment, and to feel a sense of engagement and presence in a virtual space. Users are clearly the main driving force of immersive applications and consequentially the services need to be properly tailored to them. In this context, this chapter highlights the importance of the new role of users in ODV streaming applications, and thus the need for understanding their behaviour while navigating within ODVs. A comprehensive overview of the research efforts aimed at advancing ODV streaming systems is also presented. In particular, the state-of-the-art solutions under examination in this chapter are distinguished in terms of system-centric and user-centric streaming approaches: the former approach comes from a quite straightforward extension of well-established solutions for the 2D video pipeline while the latter one takes the benefit of understanding users’ behaviour and enable more personalised ODV streaming.
... Realistic virtual environments can allow for training of e.g., police officers under different scenarios and medical personnel under rare medical conditions, thus improving their training and potential to handle unknowns in the future. As highlighted by [1], however, the success of new immersive applications will rely on the experience that they provide to the user and not on the technology they use. As such, measurement of the quality of immersive media experiences (IMEx) has become crucial. ...
Recent technological advances have allowed for virtual reality applications to burgeon. With virtual reality (VR), so-called human influential factors play a crucial role in the final perceived immersive media experience (IMEx). While two individuals can use the same VR headset, play the same game in the same location, and have the same goals, the two individuals can have very different experiences, with varying perceptions of immersion, presence, realism, engagement, and cybersickness. This can be particularly true in multisensory immersive experiences where, in addition to audio-visual stimuli, olfactory and haptic feedback can be used. In this paper, we describe a pilot study in which a VR game was developed to combine audio-visual, olfactory, and haptic feedback to the user in real-time. After game play, participants were asked about their IMEx using five scales: realism, immersion, presence, engagement, and overall quality of experience (QoE). Moreover, using an instrumented VR headset we measure electroencephalography (EEG), electrocardiography (ECG), and electrooculography (EOG) signals and compute several instrumental measures of human influential factors, including an engagement index, arousal and valence indices, frontal alpha asymmetry, heart rate, several EEG sub-band powers, and eye blink rate. Using the subjective ratings, we measure the contribution that each IMEx subscale has on overall QoE, as a function of the type of sensory stimuli used. Results on 11 participants suggest very different contributions once smells and haptics are incorporated, relative to traditional audio-visual experiences. We also report on several instrumental measures that showed significant correlations with the IMEx subscales, suggesting that, in the future, real-time instrumental QoE measurement of multisensory experiences could be possible.