Figure 4 - uploaded by Deepak Akkil
Content may be subject to copyright.
Number of correct predictions for the two conditions. Maximum was 15, random choice would lead to 5 correct predictions on average. 

Number of correct predictions for the two conditions. Maximum was 15, random choice would lead to 5 correct predictions on average. 

Source publication
Conference Paper
Full-text available
Video communication using head-mounted cameras could be useful to mediate shared activities and support collaboration. Growing popularity of wearable gaze trackers presents an opportunity to add gaze information on the egocentric video. We hypothesized three potential benefits of gaze-augmented egocentric video to support collaborative scenarios: s...

Context in source publication

Context 1
... completing both the conditions, a final questionnaire was used to collect the subjective opinions of the participant in relation to the two experimental conditions. Figure 4 shows the number of correct predictions of the driver's intention in the two experimental conditions. The median value indicates that participants predicted the direction which the driver would take at a four-way intersection 26% more accurately when the video was augmented with the gaze information of the driver. ...

Similar publications

Article
Full-text available
Egocentric vision data captures the first person perspective of a visual stimulus and helps study the gaze behavior in more natural contexts. In this work, we propose a new dataset collected in a free viewing style with an end-to-end data processing pipeline. A group of 25 participants provided their gaze information wearing Tobii Pro Glasses 2 set...

Citations

... In our study, the local worker and the expert are standing next to each other in the collocated setting oriented towards a shared 2D screen displaying the work field. The shared screen and their placement next to each other, looking at the screen may limit the conveyance of some natural gestures or non-verbal instructions that the instructor could provide such as using head direction or eye-gaze for understanding the instructions [1,22]. ...
... Visual search strategy and attention have been known as an impacting factor on interpreting the instruction and mentoring outcome specifically for distributed instruction [1,26]. Our results show an increase of local workers' efficiency of visual search strategy and attention to the instructions in telementoring. ...
Article
Full-text available
There is a long-standing interest in CSCW on distributed instruction-both in how it differs from collocated instruction as well as the design of tools to reduce any deficiencies. In this study, we leveraged the unique environment of laparoscopic surgery to compare the efficacy and mechanism of instruction in a collocated and distributed condition. By implementing the same instructional technology in both conditions, we are able to evaluate the effect of distance on instruction without the confounding variable of medium of instruction. Surprisingly, our findings revealed trainees perceived a higher perceived quality of instruction in the distributed condition. Further investigation suggests that in a distributed learning environment, trainees change their behavior to attend more to the provided instructions resulting in this higher perceived quality of instruction. Finally, we discuss our findings with regards to media compensation theory, and we provide both social and technical insights on how to better support a distributed instructional process. CCS CONCEPTS • Human-centered computing • Collaborative and social computing • Empirical studies in collaborative and social computing
... Predicting the target of a visual search with computational models, and the overt gaze signal as input, is commonly referred to as search target inference [Borji et al. 2015;Sattar et al. 2017aSattar et al. , 2015. This provides implicit insight into user intentions and allows an external observer or intelligent user interface to make predictions about the ongoing activities [Akkil and Isokoski 2016;Bader and Beyerer 2013;Flanagan and Johansson 2003;Haji-Abolhassani and Clark 2014;Rothkopf et al. 2016;Rotman et al. 2006]. ...
... Deepak et al. [42] presented a novel gaze-based interface called GazeTorch, comparing between the use of gaze and that of mouse pointers in remote collaborative physical tasks [43]. A study [44] showed the benefits of gaze augmentation in egocentric videos in a driving task. Yoo et al. [45] proposed the teleoperation of a robotic arm from a remote location by using an experimental eye-tracking algorithm. ...
Article
Full-text available
Over the years, gaze input modality has been an easy and demanding human–computer interaction (HCI) method for various applications. The research of gaze-based interactive applications has advanced considerably, as HCIs are no longer constrained to traditional input devices. In this paper, we propose a novel immersive eye-gaze-guided camera (called GazeGuide) that can seamlessly control the movements of a camera mounted on an unmanned aerial vehicle (UAV) from the eye-gaze of a remote user. The video stream captured by the camera is fed into a head-mounted display (HMD) with a binocular eye tracker. The user’s eye-gaze is the sole input modality to maneuver the camera. A user study was conducted considering the static and moving targets of interest in a three-dimensional (3D) space to evaluate the proposed framework. GazeGuide was compared with a state-of-the-art input modality remote controller. The qualitative and quantitative results showed that the proposed GazeGuide performed significantly better than the remote controller.
... Remote collaboration is a key usage scenario for AR: several studies have provided insights on different aspects of collaboration, such as increasing situational awareness using the gaze information of one's partner [2,13], the effectiveness of various pointing techniques in remote collaborative settings [1] or the importance of a shared view of the workspace in conversational grounding [5]. All these works introduce strategies to compensate for the lack of a shared real world environment (as in colocated tasks), so that collaboration does not suffer. ...
Conference Paper
Our research explores the impact of network impairments on remote augmented reality (AR) collaborative tasks, and possible strategies to improve user experience in these scenarios. Using a simple AR task, under a controlled network environment, our preliminary user study highlights the impact of network outages on user workload and experience, and how user roles and learning styles play a role in this regard.
... There are some previous studies that have investigated the value of augmenting gaze information in video from a head- mounted camera for collaborative purposes. Akkil and Isokoski found that overlaying gaze data in egocentric video improves a viewer's ability to predict intention of the partner (Akkil and Isokoski, 2016b) and help more accurately inter- pret pointing targets than when pointing with hand or head (Akkil and Isokoski, 2016a). Gupta et al. (2016) found that conveying gaze information in the egocentric video from the worker to the expert improves collaboration performance in a stationary LEGO building task. ...
Preprint
Full-text available
Remote collaboration on physical tasks is an emerging use of video telephony. Recent work suggests that conveying gaze information measured using an eye tracker between collaboration partners could be beneficial in this context. However, studies that compare gaze to other pointing mechanisms, such as a mouse-controlled pointer, in video-based collaboration, have not been available. We conducted a controlled user study to compare the two remote gesturing mechanisms (mouse, gaze) to video only (none) in a situation where a remote expert saw video of the desktop of a worker where his/her mouse or gaze pointer was projected. We also investigated the effect of distraction of the remote expert on the collaborative process and whether the effect depends on the pointing device. Our result suggests that mouse and gaze pointers lead to faster task performance and improved perception of the collaboration, in comparison to having no pointer at all. The mouse outperformed the gaze when the task required conveying procedural instructions. In addition, using gaze for remote gesturing required increased verbal effort for communicating both referential and procedural messages.
... The experiment below covers scenarios (1) and (2): Preparing open questions, and selecting from different variants. Scenario (3) was not included in this experiment since it follows the same pattern on the next refinement level. We decided to show all alternatives (A-B-C and 1-2-3) in one video, one after the other (see Fig. 2). ...
... The authors suspect that view switching may support taking different perspectives and lead to a better understanding of the perspectives of the different characters, e.g. if the video is filmed in first-person view. Akkil and Isokoski [3] visualized the actor's gaze point in an egocentric video and show that this improves the viewers' awareness of the actor's emotions. Kallinen et al. [16] compared first-and third-person perspectives in computer games and found higher presence for first-person perspective. ...
Preprint
[Context and motivation] Complex software-based systems involve several stakeholders, their activities and interactions with the system. Vision videos are used during the early phases of a project to complement textual representations. They visualize previously abstract visions of the product and its use. By creating, elaborating, and discussing vision videos, stakeholders and developers gain an improved shared understanding of how those abstract visions could translate into concrete scenarios and requirements to which individuals can relate. [Question/problem] In this paper, we investigate two aspects of refining vision videos: (1) Refining the vision by providing alternative answers to previously open issues about the system to be built. (2) A refined understanding of the camera perspective in vision videos. The impact of using a subjective (or "ego") perspective is compared to the usual third-person perspective. [Methodology] We use shopping in rural areas as a real-world application domain for refining vision videos. Both aspects of refining vision videos were investigated in an experiment with 20 participants. [Contribution] Subjects made a significant number of additional contributions when they had received not only video or text but also both - even with very short text and short video clips. Subjective video elements were rated as positive. However, there was no significant preference for either subjective or non-subjective videos in general.
... HCI researchers have particularly focused on studies illustrating technical aspects of using virtual reality. These include challenges in augmented reality environments (such as depth perception [14]), tools (such as HMDs [1,16]) or novel techniques for optical configuration [25] in order to ease the visual discomfort [31], or exploring human behaviour addressing the impact of AR and VR experiences for chronic pain patients [24]. ...
Conference Paper
Full-text available
This paper presents a case study of a Mixed-Reality Performance employing 360-degree video for a virtual reality experience. We repurpose the notions of friction to illustrate the different threads at which priming is enacted during the performance to create an immersive audience experience. We look at aspects of friction between the different layers of the Mixed-Reality Performance, namely: temporal friction, friction between the physical and virtual presence of the audience, and friction between realities. We argue that Mixed-Reality Performances that employ immersive technology, do not need to rely on its presumed immersive nature to make the performance an engaging or coherent experience. Immersion, in such performances, emerges from the audience' transition towards a more active role, and the creation of various fictional realities through frictions.
... Human gaze behavior depends on the task in which a user is currently engaged [22,4]; this provides implicit insight into the user's intentions and allows an external observer or intelligent user interface to make predictions about the ongoing activity [6,13,2,8,1]. Predicting the target of a visual search with computational models and the overt gaze signal as input, is commonly referred to as search target inference [3,15,16]. Inferring visual search targets helps to construct and improve intelligent user interfaces in many fields, e.g., robotics [9] or similar to examples in [18]. ...
Chapter
Full-text available
Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset. The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.
... Thus, it is challenging for designers to select the appropriate design technique based on certain conditions. The literature presents us with an increasing number of studies focusing on the introduction to and explanation of specific design techniques to solve design problems (e.g., Akkil and Isokoski 2016;Boy et al. 2015;Cartwright and Pardo 2015). Despite many design techniques being available, little research has been conducted to classify design techniques and therefore help designers to select the right design techniques for a particular design problem (e.g., Martin and Hanington 2012;Rajeshkumar et al. 2013;Vermeeren et al. 2010). ...
Conference Paper
Full-text available
In a digital world, service designers need to apply design techniques to meet increasing user expectations. While there are many design techniques out there, the current taxonomies of design techniques provide little guidance for designers when it comes to selecting appropriate design techniques during the design process. Hence, a well-structured taxonomy is needed. This research-in-process seeks to apply a taxonomy development method to classify design techniques and to identify important dimensions in order to provide an overview of digital service design techniques. Our preliminary results present a taxonomy with five dimensions, each of which includes mutually exclusive and collectively exhaustive characteristics. In future research, we plan to evaluate the usefulness of our taxonomy and compare our taxonomy with those that are currently available. Furthermore, we expect to look into the potential interrelations among the dimensions and build a model that explains and predicts the appropriate techniques for a given situation.
... Unlike other body sites, head-mounted cameras (e.g. cameras integrated in smartglasses) provide an egocentric perspective, providing a consistent view of the current activity and a coarse indication of visual attention of the wearer [2]. There are many commercial head-mounted devices, equipped with a world-facing camera, e.g. ...
... A second option is pointing with the head, by bringing the object of interest to the center of the camera view by turning the head. A third option becomes possible when the point of gaze of the user is superimposed on the video [2]. In such cases, the user can convey the object of interest by just looking at it. ...
Conference Paper
Full-text available
Communicating spatial information by pointing is ubiquitous in human interactions. With the growing use of head-mounted cameras for collaborative purposes, it is important to assess how accurately viewers of the resulting egocentric videos can interpret pointing acts. We conducted an experiment to compare the accuracy of interpreting four different pointing techniques: hand pointing, head pointing, gaze pointing and hand+gaze pointing. Our results suggest that superimposing the gaze information on the egocentric video can enable viewers to determine pointing targets more accurately and more confidently. Hand pointing performed best when the pointing target was straight ahead and head pointing was the least preferred in terms of ease of interpretation. Our results can inform the design of collaborative applications that make use of the egocentric view.