ArticlePDF Available

Comparison of Gaze and Mouse Pointers for Video-based Collaborative Physical Task

Authors:

Abstract

Remote collaboration on physical tasks is an emerging use of video telephony. Recent work suggests that conveying gaze information measured using an eye tracker between collaboration partners could be beneficial in this context. However, studies that compare gaze to other pointing mechanisms, such as a mouse-controlled pointer, in video-based collaboration, have not been available. We conducted a controlled user study to compare the two remote gesturing mechanisms (mouse, gaze) to video only (none) in a situation where a remote expert saw video of the desktop of a worker where his/her mouse or gaze pointer was projected. We also investigated the effect of distraction of the remote expert on the collaborative process and whether the effect depends on the pointing device. Our result suggests that mouse and gaze pointers lead to faster task performance and improved perception of the collaboration, in comparison to having no pointer at all. The mouse outperformed the gaze when the task required conveying procedural instructions. In addition, using gaze for remote gesturing required increased verbal effort for communicating both referential and procedural messages.
... Desktop VRE 1 [69] FOVE VR HMD 1 [70] Display 3 screen pseudo-CAVE 1 [71] Non immersive screen based VRE 7 [27], [61], [69], [72]- [75] Projection based VRE 2 [76], [77] Gloma 350 1 [51] Camera based VR 3 [54], [73], [78] Custom system ...
... A variety of other input methods were less used, such as facial expression, full-body tracking, and heart rate. [35], [37], [40], [46], [52], [56]- [58], [65], [67], [70], [75], [81] Head position 14 28 [39], [41]- [43], [48], [52], [55], [57], [60], [62], [69], [73], [78], [86] Gesture 13 26 [35], [40], [42], [45]- [48], [50], [51], [53], [69], [74], [84] Voice 9 18 [27], [39], [44], [51], [54], [60], [62], [79], [84] Movement 7 14 [50], [59], [67], [71], [76], [77], [79] Facial expression 5 10 [41], [49], [64], [68], [ The selected studies utilized 15 different types of system outputs. The five most common system outputs included: gaze visualization (64%, n=32), avatar representation (24%, n= 12), use of or as controller (16%, n=8), placement of annotations (10%, n=5), and display of facial expression (10%, n=5). ...
... Gaze Visualization 32 64 [25], [27], [36]- [44], [46], [48], [49], [53]- [56], [58], [60], [62], [66], [67], [69], [71]- [75], [78], [85], [86] Avatar representation 12 24 [37], [43], [45], [48], [50], [59], [62], [63], [76], [83], [84], [86] Controller 8 16 [34], [35], [39], [42], [46], [57], [70], [80] Place annotations 5 10 [35], [54], [56], [65], [84] Facial Expression 5 10 ...
Article
Full-text available
We present a state of the art and scoping review of the literature to examine embodied information behaviors, as reflected in shared gaze interactions, within co-present extended reality experiences. Recent proliferation of consumer-grade head-mounted XR displays, situated at multiple points along the Reality-Virtuality Continuum, has increased their application in social, collaborative, and analytical scenarios that utilize data and information at multiple scales. Shared gaze represents a modality for synchronous interaction in these scenarios, yet there is a lack of understanding of the implementation of shared eye gaze within co-present extended reality contexts. We use gaze behaviors as a proxy to examine embodied information behaviors. This review examines the application of eye tracking technology to facilitate interaction in multiuser XR by sharing a user’s gaze, identifies salient themes within existing research since 2013 in this context, and identifies patterns within these themes relevant to embodied information behavior in XR. We review a corpus of 50 research papers that investigate the application of shared gaze and gaze tracking in XR generated using the SALSA framework and searches in multiple databases. The publications were reviewed for study characteristics, technology types, use scenarios, and task types. We construct a state-of-the field and highlight opportunities for innovation and challenges for future research directions.
... Therefore, we take full advantage of combining gestures and gaze in order to develop a novel multimodal interaction for an AR remote collaborative platform. This research is motivated by earlier research [13][14][15][16] and builds upon it by combining 2.5D gestures, gaze, and SAR. Hence, the contributions of the present study fall into three areas: ...
... Akkil et al. [14] developed a SAR remote collaborative system called GazeTorch, which provides the remote user' gaze awareness for the local user. Recently, continuing in this exploratory direction, they investigated the influence of gaze in video-based remote collaboration for a physical task, compared with the POINTER interface (e.g., a mouse-controlled pointer) [16]. Similar work was carried out by Higuchi et al. [19] and Wang et al. [20][21][22]. ...
Article
Full-text available
Although sharing gestures and gaze can improve AR remote collaboration, most current systems only enable collaborators to share 2D or 3D gestures, and the unimodal HCI interface remains dominant. To address this problem, we describe a novel remote collaborative platform based on 2.5D gestures and gaze (2.5DGG), which supports an expert who collaborates with a worker (e.g., during assembly or training tasks). We investigate the impact of sharing the remote site’s 2.5DGG using spatial AR (SAR) remote collaboration in manufacturing. Compared to other systems, there is a key advantage that it can provide more natural and intuitive multimodal interaction based on 2.5DGG. We track the remote experts’ gestures and eye gaze using Leap Motion and aGlass, respectively, in a VR space displaying the live video stream of the local physical workspace and visualize them onto the local work scenario by a projector. The results of an exploratory user study demonstrate that 2.5DGG has a clear difference in performance time and collaborative experience, and it is better than the traditional one.
... However, the mouse cursor facilitated communication more than the gaze cursor (cf. Akkil & Isokoski, 2018;Müller et al., 2013). Finally, Gupta et al. (2016) showed comparable benefits of both tools for improving remote collaboration tasks (e.g. ...
Article
Full-text available
Eye movement modelling examples (EMME) are instructional videos that display a teacher’s eye movements as “gaze cursor” (e.g. a moving dot) superimposed on the learning task. This study investigated if previous findings on the beneficial effects of EMME would extend to online lecture videos and compared the effects of displaying the teacher’s gaze cursor with displaying the more traditional mouse cursor as a tool to guide learners’ attention. Novices (N = 124) studied a pre-recorded video lecture on how to model business processes in a 2 (mouse cursor absent/present) × 2 (gaze cursor absent/present) between-subjects design. Unexpectedly, we did not find significant effects of the presence of gaze or mouse cursors on mental effort and learning. However, participants who watched videos with the gaze cursor found it easier to follow the teacher. Overall, participants responded positively to the gaze cursor, especially when the mouse cursor was not displayed in the video.
... Various media have been used for remote conferencing systems. Isokoski and Akkil created a video conferencing environment in which the helper could point using a mouse or via eye gazing [Isokoski, 2019]. Mouse pointing brightened the area upon which the helper's mouse pointer dwelled; eye gaze pointing brightened the area that was followed by the helper's gaze. ...
Article
Full-text available
Video/audio conferencing systems have been used extensively for remote collaboration over many years. Recently, virtual and mixed reality (VR/MR) systems have started to show great potential as communication media for remote collaboration. Prior studies revealed that the creation of common ground between discourse participants is crucial for collaboration and that grounding techniques change with the communication medium. However, it is difficult to find previous research that compares VR and MR communication system performances with video conferencing systems regarding the creation of common ground for collaborative problem solving. On the other hand, prior studies have found that display fidelity and interaction fidelity had significant effects on performance-intensive individual tasks in virtual reality. Fidelity in VR can be defined as the degree of objective accuracy with which the real-world is represented by the virtual world. However, to date, fidelity for collaborative tasks in VR/MR has not been defined or studied much. In this paper, we compare five different communication media for the establishment of common ground in collaborative problem-solving tasks: Webcam, headband camera, VR, MR, and audio-only conferencing systems. We analyzed these communication media with respect to collaborative fidelity components which we defined. For the experiments, we utilized two different types of collaborative tasks: a 2D Tangram puzzle and a 3D Soma cube puzzle. The experimental results show that the traditional Webcam performed better than the other media in the 2D task, while the headband camera performed better in the 3D task. In terms of collaboration fidelity, these results were somehow predictable, although there was a little difference between our expectations and the results.
... This has an important influence on the ability of cooperation with one another. On the basis of the aforementioned study and the human interaction relying on many perceptual cues, three commonly used methods are put forward, for example, sharing ARA [3,4,9], gestures [2,6,15,21], and gaze [5,7,15,22]. ...
Article
Full-text available
A novel platform for remote collaboration is proposed in this paper. This platform makes it possible for an expert to assist a worker for industrial assembly task in different places. Our goal is to compare the effect of sharing head pointer in a SAR remote collaboration with AR annotations in manufacturing. First, we develop an AR remote collaborative platform. This platform implements sharing a remote expert’s head pointer instead of eye gaze. Then, we evaluate the prototype system comparing two conditions through a user study, AR annotations and gaze cues (GC), relating to their effectiveness in the assembly efficiency, numbers of incorrect operation, workload, and collaborative experience. The results show that sharing head pointer can improve performance, co-presence awareness, and user collaborative experiences and decreases the numbers of incorrect operations. More importantly, we implement GC visualization in low-cost head tracking and find that it acts as a good referential pointing. Therefore, head pointer could be a competent representation of GC in AR/MR remote collaboration for assembly assistance. Our research has a great practical significance for the industrial application of GC-based AR/MR remote collaboration.
... The development of a hands-free communication device between operators in such an environment has been attempted in the past [10][11][12][13][14][15][16][17][18]. Trejos et al. [14] developed a communication support device called WHaSP (Wireless hands-free surgical pointer), which is equipped with a 6-axis acceleration sensor and Bluetooth device in the headset. ...
Article
Full-text available
The purpose of this study is to construct a hands-free endoscopic surgical communication support system that can draw lines in space corresponding to head movements using AR technology and evaluate the applicability of the drawing motion by the head movement to the steering law, one of the HCI models, for the potential use during endoscopic surgery. In the experiment, the participants manipulated the cursor by using head movements through the pathway and movement time (MT); the number of errors and subjective evaluation of the difficulty of the task was obtained. The results showed that the head-movement-based line drawing manipulation was significantly affected by the tracking direction and by the task difficulty, shown as the Index of Difficulty (ID). There was high linearity between ID and MT, with a coefficient of determination R² of 0.9991. The Index of Performance was higher in the horizontal and vertical directions compared to diagonal directions. Although the weight and biocompatibility of the AR glasses must be overcome to make the current prototype a viable tool for supporting communication in the operating room environment, the prototype has the potential to promote the development of a computer-supported collaborative work environment for endoscopic surgery purposes.
Article
Perspective-taking and attentional switching are some of the ergonomic challenges that existing teleoperation human-machine interface designs need to address. This study developed two gaze interaction methods, the Eye Stick and the Eye Click, which were based on the joystick metaphor and the navigation metaphor, respectively, to be used in exocentric perspective teleoperation scenarios. We conducted two user studies to test the task performance and the subjective experience of the gaze interaction methods in a virtual ground vehicle teleoperation task. The results showed that compared with a traditional joystick design, the Eye Stick led to a shorter driving distance and the Eye Click led to less task time, and the gaze interaction methods had performance advantages in more difficult mazes. After multiple task sessions, the participants reported that the gaze interaction methods and the traditional joystick were similar in terms of task workload, perceived learnability, and satisfaction; however, the perceived usability of the Eye Stick was not as good as the Eye Click and the traditional joystick. In conclusion, both the Eye Stick and the Eye Click are feasible and promising gaze interaction methods for teleoperation applications with task performance advantages; however, more research is needed to optimize their user experience design.
Thesis
Remote Collaboration using Augmented Reality (AR) shows great potential to establish a common ground in physically distributed scenarios where team-members need to achieve a shared goal. However, most research efforts in this field have been devoted to experiment with the enabling technology and propose methods to support its development. As the field evolves, evaluation and characterization of the collaborative process become an essential, but difficult endeavor, to better understand the contributions of AR. In this thesis, we conducted a critical analysis to identify the main limitations and opportunities of the field, while situating its maturity and proposing a roadmap of important research actions. Next, a human-centered design methodology was adopted, involving industrial partners to probe how AR could support their needs during remote maintenance. These outcomes were combined with literature methods into an AR-prototype and its evaluation was performed through a user study. From this, it became clear the necessity to perform a deep reflection in order to better understand the dimensions that influence and must/should be considered in Collaborative AR. Hence, a conceptual model and a human- centered taxonomy were proposed to foster systematization of perspectives. Based on the model proposed, an evaluation framework for contextualized data gathering and analysis was developed, allowing support the design and performance of distributed evaluations in a more informed and complete manner. To instantiate this vision, the CAPTURE toolkit was created, providing an additional perspective based on selected dimensions of collaboration and pre-defined measurements to obtain “in situ” data about them, which can be analyzed using an integrated visualization dashboard. The toolkit successfully supported evaluations of several team-members during tasks of remote maintenance mediated by AR. Thus, showing its versatility and potential in eliciting a comprehensive characterization of the added value of AR in real-life situations, establishing itself as a general-purpose solution, potentially applicable to a wider range of collaborative scenarios.
Chapter
Remote collaboration is becoming increasingly crucial, especially currently when travel is restricted because of the Covid-19 pandemic. People are looking for real-time and no-travel solutions to enable remote collaboration with colleagues and experts. A lot of research has been conducted on how to support remote guidance on physical tasks. However, these studies have mainly focused on development of technical components to support collaboration, while less attention has been paid into exploring and evaluating human factors that could influence remote collaboration. The aim of this paper is to identify human factors including culture, language, trust and social status for their possible effects on remote collaboration by reviewing their effects on computer-supported collaboration. This review adds more critical views of human perspectives into the current research mostly-focused on the technical side of remote guidance.
Article
Remote Collaboration mediated by Mixed and Augmented Reality (MR/AR) shows great potential in scenarios where physically distributed collaborators need to establish a common ground to achieve a shared goal. So far, most research efforts have been devoted to creating the enabling technology, overcoming engineering hurdles and proposing methods to support its design and development. To contribute to more in-depth knowledge on how remote collaboration occurs through these technologies, it is paramount to understand where the field stands and how characterization and evaluation have been conducted. In this vein, this work reports the results of a literature review which shows that evaluation is frequently performed in ad-hoc manners, i.e., disregarding adapting the evaluation methods to collaborative AR. Most studies rely on single-user methods, which are not suitable for collaborative solutions, falling short of retrieving the necessary amount of contextualized data for more comprehensive evaluations. This suggests minimal support of existing frameworks and a lack of theories and guidelines to guide the characterization of the collaborative process using AR. Then, a critical analysis is presented in which we discuss the maturity of the field and a roadmap of important research actions is proposed, that may help address how to improve the characterization and evaluation of the collaboration process moving forward and, in consequence, improve MR/AR based remote collaboration.
Article
Full-text available
This paper studies how eye-tracking can be used to measure and facilitate joint attention in parent-child interaction. Joint attention is critical for social learning activities such as parent-child shared storybook reading. There is a disassociation of attention when the adult reads texts while the child looks at pictures. We hypothesize the lack of joint attention limits children"s opportunity to learn print-related skills. Traditional research paradigm does not measure joint attention in real-time during shared storybook reading. In the current study, we simultaneously tracked eye movements of a parent and his/her child with two eye-trackers. We also provided real-time feedback to the parent where the child was looking at, and vice versa. Changes of dyads" reading behaviors before and after the joint attention intervention were measured from both eye movements and video records. Baseline data show little joint attention in parent-child shared book reading. The real-time eye-gaze feedback significantly changes parent-child interaction and improves learning.
Article
Full-text available
Giant strides in information technology at the turn of the century may have unleashed unreachable goals. With the invention of groupware, people expect to communicate easily with each other and accomplish difficult work even though they are remotely located or rarely overlap in time. Major corporations launch global teams, expecting that technology will make "virtual collocation" possible. Federal research money encourages global science through the establishment of "collaboratories." We review over 10 years of field and laboratory investigations of collocated and noncollocated synchronous group collaborations. In particular, we compare collocated work with remote work as it is possible today and comment on the promise of remote work tomorrow. We focus on the sociotechnical conditions required for effective distance work and bring together the results with four key concepts: common ground, coupling of work, collaboration readiness, and collaboration technology readiness. Groups with high common ground and loosely coupled work, with readiness both for collaboration and collaboration technology, have a chance at succeeding with remote work. Deviations from each of these create strain on the relationships among teammates and require changes in the work or processes of collaboration to succeed. Often they do not succeed because distance still matters.
Article
Full-text available
In this paper we present the results of an eye-tracking study on collaborative problem-solving dyads. Dyads remotely collaborated to learn from contrasting cases involving basic concepts about how the human brain processes visual information. In one condition, dyads saw the eye gazes of their partner on the screen; in a control group, they did not have access to this information. Results indicated that this real-time mutual gaze perception intervention helped students achieve a higher quality of collaboration and a higher learning gain. Implications for supporting group collaboration are discussed.
Article
Full-text available
Gaze cues are important in communication. In social interactions gaze cues usually occur with spoken language, yet most previous research has used artificial paradigms without dialogue. The present study investigates the interaction between gaze and language using a real-world paradigm. Each participant followed instructions to build a series of abstract structures out of building blocks, while their eye movements were recorded. The instructor varied the specificity of the instructions (unambiguous or ambiguous) and the presence of gaze cues (present or absent) between participants. Fixations to the blocks were recorded and task performance was measured. The presence of gaze cues led to more accurate performance, more accurate visual selection of the target block and more fixations towards the instructor when ambiguous instructions were given, but not when unambiguous instructions were given. We conclude that people only utilize the gaze cues of others when the cues provide useful information.
Article
Full-text available
Establishing common ground in remote cooperation is challenging because nonverbal means of ambiguity resolution are limited. In such settings, information about a partner's gaze can support cooperative performance, but it is not yet clear whether and to what extent the abundance of information reflected in gaze comes at a cost. Specifically, in tasks that mainly rely on spatial referencing, gaze transfer might be distracting and leave the partner uncertain about the meaning of the gaze cursor. To examine this question, we let pairs of participants perform a joint puzzle task. One partner knew the solution and instructed the other partner's actions by (1) gaze, (2) speech, (3) gaze and speech, or (4) mouse and speech. Based on these instructions, the acting partner moved the pieces under conditions of high or low autonomy. Performance was better when using either gaze or mouse transfer compared to speech alone. However, in contrast to the mouse, gaze transfer induced uncertainty, evidenced in delayed responses to the cursor. Also, participants tried to resolve ambiguities by engaging in more verbal effort, formulating more explicit object descriptions and fewer deictic references. Thus, gaze transfer seems to increase uncertainty and ambiguity, thereby complicating grounding in this spatial referencing task. The results highlight the importance of closely examining task characteristics when considering gaze transfer as a means of support.
Article
We present the results of an empirical study that measured the contribution of a conspicuous eye-gaze (as a function of scleral de-pigmentation) of humans in conveying multimodal referentiality by combining visual and auditory cues in a naturalistic setting. We made participants interact in a cooperative task in which they had to convey referential meaning about co-presential entities. In one of the conditions, participants had no access to their interactants' eye-gaze. We interpret the results as supporting the idea that our eye morphology contributes to instantiating multimodal referentiality in cooperative tasks in peripersonal space.
Article
Remote cooperation can be improved by transferring the gaze of one participant to the other. However, based on a partner's gaze, an interpretation of his communicative intention can be difficult. Thus, gaze transfer has been inferior to mouse transfer in remote spatial referencing tasks where locations had to be pointed out explicitly. Given that eye movements serve as an indicator of visual attention, it remains to be investigated whether gaze and mouse transfer differentially affect the coordination of joint action when the situation demands an understanding of the partner's search strategies. In the present study, a gaze or mouse cursor was transferred from a searcher to an assistant in a hierarchical decision task. The assistant could use this cursor to guide his movement of a window which continuously opened up the display parts the searcher needed to find the right solution. In this context, we investigated how the ease of using gaze transfer depended on whether a link could be established between the partner's eye movements and the objects he was looking at. Therefore, in addition to the searcher's cursor, the assistant either saw the positions of these objects or only a grey background. When the objects were visible, performance and the number of spoken words were similar for gaze and mouse transfer. However, without them, gaze transfer resulted in longer solution times and more verbal effort as participants relied more strongly on speech to coordinate the window movement. Moreover, an analysis of the spatio-temporal coupling of the transmitted cursor and the window indicated that when no visual object information was available, assistants confidently followed the searcher's mouse but not his gaze cursor. Once again, the results highlight the importance of carefully considering task characteristics when applying gaze transfer in remote cooperation.
Article
Accessibility theory associates more complex referring expressions with less accessible referents. Felicitous referring expressions should reflect accessibility from the addressee's perspective, which may be difficult for speakers to assess incrementally. If mechanisms shared by perception and production help interlocutors align internal representations, then dyads with different roles and different things to say should profit less from alignment. We examined introductory mentions of on-screen shapes within a joint task for effects of access to the addressee's attention, of players' actions and of speakers' roles. Only speakers' actions affected the form of referring expression and only different role dyads made egocentric use of actions hidden from listeners. Analysis of players' gaze around referring expressions confirmed this pattern; only same role dyads coordinated attention as the accessibility theory predicts. The results are discussed within a model distributing collaborative effort under the constraints of joint tasks.
Article
The results of two experiments, in which participants solved constructive tasks of the puzzle type, are reported. The tasks were solved by two partners who shared the same visual environment hut whose knowledge of the situation and ability to change it to reach a solution were different. One of the partners — the "expert" — knew the solution in detail but had no means of acting on this information. The second partner — the "novice " — could act to achieve the goal, but knew very little about the solution. The partners were free to communicate verbally. In one third of the trials of the first experiment, in addition to verbal communication, the eye fixations of the expert were projected onto the working space of the novice. In another condition the expert could use a mouse to show the novice relevant parts of the task configuration. Both methods of facilitating the 'joint attention' state of the partners improved their performance. The nature of the dialogues as well as the parameters of the eye movements changed. In the second experiment the direction of the gaze-position data transfer was reversed, from the novice to the expert. This also led to a significant increase in the efficiency of the distributed problem solving.
Article
When pairs work together on a physical task, seeing a common workspace facilitates communication and benefits performance. When mediating such activities, however, the choice of technology can transform the visual information in ways that impact critical coordination processes. In this paper we examine two coordination processes that are impacted by visual information—situation awareness and conversational grounding—which are theoretically distinct but often confounded in empirical research. We present three empirical studies that demonstrate how shared visual information supports collaboration through these two distinct routes. We also address how particular features of visual information interact with features of the task to influence situation awareness and conversational grounding, and further demonstrate how these features affect conversation and coordination. Experiment 1 manipulates the immediacy of the visual information and shows that immediate visual feedback facilitates collaboration by improving both situation awareness and conversational grounding. In Experiment 2, by misaligning the perspective through which the Worker and Helper see the work area we disrupt the ability of visual feedback to support conversational grounding, but not situation awareness. The findings demonstrate that visual information supports the central mechanism of conversational grounding. Experiment 3 disrupts the ability of visual feedback to support situation awareness by reducing the size of the common viewing area. The findings suggest that visual information independently supports both situation awareness and conversational grounding. We conclude with a general discussion of the results and their implications for theory development and the future design of collaborative technologies.