Figure 5 - uploaded by Deepak Akkil
Content may be subject to copyright.
Subjective confidence of interpretation for the four pointing conditions (1-7 Likert scale). 

Subjective confidence of interpretation for the four pointing conditions (1-7 Likert scale). 

Source publication
Conference Paper
Full-text available
Communicating spatial information by pointing is ubiquitous in human interactions. With the growing use of head-mounted cameras for collaborative purposes, it is important to assess how accurately viewers of the resulting egocentric videos can interpret pointing acts. We conducted an experiment to compare the accuracy of interpreting four different...

Similar publications

Article
Full-text available
In the case of languages which make a two-way distinction between demonstrative terms the choice between spatial demonstratives has traditionally been assumed to depend on the referent's physical proximity to the speaker. However, recently this egocentric and speaker-anchored view has been challenged, and the addressee's role in demonstrative refer...

Citations

... Hofemann et al. [5] for instance use pointing gestures to instruct a robot which part to pick from a table, while pointing gestures were studied in a student-tutor relationship by Sathayanarayana et al. [14]. Besides possible applications of detecting pointing gestures, the pointing accuracy was also intensively studied, e.g., in [1,4]. ...
Chapter
Full-text available
During a team discussion, participants frequently perform pointing, pairing, or grouping gestures on artifacts on a whiteboard. While the content of the whiteboard is accessible to the blind and visually impaired people, the referring deictic gestures are not. This paper thus introduces an improved algorithm to detect such gestures and to classify them. Since deictic gestures such as pointing, pairing and grouping are performed by sighted users only, we used a VR environment for the development of the gesture recognition algorithm and for the subsequent user studies.
... For selection of Points-of-Interest (POI) while driving a vehicle, Suras-Perez et al. [34] and Fujimura et al. [10] used finger pointing with speech and hand-constraint finger pointing, respectively. However, the use of finger pointing can be a difficult task especially when trying to identify objects that do not lie straight ahead [4]. In fact, while studying driver behaviors in a driving simulator, Gomaa et al. found gaze accuracy to be higher than pointing accuracy [11]. ...
... However, these multimodal approaches do not use the opportunity to enhance the preciseness of gaze tracking, although some use semantics from speech to narrow down the target. Our work achieves this enhancement in preciseness with the use of multimodal fusion of relevant deictic information from gaze, finger pointing, and head pose as input modalities [4]. Instead of using finger pointing as a trigger for selection, we use it as an equal input modality, while utilizing speech modality as a trigger. ...
Preprint
Full-text available
Advanced in-cabin sensing technologies, especially vision based approaches, have tremendously progressed user interaction inside the vehicle, paving the way for new applications of natural user interaction. Just as humans use multiple modes to communicate with each other, we follow an approach which is characterized by simultaneously using multiple modalities to achieve natural human-machine interaction for a specific task: pointing to or glancing towards objects inside as well as outside the vehicle for deictic references. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture using a deep neural network to precisely identify the driver's referencing intent. Additionally, we use a speech command as a trigger to separate each referencing event. We observe differences in driver behavior in the two pointing use cases (i.e. for inside and outside objects), especially when analyzing the preciseness of the three modalities eye, head, and finger. We conclude that there is no single modality that is solely optimal for all cases as each modality reveals certain limitations. Fusion of multiple modalities exploits the relevant characteristics of each modality, hence overcoming the case dependent limitations of each individual modality. Ultimately, we propose a method to identity whether the driver's referenced object lies inside or outside the vehicle, based on the predicted pointing direction.
... Desktop VRE 1 [69] FOVE VR HMD 1 [70] Display 3 screen pseudo-CAVE 1 [71] Non immersive screen based VRE 7 [27], [61], [69], [72]- [75] Projection based VRE 2 [76], [77] Gloma 350 1 [51] Camera based VR 3 [54], [73], [78] Custom system ...
... Desktop VRE 1 [69] FOVE VR HMD 1 [70] Display 3 screen pseudo-CAVE 1 [71] Non immersive screen based VRE 7 [27], [61], [69], [72]- [75] Projection based VRE 2 [76], [77] Gloma 350 1 [51] Camera based VR 3 [54], [73], [78] Custom system ...
... A variety of other input methods were less used, such as facial expression, full-body tracking, and heart rate. [35], [37], [40], [46], [52], [56]- [58], [65], [67], [70], [75], [81] Head position 14 28 [39], [41]- [43], [48], [52], [55], [57], [60], [62], [69], [73], [78], [86] Gesture 13 26 [35], [40], [42], [45]- [48], [50], [51], [53], [69], [74], [84] Voice 9 18 [27], [39], [44], [51], [54], [60], [62], [79], [84] Movement 7 14 [50], [59], [67], [71], [76], [77], [79] Facial expression 5 10 [41], [49], [64], [68], [ The selected studies utilized 15 different types of system outputs. The five most common system outputs included: gaze visualization (64%, n=32), avatar representation (24%, n= 12), use of or as controller (16%, n=8), placement of annotations (10%, n=5), and display of facial expression (10%, n=5). ...
Article
Full-text available
We present a state of the art and scoping review of the literature to examine embodied information behaviors, as reflected in shared gaze interactions, within co-present extended reality experiences. Recent proliferation of consumer-grade head-mounted XR displays, situated at multiple points along the Reality-Virtuality Continuum, has increased their application in social, collaborative, and analytical scenarios that utilize data and information at multiple scales. Shared gaze represents a modality for synchronous interaction in these scenarios, yet there is a lack of understanding of the implementation of shared eye gaze within co-present extended reality contexts. We use gaze behaviors as a proxy to examine embodied information behaviors. This review examines the application of eye tracking technology to facilitate interaction in multiuser XR by sharing a user’s gaze, identifies salient themes within existing research since 2013 in this context, and identifies patterns within these themes relevant to embodied information behavior in XR. We review a corpus of 50 research papers that investigate the application of shared gaze and gaze tracking in XR generated using the SALSA framework and searches in multiple databases. The publications were reviewed for study characteristics, technology types, use scenarios, and task types. We construct a state-of-the field and highlight opportunities for innovation and challenges for future research directions.
... When the same gestures were observed from viewpoints next to the pointer, which were comparable to those used by Bangerter and Oppenheimer (2006), also the biases resembled those reported by these authors. As vertical biases shrank the more the observer approached the pointer's viewpoint, it can be speculated that they might practically vanish once the observer assumes the pointer perspective, as has been reported by Akkil and Isokoski (2016). Furthermore, the data are descriptively resembling those of Mayer et al. (2020) who also found an perspective dependency of pointing interpretation, e.g. a rather leftward bias especially from rightward viewpoints as well as an overall upward bias. ...
Article
Full-text available
Though ubiquitous in human communication, pointing gestures are often misunderstood. This study addressed how the observer's perspective affects pointing perception. More specifically, we tested the hypothesis that two different visual cues-namely (a) the vector defined by the pointer's arm or finger and (b) the pointer's index finger position in the observer's visual field-determine pointing perception and that their relative influence depends on the observer's perspective. In three experiments, participants judged the location at which a virtual or real pointer was pointing from different viewpoints. The experiments show that the observer perspective has a considerable effect on pointing perception. The more the observer's gaze direction is aligned with the pointing arm, the more observers rely on the position of the pointing finger in their visual field and the less they rely on its direction. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... However, they lack the enhancement in the preciseness of gaze tracking. Unlike these, we use multimodal fusion for better precision of the driver's referenced direction as head, gaze and finger pointing all provide relevant deictic information [25]. ...
... Using finger pointing to recognize objects that do not lie straight ahead is a challenging task [25]. To improve the pointing gesture accuracy for object selection inside the car, Roider et. ...
Preprint
Full-text available
There is a growing interest in more intelligent natural user interaction with the car. Hand gestures and speech are already being applied for driver-car interaction. Moreover, multimodal approaches are also showing promise in the automotive industry. In this paper, we utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle. We use features from gaze, head pose and finger pointing simultaneously to precisely predict the referenced objects in different car poses. We demonstrate the practical limitations of each modality when used for a natural form of referencing, specifically inside the car. As evident from our results, we overcome the modality specific limitations, to a large extent, by the addition of other modalities. This work highlights the importance of multimodal sensing, especially when moving towards natural user interaction. Furthermore, our user based analysis shows noteworthy differences in recognition of user behavior depending upon the vehicle pose.
... The collection and preprocessing of driving behavior data was performed by D-Lab software (Ergoneers GmbH, Geretsried, Germany) [45][46][47][48]. Eye movement was recorded using an Asus laptop with an Intel Core i5, 16 G memory, 512 G SSD. ...
Article
Full-text available
This study reports the results of a pilot study on spatiotemporal characteristics of drivers’ visual behavior while driving in three different luminance levels in a tunnel. The study was carried out in a relatively long tunnel during the daytime. Six experienced drivers were recruited to participate in the driving experiment. Experimental data of pupil area and fixation point position (at the tunnel’s interior zone: 1566 m long) were collected by non-intrusive eye-tracking equipment at three luminance levels (2 cd/m2, 2.5 cd/m2, and 3 cd/m2). Fixation maps (color-coded maps presenting distributed data) were created based on fixation point position data to quantify changes in visual behavior. The results demonstrated that luminance levels had a significant effect on pupil areas and fixation zones. Fixation area and average pupil area had a significant negative correlation with luminance levels during the daytime. In addition, drivers concentrated more on the front road pavement, the top wall surface, and the cars’ control wheels. The results revealed that the pupil area had a linear relationship with the luminance level. The limitations of this research are pointed out and the future research directions are also prospected.
... Therefore, guidelines [26] and gesture studies [70] are prepared for measuring users' experience across devices, to build consistent gesture sets [47] such that users can transfer interaction models and gestures [7]. It is quite pertinent to our discussion that gestures (macro and micro) such as pointing play a role in non-verbal communication in interactions [2,31,32]. Often with smartphones and smartwatches, macro-scale gestures are used, among them pointing and turning the wrist [10]. Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. ...
... Often with smartphones and smartwatches, macro-scale gestures are used, among them pointing and turning the wrist [10]. Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. The gesture of sharing one's device screen to thereby ground the interaction [22] plays a part in the users' experiences of the interaction through communicating spatial information [2]. ...
... Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. The gesture of sharing one's device screen to thereby ground the interaction [22] plays a part in the users' experiences of the interaction through communicating spatial information [2]. Sociocentric referential gestures of this type are practised and designed in such a way that these actions themselves provide additional conversation dynamics to allow further references within the conversation [18]. ...
Preprint
Full-text available
Technologies that augment face-to-face interactions with a digital sense of self have been used to support conversations. That work has employed one homogenous technology, either 'off-the-shelf' or with a bespoke prototype, across all participants. Beyond speculative instances, it is unclear what technology individuals themselves would choose, if any, to augment their social interactions; what influence it may exert; or how use of heterogeneous devices may affect the value of this augmentation. This is important, as the devices that we use directly affect our behaviour, influencing affordances and how we engage in social interactions. Through a study of 28 participants, we compared head-mounted display, smartphones, and smartwatches to support digital augmentation of self during face-to-face interactions within a group. We identified a preference among participants for head-mounted displays to support privacy, while smartwatches and smartphones better supported conversational events (such as grounding and repair), along with group use through screen-sharing. Accordingly, we present software and hardware design recommendations and user interface guidelines for integrating a digital form of self into face-to-face conversations.
... Cooney, Brady, & McKinney, 2018). Moreover, whereas observers typically interpret points as indicating a higher position than intended by the pointer (Bangerter & Oppenheimer, 2006;Herbort & Kunde, 2016Wnuczko & Kennedy, 2011), such errors were rare in an experiment in which participants interpreted videos of pointing gestures recorded from the pointer's viewpoint (Akkil & Isokoski, 2016). In sum, these findings suggest that both pointing production and interpretation depend on the perspective. ...
Article
Full-text available
Pointing is a ubiquitous means of communication. Nevertheless, observers systematically misinterpret the location indicated by pointers. We examined whether these misunderstandings result from the typically different viewpoints of pointers and observers. Participants either pointed themselves or interpreted points while assuming the pointer’s or a typical observer perspective in a virtual reality environment. The perspective had a strong effect on the relationship between pointing gestures and referents, whereas the task had only a minor influence. This suggests that misunderstandings between pointers and observers primarily result from their typically different viewpoints.
... In addition, there is an increasing demand of customization, being at the same time in a global competition with competitors all over the world. This trend, which is inducing the development from macro to micro markets, results in diminished lot sizes due to augmenting product varieties (high-volume to low-volume production) [1]. To cope with this augmenting variety as well as to be able to identify possible optimization potentials in the existing production system, it is important to have a precise knowledge of the product range and characteristics manufactured and/or assembled in this system. ...
... Their study also shows that head-hand pointing performs best, while pointing on targets straight ahead. [1] In the paper of Herbort et al. two experiments are carried out that also support the hypothesis that a pointer uses the head-finger line while an observer interprets the shoulder-finger line. Misunderstandings can be reduced when bringing the head closer to the shoulder. ...
Article
Full-text available
The main contribution of this work is a method to generate datasets of pointing gestures. A person points to tracked objects within a motion capture environment and a neuronal network further processes the tracking data. We found that different input data combination has no significant effect on the performance of the networks. Therefore, it is possible to train a network and obtain correct results, regardless of the complete availability of data for head, shoulder, elbow or hand. With the presented method, we could achieve an overall accuracy of 35 mm within a 2D plane over a distance of 2 m.
... Therefore, guidelines [26] and gesture studies [70] are prepared for measuring users' experience across devices, to build consistent gesture sets [47] such that users can transfer interaction models and gestures [7]. It is quite pertinent to our discussion that gestures (macro and micro) such as pointing play a role in non-verbal communication in interactions [2,31,32]. Often with smartphones and smartwatches, macro-scale gestures are used, among them pointing and turning the wrist [10]. Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. ...
... Often with smartphones and smartwatches, macro-scale gestures are used, among them pointing and turning the wrist [10]. Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. The gesture of sharing one's device screen to thereby ground the interaction [22] plays a part in the users' experiences of the interaction through communicating spatial information [2]. ...
... Pointing gestures include eye pointing [9,29], gaze pointing, and hand pointing [2]. The gesture of sharing one's device screen to thereby ground the interaction [22] plays a part in the users' experiences of the interaction through communicating spatial information [2]. Sociocentric referential gestures of this type are practised and designed in such a way that these actions themselves provide additional conversation dynamics to allow further references within the conversation [18]. ...
Conference Paper
Technologies that augment face-to-face interactions with a digital sense of self have been used to support conversations. That work has employed one homogenous technology, either 'off-the-shelf' or with a bespoke prototype, across all participants. Beyond speculative instances, it is unclear what technology individuals themselves would choose, if any, to augment their social interactions; what influence it may exert; or how use of heterogeneous devices may affect the value of this augmentation. This is important, as the devices that we use directly affect our behaviour, influencing affordances and how we engage in social interactions. Through a study of 28 participants, we compared head-mounted display, smartphones, and smartwatches to support digital augmentation of self during face-to-face interactions within a group. We identified a preference among participants for head-mounted displays to support privacy, while smartwatches and smartphones better supported conversational events (such as grounding and repair), along with group use through screen-sharing. Accordingly, we present software and hardware design recommendations and user interface guidelines for integrating a digital form of self into face-to-face conversations.