Screenshot of the Experiment 1. The participant was expected to look at the middle box, at the number, and either increase or decrease the number by head turning to reach the target value, shown above the box. The participant confirmed the selection by looking at the box below.  

Screenshot of the Experiment 1. The participant was expected to look at the middle box, at the number, and either increase or decrease the number by head turning to reach the target value, shown above the box. The participant confirmed the selection by looking at the box below.  

Source publication
Conference Paper
Full-text available
mart glasses equipped with eye tracking technology could be utilized to develop natural interaction techniques. They could be used to conveniently interact with an electronic appliance in the environment from a distance. We describe a technique, HeadTurn, that allows a user to look at a device and then control it by turning the head to the left or...

Context in source publication

Context 1
... created an application where the user would see a number that s/he can increase by gazing it and turning his/her head to the right and decrease by gazing the number and turning his/her head to the left. A screenshot of the application is shown in Figure 2. The target number was shown above the controlled number. ...

Similar publications

Conference Paper
Full-text available
Augmented Reality (AR) is becoming more and more popular and many applications across multiple domains are developed on AR hardware such as the Microsoft HoloLens or similar Head-Mounted Displays (HMD). Most of the AR applications are visualizing information that was not visible before and enable interaction with this information using voice input,...

Citations

... Since gaze can also reflect the user's intention [29], gaze can be used for a hands-free gesture input method [9,30]. There are also methods that use the movement of the head, such as a method that can adjust the input value by rotating or tilting the user's head (HeadTurn [31]), a method to turn pages when browsing (HeadPager [32]), a method to operate the cursor (e.g., desktop devise [33,34], mobile device [35]), and a method to select a target [36]. There are methods for gesture input that combine multiple factors, such as head movement and gaze [37,38]. ...
Article
Full-text available
Simple hands-free input methods using ear accessories have been proposed to broaden the range of scenarios in which information devices can be operated without hands. Although many previous studies use canal-type earphones, few studies focused on the following two points: (1) A method applicable to ear accessories other than canal-type earphones. (2) A method enabling various ear accessories with different styles to have the same hands-free input function. To realize these two points, this study proposes a method to recognize the user’s facial gesture using an infrared distance sensor attached to the ear accessory. The proposed method detects skin movement around the ear and face, which differs for each facial expression gesture. We created a prototype system for three ear accessories for the root of the ear, earlobe, and tragus. The evaluation results for nine gestures and 10 subjects showed that the F-value of each device was 0.95 or more, and the F-value of the pattern combining multiple devices was 0.99 or more, which showed the feasibility of the proposed method. Although many ear accessories could not interact with information devices, our findings enable various ear accessories with different styles to have eye-free and hands-free input ability based on facial gestures.
... 13 Short (less than 0.5s) medium (between 0.5s and 1.5s), and long (more than 1.5s) [6]. 14 Three categories: small (the gesture can be performed in less than 439cm 3 of physical space), medium (between 439cm 3 and 1467cm 3 and HeadTurn [73] interaction techniques, and prior work on head gesture input for controlling the power wheelchair for people with motor impairments [47]. • We included letters and common shapes and symbols in our set due to their ubiquity in stroke-gesture UIs, such as in the "augmented letters" [80] or "gesture search" [58] techniques. ...
... 13 Short (less than 0.5s) medium (between 0.5s and 1.5s), and long (more than 1.5s) [6]. 14 Three categories: small (the gesture can be performed in less than 439cm 3 of physical space), medium (between 439cm 3 and 1467cm 3 and HeadTurn [73] interaction techniques, and prior work on head gesture input for controlling the power wheelchair for people with motor impairments [47]. Figure 3, bottom-right shows these gestures characterized along the complexity, 15 trajectory, 16 ...
... Motivated by this need and prior success of designing user-defined gestures in other contexts, we sought to engage people with motor impairments to design eyelid gestures they prefer. What's more, as recent research demonstrated the promise of gaze and head pose for hands-free interaction in addition to eyelid gestures [17,24,29,37], we extended the design space of user-defined gestures by inviting people with motor impairments to design above-the-neck gestures that include eyelids, gaze, mouth, and head. ...
... Similarly, gaze and head turn were also combined to facilitate the control of onscreen targets [24]. Inspired by this line of work that shows the advantage of combining head motion with eye motions, we extend our exploration to include above-the-neck body parts, including both eyes, head and mouth, to allow people with motor impairments to better design a richer set of user-defined gestures. ...
Preprint
Full-text available
Recent research proposed eyelid gestures for people with upper-body motor impairments (UMI) to interact with smartphones without finger touch. However, such eyelid gestures were designed by researchers. It remains unknown what eyelid gestures people with UMI would want and be able to perform. Moreover, other above-the-neck body parts (e.g., mouth, head) could be used to form more gestures. We conducted a user study in which 17 people with UMI designed above-the-neck gestures for 26 common commands on smartphones. We collected a total of 442 user-defined gestures involving the eyes, the mouth, and the head. Participants were more likely to make gestures with their eyes and preferred gestures that were simple, easy-to-remember, and less likely to draw attention from others. We further conducted a survey (N=24) to validate the usability and acceptance of these user-defined gestures. Results show that user-defined gestures were acceptable to both people with and without motor impairments.
... However, given the importance of non-verbal cues in communication, a successful interaction requires the proper analysis of human action [11] as well. While head and eye-based gestures have been investigated in the broader context of humancomputer interaction [19,5,27,26,8], less efforts have arguably * Work supported by project UJI-B2018-44 from Pla de promoció de la investigació de la Universitat Jaume I, Castelló, Spain. The financial support for the research network with code RED2018-102511-T, from Ministerio de Ciencia, Innovación y Universidades, is acknowledged. ...
Preprint
Non-verbal communication plays a particularly important role in a wide range of scenarios in Human-Robot Interaction (HRI). Accordingly, this work addresses the problem of human gesture recognition. In particular, we focus on head and eye gestures, and adopt an egocentric (first-person) perspective using eyewear cameras. We argue that this egocentric view offers a number of conceptual and technical benefits over scene- or robot-centric perspectives. A motion-based recognition approach is proposed, which operates at two temporal granularities. Locally, frame-to-frame homographies are estimated with a convolutional neural network (CNN). The output of this CNN is input to a long short-term memory (LSTM) to capture longer-term temporal visual relationships, which are relevant to characterize gestures. Regarding the configuration of the network architecture, one particularly interesting finding is that using the output of an internal layer of the homography CNN increases the recognition rate with respect to using the homography matrix itself. While this work focuses on action recognition, and no robot or user study has been conducted yet, the system has been de signed to meet real-time constraints. The encouraging results suggest that the proposed egocentric perspective is viable, and this proof-of-concept work provides novel and useful contributions to the exciting area of HRI.
... They conducted a user study that showed the possibility of resolving the ambiguity caused by the occlusion problem when target selection was made by gaze and head gestures. Nukarinen et al. [35] proposed a technique, HeadTurn, that allowed a user to look at a device and to then control it by turning his or her head to the left or right. They evaluated HeadTurn using an interface that linked head-turning to increasing or decreasing a number shown on the display. ...
... The coarse-to-fine main interactions consist of 1) eye gazing-based coarse interactions and 2) head gesture-based fine interactions. It is important to note that eye movement is faster and requires less energy, while head movement is less jittery and more controlled [23], [24], [30], [34], [35]. Therefore, the eye gazingbased coarse interaction is used for the search and preview of intended target objects or UIs by tracking the user's pupils and calculating the eye pointer's location in the MR display which the user is looking at. ...
Article
Full-text available
This study proposes a novel hands-free interaction method using multimodal gestures such as eye gazing and head gestures and deep learning for human-robot interaction (HRI) in mixed reality (MR) environments. Since human operators hold some objects for conducting tasks, there are many constrained situations where they cannot use their hands for HRI interactions. To provide more effective and intuitive task assistance, the proposed hands-free method supports coarse-to-fine interactions. Eye gazing-based interaction is used for coarse interactions such as searching and previewing of target objects, and head gesture interactions are used for fine interactions such as selection and 3D manipulation. In addition, deep learning-based object detection is applied to estimate the initial positioning of physical objects to be manipulated by the robot. The result of object detection is then combined with 3D spatial mapping in the MR environment for supporting accurate initial object positioning. Furthermore, virtual object-based indirect manipulation is proposed to support more intuitive and efficient control of the robot, compared with traditional direct manipulation (e.g., joint-based and end effector-based manipulations). In particular, a digital twin, the synchronized virtual robot of the real robot, is used to provide a preview and simulation of the real robot to manipulate it more effectively and accurately. Two case studies were conducted to confirm the originality and advantages of the proposed hands-free HRI: (1) performance evaluation of initial object positioning and (2) comparative analysis with traditional direct robot manipulations. The deep learning-based initial positioning reduces much effort for robot manipulation using eye gazing and head gestures. The object-based indirect manipulation also supports more effective HRI than previous direct interaction methods.
... While this is not necessarily optimal, this depicts a fixed extreme point in the possible design space from which other prototypes may be derived. The design itself was derived from earlier prototypes mapping audio from or to sight [46,47,115] and interaction techniques relying on head roll or yaw [30,53,108,91]. Sources are targeted with head rotation and a sphere-cast along the users' head orientation. Alteration is mapped to the head's tilt or roll, with a knob-like metaphor. ...
Conference Paper
Many people utilize audio equipment to escape from noises around them, leading to the desired isolation but also dangerously reduced awareness. Mediation of sounds through smarter headphones (e.g., hearables) could address this by providing nonuniform interaction with sounds while retaining a comfortable, yet informative soundscape. In a week-long event sampling study (n = 12), we found that users mostly desire muting or a distinct "quiet but- audible" volume for sound sources. A follow-up study (n = 12) compared a reduced interaction granularity with a continuous one in VR. Usability and workload did not differ significantly for the two granularities but a set of four states can be considered sufficient for most scenarios, namely: ”muted”, ”quieter”, ”louder” and ”unchanged”, allowing for smoother interaction flows. We provide implications for the design of interactive auditory mediated reality systems enabling users to be safe, comfortable and less isolated from their surroundings, while re-gaining agency over their sense of hearing.
... These methods exploit eye-head coordination implicitly as they track the compensatory eye movement during a head gesture, without need for separate head tracking. In extension, head turning has been proposed for scalar input to controls fxated by gaze [36] and 3D target disambiguation [28]. In Eye-SeeThrough, head movement controls a toolglass that can be Session 9B: 3D and VR Input UIST '19, October 20-23, 2019, New Orleans, LA, USA moved over gaze-fxated targets [29]. ...
Conference Paper
Eye gaze involves the coordination of eye and head movement to acquire gaze targets, but existing approaches to gaze pointing are based on eye-tracking in abstraction from head motion. We propose to leverage the synergetic movement of eye and head, and identify design principles for Eye&Head gaze interaction. We introduce three novel techniques that build on the distinction of head-supported versus eyes-only gaze, to enable dynamic coupling of gaze and pointer, hover interaction, visual exploration around pre-selections, and iterative and fast confirmation of targets. We demonstrate Eye&Head interaction on applications in virtual reality, and evaluate our techniques against baselines in pointing and confirmation studies. Our results show that Eye&Head techniques enable novel gaze behaviours that provide users with more control and flexibility in fast gaze pointing and selection.
... The gaze depth estimation error using VOR gain increases proportionally to the fixation depth, suggesting that this technique may not be appropriate for accurate gaze depth estimation. However, as shown in previous work this is a compelling mechanism for target disambiguation in 3D environments, where objects may be partially occluded at different distances, and when combined with head gestures for selection [Mardanbegi et al. 2012[Mardanbegi et al. , 2019Nukarinen et al. 2016]. Unlike vergence-based methods, the VOR method using pupil centre is not reliant on gaze calibration and therefore does not suffer gaze calibration drift which is a common issue in many commercial eye trackers. ...
Conference Paper
Gaze depth estimation presents a challenge for eye tracking in 3D. This work investigates a novel approach to the problem based on eye movement mediated by the vestibulo-ocular reflex (VOR). VOR stabilises gaze on a target during head movement, with eye movement in the opposite direction, and the VOR gain increases the closer the fixated target is to the viewer. We present a theoretical analysis of the relationship between VOR gain and depth which we investigate with empirical data collected in a user study (N=10). We show that VOR gain can be captured using pupil centres, and propose and evaluate a practical method for gaze depth estimation based on a generic function of VOR gain and two-point depth calibration. The results show that VOR gain is comparable with vergence in capturing depth while only requiring one eye, and provide insight into open challenges in harnessing VOR gain as a robust measure.
... By combination with a 3D gesture detection system, researchers have managed to select and manipulate objects remotely, by gaze and 3D hand gestures, at greater speed [11,35]. The fixed-gaze head movement, which combines head movements with gaze, has also been studied to interact with a screen [19,23,30]. ...
... Similarly, Spakov et al. have studied the same technology using a remote eye-tracker [30]. The work of Nukarinen et al. is not limited to triggering a simple action, as they investigated performance optimization of head movements in complex and continuous interactions [23]. This research makes it possible to couple gaze with head movements without the need for an extra device (apart from the eye-tracker). ...
... To the best of our knowledge, all the researchers have investigated detection of fixed-gaze head movements with the help of eye images captured by eye-tracker: some of them use head-mounted eye-trackers [19], while others use remote eye-trackers [23,30]. On the contrary, in our project we tried to detect fixed-gaze head movements from the scene images captured by the scene camera of the head-mounted eye-tracker, for the purpose of saving computation time. ...
Article
Full-text available
Eye-tracking has a very strong potential in human computer interaction (HCI) as an input modality, particularly in mobile situations. However, it lacks convenient action triggering methods. In our research, we investigate the combination of eye-tracking and fixed-gaze head movement, which allows us to trigger various commands without using our hands or changing gaze direction. In this instance, we have proposed a new algorithm for fixed-gaze head movement detection using only scene images captured by the scene camera equipped in front of the head-mounted eye-tracker, for the purpose of saving computation time. To test the performance of our fixed-gaze head movement detection algorithm and the acceptance of triggering commands by these movements when the user's hands are occupied by another task, we have designed and developed an experimental application known as EyeMusic. The EyeMusic system is a music reading system, which can play the notes of a measure in a music score that the user does not understand. By making a voluntary head movement when fixing his/her gaze on the same point of a music score, the user can obtain the desired audio feedback. The design, development and usability testing of the first prototype for this application are presented in this paper. The usability of our application is confirmed by the experimental results, as 85% of participants were able to use all the head movements we implemented in the prototype. The average success rate of this application is 70%, which is partly influenced by the performance of the eye-tracker we use. The performance of our fixed-gaze head movement detection algorithm is 85%, and there were no significant differences between the performance of each head movement.
... Our results show that the VOR depth estimation even outperformed the vergence method in the wide-range scene condition by 18%. Since our method requires head movements for estimating gaze depth, we see in particular that it can be used in combination with other methods that combine gaze and head movements for interaction (e.g., [28,31]) when used in 3D. One application example could be to resolve target ambiguity when continuous head movements are used to adjust continuous parameters of different objects in 3D (for example adjusting the volume of a TV that is partially occluded by another device). ...
Conference Paper
Full-text available
Target disambiguation is a common problem in gaze interfaces, as eye tracking has accuracy and precision limitations. In 3D environments this is compounded by objects overlapping in the field of view, as a result of their positioning at different depth with partial occlusion. We introduce VOR depth estimation, a method based on the Vestibulo-ocular reflex of the eyes in compensation of head movement, and explore its application to resolve target ambiguity. The method estimates gaze depth by comparing the rotations of the eye and the head when the users look at a target and deliberately rotate their head. We show that VOR eye movement presents an alternative to vergence for gaze depth estimation, that is feasible also with monocular tracking. In an evaluation of its use for target disambiguation, our method outperforms vergence for targets presented at greater depth.