ArticlePDF Available

Hand Tracking for Immersive Virtual Reality: Opportunities and Challenges



Hand tracking has become an integral feature of recent generations of immersive virtual reality head-mounted displays. With the widespread adoption of this feature, hardware engineers and software developers are faced with an exciting array of opportunities and a number of challenges, mostly in relation to the human user. In this article, I outline what I see as the main possibilities for hand tracking to add value to immersive virtual reality as well as some of the potential challenges in the context of the psychology and neuroscience of the human user. It is hoped that this paper serves as a roadmap for the development of best practices in the field for the development of subsequent generations of hand tracking and virtual reality technologies.
Hand Tracking for Immersive Virtual
Reality: Opportunities and Challenges
Gavin Buckingham*
Department of Sport and Health Sciences, University of Exeter, Exeter, United Kingdom
Hand tracking has become an integral feature of recent generations of immersive virtual
reality head-mounted displays. With the widespread adoption of this feature, hardware
engineers and software developers are faced with an exciting array of opportunities and a
number of challenges, mostly in relation to the human user. In this article, I outline what I
see as the main possibilities for hand tracking to add value to immersive virtual reality as
well as some of the potential challenges in the context of the psychology and neuroscience
of the human user. It is hoped that this paper serves as a roadmap for the development of
best practices in the eld for the development of subsequent generations of hand tracking
and virtual reality technologies.
Keywords: VR, embodiment, psychology, communcation, inclusivity
Immersive virtual reality (iVR) systems have recently seen a huge growth due to reductions in
hardware costs and a wealth of software use cases. In early consumer models of the Oculus Rift Head-
Mounted Display (HMD), interactions with the environment (a key hallmark of iVR) were usually
performed with hand-held controllers. Hands were visualized in games and applications
(infrequently) in a limited array of poses based on nger position, assumed from contact with
triggers and buttons on these controllers. Although the ability to visualize the positions of individual
digits was possible with external motion tracking and/or datagloveperipherals which measured
nger joint angles and rotations, these technologies were prohibitively expensive and were unreliable
without careful calibration. A step change in hand tracking occurred with the Leap Motion Tracker, a
small encapsulated infra-red emitter and optical camera developed with the goal of having people
interacting with desktop machines by gesturing at the screen. This device was very small, required no
external power source, and was able to track the movements of individual digits in three dimensions
using a stereo camera system with reasonable precision (Guna et al., 2014). Signicant improvements
in software, presumably through a clever use of inverse kinematics, along with a free software-
development kit and a strong user base in the Unity and Unreal Game Engine communities led to a
proliferation of accessible hand tracking addons and experiences tailor-made for iVR. Since then,
hand tracking has become embedded into the hardware of recent generations of iVR HMDs (e.g., the
rst and second iterations of the Oculus Quest) through so-called inside outtracking, and looks set
to continue to evolve with emerging technologies such as wrist-worn electromyography (Inside
Facebook Reality Labs, 2021). This paper will briey outline the main use-cases of hand tracking in
VR, and then discuss in some detail the outstanding issues and challenges which developers need to
keep in mind when developing such experiences.
OpportunitiesWhy Hand Tracking?
Our hands, with the dexterity afforded by our opposable thumbs, are one of the canonical features
which separates us from non-human primates. We use our hands to gesture, feel, and interact with
Edited by:
Nadia Magnenat Thalmann,
Université de Genève, Switzerland
Reviewed by:
Antonella Maselli,
Italian National Research Council, Italy
Richard Skarbez,
La Trobe University, Australia
Gavin Buckingham
Specialty section:
This article was submitted to
Virtual Reality and Human Behaviour,
a section of the journal
Frontiers in Virtual Reality
Received: 21 June 2021
Accepted: 24 September 2021
Published: 20 October 2021
Buckingham G (2021) Hand Tracking
for Immersive Virtual Reality:
Opportunities and Challenges.
Front. Virtual Real. 2:728461.
doi: 10.3389/frvir.2021.728461
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284611
published: 20 October 2021
doi: 10.3389/frvir.2021.728461
our environment almost every minute of our waking lives. When
we are prevented from, or limited in, using our hands, we are
profoundly impaired, with a range of once-mundane tasks
becoming frustratingly awkward. Below, I briey outline three
signicant potential benets of having tracked hands in a virtual
Opportunity 1Increased Immersion and
The degree to which a user can to perceive a virtual environment
through the sensorimotor contingencies they would encounter in
the physical environment is termed immersion(Slater and
Sanchez-Vives, 2016). The subjective experience of being in a
highly-immersive virtual environment is known as presence,
and recent empirical evidence suggests that being able to see ones
tracked hands animated in real time in a virtual environment is an
extremely compelling method of engagement (Voigt-Antons
et al., 2020). Research has shown that we have an almost
preternatural sense of our hands positions and shape when
they are obscured (Dieter et al., 2014), and when our hands
are removed from our visual worlds it is a stark reminder of our
disembodiment. Indeed, we spend the majority of our time
during various mundane tasks foveating our hands (Land,
2009), so removing them from the visual scene presumably
has a range of consequences for our visuomotor behaviour.
Opportunity 2More Effective Interaction
The next point to raise is that of interaction. A key goal of virtual
reality is to allow the user to interact with the computer-generated
environment in a natural fashion. This interaction can be
achieved in its simplest form by the user by moving their head
to experience the wide visual world. More modern VR
experiences, however, usually involve some form of manual
interaction, from opening doors to wielding weapons.
Accurate tracking of the hands potentially allows for far more
precise interactions that would be possible with controllers,
adding not only to the users immersion (Argelaguet et al.,
2016;Pyasik et al., 2020), but even the accuracy of their
movements (Vosinakis and Koutsabasis, 2018), which seems
particularly key in the context of training (Harris et al., 2020).
Opportunity 3More Effective
The nal point to discuss is that of communication, and in
particular manual gesticulationthe use of ones hands to
emphasize words and punctuate sentences through a series of
gestures. Gesturesin the context of HCI has come to mean the
swipes and pinching motions uses to perform commands.
However, the involuntary movements of hands during natural
communication appear to play a signicant role not just for the
listener, but also the communicator to such an extent that
conversations between two congenitally blind individuals
contain as many gestures as conversations between sighted
individuals (Iverson and Goldin-Meadow, 1998;Özçalışkan
et al., 2016). Indeed, recent research has shown that
individuals are impaired in recognizing a number of key
emotions in the images of bodies which have the hands
removed (Ross and Flack, 2020), highlighting how important
hand form information is in communicative experiences. The
value of manual gestures for communication in virtual
environments is compounded given that veridical real-time
face tracking and visualization is technically very difcult due
to the extremely high temporal and spatial resolution required to
detect and track microexpressions. Furthermore, computer-
generated faces are particularly prone to large uncanny-valley
like effects whereby faces which fall just short of being realistic
elicit a strong sense of unease (MacDorman et al., 2009;
McDonnell and Breidt, 2010). Signicant recent strides have
been made in tracking and rendering photorealistic faces
(Schwartz et al., 2020), but the hardware costs are likely to be
prohibitive for the current generation of consumer-based VR
technologies. Tracking and rendering of the hands, with their
large and expressive kinematics, should thus be strong a focus for
communicative avatars in the short term.
Challenge 1Object Interaction
Our hands are one of our main ways to effect change in the
environment around us. Thus, one of the main reasons to
visualise hands in VR is to facilitate and encourage
interactions with the virtual environment. From opening doors
to wielding weapons, computer-generated hands are an integral
part of many game experiences across many platforms. As
outlined above, these manual interactions are typically
generated by reverse-engineering interactions with a held
controller. For example, on the Oculus Quest 2 controller, if
the buttons underneath the index and middle ngers are lightly
depressed, the hand appears to close slightly; if the buttons are
fully depressed, the hand closes into a st. Not only does this
method of interacting with the world feel quite engaging, it elicits
a greater sense of ownership over the seen hand than a
visualization of the held controller itself (Lavoie and
Chapman, 2021). But despite the compelling nature of this
experience, hand tracking offers the promise of a real-time
veridical representation of the hands true actions, requiring
no mapping of physical to seen actions and untethered from
any extraneous hardware. Anecdotally, however, interacting with
virtual objects using hand tracking feels imprecise and difcult to
use, which is supported by recent ndings showing that during a
block moving task hands tracked with a Leap Motion tracker
score lower on the System Usability Scale than hands tracked with
a hand-held controller (Masurovsky et al., 2020). Furthermore,
subjective Likert ratings on a number of descriptive metrics
suggested that the controller-free interaction felt signicantly
less comfortable and less precise than the controller-based
interactions. Even more worryingly, this same article noted
that participants performed worse on a number of
performance metrics when their hands were tracked with the
Leap than with the controller.
It is likely that the main reason that controller-free hand
tracking is problematic during object interaction is the lack of
tactile and haptic cues in this context. Tactile cues are a key part to
successful manual actions, and their removal impairs the
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284612
Buckingham VR Hand Tracking
accuracy of manual localization (Rao and Gordon, 2001), alters
grasping kinematics (Whitwell et al., 2015;Furmanek et al., 2019;
Ozana et al., 2020;Mangalam et al., 2021), and affects the normal
application of ngertip forces (Buckingham et al., 2016). While
controller-based interactions with virtual objects do not deliver
the same tactile and haptic sensations experienced when
interacting with objects in the physical environment, the
vibro-tactile pulses and the mass of the controllers do seem to
aid in scaffolding a compelling percept of touching something. A
range of solutions to replace tactile feedback in the context of VR
have been developed in recent years. From a hardware
perspective, solutions range from glove-like devices which
provide tactile feedback and force feedback to the digits
(Carlton, 2021) to stimuli which precisely deform the
ngertips to create a sensation of the mechanics of interaction
(Schorr and Okamura, 2017) to devices which deliver contactless
ultrasonic pulses aimed at the hands to simulate tactile cues
(Rakkolainen et al., 2019). Researchers have also used a lower-
cost mixed reality solution known as haptic retargetingwhere
an individual interacts with a single physical peripheral and the
apparent position and orientation of the hands are subtly
manipulated to create the illusion of interacting with a range
of different objects (Azmandian et al., 2016;Clarence et al., 2021).
It is currently unclear which of these solutions (or one hitherto
unforeseen) will solve this issue, but it clearly a major challenge
for the broad uptake immersive virtual reality.
Challenge 2Tracking Location
With inside-outcameras in current consumer models (e.g., the
Oculus Quest 2), hand tracking is at its most reliable when the
hands are roughly in front ofthe face, presumably to maximise the
overlap of the elds of view of the individual cameras which track
the hands. In these headsets, the orientation of these cameras is
xed, presumably due to the assumption that participants will be
looking at what they are doing in VR. This assumption is probably
appropriate for discrete game-style events”–it is well-established
that individuals foveate the hands and the action endpoint during
goal-directed tasks (Desmurget et al., 1998;Johansson et al., 2001;
Lavoie et al., 2018). In more natural sequences of tasks (e.g.,
preparing food), however, the hands are likely to spend
signicant proportion of time in the lower visual eld due to
their physical location below the head. This asymmetry in the
common locations of the hand during many tasks was discussed in
the context of a lower visual eld specialization for manual action
by Previc (1990) and has received support parallels from a range of
studies showing that humans are more efcient utilizing visual
feedback to guide effective reaching toward targets in their lower
visual eld than their upper visual eld (Danckert and Goodale,
2001;Khan and Lawrence, 2005;Krigolson and Heath, 2006). This
behavioural work is supported by evidence from the visual system
for a lower visual eld speciality for factors related to action
(Schmidtmann et al., 2015;Zhou et al., 2017), as well as
neuroimaging evidence that grasping objects in the lower visual
eld preferentially activates a network of dorsal brain regions
specialised for planning and controlling visually-guided actions
(Rossit et al., 2013). As the range of tasks undertaken in VR widens
to include more natural everyday experiences where the hands
might be engaged in tasks in the lower visual elds, limitations of
tracking and visualization in this region of space will likely become
more apparent. Indeed, this issue is not only one of tracking, but
hardware eld of view. Currently the main focus on eld of view is
concerned with increasing the lateral extent, with little
consideration given to the fact that the letterboxshape of
most VR HMDs reduce the vertical eld of view in the lower
visual eld by more than 10% compared to that which the eye
affords in the physical environment (Kreylos, 2016;Kreylos, 2019).
Together, these issues of tracking limitations and physical
occlusion are likely to result in unnatural head movements in
manual tasks to ensure the hands are kept in view which could limit
the transfer of training from virtual to physical environments, or
signicant impacts on immersion as the hands disappear from
peripheral view at an unexpected or inconsistent point.
Challenge 3Uncanny Phenomenon and
The uncanny phenomenon (sometimes referred to as the
uncanny valley) refers to the lack of afnity yielding feelings
of unease or disgust when looking at, or interacting with,
something articial which falls just short of appearing natural
(Mori, 1970;Wang et al., 2015). The cause of this effect is still
undetermined, but recent studies have suggested that this effect
might be driven by mismatches between the apparently-biological
appearance of the offending stimuli and non-biological
kinematics and/or inappropriate features such as temperature
and surface textures (Saygin et al., 2012;Kätsyri et al., 2015). The
main triggers for uncanny valley seem to be in the realms of
computer-generated avatars (MacDorman et al., 2009;
McDonnell and Breidt, 2010) and interactive humanoid robots
(Destephe et al., 2015;Strait et al., 2017) and, as such, much of
research into this topic has focussed on faces. Recent studies have
suggested that this effect is amplied when experienced through
an HMD (Hepperle et al., 2020), highlighting the importance of
this factor in the context of tracked VR experiences.
Little work has, by contrast, examined such responses toward
hands. In the context of prosthetic hands, Poliakoff et al. (2013,
2018) demonstrated that images of life-like prosthetic hands were
rated as more eerie than anatomical or robotic hands in
equivalent poses. This effect appears to be eliminated in some
groups with extensive experience (e.g., in observers who
themselves have a limb absence), but is still strongly
experienced by prosthetists and non-amputees trained to use a
prosthetic hand simulator (Buckingham et al., 2019). Given the
strong possibility of inducing a presence-hindering effect if
virtual hands are sufciently disconcerting (Brenton et al.,
2005), it seems prudent to recommend outline or cartoon
hands as the norm for even strongly-embodied VR
experiences. This suggestion is particularly important for
untetheredHMDs, due to the fact that rendering
photorealistic images of hands tracked at the high frequencies
required to visualize the full range of dextrous actions will require
signicant computing power. A nal point in this regard which
also bears mention is that the uncanny valley is not a solely visual
experience, but a multisensory one. For example, it has been
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284613
Buckingham VR Hand Tracking
shown that users experience of their presence in VR rapidly
declines when the visual cues in a VR scenario do not match with
the degree of haptic feedback (Berger et al., 2018). Furthermore it
has recently been shown that when the articiality of tactile cues
and visual cues are mismatched, this can also generate a reduction
in feelings of ownership (DAlonzo et al., 2019). Thus if tactile
cues are to become a feature of hand tracking and visualization,
care must be taken to avoid features of this so-called haptic
uncanny valley(Berger et al., 2018).
A more general issue which developers must grapple with than
hedonic perception is so-called embodiment”–the feeling of
ownership that one feels toward an effector that they are
controlling. This term is usually discussed in the context of a
body part or a tool, so has clear implications in the context of
hand tracking in VR (Kiltenietal.,2012) and is usually measured
either through subjective questionnaires or ostensibly objective
measures of felt body position and physiological responses to
threat. Anecdotally the dynamic and precise experience of viewing
computer-generated hands which are being tracked yields an
extremely strong sense of embodiment which does not require a
lengthy period of training or induction. In the context of virtual hands
presented through an HMD, the literature suggests that embodiment
happens naturally with realistic and veridical stimuli. Pyasik et al.
(2020) have shown that participants feel stronger levels of ownership
toward 3-D scans of their own hand than they did toward an
articially-smoothed and whitened hand. Furthermore, it has been
shown that feelings of embodiment are enhanced when the virtual
hands appear to be connected to the body rather than disembodied
(Seinfeld and Müller, 2020).Atthetimeofwriting,however,much
work remains to be done to build up a comprehensive picture of what
visual factors are required to balance embodiment, enjoyment, and
effective interaction with virtual environments.
Challenge 4Inclusivity
Inclusivity is an increasingly important ethical issue in technology
(Birhane, 2021), and the development of hand tracking and
visualization in iVR throws up a series of unique challenges in
this regard. A fundamental part of marker-free hand tracking is to
segment the skin from the surrounding background to build, and
ultimately visualize, the dynamics of the hand. One potential
issue which has not received explicit consideration is that of skin
pigmentation. There are a number of recent anecdotal examples
(Fussell, 2017) of examples framed around hardware limitations
where items from automatic soap dispensers to heart-rate
monitors fail to function as effectively for individuals with
darker skin tones (which are less reective) than lighter skin
tones (which are more reective). It is critical that, as iVR is more
widely adopted, the cameras which track the hands are able to
adequately image all levels of skin pigmentation.
A related issue comes from the software which is used to turn the
images captured by the cameras into dynamic models of the hands,
using models of possible hand congurations (inverse kinematics).
These models, assuming they are built from training sets, are likely to
suffer from the same algorithmic bias which has been problematic in
face classication research (Buolamwini and Gebru, 2018), with
datasets largely derived from Caucasian males yielding startling
disparities in levels of misclassication across skin type and
gender. This issue becomes one not just of skin pigmentation, but
of gender, age, disability, and skin texture and presumably will be
exacerbated at these intersections. Any hardware and software which
aims to cater for the average userrisks leaving hand tracking
functionally unavailable to large portions of society. One possible
solution to this could be to have users generate their own
personalised training sets, akin personalized voice prolesused
in some speech recognition software and home assistant devices.
The nal issue on this topic relates to the visualization of the
hands, related to the discussion of embodiment in the section
above. Although the current norm for hand visualization is for
outline or cartoon-style hands which lack distinguishing features,
presumably there will be a drive for the visualization of more
realistic-looking hands. As is becoming standard for facial avatars
in CG environment, it is important for individuals to be able to
develop a model in the virtual environment steps away from the
defaultof an able-bodied Caucasian male or female toward one
which accurately represents their bodily characteristics (or,
indeed, that of another). This can be jarringfor example it
has been shown that the appearance of opposite-gender hands
reduces womens experience of presence in virtual environments
(Schwind et al., 2017). With hands, this is also likely to be
particularly important from an embodiment perspective, with
an emerging body of literature suggesting that individuals are less
able to embody hands which appear to be from a visibly different
skin tone than their own (Farmer et al., 2012;Lira et al., 2017).
In summary, hand tracking is probably here to stay as a cardinal
(but probably still optional) feature of immersive virtual reality.
The opportunities for facilitating effective and engaging
interpersonal communication and more formal presentations
in a remote context is particularly exciting for many aspects of
our social, teaching, and learning worlds. Being cognisant of the
challenges which come with these opportunities is a rst step
toward developing a clear series of best practices to aid in the
development of the next generation of VR hardware and
immersive experiences.
GB conceived and wrote the manuscript.
The author would like to thank João Mineiro for his comments on
an earlier draft of this manuscript.
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284614
Buckingham VR Hand Tracking
Argelaguet, F., Hoyet, L., Trico, M., and Lecuyer, A. (2016). The Role of
Interaction in Virtual Embodiment: Effects of the Virtual Hand
Representation,in 2016 IEEE Virtual Reality (VR). Presented at the 2016
IEEE Virtual Reality (VR), 310. doi:10.1109/VR.2016.7504682
Azmandian, M., Hancock, M., Benko, H., Ofek, E., and Wilson, A. D. (2016).
Haptic Retargeting: Dynamic Repurposing of Passive Haptics for Enhanced
Virtual Reality Experiences,in Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems, New York, NY, USA (Association for
Computing Machinery), 19681979.
Berger, C. C., Gonzalez-Franco, M., Ofek, E., and Hinckley, K. (2018). The
Uncanny valley of Haptics. Sci. Robot. 3, eaar7010. doi:10.1126/
Birhane, A. (2021). Algorithmic Injustice: A Relational Ethics Approach. Patterns
2, 100205. doi:10.1016/j.patter.2021.100205
Brenton, H., Gillies, M., Ballin, D., and Chatting, D. (2005). D.: The Uncanny
valley: Does it Exist,in 19th British HCI Group Annual Conference:
Workshop on Human-Animated Character Interaction.
Buckingham, G., Michelakakis, E. E., and Cole, J. (2016). Perceiving and Acting
upon Weight Illusions in the Absence of Somatosensory Information.
J. Neurophysiol. 115, 19461953. doi:10.1152/jn.00587.2015
Buckingham, G., Parr, J., Wood, G., Day, S., Chadwell, A., Head, J., et al. (2019).
Upper- and Lower-Limb Amputees Show Reduced Levels of Eeriness for
Images of Prosthetic Hands. Psychon. Bull. Rev. 26, 12951302. doi:10.3758/
Buolamwini, J., and Gebru, T. (2018). Gender Shades: Intersectional Accuracy
Disparities in Commercial Gender Classication,in Conference on Fairness,
Accountability and Transparency. Presented at the Conference on Fairness,
Accountability and Transparency (PMLR), 7791.
Carlton, B. (2021). HaptX Launches True-Contact Haptic Gloves for VR and
Robotics. VRScout. Available at:
haptic-gloves-vr/(accessed 10 3, 21).
Clarence, A., Knibbe, J., Cordeil, M., and Wybrow, M. (2021). Unscripted
Retargeting: Reach Prediction for Haptic Retargeting in Virtual Reality,in
2021 IEEE Virtual Reality and 3D User Interfaces (VR). Presented at the 2021
IEEE Virtual Reality and 3D User Interfaces (VR), 150159. doi:10.1109/
DAlonzo, M., Mioli, A., Formica, D., Vollero, L., and Di Pino, G. (2019). Different
Level of Virtualization of Sight and Touch Produces the Uncanny valley of
Avatars Hand Embodiment. Sci. Rep. 9, 19030. doi:10.1038/s41598-019-
Desmurget, M., Pélisson, D., Rossetti, Y., and Prablanc, C. (1998). From Eye to
Hand: Planning Goal-Directed Movements. Neurosci. Biobehav. Rev. 22,
761788. doi:10.1016/s0149-7634(98)00004-9
Destephe, M., Brandao, M., Kishi, T., Zecca, M., Hashimoto, K., and Takanishi, A.
(2015). Walking in the Uncanny Valley: Importance of the Attractiveness on
the Acceptance of a Robot as a Working Partner. Front. Psychol. 6, 204.
Dieter, K. C., Hu, B., Knill, D. C., Blake, R., and Tadin, D. (2014). Kinesthesis Can
Make an Invisible Hand Visible. Psychol. Sci. 25, 6675. doi:10.1177/
Farmer, H., Tajadura-Jiménez, A., and Tsakiris, M. (2012). Beyond the Colour of
My Skin: How Skin Colour Affects the Sense of Body-Ownership. Conscious.
Cogn. 21, 12421256. doi:10.1016/j.concog.2012.04.011
Furmanek, M. P., Schettino, L. F., Yarossi, M., Kirkman, S., Adamovich, S. V., and
Tunik, E. (2019). Coordination of Reach-To-Grasp in Physical and Haptic-Free
Virtual Environments. J. Neuroengineering Rehabil. 16, 78. doi:10.1186/s12984-
Fussell, S. (2017). Why Cant This Soap Dispenser Identify Dark Skin? [WWW
Document]. Gizmodo. Available at:
dark-skin-1797931773 (accessed 9 3, 21).
Goodale, M. A., and Danckert, J. (2001). Superior Performance for Visually Guided
Pointing in the Lower Visual Field. Exp. Brain Res. 137, 303308. doi:10.1007/
Guna, J., Jakus, G., Pogačnik, M., Tomažič, S., and Sodnik, J. (2014). An Analysis of
the Precision and Reliability of the Leap Motion Sensor and its Suitability for
Static and Dynamic Tracking. Sensors 14, 37023720. doi:10.3390/s140203702
Harris, D. J., Bird, J. M., Smart, P. A., Wilson, M. R., and Vine, S. J. (2020). A
Framework for the Testing and Validation of Simulated Environments in
Experimentation and Training. Front. Psychol. 11, 605. doi:10.3389/
Hepperle, D., Ödell, H., and Wölfel, M. (2020). Differences in the Uncanny Valley
between Head-Mounted Displays and Monitors,in 2020 International
Conference on Cyberworlds (CW). Presented at the 2020 International
Conference on Cyberworlds (CW), 4148. doi:10.1109/CW49994.2020.00014
Inside Facebook Reality Labs (2021). Wrist-based Interaction for the Next
Computing Platform [WWW Document]. Facebook Technol. Available at:
the-next-computing-platform/(accessed 3 18, 21).
Iverson, J. M., and Goldin-Meadow, S. (1998). Why People Gesture when They
Speak. Nature 396, 228. doi:10.1038/24300
Johansson, R. S., Westling, G., Bäckström, A., and Flanagan, J. R. (2001). Eye-Hand
Coordination in Object Manipulation. J. Neurosci. 21, 69176932. doi:10.1523/
Kätsyri, J., Förger, K., Mäkäräinen, M., and Takala, T. (2015). A Review of
Empirical Evidence on Different Uncanny Valley Hypotheses: Support for
Perceptual Mismatch as One Road to the valley of Eeriness. Front. Psychol. 6,
390. doi:10.3389/fpsyg.2015.00390
Khan, M. A., and Lawrence, G. P. (2005). Differences in Visuomotor Control
between the Upper and Lower Visual elds. Exp. Brain Res. 164, 395398.
Kilteni, K., Groten, R., and Slater, M. (2012). The Sense of Embodiment in Virtual
Reality. Presence 21, 373387. doi:10.1162/PRES_a_00124
Kreylos, O. (2016). Optical Properties of Current VR HMDs [WWW Document].
Doc-Okorg. Available at: (accessed 9 3, 21).
Kreylos, O. (2019). Quantitative Comparison of VR Headset Fields of View
[WWW Document]. Doc-Okorg. Available at:
20200328103226/ (accessed 9 3, 21).
Krigolson, O., and Heath, M. (2006). A Lower Visual Field Advantage for Endpoint
Stability but No Advantage for Online Movement Precision. Exp. Brain Res.
170, 127135. doi:10.1007/s00221-006-0386-x
Land, M. F. (2009). Vision, Eye Movements, and Natural Behavior. Vis. Neurosci.
26, 5162. doi:10.1017/S0952523808080899
Lavoie, E. B., Valevicius, A. M., Boser, Q. A., Kovic, O., Vette, A. H., Pilarski, P. M.,
et al. (2018). Using Synchronized Eye and Motion Tracking to Determine High-
Precision Eye-Movement Patterns during Object-Interaction Tasks. J. Vis. 18,
18. doi:10.1167/18.6.18
Lavoie, E., and Chapman, C. S. (2021). Whats Limbs Got to Do with it? Real-
World Movement Correlates with Feelings of Ownership over Virtual Arms
during Object Interactions in Virtual Reality. Neurosci. Conscious. 7 (1),
niaa027. doi:10.1093/nc/niaa027
Lira, M., Egito, J. H., DallAgnol, P. A., Amodio, D.M., Gonçalves, Ó. F., and Boggio,
P. S. (2017). The Inuence of Skin Colour on the Experience of Ownership in the
Rubber Hand Illusion. Sci. Rep. 7, 15745. doi:10.1038/s41598-017-16137-3
MacDorman, K. F., Green, R. D., Ho, C.-C., and Koch, C. T. (2009). Too Real for
comfort? Uncanny Responses to Computer Generated Faces. Comput. Hum.
Behav. 25, 695710. doi:10.1016/j.chb.2008.12.026
Mangalam, M., Yarossi, M., Furmanek, M. P., and Tunik, E. (2021). Control of
Aperture Closure during Reach-To-Grasp Movements in Immersive Haptic-
Free Virtual Reality. Exp. Brain Res. 239 (5), 16511665. doi:10.1007/s00221-
Masurovsky, A., Chojecki, P., Runde, D., Lafci, M., Przewozny, D., and Gaebler, M.
(2020). Controller-Free Hand Tracking for Grab-And-Place Tasks in
Immersive Virtual Reality: Design Elements and Their Empirical Study.
Multimodal Technol. Interact. 4, 91. doi:10.3390/mti4040091
McDonnell, R., and Breidt, M. (2010). Face Reality: Investigating the Uncanny
Valley for Virtual Faces,in ACM SIGGRAPH ASIA 2010 Sketches, SA 10,
New York, NY, USA (Association for Computing Machinery), 12.
Mori, M. (1970). Bukimi No Tani [The Uncanny valley]. Energy 7, 3335.
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284615
Buckingham VR Hand Tracking
Ozana, A., Berman, S., and Ganel, T. (2020). Grasping Webers Law in a Virtual
Environment: The Effect of Haptic Feedback. Front. Psychol. 11, 573352.
Özçalışkan, Ş., Lucero, C., and Goldin-Meadow, S. (2016). Is Seeing Gesture
Necessary to Gesture Like a Native Speaker. Psychol. Sci. 27, 737747.
Poliakoff, E., Beach, N., Best, R., Howard, T., and Gowen, E. (2013). Can Looking at
a Hand Make Your Skin Crawl? Peering into the Uncanny Valley for Hands.
Perception 42, 9981000. doi:10.1068/p7569
Poliakoff, E., OKane, S., Carefoot, O., Kyberd, P., and Gowen, E. (2018).
Investigating the Uncanny valley for Prosthetic Hands. Prosthet. Orthot. Int.
42, 2127. doi:10.1177/0309364617744083
Previc, F. H. (1990). Functional Specialization in the Lower and Upper Visual elds
in Humans: Its Ecological Origins and Neurophysiological Implications. Behav.
Brain Sci. 13, 519542. doi:10.1017/S0140525X00080018
Pyasik, M., Tieri, G., and Pia, L. (2020). Visual Appearance of the Virtual Hand
Affects Embodiment in the Virtual Hand Illusion. Sci. Rep. 10, 5412.
Rakkolainen, I., Sand, A., and Raisamo, R. (2019). A Survey of Mid-air Ultrasonic
Tactile Feedback,in 2019 IEEE International Symposium on Multimedia
(ISM). Presented at the 2019 IEEE International Symposium on Multimedia
(ISM), 94944. doi:10.1109/ISM46123.2019.00022
Rao, A., and Gordon, A. (2001). Contribution of Tactile Information to Accuracy
in Pointing Movements. Exp. Brain Res. 138, 438445. doi:10.1007/
Ross, P., and Flack, T. (2020). Removing Hand Form Information Specically
Impairs Emotion Recognition for Fearful and Angry Body Stimuli. Perception
49, 98112. doi:10.1177/0301006619893229
Rossit, S., McAdam, T., Mclean, D. A., Goodale, M. A., and Culham, J. C. (2013).
fMRI Reveals a Lower Visual Field Preference for Hand Actions in Human
superior Parieto-Occipital Cortex (SPOC) and Precuneus. Cortex 49,
25252541. doi:10.1016/j.cortex.2012.12.014
Saygin, A. P., Chaminade, T., Ishiguro, H., Driver, J., and Frith, C. (2012). The
Thing that Should Not Be: Predictive Coding and the Uncanny valley in
Perceiving Human and Humanoid Robot Actions. Soc. Cogn. Affect.
Neurosci. 7, 413422. doi:10.1093/scan/nsr025
Schmidtmann, G., Logan, A. J., Kennedy, G. J., Gordon, G. E., and Lofer, G.
(2015). Distinct Lower Visual Field Preference for Object Shape. J. Vis. 15, 18.
Schorr, S. B., and Okamura, A. M. (2017). Fingertip Tactile Devices for Virtual
Object Manipulation and Exploration,in Proceedings of the 2017 CHI
Conference on Human Factors in Computing Systems, CHI 17, New York,
NY, USA (Association for Computing Machinery), 31153119. doi:10.1145/
Schwartz, G., Wei, S.-E., Wang, T.-L., Lombardi, S., Simon, T., Saragih, J., et al.
(2020). The Eyes Have it. ACM Trans. Graph. 39, 91:91:191:91:15.
Schwind, V., Knierim, P., Tasci, C., Franczak, P., Haas, N., and Henze, N. (2017).
"These Are Not My Hands!",in Proceedings of the 2017 CHI Conference on
Human Factors in Computing Systems. Presented at the CHI 17: CHI
Conference on Human Factors in Computing Systems, Denver Colorado
USA (ACM), 15771582. doi:10.1145/3025453.3025602
Seinfeld, S., and Müller, J. (2020). Impact of Visuomotor Feedback on the
Embodiment of Virtual Hands Detached from the Body. Sci. Rep. 10, 22427.
Slater, M., and Sanchez-Vives, M. V. (2016). Enhancing Our Lives with Immersive
Virtual Reality. Front. Robot. AI. 3, 74. doi:10.3389/frobt.2016.00074
Strait, M. K., Floerke, V. A., Ju, W., Maddox, K., Remedios, J. D., Jung, M. F., et al.
(2017). Understanding the Uncanny: Both Atypical Features and Category
Ambiguity Provoke Aversion toward Humanlike Robots. Front. Psychol. 8,
1366. doi:10.3389/fpsyg.2017.01366
Voigt-Antons, J.-N., Kojić, T., Ali, D., and Möller, S. (2020). Inuence of Hand
Tracking as a Way of Interaction in Virtual Reality on User Experience.
ArXiv200412642 Cs.
Vosinakis, S., and Koutsabasis, P. (2018). Evaluation of Visual Feedback
Techniques for Virtual Grasping with Bare Hands Using Leap Motion and
Oculus Rift. Virtual Reality 22, 4762. doi:10.1007/s10055-017-0313-4
Wang, S., Lilienfeld, S. O., and Rochat, P. (2015). The Uncanny Valley:
Existence and Explanations. Rev. Gen. Psychol. 19, 393407.
Whitwell, R. L., Ganel, T., Byrne, C. M., and Goodale, M. A. (2015). Real-Time
Vision, Tactile Cues, and Visual Form Agnosia: Removing Haptic Feedback
from a "Natural" Grasping Task Induces Pantomime-Like Grasps. Front. Hum.
Neurosci. 9, 216. doi:10.3389/fnhum.2015.00216
Zhou, Y., Yu, G., Yu, X., Wu, S., and Zhang, M. (2017). Asymmetric
Representations of Upper and Lower Visual elds in Egocentric and
Allocentric References. J. Vis. 17, 9. doi:10.1167/17.1.9
Conict of Interest: The author declares that the research was conducted in the
absence of any commercial or nancial relationships that could be construed as a
potential conict of interest.
Publishers Note: All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their afliated organizations, or those of
the publisher, the editors and the reviewers. Any product that may be evaluated in
this article, or claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.
Copyright © 2021 Buckingham. This is an open-access article distributed under the
terms of the Creative Commons Attribution License (CC BY). The use, distribution
or reproduction in other forums is permitted, provided the original author(s) and the
copyright owner(s) are credited and that the original publication in this journal is
cited, in accordance with accepted academic practice. No use, distribution or
reproduction is permitted which does not comply with these terms.
Frontiers in Virtual Reality | October 2021 | Volume 2 | Article 7284616
Buckingham VR Hand Tracking
... Hence, a VR system that ensures the necessary conditions for presence [54]-whereas PI is more directly related to the immersive features of a VR system with adequate immersive properties, embodiment and plausible and believable scenarios-can elicit behavioral and psychophysiological responses [55][56][57] consistent with real-world counterparts. Modern VR setups allow for relatively precise recordings of human-hand motion capture [58] with the feedback of the user's hand enabling a more or less embodied experience. The possibility of the inclusion of virtual models of anthropomorphic hands mimicking a user's own, as well as a variety of other end effectors (including different tools), allows for testing different levels of embodiment and their impact on collaborative situations, not limited to user's own body, like in real-life testing. ...
... As pointed out by Bauer et al. [9], robot touch may also serve other communication purposes, important for establishing communication (such as a handshake); therefore, including it in VR scenes with cobots seems to be an important issue to be solved. While the use of haptic technologies significantly improves the embodiment of virtual scenes [58], haptics is not currently widespread due to the limited number of commercially naturalistic haptic interfaces. ...
Full-text available
Collaborative robots (cobots) could help humans in tasks that are mundane, dangerous or where direct human contact carries risk. Yet, the collaboration between humans and robots is severely limited by the aspects of the safety and comfort of human operators. In this paper, we outline the use of extended reality (XR) as a way to test and develop collaboration with robots. We focus on virtual reality (VR) in simulating collaboration scenarios and the use of cobot digital twins. This is specifically useful in situations that are difficult or even impossible to safely test in real life, such as dangerous scenarios. We describe using XR simulations as a means to evaluate collaboration with robots without putting humans at harm. We show how an XR setting enables combining human behavioral data, subjective self-reports, and biosignals signifying human comfort, stress and cognitive load during collaboration. Several works demonstrate XR can be used to train human operators and provide them with augmented reality (AR) interfaces to enhance their performance with robots. We also provide a first attempt at what could become the basis for a human-robot collaboration testing framework, specifically for designing and testing factors affecting human-robot collaboration. The use of XR has the potential to change the way we design and test cobots, and train cobot operators, in a range of applications: from industry, through healthcare, to space operations.
Measuring the motions of human hand joints is often a challenge due to the high number of degrees of freedom. In this study, we proposed a hand tracking system utilizing action cameras and ArUco markers to continuously measure the rotation angles of hand joints. Three methods were developed to estimate the joint rotation angles. The pos-based method transforms marker positions to a reference coordinate system (RCS) and extracts a hand skeleton to identify the rotation angles. Similarly, the orient-x-based method calculates the rotation angles from the transformed x-orientations of the detected markers in the RCS. In contrast, the orient-mat-based method first identifies the rotation angles in each camera coordinate system using the detected orientations, and then, synthesizes the results regarding each joint. Experiment results indicated that the repeatability errors with one camera regarding different marker sizes were around 2.64 to 27.56 degrees and 0.60 to 2.36 degrees using the marker positions and orientations respectively. When multiple cameras were employed to measure the joint rotation angles, the angles measured by using the three methods were comparable with that measured by a goniometer. Despite larger deviations occurred when using the pos-based method. Further analysis indicated that the results of using the orient-mat-based method can describe more types of joint rotations, and the effectiveness of this method was verified by capturing hand movements of several participants. Thus it is recommended for measuring joint rotation angles in practical setups.
Virtual Reality (VR) technology is frequently applied in simulation, particularly in medical training. VR medical training often requires user input either from controllers or free-hand gestures. Nowadays, hand gestures are commonly tracked via built-in cameras from a VR headset. Like controllers, hand tracking can be used in VR applications to control virtual objects. This research developed VR intubation training as a case study and applied controllers and hand tracking for four interactions—namely collision, grabbing, pressing, and release. The quasi-experimental design assigned 30 medical students in clinical training to investigate the differences between using VR controller and hand tracking in medical interactions. The subjects were divided into two groups, one with VR controllers and the other with VR hand tracking, to study the interaction time and user satisfaction in seven procedures. System Usability Scale (SUS) and User Satisfaction Evaluation Questionnaire (USEQ) were used to measure user usability and satisfaction, respectively. The results showed that the interaction time of each procedure was not different. Similarly, according to SUS and USEQ scores, satisfaction and usability were also not different. Therefore, in VR intubation training, using hand tracking has no difference in results to using controllers. As medical training with free-hand gestures is more natural for real-world situations, hand tracking will play an important role as user input for VR medical training. This allows trainees to recognize and correct their postures intuitively, which is more beneficial for self-learning and practicing.
Full-text available
Virtual reality (VR) has garnered much interest as a training environment for motor skill acquisition, including for neurological rehabilitation of upper extremities. While the focus has been on gross upper limb motion, VR applications that involve reaching for, and interacting with, virtual objects are growing. The absence of true haptics in VR when it comes to hand-object interactions raises a fundamentally important question: can haptic-free immersive virtual environments (hf-VEs) support naturalistic coordination of reach-to-grasp movements? This issue has been grossly understudied, and yet is of significant importance in the development and application of VR across a number of sectors. In a previous study (Furmanek et al. 2019), we reported that reach-to-grasp movements are similarly coordinated in both the physical environment (PE) and hf-VE. The most noteworthy difference was that the closure phase—which begins at maximum aperture and lasts through the end of the movement—was longer in hf-VE than in PE, suggesting that different control laws might govern the initiation of closure between the two environments. To do so, we reanalyzed data from Furmanek et al. (2019), in which the participants reached to grasp three differently sized physical objects, and matching 3D virtual object renderings, placed at three different locations. Our analysis revealed two key findings pertaining to the initiation of closure in PE and hf-VE. First, the respective control laws governing the initiation of aperture closure in PE and hf-VE both included state estimates of transport velocity and acceleration, supporting a general unified control policy for implementing reach-to-grasp across physical and virtual environments. Second, aperture was less informative to the control law in hf-VE. We suggest that the latter was likely because transport velocity at closure onset and aperture at closure onset were less independent in hf-VE than in PE, ultimately resulting in aperture at closure onset having a weaker influence on the initiation of closure. In this way, the excess time and muscular effort needed to actively bring the fingers to a stop at the interface of a virtual object was factored into the control law governing the initiation of closure in hf-VE. Critically, this control law remained applicable, albeit with different weights in hf-VE, despite the absence of terminal haptic feedback and potential perceptual differences.
Full-text available
Humans will initially move awkwardly so that the end-state of their movement is comfortable. But, what is comfortable? We might assume it refers to a particular physical body posture, however, humans have been shown to move a computer cursor on a screen with an out-of-sight hand less efficiently (curved) such that the visual representation appears more efficient (straight). This suggests that movement plans are made in large part to satisfy the demands of their visual appearance, rather than their physical movement properties. So, what determines if a body movement is comfortable—how it feels or how it looks? We translated an object-interaction task from the real-world into immersive virtual reality (IVR) to dissociate a movement from its visual appearance. Participants completed at least 20 trials in two conditions: Controllers—where participants saw a visual representation of the hand-held controllers and Arms—where they saw a set of virtual limbs. We found participants seeing virtual limbs moved in a less biomechanically efficient manner to make the limbs look similar to if they were interacting with a real-world object. These movement changes correlated with an increase in self-reported feelings of ownership over the limbs as compared to the controllers. Overall this suggests we plan our movements to provide optimal visual feedback, even at the cost of being less efficient. Moreover, we speculate that a detailed measurement of how people move in IVR may provide a new tool for assessing their degree of embodiment. There is something about seeing a set of limbs in front of you, doing your actions, that affects your moving, and in essence, your thinking.
Full-text available
It has become trivial to point out that algorithmic systems increasingly pervade the social sphere. Improved efficiency—the hallmark of these systems—drives their mass integration into day-to-day life. However, as a robust body of research in the area of algorithmic injustice shows, algorithmic systems, especially when used to sort and predict social outcomes, are not only inadequate but also perpetuate harm. In particular, a persistent and recurrent trend within the literature indicates that society's most vulnerable are disproportionally impacted. When algorithmic injustice and harm are brought to the fore, most of the solutions on offer (1) revolve around technical solutions and (2) do not center disproportionally impacted communities. This paper proposes a fundamental shift—from rational to relational—in thinking about personhood, data, justice, and everything in between, and places ethics as something that goes above and beyond technical solutions. Outlining the idea of ethics built on the foundations of relationality, this paper calls for a rethinking of justice and ethics as a set of broad, contingent, and fluid concepts and down-to-earth practices that are best viewed as a habit and not a mere methodology for data science. As such, this paper mainly offers critical examinations and reflection and not “solutions.”
Full-text available
It has been shown that mere observation of body discontinuity leads to diminished body ownership. However, the impact of body discontinuity has mainly been investigated in conditions where participants observe a collocated static virtual body from a first-person perspective. This study explores the influence of body discountinuity on the sense of embodiment, when rich visuomotor correlations between a real and an artificial virtual body are established. In two experiments, we evaluated body ownership and motor performance, when participants interacted in virtual reality either using virtual hands connected or disconnected from a body. We found that even under the presence of congruent visuomotor feedback, mere observation of body discontinuity resulted in diminished embodiment. Contradictory evidence was found in relation to motor performance, where further research is needed to understand the role of visual body discontinuity in motor tasks. Preliminary findings on physiological reactions to a threat were also assessed, indicating that body visual discontinuity does not differently impact threat-related skin conductance responses. The present results are in accordance with past evidence showing that body discontinuity negatively impacts embodiment. However, further research is needed to understand the influence of visuomotor feedback and body morphological congruency on motor performance and threat-related physiological reactions.
Full-text available
Recent findings suggest that the functional separation between vision-for-action and vision-for-perception does not generalize to situations in which virtual objects are used as targets. For instance, unlike actions toward real objects that violate Weber’s law, a basic law of visual perception, actions toward virtual objects presented on flat-screens, or in remote virtual environments, obey to Weber’s law. These results suggest that actions in virtual environments are performed in an inefficient manner and are subjected to perceptual effects. It is unclear, however, whether this inefficiency reflects extensive variation in the way in which visual information is processed in virtual environments or more local aspects related to the settings of the virtual environment. In the current study, we focused on grasping performance in a state-of-the-art virtual reality system that provides an accurate representation of the 3D space. Within this environment, we tested the effect of haptic feedback on grasping trajectories. Participants were asked to perform bimanual grasping movements toward the edges of virtual targets. In the haptic feedback condition, physical stimuli of matching dimensions were embedded in the virtual environment. Haptic feedback was not provided in the no-feedback condition. The results showed that grasping trajectories in the feedback, but not in the no-feedback condition, could be performed more efficiently, and evade the influence of Weber’s law. These findings are discussed in relevance to previous literature on 2D and 3D grasping.
Full-text available
Hand tracking enables controller-free interaction with virtual environments, which can, compared to traditional handheld controllers, make virtual reality (VR) experiences more natural and immersive. As naturalness hinges on both technological and user-based features, fine-tuning the former while assessing the latter can be used to increase usability. For a grab-and-place use case in immersive VR, we compared a prototype of a camera-based hand tracking interface (Leap Motion) with customized design elements to the standard Leap Motion application programming interface (API) and a traditional controller solution (Oculus Touch). Usability was tested in 32 young healthy participants, whose performance was analyzed in terms of accuracy, speed and errors as well as subjective experience. We found higher performance and overall usability as well as overall preference for the handheld controller compared to both controller-free solutions. While most measures did not differ between the two controller-free solutions, the modifications made to the Leap API to form our prototype led to a significant decrease in accidental drops. Our results do not support the assumption of higher naturalness for hand tracking but suggest design elements to improve the robustness of controller-free object interaction in a grab-and-place scenario.
Full-text available
The uncanny valley describes a relationship between the degree of the emotional response with respect to a character’s resemblance to an actual human being. It has been a topic for several decades and has been discussed by scholars of different disciplines and in various aspects such as robotics, 3D computer animations, interactive applications, and even lifelike dolls. With the increasing popularity of photo-realistic computer animation and telepresence applications, we are more and more exposed to characters that might be affected by the uncanny valley effect. Recent research suggests that the output medium, such as a monitor or head-mounted display, can have a significant effect on how we perceive given visualizations.In relation to the uncanny valley, we observe a similar effect in our study: characters appear significantly more eerie as well as humanlike when watched on a head-mounted virtual reality headset instead of a monitor. The amplification we see is similar to the findings that the uncanny valley effect is more pronounced when the respective characters are in motion.
Full-text available
New computer technologies, like virtual reality (VR), have created opportunities to study human behavior and train skills in novel ways. VR holds significant promise for maximizing the efficiency and effectiveness of skill learning in a variety of settings (e.g., sport, medicine, safety-critical industries) through immersive learning and augmentation of existing training methods. In many cases the adoption of VR for training has, however, preceded rigorous testing and validation of the simulation tool. In order for VR to be implemented successfully for both training and psychological experimentation it is necessary to first establish whether the simulation captures fundamental features of the real task and environment, and elicits realistic behaviors. Unfortunately evaluation of VR environments too often confuses presentation and function, and relies on superficial visual features that are not the key determinants of successful training outcomes. Therefore evidence-based methods of establishing the fidelity and validity of VR environments are required. To this end, we outline a taxonomy of the subtypes of fidelity and validity, and propose a variety of practical methods for testing and validating VR training simulations. Ultimately, a successful VR environment is one that enables transfer of learning to the real-world. We propose that key elements of psychological, affective and ergonomic fidelity, are the real determinants of successful transfer. By adopting an evidence-based approach to VR simulation design and testing it is possible to develop valid environments that allow the potential of VR training to be maximized.