Conference PaperPDF Available

Gesture Recognition Using Leap Motion: A Machine Learning-based Controller Interface

Authors:
  • Center for Artificial Intelligence Research (C-AIR)

Abstract and Figures

There is a growing tendency of making use of real-time interactions and converting the gestures to the virtual game scenario. In this paper, we present a gesture interfacing controller for real-time communication between the Leap Motion sensor and games. We also we compare the effectiveness of various methods of real-time machine learning algorithms to find the most optimal way to identify static hand gestures, as well as the most optimal sample size for use during the training step Moreover, we introduce a novel static hand gesture dataset containing 1200 samples for 10 static gesture classes. This dataset may encourage to develop innovative gesture recognition methods.
Content may be subject to copyright.
Gesture Recognition Using Leap Motion:
A Machine Learning-based Controller Interface
Ivo A. Stinghen Filho, Bernardo B. Gatto, Jos´
e Luiz de S. Pio,
Federal University of Amazonas
Manaus, Amazonas
{iasf,bernardo,josepio}@icomp.ufam.edu.br
Estevam N. Chen, Jucimar M. Junior, Ricardo Barboza
Amazonas State University
Manaus, Amazonas
enc.eng@uea.edu.br, {jucimar.jr,rsbarboza}@gmail.com
Abstract—There is a growing tendency of making use of
real-time interactions and converting the gestures to the virtual
game scenario. In this paper, we present a gesture interfacing
controller for real-time communication between the Leap
Motion sensor and games. We also we compare the effectiveness
of various methods of real-time machine learning algorithms
to find the most optimal way to identify static hand gestures,
as well as the most optimal sample size for use during the
training step Moreover, we introduce a novel static hand
gesture dataset containing 1200 samples for 10 static gesture
classes. This dataset may encourage to develop innovative
gesture recognition methods.
Keywords-Virtual Reality, Leap Motion, Motion Capture,
Machine Learning
I. INTRODUCTION
Humans interact with machines in a variety of ways. As
such, many forms of HCI (Human-Computer Interaction)
has been developed [6], [7]. Although the use of the mouse
and keyboard is widespread, new methods of HCI are also
developed. An example is gesture recognition, a topic that
received significant attention in the field of HCI due to the
development of Virtual Reality (VR) technology, as well as
a method to control robots [17].
A gesture is a form of nonverbal expression. It includes
hands, face and other parts of the body [4]. Hand gestures
can be divided into two categories: static and dynamic
gestures [3]. In static gestures, the gestures usually do not
change its shape over time. This paper will focus on such
hand gestures.
Depending on the type of the input data, the hand
gesture recognition can also be divided into two cate-
gories: appearance-based and 3D model-based algorithms.
Appearance-based algorithms use the data acquired from the
silhouette or contour of the input images. Meanwhile 3D
model-based algorithms use volumetric or skeletal data, or
even a combination of the two [3].
In this work, we present a comparison of three different
machine learning algorithms used in gesture recognition lit-
erature, with a game interface control technology providing
improved gameplay for games that use gesture recognition.
Moreover, we provide a dataset containing 1200 samples for
10 static gesture classes (https://drive.google.com/open?id=
1lXKnAlNdJ0I1tbPNnvXbi0VOCyUEPaMF).
II. LITERATURE REVIEW
Leap Motion [2] is a computer hardware sensor device
that supports hand and finger motions as input, similar
to a keyboard or a mouse, requiring no hand contact or
touching [12].
In this work, we used a variety of classifiers, including
KNN (K-Nearest Neighbors algorithm), Decision Trees [16]
and SVM (Support Vector Machines) [9].
A. Review on Gesture Recognition Methods
In the work of Ameur et al. [1], the authors introduced
an application with gestural hand control using leap motion
for medical visualization. Their experimental results demon-
strated a high accuracy rate of about 81%. In the paper of
Mapari and Kharat [11], a novel method for recognition of
American Sign Language (ASL) is proposed using Leap
Motion. The proposed feature scheme, combined with the
MLP achieved 90% of accuracy.
In the work of D. Yao et al. [16], the authors proposed
a decision-tree-based algorithm to recognize 3D gestures.
The provided experimental results show that the recognition
rate reached 95.8%, with a response time of 5.4seconds.
Similarly, In the paper of C.-H. Chuan et al. [4] the authors
present an American Sign Language recognition system with
Leap Motion, employing KNN and SVM, with a average
classification rate of 72.78% and 79.83%, respectively.
III. METHODOLOGY
In order to train the machine learning algorithms, it is nec-
essary to acquire the data regarding the fingers. To do so, we
recorded a series of hand signs using Leap Motion [12], and
the Leap Motion API (Application Programming Interface)
in Unity was used [13]. With this, it was possible to get many
different combinations of variables regarding the fingers. In
this work the main variables chosen were the combination
of the normalized spatial positions of the tip of the 5 fingers
and the 4 angles between adjacent fingers [3], as represented
in the figure 1.
SBC – Proceedings of SBGames 2018 — ISSN: 2179-2259 Computing Track – Short Papers
XVII SBGames – Foz do Igua¸cu – PR – Brazil, October 29th – November 1st, 2018
Figure 1. Methodology flowchart
A. Features Extraction in a 3D Space
The data is acquired through the Leap Motion camera,
which has infrared sensors. The camera is to be tied to the
user’s forehead, as shown in figure 2.
Figure 2. Leap Motion usage
B. Data Normalization
Data normalization is a critical step in computer vi-
sion related-works [8]. The main variables chosen were
the combination of the normalized spatial positions of the
5 fingertips, each one containing a 3D position of each
detected finger.
Γi= (Pρi)(1)
Γis the normalized position based on the center of the
hand palm ( P), and ρis the tip position of a finger.
Combined to this data, we also collect the 4 angles
between adjacent fingers, each one calculated according to
law of cosines.
θi=|ρi+1, P |2+|ρi, P |2− |ρi, ρi+1 |
2∗ |ρi+1, P |2∗ |ρi, P |2(2)
We made use of three world points to calculate the angle
between fingers, θ.ρis the tip position of a finger, and P is
the center of the hand palm and “i” varies from 1 to n - 1,
where n is the number of fingers. And
|ρA, ρB|(3)
calculate the distance between two points.
C. Classification Models
In this work, we evaluate the use of different types of
classifiers to use in our gesture recognition control system.
We used three classifiers: K-Nearest Neighbors(KNN), Sup-
port Vector Machines(SVM) and Decision Trees. To do so,
a python library named numPy was used, similar to its use
in Solem’s work [14].
IV. DATA SE T
In this database 6 different volunteers use their hands to
feed the database. Each one of them put the leap motion on
their heads and training the gestures while having variations
of said gestures.
In this work ten classes were used, each represent-
ing a hand gesture: “OPEN”, “CLOSE”, “THUMB”,
“TWO”, “THREE”, “FOUR”, “LOVE”, “COOL”, “FIRE”
and “SHAKA”. The hand gestures can be seen in figure 3.
Figure 3. 10 classes used in this work.
V. EX PE RI ME NTAL RESULTS
To generate the necessary data, it is required to do at
least one record session for each machine learning class,
each representing a hand sign. For each record session, while
positioning the hand, a script in Unity records the data to a
.csv file, along with the class name. A line is recorded every
0.05s, varying in total recording time with necessity. 30% of
the lines were used to train, while the remaining 70% lines
were used to evaluate accuracy. The algorithm then returns
the percentage of correct predictions.
The .csv files are then analyzed. Overfitting started oc-
curring when over than 12000 samples were utilized (1200
SBC – Proceedings of SBGames 2018 — ISSN: 2179-2259 Computing Track – Short Papers
XVII SBGames – Foz do Igua¸cu – PR – Brazil, October 29th – November 1st, 2018
per hand gesture), as such, this particular sample size was
used. Figure 6 shows the accuracy of gesture predictions
using different classifier algorithms utilizing a file with 1200
learning samples per gesture. It shows a comparison based
on the best algorithms results. The worst results were that
of the “COOL” class, with 96,57% hit rate. Table 1 shows
the confusion matrix of the hand gesture recognition of the
selected classes, In order to check for false positives and
consider the complexity of each gesture of the Decision
tree algorithm. The values are presented in percentage based
on the number of gestures of each class. Noticeably, the
“SHAKA” class has the smaller false positive rate.
A. Application
Given the results, the decision tree classifier was used
for offline [5] base classification and training with real-time
sample input. By having offline training, it’s possible to
make real-time predictions.
Our system have the following specifications: Windows
10 version 10.0.17, 64 bit, Intel R
CoreTM i5-5200U CPU
@ 2.20GHz, 2201 Mhz, 2 Core(s), 8GB RAM DDR3 3x,
USB 3.0 port, NVIDIA GTX 920M. We also make use of
Leap Motion, which uses two monochromatic IR (infrared)
cameras and three infrared LEDs (Light Emitting Diode),
the sensor device observes a roughly hemispherical area,
approximately up to 1 meter of distance. The LEDs are also
able to generate pattern-less IR light [15]. The software tools
used were Python 3.6.2 and Unity Engine 2018.1.0f2.
Figure 4. Data loop between Python and Unity.
A loop was created between Unity and a socket created
by Python, running on localhost, as shown in figure 4.
The current state is updated every 0.02 seconds in Unity’s
interface, Unity then sends a vector as input through the
socket to the decision tree classifier in scikit-learn.
Figure 5. The sequences used in the tests.
When receiving the data through the socket, the predict()
function is run, (using the decision tree classifiers) and
returns the result. Ex: “Open”. Finally, Unity receives the
result from the socket and creates effects as needed.
B. Qualitative Analysis
In our analysis the algorithms were labeled as A and B
so that the user could not know the name of the classifier he
was testing or which had better results in our comparison.
Algorithm A represents SVM and B the Decision Tree.
For this, we used the Case Study method [10] to obtain
a detailed examination of this comparison in relation to the
ease of gesture recognition. In order to obtain significant
analysis, the case study had 12 volunteers involved, who
tested 6 of the total of 10 gestures of that work.
The volunteers were instructed on how Leap Motion
works, including to only use the right hand and to always
remain in the field of view of the device, as well as images
of each gestures, as shown in figure 3.
Then it was instructed to follow the sequence of gestures
shown in figure 5. After that, an Unity scene is initialized
and the user is instructed to attempt to destroy all targets on
the screen. A test ends when destroying all targets or when
the sequence is finished. In the analysis to the test using the
A algorithm, the following results were verified:
In the gesture sequence 1(tested with all users), the
system often outputs the “Thumb” gesture when the inputs
were “Fire”. Everyone managed to perform the “Plasma
Fire” magic. In the gesture sequence 2(tested with 9 users),
the system often outputs the “Open“ gesture when the
inputs were “Four”. Everyone managed to perform the “Fire
Flame” magic. In the gesture sequence 3(tested with all
users), the system often outputs the “Fire” gesture when the
inputs were “Two”, meaning summoning the Thunderbolt
magic was impossible in all cases. That was due to the fact
the thumb finger blocked the camera view of the index finger.
Meanwhile using the B algorithm, in all 3 gesture se-
quences (tested with all users) the system output was mostly
correct gestures, with a small number of false positives,
although the same problem with the “Fire” gesture remained.
Overall, through observation and verbal reports of each
user, we came to the conclusion that the Decision Tree
approach was superior when comparing with the SVM.
VI. CONCLUSION AND FUTURE WO RK
In this paper, we used 1200 samples per class and nor-
malized data regarding the fingers and the angles between
them. With this, we achieved a hit rate of over 99.7% using
the decision tree classifier, while showing that the Unity
engine and the Leap Motion API can be used in real-time
applications of arbitrary gestures.
Everything considered, further study of how dynamic
gestures behave through normalization might be interesting,
as both dynamic and static gestures can be used for various
applications, such as VR games.
SBC – Proceedings of SBGames 2018 — ISSN: 2179-2259 Computing Track – Short Papers
XVII SBGames – Foz do Igua¸cu – PR – Brazil, October 29th – November 1st, 2018
Figure 6. Comparison between KNN (with 3 and 5 neighbors), SVM and Decision Tree classifiers, respectively, for each of the 10 classes.
Table 1: Cross-validated confusion matrix for Decision Tree classifier.
OPEN CLOSE THUMB TWO THREE FOUR LOVE COOL FIRE SHAKA
OPEN 97.69 0.48 1.05 0.11 0.57
CLOSE 0.12 99.52 0.34
THUMB 98.56 0.47 0.8 0.12
TWO 0.24 98.57 0.49 0.47 0.23
THREE 1.07 98.65 0.12 0.11
FOUR 1.58 0.12 97.66 0.69
LOVE 0.24 0.35 99.06 0.23 0.12
COOL 0.85 1.43 0.35 0.12 96.57 0.8
FIRE 0.85 0.59 0.36 0.48 0.94 0.34 96.58
SHAKA 100
ACKNOWLEDGMENT
The authors would like to thank IComp UFAM, UEA
Ludus Lab, and FAPEAM for supporting the development
of this work.
REFERENCES
[1] S. Ameur, A. B. Khalifa, and M. S. Bouhlel. A compre-
hensive leap motion database for hand gesture recognition.
In Sciences of Electronics, Technologies of Information and
Telecommunications (SETIT), 2016 7th International Confer-
ence on, pages 514–519. IEEE, 2016.
[2] M. Buckwald. Leap motion. https://www.leapmotion.com
(accessed: 2018.08.04).
[3] F. Chen, J. Deng, Z. Pang, M. Baghaei Nejad, H. Yang,
and G. Yang. Finger angle-based hand gesture recognition
for smart infrastructure using wearable wrist-worn camera.
Applied Sciences, 8(3):369, 2018.
[4] C.-H. Chuan, E. Regina, and C. Guardino. American sign
language recognition using leap motion sensor. In Machine
Learning and Applications (ICMLA), 2014 13th International
Conference on, pages 541–544. IEEE, 2014.
[5] E. M. Y. David, Shai; Kushilevitz. Online learning versus
offline learning. 1997.
[6] B. B. Gatto, A. Bogdanova, L. S. Souza, and E. M. dos
Santos. Hankel subspace method for efficient gesture rep-
resentation. In Machine Learning for Signal Processing
(MLSP), 2017 IEEE 27th International Workshop on, pages
1–6. IEEE, 2017.
[7] B. B. Gatto, E. M. dos Santos, and W. S. Da Silva. Orthogonal
hankel subspaces for applications in gesture recognition.
In Graphics, Patterns and Images (SIBGRAPI), 2017 30th
SIBGRAPI Conference on, pages 429–435. IEEE, 2017.
[8] B. B. Gatto, S. Waldir, M. Eulanda, and D. Santos. Kernel
two dimensional subspace for image set classification. In
Tools with Artificial Intelligence (ICTAI), 2016 IEEE 28th
International Conference on, pages 1004–1011. IEEE, 2016.
[9] S. R. Gunn et al. Support vector machines for classification
and regression. ISIS technical report, 14(1):5–16, 1998.
[10] S. Krug. N ˜
ao me fac¸a pensar. Traduc¸ ˜
ao de Roger Maioli dos
Santos, S˜
ao Paulo: Market Books, pages 123–137, 2001.
[11] R. B. Mapari and G. Kharat. American static signs recogni-
tion using leap motion sensor. In Proceedings of the Second
International Conference on Information and Communication
Technology for Competitive Strategies, page 67. ACM, 2016.
[12] R. Ribeiro et al. Framework for registration and recognition
of free-hand gestures in digital games. SBGames, 2016.
[13] L. Shao. Hand movement and gesture recognition using Leap
Motion Controller. Stanford University, Stanford, CA, 2016.
[14] J. E. Solem. Programming computer vision with python.
2012.
[15] D. R. B. F. D. Weichert, F; Bachmann. Analysis of the
accuracy and robustness of the leap motion controller. 2013.
[16] D. Yao, M. Jiang, A. Abulizi, and X. You. Decision-tree-based
algorithm for 3d sign classification. In Signal Processing
(ICSP), 2014 12th International Conference on, pages 1200–
1204. IEEE, 2014.
[17] J. Youngkyoon, S.-T. Noh, H. J. Chang, and T.-K. Kim. 3d
finger cape: Clicking action and position estimation under
self-occlusions in egocentric viewpoint. 2015.
SBC – Proceedings of SBGames 2018 — ISSN: 2179-2259 Computing Track – Short Papers
XVII SBGames – Foz do Igua¸cu – PR – Brazil, October 29th – November 1st, 2018
... In contrast, our work focuses primarily on the identification of unique hands keypoints to evaluate compliance with hand hygiene. The innovation of this methodology lies in the adoption of advanced gesture recognition techniques, extensively used in industrial environments, i.e. to operate robots [22] or to use the subject's hands as remote controllers [23] [24]. To the best of our knowledge, similar approaches have never been tested in lesscontrolled environments such as hospitals. ...
Conference Paper
The management and monitoring of diagnostic routines for the active surveillance of colonization of antibiotic-resistant bacteria require the use of advanced data drivers based on field sensors that characterize various phases of hospital processes. To this aim, this study describes the proof of concept of an integrated system exploiting smart field sensors and a digital management system that utilizes flow diagrams and business process models and notation (BPMN) to optimize hospital processes. The focus is on the development and validation of the smart field sensor based on a vision system, which extensively leverages machine learning algorithms for real-time identification of hand-washing procedures. The novelty of this research is twofold: hands joints are extracted and processed frame-by-frame to extract relevant geometric features, which are the input of three independent Random Forests. Then, the output of the three Random Forests composes the input for a final supervisor classifier (a simple Artificial Neural Network). The overall system prediction accuracy is of 73.3%, which is an encouraging result given the complexity of such gestures and the simplicity of the algorithms adopted. Therefore, the proposed method demonstrates the capability of field sensors to facilitate novel real-time management models in the healthcare sector, aligning with the principles of the Healthcare 4.0 paradigm.
... The Leap Motion Controller (LMC) is a marker-free vision-based hand-tracking sensor that has been shown to be a promising tool for HGR applications [16,17]. Several researchers have used the LMC to detect signs using hand gestures for American [18,19], Arabic [20][21][22][23], Indian [24][25][26], and other sign languages [27][28][29][30][31][32]. ...
Article
Full-text available
Stroke is one of the leading causes of mortality and disability worldwide. Several evaluation methods have been used to assess the effects of stroke on the performance of activities of daily living (ADL). However, these methods are qualitative. A first step toward developing a quantitative evaluation method is to classify different ADL tasks based on the hand grasp. In this paper, a dataset is presented that includes data collected by a leap motion controller on the hand grasps of healthy adults performing eight common ADL tasks. Then, a set of features with time and frequency domains is combined with two well-known classifiers, i.e., the support vector machine and convolutional neural network, to classify the tasks, and a classification accuracy of over 99% is achieved.
... TRABAJOS RELACIONADOS En la literatura existen experiencias previas del uso de equipos de tracking óptico para terapias de rehabilitación física. Así Stinghen, Gatto y Pio [4] desarrollaron un controlador de interfaz de gestos basado en aprendizaje automático para la comunicación en tiempo real entre el dispositivo Leap Motion y juegos, en el trabajo se comparó la efectividad de varios métodos de algoritmos de aprendizaje automático en tiempo real para encontrar la forma más óptima de identificar gestos estáticos en las manos. En su estudio probaron 6 del total de 10 gestos incorporados en un juego desarrollado en Unity. ...
... Moreover, a fusion method of two or more classifiers has been adopted like in the work of Stinghen et al. [18,56]. These studies reported that each classifier had its advantages and drawbacks, and each one outperformed others depending on the classification case. ...
Article
Due to the recent development of machine learning and sensor innovations, hand gesture recognition systems become promising for the digital entertainment field. In this paper, we propose a dynamic hand gesture recognition approach using touchless hand motions over a Leap Motion device. First, we analyze the sequential time series data gathered from Leap Motion using Long Short-Term Memory (LSTM) recurrent neural networks for recognition purposes. We exploit basic unidirectional LSTM and bidirectional LSTM separately. Then, we propound novel architecture by combining the aforementioned models with additional components to give a final prediction network, named Hybrid Bidirectional Unidirectional LSTM (HBU-LSTM). The suggested network improves the model performance significantly by considering the spatial and temporal dependencies between the Leap Motion data and the network layers during the forward and backward pass. The recognition models are examined on two available benchmark datasets, named the LeapGestureDB dataset and the RIT dataset. Experiments demonstrate the potential of the proposed HBU-LSTM network for dynamic hand gesture recognition, with an average recognition rate reaching approximately 90%. Our suggested approach reaches superior performance, in terms of accuracy and computational complexity, over some existing methods for hand gesture recognition.
... The Kinect sensor has already been used in research to detect and evaluate the fall of the elderly but this device has limitations in precision compared to other more expensive motion 3D sensor. [25] | [26][27] [28] This work can be ameliorated by a multidimensional exercise program specific and adapted to the anomalies of each user. In a future study, an experimental study should be planned and a number of elderly people with equilibrium and posture walking disorders should be selected and their health status should be assessed by taking the necessary measures. ...
Chapter
Fall is a major health problem, especially among elderly people living alone at home. This age group is characterized by a loss of physical and motor skills, balance and posture disorder, and a reduction in daily activities. These factors are the main cause of falling. However, it is important to design technologies that prevent falls and can help older people to practice exercise to improve balance, posture and strength. Due to the low use of conventional physical therapy, fall prevention interventions through the game demonstrated their effectiveness. Unfortunately, the most existing exergames, used for fall prevention, were not designed specifically for the elderly.
Article
Disability is one of a person's physical and mental conditions that can inhibit normal daily activities. One of the disabilities that can be found in disability is speech without fingers. Persons with disabilities have obstacles in communicating with people around both verbally and in writing. Communication tools to help people with disabilities without finger fingers continue to be developed, one of them is by creating a virtual keyboard using a Leap Motion sensor. The hand gestures are captured using the Leap Motion sensor so that the direction of the hand gesture in the form of pitch, yaw, and roll is obtained. The direction values are grouped into normal, right, left, up, down, and rotating gestures to control the virtual keyboard. The amount of data used for gesture recognition in this study was 5400 data consisting of 3780 training data and 1620 test data. The results of data testing conducted using the Artificial Neural Network method obtained an accuracy value of 98.82%. This study also performed a virtual keyboard performance test directly by typing 20 types of characters conducted by 15 respondents three times. The average time needed by respondents in typing is 5.45 seconds per character.
Article
Full-text available
Los sistemas de rehabilitación basados en juegos serios se han desarrollado no solo para medir la precisión de los movimientos sino buscando motivar y lograr un mayor compromiso con los ejercicios realizados, más aún cuando los pacientes son niños. En este artículo presenta el desarrollo un juego serio para la rehabilitación motora de niños entre 7 y 13 años. Para ello se hizo uso del dispositivo Leap Motion teniendo en cuenta los atributos de sistemas de rehabilitación virtual para terapias: aprendizaje observacional, práctica, motivación y retroalimentación. Para probar la usabilidad del sistema se hizo uso del cuestionario especializado para sistemas de rehabilitación virtual para terapia Suitability Evaluation Questionary (SEQ). Los resultados mostraron la aceptación del sistema no sólo a nivel de funcionalidad sino de no presentar incomodidades en cuanto a mareos, náuseas, molestia en los ojos entre otras. El poder tener los resultados del SEQ ha permitido identificar mejoras en cuanto al diseño y consideraciones de usabilidad.
Article
Full-text available
Despite the tremendous progress made for recognizing gestures acquired by various devices, such as the Leap Motion Controller, developing a gestural user interface based on such devices still induces a significant programming and software engineering effort before obtaining a running interactive application. To facilitate this development, we present QuantumLeap, a framework for engineering gestural user interfaces based on the Leap Motion Controller. Its pipeline software architecture can be parameterized to define a workflow among modules for acquiring gestures from the Leap Motion Controller, for segmenting them, recognizing them, and managing their mapping to functions of the application. To demonstrate its practical usage, we implement two gesture-based applications: an image viewer that allows healthcare workers to browse DICOM medical images of their patients without any hygiene issues commonly associated with touch user interfaces and a large-scale application for managing multimedia contents on wall screens. To evaluate the usability of QuantumLeap, seven participants took part in an experiment in which they used QuantumLeap to add a gestural interface to an existing application.
Chapter
Full-text available
Defining methods for the automatic understanding of gestures is of paramount importance in many application contexts and in Virtual Reality applications for creating more natural and easy-to-use human-computer interaction methods. In this paper, we present a method for the recognition of a set of non-static gestures acquired through the Leap Motion sensor. The acquired gesture information is converted in color images, where the variation of hand joint positions during the gesture are projected on a plane and temporal information is represented with color intensity of the projected points. The classification of the gestures is performed using a deep Convolutional Neural Network (CNN). A modified version of the popular ResNet-50 architecture is adopted, obtained by removing the last fully connected layer and adding a new layer with as many neurons as the considered gesture classes. The method has been successfully applied to the existing reference dataset and preliminary tests have already been performed for the real-time recognition of dynamic gestures performed by users.
Article
Full-text available
The arising of domestic robots in smart infrastructure has raised demands for intuitive and natural interaction between humans and robots. To address this problem, a wearable wrist-worn camera (WwwCam) is proposed in this paper. With the capability of recognizing human hand gestures in real-time, it enables services such as controlling mopping robots, mobile manipulators, or appliances in smart-home scenarios. The recognition is based on finger segmentation and template matching. Distance transformation algorithm is adopted and adapted to robustly segment fingers from the hand. Based on fingers' angles relative to the wrist, a finger angle prediction algorithm and a template matching metric are proposed. All possible gesture types of the captured image are first predicted, and then evaluated and compared to the template image to achieve the classification. Unlike other template matching methods relying highly on large training set, this scheme possesses high flexibility since it requires only one image as the template, and can classify gestures formed by different combinations of fingers. In the experiment, it successfully recognized ten finger gestures from number zero to nine defined by American Sign Language with an accuracy up to 99.38%. Its performance was further demonstrated by manipulating a robot arm using the implemented algorithms and WwwCam to transport and pile up wooden building blocks.
Conference Paper
Full-text available
In this paper, we present an American Sign Language recognition system using a compact and affordable 3D motion sensor. The palm-sized Leap Motion sensor provides a much more portable and economical solution than Cyblerglove or Microsoft kinect used in existing studies. We apply k-nearest neighbor and support vector machine to classify the 26 letters of the English alphabet in American Sign Language using the derived features from the sensory data. The experiment result shows that the highest average classification rate of 72.78% and 79.83% was achieved by k-nearest neighbor and support vector machine respectively. We also provide detailed discussions on the parameter setting in machine learning methods and accuracy of specific alphabet letters in this paper.
Article
Full-text available
The Leap Motion Controller is a new device for hand gesture controlled user interfaces with declared sub-millimeter accuracy. However, up to this point its capabilities in real environments have not been analyzed. Therefore, this paper presents a first study of a Leap Motion Controller. The main focus of attention is on the evaluation of the accuracy and repeatability. For an appropriate evaluation, a novel experimental setup was developed making use of an industrial robot with a reference pen allowing a position accuracy of 0.2 mm. Thereby, a deviation between a desired 3D position and the average measured positions below 0.2mmhas been obtained for static setups and of 1.2mmfor dynamic setups. Using the conclusion of this analysis can improve the development of applications for the Leap Motion controller in the field of Human-Computer Interaction.
Conference Paper
Gesture recognition is an important research area in video analysis and computer vision. Gesture recognition systems include several advantages, such as the interaction with machines without needing additional external devices. Moreover, gesture recognition involves many challenges, as the distribution of a specific gesture largely varies depending on viewpoints due to its multiple joint structures. In this paper, We present a novel framework for gesture recognition. The novelty of the proposed framework lies in three aspects: first, we propose a new gesture representation based on a compact trajectory matrix, which preserves spatial and temporal information. We understand that not all images of a gesture video are useful for the recognition task, therefore it is necessary to create a method where it is possible to detect the images that do not contribute to the recognition task, decreasing the computational cost of the overall framework. Second, we represent this compact trajectory matrix as a subspace, achieving discriminative information, as the trajectory matrices obtained from different gestures generate dissimilar clusters in a low dimension space. Finally, we introduce an automatic procedure to infer the optimal dimension of each gesture subspace. We show that our compact representation presents practical and theoretical advantages, such as compact representation and low computational requirements. We demonstrate the advantages of the proposed method by experimentation employing Cambridge gesture and Human-Computer Interaction datasets.
Conference Paper
Object recognition on large-scale video has recently attracted considerable research interest due to the huge amount of data available on the Internet, surveillance systems, social media networks and autonomous vehicles. By representing largescale videos as image sets, we can handle the complex data variations such as viewpoint, illumination, and pose. In this paper, we propose an efficient and robust method for image set recognition based on Kernel Orthogonal Mutual Subspace Method (KOMSM), where sets of images are expressed as nonlinear subspaces. In our method, we formulate the image sets as nonlinear 2D subspaces by applying K2D-PCA and variants of 2D-PCA. Comparing to KOMSM, the proposed method requires less memory resource since it inherits the computational advantages of 2D-PCA and variants. In addition, the subspaces produced by K2D-PCA preserves the spatial relation between image pixels, generating more informative subspaces than KOMSM. The introduced method has the advantage of representing the subspaces in a more compact manner, achieving lower time complexity, confirming the suitability of employing 2D-PCA and variants. These results have been revealed through comprehensive experimentation conducted on five publicly available datasets.
Conference Paper
Advancement in technology has opened new ways in the field of Human Machine Interaction. A Novel method for recognition of American Sign Language (ASL) is proposed using Leap Motion Sensor. Static signs (A to Z and numbers from 1 to 10) excluding J and Z are used for processing. However 2 and 6 also excluded from dataset as the posture of these is similar to V and W respectively. Features Set consist of positional values (fingers and palm), distance and angle values. Total 48 features are used to recognize ASL using Multilayer Perceptron (MLP) which is a feed forward artificial neural network. Dataset consists of 146 users who have performed 32 signs resulting in total dataset of 4672 signs. Out of this 90% dataset is used for training and 10% dataset is used for CV (Cross Validation)/testing. The average classification accuracy obtained is near about 90%.
Article
In this paper we present a novel framework for simultaneous detection of click action and estimation of occluded fingertip positions from egocentric viewed single-depth image sequences. For the detection and estimation, a novel probabilistic inference based on knowledge priors of clicking motion and clicked position is presented. Based on the detection and estimation results, we were able to achieve a fine resolution level of a bare hand-based interaction with virtual objects in egocentric viewpoint. Our contributions include: (i) a rotation and translation invariant finger clicking action and position estimation using the combination of 2D image-based fingertip detection with 3D hand posture estimation in egocentric viewpoint. (ii) a novel spatio-temporal random forest, which performs the detection and estimation efficiently in a single framework. We also present (iii) a selection process utilizing the proposed clicking action detection and position estimation in an arm reachable AR/VR space, which does not require any additional device. Experimental results show that the proposed method delivers promising performance under frequent self-occlusions in the process of selecting objects in AR/VR space whilst wearing an egocentric-depth camera-attached HMD.