Chapter

A Novel Approach to Gesture Recognition in Sign Language Applications Using AVL Tree and SVM

Authors:
  • Maulana Abul Kalam Azad University of Technology West Bengal
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Body gesture is the most important way of non-verbal communication for deaf and dumb people. Thus, a novel sign language recognition procedure is presented here where the movements of hands play a pivotal role for such kind of communications. Microsoft’s Kinect sensor is used to act as a medium to interpret such communication by tracking the movement of human body using 20 joints. A procedural approach has been developed to deal with unknown gesture recognition by generating in-order expression for AVL tree as a feature. Here, 12 gestures are taken into consideration, and for the classification purpose, kernel function-based support vector machine is employed with results to gesture recognition into an accuracy of 88.3%. The foremost goal is to develop an algorithm that act as a medium to human–computer interaction for deaf and dumb people. Here, the novelty lies in the fact that for gesture recognition in sign language interpretation, the whole body of the subject is represented using a hierarchical balanced tree (here AVL).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The current way of producing signed content for a broadcaster is via a dedicated studio created at broadcaster premise, which includes professional lighting, a professional camera, a special background for the signer (to later create overlay effects via Chroma Key) and screens that reproduce the original content to be signed and the subtitles or scene description. In the literature there has been several reports on the topic of processing sign language [1,2,3,4] span- ning across more than five years. The common approach for 3D reconstruction was dependent upon the usage of Time of Flight (ToF) sensor. ...
... RGB-D sensors are used to identify colour and depth simul- taneously in real time. With the development of low-cost commercial RGB-D sensors such as Kinect, the availability of point-clouds and powerful computing devices has inspired researchers in several areas to develop many vision applica- tions such as pose estimation gesture recognition [6,4] or scene reconstruction [7]. However, depth sensors suffer from missing or inaccurate depth information. ...
Experiment Findings
Full-text available
The exponential increase in the volume of digital content produced and distributed by media organisations has necessitated the need to address content accessibility for deaf community. As broadcasters increasing aim to facilitate the supplementary delivery of sign language interpretation there is a critical need for novel approaches to be developed to generate photo-realistic 3D avatar using low-cost studio setup. This process is not trivial and the acquisition of high quality point cloud representations of dynamic 3D objects is still an open problem. The proposed approach exploits the availability of efficient low-cost hardware setting based on several Kinect v2 scanners connected to a single Intel i7 processor and processed with GTX 770 graphic card and includes the processing of autocalibration, depth map enhancement and point cloud refinement. The performance of the proposed method has been demonstrated through efficient acquisition of dense 3D point cloud of sign language interpreter which is evaluated through subjective analysis of 3D reconstruction.
... Various literature states that multiple techniques are anticipated to perform gesture recognition and help disabled people reduce communication gaps. Sriparmasaha et al. [1] expects an approach for gesture recognition with sign language applications using AVL tree and sum, and works in computing theory and been practicing the applications, 2013. AI approaches are developed with unknown gesture recognition using AVL trees as features. ...
Article
Full-text available
This work concentrates on the device, which helps as a translation system for translating sign gestures into text. The disabled people, in particular, hearing and speech impaired people, are facing difficulties in society. Communication of disabled people becomes worse as the majority of ordinary people do not understand it. These disabled people face difficulty communicating with others; some have many problems communicating with others in sign language. It causes the communication gap between them where impaired people cannot share their views and skills with others. We headed to facilitate communication between the disabled people and the "Tamil sign language translator to solve this problem." Here, gestures are translated to Tamil language to find a localized solution. It processes 31 Tamil alphabets, 12 Vowels, 18 Consonants, and 1 Aayudha Ezhuthu. It is 32 combinations with five fingers points either up or down and mapped to decimal numbers. Here in this process, we need edge detection, which is accurately done and Processed by canny edge detection. In addition to this process, we have used two gesture recognition methods and training the input system through our mainframe algorithm called scale-invariant feature detection transform. We have developed this system, which is useful for deaf and dumb people for essential communication.
... However, the prediction process is done more appropriately for certain users and it cannot work suitably in some cases. Using certain independent features of the user's hand location, orientation, position, shape and direction is extremely desirable [26]. Subsequently, in some research work fingers, tips are considered as extracted features. ...
Article
Full-text available
Sign Language Recognition (SLR) helps to bridge the gap between ordinary and hearing-impaired people. But various difficulties and challenges are faced by SLR system during real-time implementation. The major complexity associated with SLR is the inability to provide a consistent recognition process and it shows lesser recognition accuracy. To handle this issue, this research concentrates on adopting the finest classification approach to provide a feasible end-to-end system using deep learning approaches. This process transforms sign language into the voice for assisting the people to hear the sign language. The input is taken from the ROBITA Indian Sign Language Gesture Database and some essential pre-processing steps are done to avoid unnecessary artefacts. The proposed model is incorporated with the encoder Multi-Layer Convolutional Neural Networks (ML-CNN) for evaluating the scalability, accuracy of the end-to-end SLR. The encoder analyses the linear and non-linear features (higher level and lower level) to improve the quality of recognition. The simulation is carried out in a MATLAB environment where the performance of the ML-CNN model outperforms the existing approaches and establishes the trade-off. Some performance metrics like accuracy, precision, F-measure, recall, Matthews Correlation Coefficient (MCC), Mean Absolute Error (MAE) are evaluated to show the significance of the model. The prediction accuracy of the proposed ML-CNN with encoder is 87.5% in the ROBITA sign gesture dataset and it’s increased by 1% and 3.5% over the BLSTM and HMM respectively.
... In maintaining the accuracy and execution time trade-off, we can trust hierarchical SVM classifiers [52]. SVM in existing literature is also used to classify context-based music emotion recognition and classification [53], speech emotion recognition [54], privacy-preserving [55,56], vision based on paste detection [57], classification of agricultural crops [58], hyperspectral remote sensing images [59], pattern classification [60,61], identification of brain structure [62], hybrid image denoising [63], gesture identification in sign language applications [64], hand gesture recognition, gesture recognition during dance [65], tweet act classification [66,67], etc. SVM creates a decision boundary between classes in multidimensional space so that testing data points can be easily classified in the correct category according to the requirement [68]. The best decision boundary is called the hyperplane, to design which, some extreme data points in the form of vectors get selected and called as support vector. ...
Chapter
Due to the current demand for emerging technologies like the Internet of Things integrated with machine learning in industry and academics, brain-computer interface tools like electroencephalogram in healthcare have drawn worldwide attention. As has been noticed that during recent times, mobile phone exposure to people increased in at least 2-fold way, so games have been used as stimuli for detecting how our brain becomes overburdened with increased exposure. After the data acquisition from 14 channels of an electroencephalogram, the activated regions were identified. Features were extracted from the most activated ten electrode channels using discrete wavelet transform. To reduce the dimensions of the feature space for enhancing the performance, principal component analysis was used. The mental state classification was performed using a support vector machine based on the detected stress. The proposed system has outperformed the existing ones for its effectiveness and efficiency in a broad application area of cognitive rehabilitation. Classification accuracy was obtained as 92.79% and different other metrics proved that the combination of channel selection, feature extraction, and classification methods in our proposed approach has outperformed the others. Privacy is maintained, and it is flexible to the user as per his/her convenient time.
... De la même manière que la méthode des plus proches voisins, les SVM sont souvent utilisés pour reconnaître des gestes, comme par exemple par Saha et al. ainsi que Nagarajan et al. dans une application liée au langage des signes [Saha et al., 2018;Nagarajan and Subashini, 2013]. Les SVR ont été utilisés pour évaluer la qualité d'un geste dans un contexte chirurgical [Zia and Essa, 2017], le but étant de prédire un score pour chaque réalisation. ...
Thesis
Full-text available
Apprendre un nouveau sport, ou un métier manuel est complexe. En effet, de nombreux gestes doivent être assimilés afin d’atteindre un bon niveau de compétences. Cependant, l’apprentissage de ces gestes ne peut se faire seul. En effet, il est nécessaire de voir la réalisation du geste d’un œil expert afin d’indiquer les corrections pour s’améliorer. Or les experts, que ce soit en sport ou dans les métiers manuels, sont peu disponibles pour analyser et évaluer les gestes d’un novice. Afin d’aider les experts dans cette tâche d’analyse, il est possible de développer des coachs virtuels. Selon les domaines, le coach va posséder plus ou moins de compétences, mais une évaluation selon des critères précis est toujours à privilégier. Fournir un retour sur les erreurs commises est également essentiel pour l’apprentissage d’un novice. Dans cette thèse, différentes solutions pour développer des coachs virtuels les plus efficaces possibles sont proposées. Dans un premier temps, et comme évoqué précédemment, il est nécessaire d’évaluer les gestes. Dans cette optique, un premier travail a consisté à comprendre les enjeux de l’analyse de gestes automatique, afin de développer un algorithme d’évaluation automatique qui soit le plus performant possible. Par la suite, deux algorithmes d’évaluation automatique de la qualité de gestes sont proposés. Ces deux algorithmes fondés sur l’apprentissage profond, ont par la suite été testés sur deux bases de données de gestes différentes afin d’évaluer leur généricité. Une fois l’évaluation réalisée, il est nécessaire de fournir un retour d’information pertinent à l’apprenant sur ses erreurs. Afin de garder une continuité dans les travaux réalisés, ce retour est également fondé sur les réseaux de neurones et l’apprentissage profond. En s’inspirant des méthodes d’explicabilité de réseaux de neurones, une méthode a été développée. Elle permet de remonter aux instants du geste où des erreurs ont été commises selon le modèle d’évaluation. Enfin coupler cette méthode à de la segmentation sémantique, permet d’indiquer aux apprenants quelle partie du geste a été mal réalisée, mais également de lui fournir des statistiques et une courbe d’apprentissage.
... De la même manière que la méthode des plus proches voisins, les SVM sont souvent utilisés pour reconnaître des gestes, comme par exemple par Saha et al. ainsi que Nagarajan et al. dans une application liée au langage des signes [Saha et al., 2018;Nagarajan and Subashini, 2013]. Les SVR ont été utilisés pour évaluer la qualité d'un geste dans un contexte chirurgical [Zia and Essa, 2017], le but étant de prédire un score pour chaque réalisation. ...
Thesis
Apprendre un nouveau sport, ou un métier manuel est complexe. En effet, de nombreux gestes doivent être assimiler afin d’atteindre un bon niveau de compétences.Cependant, l’apprentissage de ces gestes ne peut se faire seul. En effet, il est nécessaire de voir la réalisation du geste d’un œil expert afin d’indiquer les correctionspour s’améliorer. Or les experts, que ce soit en sport ou dans les métiers manuels,sont peu disponibles pour analyser et évaluer les gestes d’un novice.Afin d’aider les experts dans cette tâche d’analyse, il est possible de développer des coachs virtuels. Selon les domaines, le coach va posséder plus ou moinsde compétences, mais une évaluation selon des critères précis est toujours à privilégier. Fournir un retour sur les erreurs commises est également essentiel pourl’apprentissage d’un novice.Dans cette thèse, différentes solutions pour développer des coachs virtuels lesplus efficaces possibles sont proposées.Dans un premier temps, et comme évoqué précédemment, il est nécessaire d’évaluer les gestes. Dans cette optique, un premier travail a consisté à comprendreles enjeux de l’analyse de gestes automatique, afin de développer un algorithmed’évaluation automatique qui soit le plus performant possible. Par la suite, deuxalgorithmes d’évaluation automatique de la qualité de gestes sont proposés. Cesdeux algorithmes fondés sur l’apprentissage profond, ont par la suite été testés surdeux bases de données de gestes différentes afin d’évaluer leur généricité.Une fois l’évaluation réalisée, il est nécessaire de fournir un retour d’information pertinent à l’apprenant sur ses erreurs. Afin de garder une continuité dansles travaux réalisés, ce retour est également fondés sur les réseaux de neurones etl’apprentissage profond. En s’inspirant des méthodes d’explicabilité de réseaux deneurones, une méthode a été développée. Elle permet de remonter aux instantsdu gestes où des erreurs ont été commises selon le modèle d’évaluation. Enfincoupler cette méthode à de la segmentation sémantique, permet d’indiquer auxapprenants quel partie du geste a été mal réalisée, mais également de lui fournirdes statistiques et une courbe d’apprentissage.
... Sriparna Saha et. al. developed an approach for identifying unknown gestures by creating features from the expression for the AVL tree [3]. However, the gesture recognition accuracy is 88.3%. ...
Conference Paper
The hand gesture provides a natural and intuitive communication medium for the human and machine interaction. Because, it can use in virtual reality, language detection, computer games, and other human-computer or human-machine instruction applications. Currently , the sensor and camera-based application is a field of interest for many researchers. This paper proposes a new hand gesture recognition system using the Kinect sensor's skeleton data, which works in an environment where people do not touch devices or communicate verbally. The proposed model focuses on mainly two modules, namely, hand area and fingertip detection, and hand gesture recognition. The hand area and fingertip are detected by positioning the palm point and find extreme of contour. And, the hand gesture is recognized by measuring the distance between different body indexes of skeleton information. Here, six gestures instructions are considered such as move right to left, move left to right, move up to down, move down to up, open and closed, and also recognize the numeric number using the fingertip. This system is able to detect the presence of hand area and fingers and to recognize different hand gestures. As a result, the average recognition accuracy of different hand gestures and stretched fingers numbers are 95.91% and 96%, respectively.
... Sriparna Saha et. al. developed an approach for identifying unknown gestures by creating features from the expression for the AVL tree [3]. However, the gesture recognition accuracy is 88.3%. ...
Conference Paper
The hand gesture provides a natural and intuitive communication medium for the human and machine interaction. Because, it can use in virtual reality, language detection, computer games, and other human-computer or human-machine instruction applications. Currently, the sensor and camera-based application is a field of interest for many researchers. This paper proposes a new hand gesture recognition system using the Kinect sensor's skeleton data, which works in an environment where people do not touch devices or communicate verbally. The proposed model focuses on mainly two modules, namely, hand area and fingertip detection, and hand gesture recognition. The hand area and fingertip are detected by positioning the palm point and find extreme of contour. And, the hand gesture is recognized by measuring the distance between different body indexes of skeleton information. Here, six gestures instructions are considered such as move right to left, move left to right, move up to down, move down to up, open and closed, and also recognize the numeric number using the fingertip. This system is able to detect the presence of hand area and fingers and to recognize different hand gestures. As a result, the average recognition accuracy of different hand gestures and stretched fingers numbers are 95.91% and 96%, respectively.
... By making the use of various kernel functions, different amounts of flexibility and nonlinearity can be attached to the model. Since these features could be elicited from high level statistical ideas, and their generalization error bounds can be measured, a great deal of research has been conducted on SVMs during the last years [13,14] . ...
Article
Full-text available
Background Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. Methods This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. Results The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. Conclusion Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes.
... Hidden Markov [13,14] , Conditional Random Field [17,18] and Support Vector Machine [15,16] have all been widely used in dynamic gesture recognition. In order to improve the accuracy of recognition, many hand-crafted features are used. ...
Article
Dynamic hand gesture recognition, as an essential part of Human–Computer Interaction, and especially an important way to realize Augmented Reality, has been attracting attention from many scholars and yet presenting many more challenges. Recently, being aware of deep convolutional neural network's excellent performance, many scholars began to apply it to gesture recognition, and obtained promising results. However, no enough attention has been paid to the number of parameters in the network and the amount of computer calculation needed until now. In this paper, a 3D separable convolutional neural network is proposed for dynamic gesture recognition. This study aims to make the model less complex without compromising its high recognition accuracy, such that it can be deployed to augmented reality glasses more easily in the future. By the application of skip connection and layer-wise learning rate, the undesired gradient dispersion due to the separation operation is solved and the performance of the network is improved. The fusion of feature information is further promoted by shuffle operation. In addition, a dynamic hand gesture library is built through HoloLens, which thus proves the feasibility of the proposed method.
... By making the use of various kernel functions, different amounts of flexibility and nonlinearity can be attached to the model. Since these features could be elicited from high level statistical ideas, and their generalization error bounds can be measured, a great deal of research has been conducted on SVMs during the last years [13,14] . ...
Article
Full-text available
Background: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. Methods: This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. Results: The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. Conclusion: Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes.
Article
Full-text available
Intelligent prosthetic hand is an important branch of intelligent robotics. It can remotely replace humans to complete various complex tasks and also help humans to complete rehabilitation training. In human-computer interaction technology, the prosthetic hand can be accurately controlled by surface electromyography (sEMG). This paper proposes a new multichannel fusion scheme (MSFS) to extend the virtual channels of sEMG and improve the accuracy of gesture recognition. In addition, the Temporal Convolutional Network (TCN) in deep learning has been improved to enhance the performance of the network. Finally, the sEMG is collected by the Myo armband and the prosthetic hand is controlled in real time to validate the new method. The experimental results show that the method proposed in this paper can improve the accuracy of the control intelligent prosthetic hand, and the accuracy rate is 93.69%.
Chapter
This chapter focuses on the hand activity recognition topics, including hand pose estimation, the static and dynamic hand gesture recognition. The research background is reviewed, and the various approaches have been surveyed and sorted out. Basically, deep learning-based methods are the mainstream solution capable of achieving state-of-the-art performance. This chapter presents the commonly used depth sensor-based approaches for the hand pose estimation firstly, as well as several typical deep learning-based models leveraging the multi-scale spatial information and the multi-frames temporal information. This chapter also introduces a reasonable solution to tackle dynamic hand gesture recognition, which aims to reduce the computational cost to meet the practical needs. In the last, we discuss the unsettled issues and provide some recommendations.
Article
Hand gesture recognition using surface electromyography (sEMG) has been one of the most efficient motion analysis techniques in human–computer interaction in the last few decades. In particular, multichannel sEMG techniques have achieved stable performance in hand gesture recognition. However, the general solution of collecting and labeling large data manually leads to time-consuming implementation. A novel learning method is therefore needed to facilitate efficient data collection and preprocessing. In this paper, a novel autonomous learning framework is proposed to integrate the benefits of both depth vision and EMG signals, which automatically label the class of collected EMG data using depth information. It then utilizes a multiple layer neural network (MNN) classifier to achieve real-time recognition of the hand gestures using only the sEMG. The overall framework is demonstrated in an augmented reality application by the recognition of 10 hand gestures using the Myo armband and an HTC VIVE PRO. The results show prominent performance by introducing depth information for real-time data labeling.
Article
According to the World Health Organization (WHO), 466 million people are suffering from hearing loss, i.e., 5% of the world population, of which 432 million (93%) are adults and 34 million (17%) children. The main problem is how deaf and hearing-impaired communicate with people and each other, how they get education or do their daily activities. Sign language is the main communication method for them. Building automatic hand gestures recognition system has many challenges specially in Arabic. Solving recognition problem and practically develop real-time recognition system is another challenge. Several types of research have been conducted on sign language recognition systems but for Arabic Sign Language are very limited. In this paper, an Arabic Sign Language (ArSL) recognition system that uses a Leap Motion Controller and Latte Panda is introduced. The recognition phase depends on two machine learning algorithms: (a) KNN (k-Nearest Neighbor) and (b) SVM (Support Vector Machine). Afterwards, an Ada-Boosting technique is applied to enhance the accuracy of both algorithms. A direct matching technique, DTW (Dynamic Time Wrapping), is applied and compared with AdaBoost. The proposed system is applied on 30 hand gestures which are composed of 20 single-hand gestures and 10 double-hand gestures. The experimental results show that the DTW achieved an accuracy of 88% for single-hand gestures and 86% for double-hand gestures. Overall, the proposed model’s recognition rate reached 92.3% for single-hand gestures and 93% for double-hand gestures after applying the Ada-Boosting. Finally, a prototype of our model was implemented in a single board (Latte Panda) to increase the system’s reliability and mobility.
Conference Paper
Full-text available
In this paper, we propose a comparison of human gesture recognition using data mining classification methods in video streaming. In particular, we are interested in a specific stream of vector of twenty body-joint positions which are representative of the human body captured by Kinect camera. The recognized gesture patterns of the study are stand, sit down, and lie down. Classification methods chosen for comparison study are backpropagation neural network, support vector machine, decision tree, and naive Bayes. Experimental results have shown that the backpropagation neural network method outperforms other classification methods and can achieve recognition with 100% accuracy. Moreover, the average accuracy of all classification methods used in this study is 93.72%, which confirms the high potential of using the Kinect camera in human body recognition applications. Our future work will use the knowledge obtained from these classifiers in time series analysis of gesture sequence for detecting fall motion in a smart home system.
Conference Paper
Full-text available
Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a novel method for contact-less HGR using Microsoft Kinect for Xbox is described, and a real-time HGR system is implemented. The system is able to detect the presence of gestures, to identify fingers, and to recognize the meanings of nine gestures in a pre-defined Popular Gesture scenario. The accuracy of the HGR system is from 84% to 99% with single-hand gestures, and from 90% to 100% if both hands perform the same gesture at the same time. Because the depth sensor of Kinect is an infrared camera, the lighting conditions, signers' skin colors and clothing, and background have little impact on the performance of this system. The accuracy and the robustness make this system a versatile component that can be integrated in a variety of applications in daily life.
Article
Full-text available
The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Mover's Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.
Article
Full-text available
Consumer-grade range cameras such as the Kinect sensor have the potential to be used in mapping applications where accuracy requirements are less strict. To realize this potential insight into the geometric quality of the data acquired by the sensor is essential. In this paper we discuss the calibration of the Kinect sensor, and provide an analysis of the accuracy and resolution of its depth data. Based on a mathematical model of depth measurement from disparity a theoretical error analysis is presented, which provides an insight into the factors influencing the accuracy of the data. Experimental results show that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimeters up to about 4 cm at the maximum range of the sensor. The quality of the data is also found to be influenced by the low resolution of the depth measurements.
Conference Paper
Full-text available
Gesture recognition is essential for human — machine interaction. In this paper we propose a method to recognize human gestures using a Kinect® depth camera. The camera views the subject in the front plane and generates a depth image of the subject in the plane towards the camera. This depth image is then used for background removal, followed by generation of the depth profile of the subject. In addition to this, the difference between subsequent frames gives the motion profile of the subject and is used for recognition of gestures. These allow the efficient use of depth camera to successfully recognize multiple human gestures. The result of a case study involving 8 gestures is shown. The system was trained using a multi class Support Vector Machine.
Article
Full-text available
In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM''s. The approach is illustrated on a two-spiral benchmark classification problem.
Article
Full-text available
Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems --- in particular online string searching. Here we investigate the potential of suitably adapted binary search trees as competitors in this context. The suffix binary search tree (SBST) and its balanced counterpart, the suffix AVL-tree, are conceptually simple, relatively easy to implement, and offer time and space efficiency to rival suffix trees and suffix arrays, with distinct advantages in some circumstances --- for instance in cases where only a subset of the suffixes need be represented. Construction of a suffix BST can be achieved in O(L) time, where L is the path length of the tree, and in the case of a suffix AVL-tree this is O(n log n), where n is the length of the input string. Searching for an m- long substring requires O(m + l) time, where l is the length of the search path. In the suffix AVL-tree this is O(m + log n) in the worst case. The space requirements are linear in n, generally intermediate between those for a suffix tree and a suffix array. Empirical evidence, illustrating the competitiveness of suffix BSTs, is presented. 1
Article
Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the eecient solution of various string processing problems | in particular on-line string searching. Here we investigate the potential of suitably adapted binary search trees as competitors in this context. The suux binary search tree (SBST) and its balanced counterpart, the suux AVL-tree, are conceptually simple, relatively easy to implement, and ooer time and space efficiency to rival suffix trees and suffix arrays, with some distinct advantages | for instance in cases where only a subset of the suuxes need be represented. Construction of a suffix BST for an n-long string can be achieved in O(nh) time, where h is the height of the tree. In the case of a suffix AVL-tree this will be O(n log n) in the worst case. Searching for an m-long substring requires O(m+l) time, where l is the length of the search path. In the suffix AVL-tree this is O(m + log n) in the worst case. The space requirements are linear in n, intermediate between those for a suffix tree and a suffix array. Preliminary empirical evidence, illustrating the competitiveness of suffix BSTs, is presented.
Conference Paper
In this paper we present an approach to recognition of signed expressions based on visual sequences obtained with Kinect sensor. Two variants of time series representing the expressions are considered: the first based on skeletal im- ages of the body, and the second describing shape and po- sition of hands extracted as skin coloured regions. Time series characterising isolated Polish sign language words are examined using three clustering algorithms and popu- lar clustering quality indices which reveal natural gesture data division and indicate gesture samples difficult in fur- ther recognition. Ten-fold cross-validation recognition tests for the k-nearest neighbour classifier with dynamic time warping technique are shown. Recognition rate obtained with the skeletal image based features were improved from 89% to 95% by changing gesture representation from time series to a vector containing pairwise distances between gesture samples. The approach with skin colour based fea- tures involving utilisation of depth information of each pixel obtained by Kinect yielded 98% recognition rate.
Conference Paper
Human posture recognition is an attractive and challenging topic in computer vision because of its wide range of application. The coming of low cost device Kinect with its SDK gives us a possibility to resolve with ease some difficult problems encountered when working with conventional cameras. In this paper, we explore the capacity of using skeleton information provided by Kinect for human posture recognition in a context of a heath monitoring framework. We conduct 7 different experiments with 4 types of features extracted from human skeleton. The obtained results show that this device can detect with high accuracy four interested postures (lying, sitting, standing, bending).
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
this article, we focus primarily on the role of machine learning algorithms in the data mining process.