Table 2 - uploaded by Hendrik De Villiers
Content may be subject to copyright.
Average fingertip position error per finger compared with ground truth for nearest neigh- bour.

Average fingertip position error per finger compared with ground truth for nearest neigh- bour.

Source publication
Article
Full-text available
Vision-based hand pose estimation presents unique challenges, particularly if high-fidelity reconstruction is desired. Searching large databases of synthetic pose candidates for items similar to the input offers an attractive means of attaining this goal. The earth mover's distance is a perceptually meaningful measure of dissimilarity that has show...

Contexts in source publication

Context 1
... single hypothesis tracking, only the clos- Table 2. When multiple hypotheses are allowed, an idealised value for obtainable accuracy may be found by selecting the candidate with the small- est error amongst the k nearest neighbours. ...
Context 2
... average over all the fingers was 3.12 cm. Note that the in- dividual values are roughly double the corre- sponding best possible individual average fin- gertip errors in Table 2. ...
Context 3
... that taking the best possible average fingertip error as a baseline instead of zero may be more appropriate, in which case the performance increase is 32%. Table 2 re- veals that the gains in accuracy between d * and d are consistent across every individual finger, indicating a level of uniformity with respect to the modelling accuracy. If one assumes that the correct match is re- turned within the set of closest neighbours, simple refinement of that set using the EMD should yield similar results to its global appli- cation. ...

Citations

... They use The CRF model for gesture recognition with the combination of Fourier and Zernike moment features. Several feature extraction techniques are used to improve the recognition accuracy [20]. Discriminative 2D Zernike moments as feature are used for the color dataset [18]. ...
Preprint
Full-text available
We propose a new technique for recognition of dumb person hand gesture in real world environment. In this technique, the hand image containing the gesture is preprocessed and then hand region is segmented by convergent the RGB color image to L.a.b color space. Only few statistical features are used to classify the segmented image to different classes. Artificial Neural Network is trained in sequential manner using one against all. When the system gets trained, it becomes capable of recognition of each class in parallel manner. The result of proposed technique is much better than existing techniques.
... They use The CRF model for gesture recognition with the combination of Fourier and Zernike moment features. Several feature extraction techniques are used to improve the recognition accuracy [20]. Discriminative 2D Zernike moments as feature are used for the color dataset [18]. ...
Article
Full-text available
We propose a new technique for the recognition of dumb person hand gestures in a real-world environments. In this technique, the hand image containing the gesture is preprocessed and then hand region is segmented by convergent the RGB color image to L.a.b color space. Only few statistical features are used to classify the segmented image to different classes. Artificial Neural Network is trained in a sequential manner using one against all. When the system gets trained, it becomes capable of recognition of each class in a parallel manner. The result of the proposed technique is much better than existing techniques.
... The Fourier and Zernike moment extracted features are used to recognized the gesture by CRF model. In hand gesture recognition technique, several feature extraction methods are reported in the literature [11][12][13][14][15][16][17][18]. Triesch et al. [12] proposes a person independent classification of hand gestures on 10 ASL alphabets against two uniform backgrounds like dark, light and one complex background dataset. ...
Article
Full-text available
This paper demonstrates the development of vision based static hand gesture recognition system using web camera in real time applications. The vision based static hand gesture recognition system is developed using the following steps: preprocessing, feature extraction and classification. The preprocessing stage consists of illumination compensation, segmentation, filtering, hand region detection and image resize. This work proposes a discrete wavelet transform (DWT) and Fisher ratio (F-ratio) based feature extraction technique to classify the hand gestures in an uncontrolled environment. This method is not only robust towards distortion and gesture vocabulary, but also invariant to translation and rotation of hand gestures. A linear support vector machine (SVM) is used as a classifier to recognize the hand gestures. The performance of the proposed method is evaluated on two standard public datasets and one indigenously developed complex background dataset for recognition of hand gestures. All above three datasets are developed based on American Sign Language (ASL) hand alphabets. The experimental result is evaluated in terms of mean accuracy. Two possible real time applications are conducted, one is for interpretation of ASL sign alphabets and another is for Image browsing.
... Publications stemming from this work include De Villiers et al. [26] (under review), which presents the tutor system, and De Villiers et al. [24] (published), which discusses the hand pose estimation system. ...
... Because the pose estimation system is to be used as a frontend for the tutor system, the use of a coloured glove is justified, as e-learning environments are more amenable to control than general settings. Note that the work detailed here was published in De Villiers et al. [24]. The presentation will follow this article, with extensions introduced after its publication being described in Chapters 7 and 8. ...
Thesis
Full-text available
A sign language tutoring system capable of generating detailed context-sensitive feedback to the user is presented in this dissertation. This stands in contrast with existing sign language tutor systems, which lack the capability of providing such feedback. A domain specific language is used to describe the constraints placed on the user’s movements during the course of a sign, allowing complex constraints to be built through the combination of simpler constraints. This same linguistic description is then used to evaluate the user’s movements, and to generate corrective natural language feedback. The feedback is dynamically tailored to the user’s attempt, and automatically targets that correction which would require the least effort on the part of the user. Furthermore, a procedure is introduced which allows feedback to take the form of a simple to-do list, despite the potential complexity of the logical constraints describing the sign. The system is demonstrated using real video sequences of South African Sign Language signs, exploring the different kinds of advice the system can produce, as well as the accuracy of the comments produced. To provide input for the tutor system, the user wears a pair of coloured gloves, and a video of their attempt is recorded. A vision-based hand pose estimation system is proposed which uses the Earth Mover’s Distance to obtain hand pose estimates from images of the user’s hands. A two-tier search strategy is employed, first obtaining nearest neighbours using a simple, but related, metric. It is demonstrated that the two-tier system’s accuracy approaches that of a global search using only the Earth Mover’s Distance, yet requires only a fraction of the time. The system is shown to outperform a closely related system on a set of 500 real images of gloved hands.
Chapter
Maintaining natural image statistics is a crucial factor in restoration and generation of realistic looking images. When training CNNs, photorealism is usually attempted by adversarial training (GAN), that pushes the output images to lie on the manifold of natural images. GANs are very powerful, but not perfect. They are hard to train and the results still often suffer from artifacts. In this paper we propose a complementary approach, that could be applied with or without GAN, whose goal is to train a feed-forward CNN to maintain natural internal statistics. We look explicitly at the distribution of features in an image and train the network to generate images with natural feature distributions. Our approach reduces by orders of magnitude the number of images required for training and achieves state-of-the-art results on both single-image super-resolution, and high-resolution surface normal estimation. Project page: https://www.github.com/roimehrez/contextualLoss.
Article
3D Hand pose estimation has received an increasing amount of attention, especially since consumer depth cameras came onto the market in 2010. Although substantial progress has occurred recently, no overview has kept up with the latest developments. To bridge the gap, we provide a comprehensive survey, including depth cameras, hand pose estimation methods, and public benchmark datasets. First, a markerless approach is proposed to evaluate the tracking accuracy of depth cameras with the aid of a numerical control linear motion guide. Traditional approaches focus only on static characteristics. The evaluation of dynamic tracking capability has been long neglected. Second, we summarize the state-of-the-art methods and analyze the lines of research. Third, existing benchmark datasets and evaluation criteria are identified to provide further insight into the field of hand pose estimation. In addition, realistic challenges, recent trends, dataset creation and annotation, and open problems for future research directions are also discussed.
Article
First world support systems are not always available to, or affordble for, South Africans with assistive requirements. Our group focuses on persons with communication needs, such as persons with sight, hearing or autism barriers.