Conference Paper

Gestural teleoperation of a mobile robot based on visual recognition of sign language static handshapes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents results achieved in the frames of a national research project (titled ldquoDIANOEMArdquo), where visual analysis and sign recognition techniques have been explored on Greek Sign Language (GSL) data. Besides GSL modelling, the aim was to develop a pilot application for teleoperating a mobile robot using natural hand signs. A small vocabulary of hand signs has been designed to enable desktopbased teleoperation at a high-level of supervisory telerobotic control. Real-time visual recognition of the hand images is performed by training a multi-layer perceptron (MLP) neural network. Various shape descriptors of the segmented hand posture images have been explored as inputs to the MLP network. These include Fourier shape descriptors on the contour of the segmented hand sign images, moments, compactness, eccentricity, and histogram of the curvature. We have examined which of these shape descriptors are best suited for real-time recognition of hand signs, in relation to the number and choice of hand postures, in order to achieve maximum recognition performance. The hand-sign recognizer has been integrated in a graphical user interface, and has been implemented with success on a pilot application for real-time desktop-based gestural teleoperation of a mobile robot vehicle.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The system uses a camera to track a person and recognize arm motion, allowing the robot follows reliably a person with changing lighting conditions. In [30], an accuracy of 98.5% was obtained in recognizing gestures to control an mobile robot. ...
... The system uses a camera to track a person and recognize arm motion, allowing the robot follows reliably a person with changing lighting conditions. In [25], an accuracy of 98.5% was obtained in recognizing gestures to control an mobile robot. ...
Conference Paper
Analysis and recognition of objects in complex scenes is a demanding task for a computer. There is a selection mechanism, named visual attention, that optimizes the visual system, in which only the important parts of the scene are considered at a time. In this work, an object-based visual attention model with both bottom-up and top-down modulation is applied to the humanoid robot NAO to allow a new attention procedure to the robot. This means that the robot, by using its cameras, can recognize geometric figures even with the competition for the attention of all the objects in the image in real time. The proposed method is validated through some tests with 13 to 14 year old kids interacting with the robot NAO that provides some tips (such as the perimeter and area calculation formulas) and recognizes the figure showed by these children. The results are very promissor and show that the proposed approach can contribute for inserting robotics in the educacional context.
Article
One of the challenges of teleoperation is the recognition of a user’s intended commands, particularly in the manning of highly dynamic systems such as drones. In this paper, we present a solution to this problem by developing a generalized scheme relying on a Convolutional Neural Network (CNN) that is trained to recognize a user’s intended commands, directed through a haptic device. Our proposed method allows the interface to be personalized for each user, by pre-training the CNN differently according to the input data that is specific to the intended end user. Experiments were conducted using two haptic devices and classification results demonstrate that the proposed system outperforms geometric-based approaches by nearly 12%. Furthermore, our system also lends itself to other human–machine interfaces where intention recognition is required.
Article
Communication by avatars provides an attractive and natural human-computer interaction (HCI). While this special field requires multidisciplinary collaboration ranging from computer science to social science, a relatively comprehensive survey allowing researchers in a wide variety of areas to share their contributions has not been found in literature. This motivates us to take an overview of recent achievements in this field. This paper focuses on language, expression and posture, and surveys a broad range of issues, from fundamental researches such as gesture tracking, posture analysis or estimate, multimodal interface, to human figure animation approach, as well as psychologists' proposal and conceptual understanding of behaviours of avatars. We also discuss and compare many potential applications and systems drawn by avatars. Finally, we present the key challenges at present and direct the future trends. This survey endeavours to bring together various research areas involved in avatar-based communication to facilitate and promote the sharing of recent accomplishments.
Conference Paper
Full-text available
We propose a fast algorithm for automatically recognizing a limited set of gestures from hand images for a robot control application. Hand gesture recognition is a challenging problem in its general form. We consider a fixed set of manual commands and a reasonably structured environment, and develop a simple, yet effective, procedure for gesture recognition. Our approach contains steps for segmenting the hand region, locating the fingers, and finally classifying the gesture. The algorithm is invariant to translation, rotation, and scale of the hand. We demonstrate the effectiveness of the technique on real imagery
Chapter
Full-text available
This chapter has reviewed fundamental concepts and technologies of the general interdisciplinary field described usually by a combination of the terms: Virtual, Augmented or Mixed Reality systems, with the emphasis being on their applications in robot teleoperation. We have first analysed the basics of VR and AR systems, which have shown a great progress of research and development activities during the last decade, demonstrating a constantly increasing repertoire of useful practical applications in diverse domains of human activity. We have then described application scenarios of such VR technologies in the general field of robotics, with the particular focus on telerobotic applications. We have started by presenting a brief historical survey of the field of telerobotics, and identified the major profits that are related to the integration of VR and AR techniques. Virtual environments can be seen as a means to achieve natural, intuitive, multimodal human/computer (and generally human/machine) interaction; in this sense, a VE can function as an efficient mediator between a human operator and a telerobot, with the main objectives being: (a) to enhance human perception of the remote task environment and therefore improve transparency of the telerobotic system, by enriching the visual information (complemented by other form of sensory and sensori-motor stimuli) provided to the user, thus conveying complex data in a more natural and easier way; (b) to contribute to the solution of the time-delay problem in bilateral teleoperation and improve stability of the telerobotic system, by extending the concept of predictive displays and offering a range of control metaphors for both operator assistance and robot autonomy sharing. We have presented a number of successful case studies, where VR techniques have been effectively applied in telerobotics, for the two main robotic systems categories, namely (i) robot manipulators and (ii) mobile robotic vehicles. A long-distance parallel telemanipulation experiment was described, where an intermediate virtual task representation was used involving direct hand actions by means of a VR glove device. The use of telerobotic technologies in a distance training (virtual and remote laboratory) application has been also demonstrated, with very promising results in this important domain. As related to the field of mobile service robotics, two application scenarios have been described, to highlight the benefits that can result from the integration of VR-based interfaces for the teleoperation of robotic vehicles for a variety of tasks, including service / intervention tasks and remote exploration. The link with the field of haptics is also
Conference Paper
Full-text available
This paper presents a vision system to be embedded in a mobile robot, both of them implemented using recon- figurable computing technology. The vision system captures gestures by means of a digital color camera, and then performs some pre-processing steps in order to use the image as input to a RAM-based neural network. The set of recognized gestures can be defined using the system on-chip training capabilities. All the above functionality has been implemented in a single FPGA chip. Experimental results have shown the system to be robust, with enough performance to meet real-time constraints (30 fps), and also high efficiency in the recognition process (true recognition rate of 99.57%).
Conference Paper
Full-text available
The ability to detect a persons unconstrained hand in a natural video sequence has applications in sign language, gesture recognition and HCl. This paper presents a novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape. A database of images is first clustered using a k-method clustering algorithm with a distance metric based upon shape context. From this, a tree structure of boosted cascades is constructed. The head of the tree provides a general hand detector while the individual branches of the tree classify a valid shape as belong to one of the predetermined clusters exemplified by an indicative hand shape. Preliminary experiments carried out showed that the approach boasts a promising 99.8% success rate on hand detection and 97.4% success at classification. Although we demonstrate the approach within the domain of hand shape it is equally applicable to other problems where both detection and classification are required for objects that display high variability in appearance.
Conference Paper
Full-text available
This paper describes a teleoperation system in which an articulated robot performs a block pushing task based on hand gesture commands sent through the Internet. A fuzzy c-means clustering method is used to classify hand postures as "gesture commands". The fuzzy recognition system was tested using 20 trials each of a 12-gesture vocabulary. Results revealed an acceptance rate of 99.6% (percent of gestures with a sufficiently large membership value to belong to at least one of the designated classifications), and a recognition accuracy of 100% (the percent of accepted gestures classified correctly). Performance times to carry out the pushing task showed rapid learning, reaching standard times within 4 to 6 trials by an inexperienced operator.
Conference Paper
Full-text available
We present a feature extraction approach based on curvature scale space (CSS) for translation, scale, and rotation invariant recognition of hand poses. First, the CSS images are used to represent the shapes of boundary contours of hand poses. Then, we extract the multiple sets of CSS features to overcome the problem of deep concavities in contours of hand poses. Finally, nearest neighbour techniques are used to perform CSS matching between the multiple sets of input CSS features and the stored CSS features for hand pose identification. Results show the proposed approach can extract the multiple sets of CSS features from the input images and perform well for recognition of hand poses.
Conference Paper
Full-text available
Several systems for automatic gesture recognition have been developed using different strategies and approaches. In these systems the recognition engine is mainly based on three algorithms: dynamic pattern matching, statistical classification, and neural networks (NN). In that paper we present four architectures for gesture-based interaction between a human being and an autonomous mobile robot using the above mentioned techniques or a hybrid combination of them. Each of our gesture recognition architecture consists of a preprocessor and a decoder. Three different hybrid stochastic/connectionist architectures are considered. A template matching problem by making use of dynamic programming techniques is dealt with; the strategy is to find the minimal distance between a continuous input feature sequence and the classes. Preliminary experiments with our baseline system achieved a recognition accuracy up to 92%. All systems use input from a monocular color video camera, and are user-independent but so far they are not in real-time yet
Article
Full-text available
We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98 percent accuracy (97 percent with an unrestricted grammar). Both experiments use a 40-word lexicon
Article
Full-text available
A shape representation technique suitable for tasks that call for recognition of a noisy curve of arbitrary shape at an arbitrary scale or orientation is presented. The method rests on the describing a curve at varying levels of detail using features that are invariant with respect to transformations that do not change the shape of the curve. Three different ways of computing the representation are described. They result in three different representations: the curvature scale space image, the renormalized curvature scale space image, and the resampled curvature scale space image. The process of describing a curve at increasing levels of abstraction is referred to as the evolution or arc length evolution of that curve. Several evolution and arc length evolution properties of planar curves are discussed
Article
This paper presents a novel technique for hand gesture recognition through human–computer interaction based on shape analysis. The main objective of this effort is to explore the utility of a neural network-based approach to the recognition of the hand gestures. A unique multi-layer perception of neural network is built for classification by using back-propagation learning algorithm. The goal of static hand gesture recognition is to classify the given hand gesture data represented by some features into some predefined finite number of gesture classes. The proposed system presents a recognition algorithm to recognize a set of six specific static hand gestures, namely: Open, Close, Cut, Paste, Maximize, and Minimize. The hand gesture image is passed through three stages, preprocessing, feature extraction, and classification. In preprocessing stage some operations are applied to extract the hand gesture from its background and prepare the hand gesture image for the feature extraction stage. In the first method, the hand contour is used as a feature which treats scaling and translation of problems (in some cases). The complex moment algorithm is, however, used to describe the hand gesture and treat the rotation problem in addition to the scaling and translation. The algorithm used in a multi-layer neural network classifier which uses back-propagation learning algorithm. The results show that the first method has a performance of 70.83% recognition, while the second method, proposed in this article, has a better performance of 86.38% recognition rate.
Article
In this paper, we introduce a new approach to gesture recognition based on incorporating the idea of fuzzy ARTMAP (1) in the feature recognition neural network and Hussain and Kabuka (3). It has already shown that neural networks are relatively effective for partially corrupted images. However, a distinct subnet is created for every training pattern. Therefore, a big network is obtained when the number training patterns is large. Furthermore, recognition rate can be hurt due to the failure of combining features from similar training patterns. We would like to improve this technique by considering input images as fuzzy resources for hand gesture recognition. Training patterns are allowed to be merged, based on the measure of similarity among features, resulting in a subnet being shared by similar patterns and network size is reduced and recognition rate is increased. The high gesture recognition rates and quick network retraining times found in the present study suggest that a neural network approach to gesture recognition be further evaluated.
Article
A method for the analysis and synthesis of closed curves in the plane is developed using the Fourier descriptors FD's of Cosgriff [1]. A curve is represented parametrically as a function of arc length by the accumulated change in direction of the curve since the starting point. This function is expanded in a Fourier series and the coefficients are arranged in the amplitude/phase-angle form. It is shown that the amplitudes are pure form invariants as well as are certain simple functions of phase angles. Rotational and axial symmetry are related directly to simple properties of the Fourier descriptors. An analysis of shape similarity or symmetry can be based on these relationships; also closed symmetric curves can be synthesized from almost arbitrary Fourier descriptors. It is established that the Fourier series expansion is optimal and unique with respect to obtaining coefficients insensitive to starting point. Several examples are provided to indicate the usefulness of Fourier descriptors as features for shape discrimination and a number of interesting symmetric curves are generated by computer and plotted out.
Article
In this paper, we consider a vision-based system that can interpret a user's gestures in real time to manipulate windows and objects within a graphical user interface. A hand segmentation procedure first extracts binary hand blob(s) from each frame of the acquired image sequence. Fourier descriptors are used to represent the shape of the hand blobs, and are input to radial-basis function (RBF) network(s) for pose classification. The pose likelihood vector from the RBF network output is used as input to the gesture recognizer, along with motion information. Gesture recognition performances using hidden Markov models (HMM) and recurrent neural networks (RNN) were investigated. Test results showed that the continuous HMM yielded the best performance with gesture recognition rates of 90.2%. Experiments with combining the continuous HMMs and RNNs revealed that a linear combination of the two classifiers improved the classification results to 91.9%. The gesture recognition system was deployed in a prototype user interface application, and users who tested it found the gestures intuitive and the application easy to use. Real time processing rates of up to 22 frames per second were obtained.
Article
This paper presents a novel concept for teleoperation using direct human hand(s) actions which we called 'the hidden robot' concept. The proposed teleoperation scheme is composed of three main components: the operator/ computer loop, the execution loop and between them the bilateral transformation modules linked by the communication channel. Within the operator/computer master loop, the operator performs what we call a 'virtual task', without being constrained by the slave robot. At this stage, the bilateral transformation layer is in charge of extracting, at the low level, pertinent parameters from the virtual task and transforming them onto robot control signals. The execution loop performs control of the slave robot(s) to achieve the desired task, described by the virtual one. At this stage, the transformation layer extracts pertinent data to provide feedback when possible and needed. It also makes sure that the task is being performed correctly in the real site; otherwise, it takes necessary recovery procedures or informs the operator to procede in a different way. We will describe in detail each component, highlighting the originalities of our approach. We will also present the experiment performed by applying this concept to long-distance, simultaneous teleoperation of four slave robots with different kinematics and situated at different locations in France and in Japan. The experimental task consisted of assembling a four-piece puzzle. All the robots had to perform the same task in parallel. We will discuss the experimental results presented in this paper, concering long-distance teleoperated robot control and round-trip communication time delay. The experiment demonstrated the feasibility of the proposed scheme and gave guidelines related to the direct use of the operator hand, within an intermediate representation, as a guide for task execution.
Article
. This paper presents a sign language recognition system which consists of three modules: model-based hand tracking, feature extraction, and gesture recognition using a 3D Hopfield neural network (HNN). The first one uses the Hausdorff distance measure to track shape-variant hand motion, the second one applies the scale and rotation-invariant Fourier descriptor to characterize hand figures, and the last one performs a graph matching between the input gesture model and the stored models by using a 3D modified HNN to recognize the gesture. Our system tests 15 different hand gestures. The experimental results show that our system can achieve above 91% recognition rate, and the recognition process time is about 10 s. The major contribution in this paper is that we propose a 3D modified HNN for gesture recognition which is more reliable than the conventional methods.
Conference Paper
Temporal hand gesture recognition by fuzzified Takagi-Sugeno-Kang (TSK)-type recurrent fuzzy network (FTRFN) is proposed in this paper. The temporal hand gesture is captured by CCD and represented by a two-dimensional fuzzy trajectory. To handle fuzzy trajectories, FTRFN is employed. The inputs and outputs of FTRFN are fuzzy patterns represented by Gaussian membership functions, and the recurrent property of FTRFN enables it to deal with fuzzy patterns with temporal context. In recognition scheme, the FTRFN performs trajectory recognition by prediction instead of classification. Experiments on ten categories of gestures are performed to verify the proposed approach.
Conference Paper
The paper presents a gesture recognition approach for sign language using curvature scale space (CSS) and hidden Markov model (HMM). First, we use the translation, scale and rotation-invariant CSS descriptor to characterize the hand shapes of gestures. Then, we propose a feature-preserving algorithm to allocate CSS features into a one-dimensional and fixed-sized feature vector for HMM since the CSS features are two-dimensional and the number of the extracted CSS features of each hand shape is not fixed. Finally, we apply the HMM to determine hand shape and trajectory transitions among the different hand shapes and trajectories of the gestures for sign language identification. Results show the proposed approach performs well for sign language recognition
Conference Paper
In the foreseeable future, gestured inputs will be widely used in human-computer interfaces. This paper describes our initial attempt at recognizing 2D hand poses for application in video-based human-computer interfaces. Specifically, this research focuses on 2D image recognition utilizing an evolved wavelet-based feature vector. We have developed a two layer feed-forward neural network that recognizes the 24 static letters in the American sign language (ASL) alphabet using still input images. Thus far, two wavelet-based decomposition methods have been used. The first produces an 8-element real-valued feature vector and the second an 18-element feature vector. Each set of feature vectors is used to train a feed-forward neural network using Levenberg-Marquardt training. The system is capable of recognizing instances of static ASL fingerspelling with 99.9% accuracy with an SNR as low as 2. We conclude by describing issues to be resolved before expanding the corpus of ASL signs to be recognized.
Conference Paper
This paper presents a new visual gesture recognition method for the human-machine interface of mobile robot teleoperation. The interface uses seven static hand gestures, each of which represents an individual control command for the motion control of the remote robot. All the important aspects to develop such an interface are explored, including image acquisition, adaptive object segmentation with color image in RGB, HLS representation, morphological filtering, hand finding and labeling, and recognition with edge codes, template matching, and skeletonizing. By choosing processing methods and procedures properly, a higher ratio of correct recognition and a faster speed are achieved from the experiments.
Tracking Skin-colored Objects in Real-time, invited contribution to the Cutting Edge Robotics Book
  • A A Argyros
  • M I A Lourakis
A. A. Argyros, M. I. A. Lourakis, Tracking Skin-colored Objects in Real-time, invited contribution to the Cutting Edge Robotics Book, ISBN 3-86611-038-3, Advanced Robotic Systems International, 2005.
Virtual and Mixed Reality in Telerobotics: A Sur-vey, In: Industrial Robotics: Programming, Simulation and Applica-tions, Advanced Robotic Systems International
  • C S Tzafestas
C. S. Tzafestas, Virtual and Mixed Reality in Telerobotics: A Sur-vey, In: Industrial Robotics: Programming, Simulation and Applica-tions, Advanced Robotic Systems International, ISBN 3-86611-286-6, pp.437-470, Feb. 2007.