Conference Paper

Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model

Sch. of Comput. Eng., Chosun Univ., Gwangju, South Korea
DOI: 10.1109/ICMLC.2011.6016973 Conference: Machine Learning and Cybernetics (ICMLC), 2011 International Conference on, Volume: 4
Source: IEEE Xplore


Sign language recognition is the task of detection and recognition of manual signals (MSs) and non-manual signals (NMSs) in a signed utterance. In this paper, a novel method for recognizing MS and facial expressions as a NMS is proposed. This is achieved through a framework consisting of three components: (1) Candidate segments of MSs are discriminated using an hierarchical conditional random field (CRF) and Boost-Map embedding. It can distinguish signs, fingerspellings and non-sign patterns, and is robust to the various sizes, scales and rotations of the signer's hand. (2) Facial expressions as a NMS are recognized with support vector machine (SVM) and active appearance model (AAM), AAM is used to extract facial feature points. From these facial feature points, several measurements are computed to distinguish each facial component into defined facial expressions with SVM. (3) Finally, the recognition results of MSs and NMSs are fused in order to recognize signed sentences. Experiments demonstrate that the proposed method can successfully combine MSs and NMSs features for recognizing signed sentences from utterance data.

12 Reads
  • Source
    • "Biswas [1] proposed a gesture recognition using SVM. Hee-Deok Yang et al., proposed sign language recognition using hierarchical CRF and boost map embedding to detect manual signals and non-manual signals, and SVM and active appearance model to recognition [17]. Jiang and Zhong [18] proposed hierarchical models for the complex action recognition consist of three stages, namely group-labeling, frame-labeling, and action-labeling. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many real-world gesture datasets are by nature containing unbalanced number of poses across classes. Such imbalance severely reduces bag-of-poses based classification performance. On the other hand, collecting a dataset of human gestures or actions is an expensive and time-consuming procedure. It is often impractical to reacquire the data or to modify the existing dataset using oversampling or undersampling procedures. The best way to handle such imbalance is by making the used classifier be directly aware and adapt to the real condition inside the data. Balancing class distribution, i.e., the number of pose samples per class, is one of difficult tasks in machine learning. Standard statistical learning models (e.g., SVM, HMM, CRF) are insensitive to unbalanced datasets. This paper proposes a distribution-sensitive prior on a standard statistical learning, i.e., Relevance Vector Machine (RVM), to deal with the imbalanced data problem. This prior analyzes the training dataset before learning a model. Thus, the RVM can put more weight on the samples from under-represented classes, while allows overall samples from the dataset to have a balanced impact to the learning process. Our experiment uses a publicly available gesture datasets, the Microsoft Research Cambridge-12 (MSRC-12). Experimental results show the importance of adapting to the unbalanced data and improving the recognition performance through distribution-sensitive prior.
    Full-text · Conference Paper · Nov 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: We show an original method for automatic hand gesture recognition that makes use of fuzzified latent-dynamic conditional random fields (LDCRF). In this method, fuzzy linguistic variables are used to model the features of hand gestures and then to modify the potential function in LDCRFs. By combining LDCRFs and fuzzy sets, these fuzzy-based LDCRFs (FLDCRF) have the advantages of LDCRFs in sequence labeling along with the advantage of retaining the imprecise character of gestures. The efficiency of the proposed method was tested with unsegmented gesture sequences in three different hand gesture data sets. The experimental results demonstrate that FLDCRFs compare favorably with support vector machines, hidden conditional random fields, and LDCRFs on hand gesture recognition tasks.
    No preview · Article · Jun 2012 · Optical Engineering
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gesture recognition is useful for human-computer interaction. The difficulty of gesture recognition is that instances of gestures vary both in motion and shape in three-dimensional (3-D) space. We use depth infor-mation generated using Microsoft's Kinect in order to detect 3-D human body components and apply a threshold model with a conditional random field in order to recognize meaningful gestures using continuous motion information. Body gesture recognition is achieved through a framework consisting of two steps. First, a human subject is described by a set of features, encoding the angular relationship between body components in 3-D space. Second, a feature vector is recognized using a threshold model with a conditional random field. In order to show the performance of the proposed method, we use a public data set, the Microsoft Research Cambridge-12 Kinect gesture database. The experimental results demon-strate that the proposed method can efficiently and effectively recognize body gestures automatically.
    Full-text · Article · Jan 2013 · Optical Engineering
Show more