Conference Paper

Combination of manual and non-manual features for sign language recognition based on conditional random field and active appearance model

Sch. of Comput. Eng., Chosun Univ., Gwangju, South Korea
DOI: 10.1109/ICMLC.2011.6016973 Conference: Machine Learning and Cybernetics (ICMLC), 2011 International Conference on, Volume: 4
Source: IEEE Xplore


Sign language recognition is the task of detection and recognition of manual signals (MSs) and non-manual signals (NMSs) in a signed utterance. In this paper, a novel method for recognizing MS and facial expressions as a NMS is proposed. This is achieved through a framework consisting of three components: (1) Candidate segments of MSs are discriminated using an hierarchical conditional random field (CRF) and Boost-Map embedding. It can distinguish signs, fingerspellings and non-sign patterns, and is robust to the various sizes, scales and rotations of the signer's hand. (2) Facial expressions as a NMS are recognized with support vector machine (SVM) and active appearance model (AAM), AAM is used to extract facial feature points. From these facial feature points, several measurements are computed to distinguish each facial component into defined facial expressions with SVM. (3) Finally, the recognition results of MSs and NMSs are fused in order to recognize signed sentences. Experiments demonstrate that the proposed method can successfully combine MSs and NMSs features for recognizing signed sentences from utterance data.

12 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: We show an original method for automatic hand gesture recognition that makes use of fuzzified latent-dynamic conditional random fields (LDCRF). In this method, fuzzy linguistic variables are used to model the features of hand gestures and then to modify the potential function in LDCRFs. By combining LDCRFs and fuzzy sets, these fuzzy-based LDCRFs (FLDCRF) have the advantages of LDCRFs in sequence labeling along with the advantage of retaining the imprecise character of gestures. The efficiency of the proposed method was tested with unsegmented gesture sequences in three different hand gesture data sets. The experimental results demonstrate that FLDCRFs compare favorably with support vector machines, hidden conditional random fields, and LDCRFs on hand gesture recognition tasks.
    Optical Engineering 06/2012; 51(6):7202-. DOI:10.1117/1.OE.51.6.067202 · 0.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gesture recognition is useful for human-computer interaction. The difficulty of gesture recognition is that instances of gestures vary both in motion and shape in three-dimensional (3-D) space. We use depth infor-mation generated using Microsoft's Kinect in order to detect 3-D human body components and apply a threshold model with a conditional random field in order to recognize meaningful gestures using continuous motion information. Body gesture recognition is achieved through a framework consisting of two steps. First, a human subject is described by a set of features, encoding the angular relationship between body components in 3-D space. Second, a feature vector is recognized using a threshold model with a conditional random field. In order to show the performance of the proposed method, we use a public data set, the Microsoft Research Cambridge-12 Kinect gesture database. The experimental results demon-strate that the proposed method can efficiently and effectively recognize body gestures automatically.
    Optical Engineering 01/2013; 5(1). DOI:10.1117/1.OE.52.1.017201 · 0.95 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The sign language is composed of two categories of signals: manual signals such as signs and fingerspellings and non-manual ones such as body gestures and facial expressions. This paper proposes a new method for recognizing manual signals and facial expressions as non-manual signals. The proposed method involves the following three steps: First, a hierarchical conditional random field is used to detect candidate segments of manual signals. Second, the BoostMap embedding method is used to verify hand shapes of segmented signs and to recognize fingerspellings. Finally, the support vector machine is used to recognize facial expressions as non-manual signals. This final step is taken when there is some ambiguity in the previous two steps. The experimental results indicate that the proposed method can accurately recognize the sign language at an 84% rate based on utterance data.
    Pattern Recognition Letters 12/2013; 34(16):2051-2056. DOI:10.1016/j.patrec.2013.06.022 · 1.55 Impact Factor