Conference Paper

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition.

DOI: 10.2197/ipsjtcva.2.25 Conference: Advances in Image and Video Technology, Third Pacific Rim Symposium, PSIVT 2009, Tokyo, Japan, January 13-16, 2009. Proceedings
Source: DBLP


  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, automatic lip reading based on ‘visemes’ have been studied by researchers for realizing human-machine interactive communication system in many applications. However there are a lot of problems such as the definition of the number of viseme classes, discrimination method of visemes, speech recognition method based on visemes, and so on. In this paper, a novel classification of Japanese visemes and hierarchical weighted discrimination method for speech recognition are proposed to address these problems. We augmented the classification number of visemes from 6(conventional) to 9 to represent the words in more detailed by visemes. In addition, considering the difficulty in discriminating with increase of the number of visemes, the hierarchical weighted discrimination method is proposed. For the purpose of comparing with the conventional method, the ATR phonetically balanced word group, which is large vocabulary and includes various visemes, was used and applied to word recognition experiments. From these results, we confirmed the proposed method worked well.
    Intelligent Signal Processing and Communications Systems (ISPACS), 2013 International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.
    Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.
    2013 IEEE Conference on Systems, Process & Control (ICSPC); 12/2013

Full-text (2 Sources)

Available from
May 31, 2014