Conference Paper

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition.

DOI: 10.2197/ipsjtcva.2.25 Conference: Advances in Image and Video Technology, Third Pacific Rim Symposium, PSIVT 2009, Tokyo, Japan, January 13-16, 2009. Proceedings
Source: DBLP

ABSTRACT http://www.cipa.dcu.ie/pubs_full.html

Download full-text

Full-text

Available from: Dahai Yu, Mar 09, 2014
0 Followers
 · 
133 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper sorts out the problem of Persian Vowel viseme clustering. Clustering audio-visual data has been discussed for a decade or so. However, it is an open problem due to shortcoming of appropriate data and its dependency to target language. Here, we propose a speaker-independent and robust method for Persian viseme class identification as our main contribution. The overall process of the proposed method consists of three main steps including (I) Mouth region segmentation, (II) Feature extraction, and (IV) Hierarchical clustering. After segmenting the mouth region in all frames, the feature vectors are extracted based on a new look at Hidden Markov Model. This is another contribution to this work, which utilizes HMM as a probabilistic model-based feature detector. Finally, a hierarchical clustering approach is utilized to cluster Persian Vowel viseme. The main advantage of this work over others is producing a single clustering output for all subjects, which can simplify the research process in other applications. In order to prove the efficiency of the proposed method a set of experiments is conducted on AVAII.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.
    Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, automatic lip reading based on ‘visemes’ have been studied by researchers for realizing human-machine interactive communication system in many applications. However there are a lot of problems such as the definition of the number of viseme classes, discrimination method of visemes, speech recognition method based on visemes, and so on. In this paper, a novel classification of Japanese visemes and hierarchical weighted discrimination method for speech recognition are proposed to address these problems. We augmented the classification number of visemes from 6(conventional) to 9 to represent the words in more detailed by visemes. In addition, considering the difficulty in discriminating with increase of the number of visemes, the hierarchical weighted discrimination method is proposed. For the purpose of comparing with the conventional method, the ATR phonetically balanced word group, which is large vocabulary and includes various visemes, was used and applied to word recognition experiments. From these results, we confirmed the proposed method worked well.
    Intelligent Signal Processing and Communications Systems (ISPACS), 2013 International Symposium on; 01/2013