Topics (14) View all

Skills (22)

Research experience

  • Jul 2010–
    present
    Research: Robotic Speech Recognition
    University of Auckland · Department of Electrical & Computer Engineering · HealthBots Group
    New Zealand · Auckland

Education

  • Jul 2006–
    Mar 2009
    Ain Shams University
    Computer Science · Master
    Egypt · Cairo

Awards & achievements

  • Oct 2012
    Award: Best Student Paper Award from the international conference on social robotics, China
  • Jul 2010
    Grant: R&D program of the Korea Ministry of Knowledge and Economy (MKE) and Korea Evaluation Institute of Industrial Technology (KEIT)

Other

  • Languages
    English, Arabic
  • Scientific Memberships
    --
  • Other Interests
    --

Questions and Answers (4) View all

  • Answer added in Signal Processing
    7 How is information captured from a signal?
    By Gupta Ashutosh · Amity University
    Abdelaziz Abdelhamid · University of Auckland
    From the speech recognition perspective, the process goes like this. The brain of a speaker thinking of a certain message to speak, then the voice art... [more]
  • Answer added in Digital Signal Processing
    14 Difference between convolution and correlation
    By Aasma Garg · Indian Institute of Information Technology Allahabad
    Abdelaziz Abdelhamid · University of Auckland
    Simply, correlation is a measure of similarity between two signals, and convolution is a measure of effect of one signal on the other. 
  • 11 What is the latest/best runtime known for computing an all-pairs shortest paths-problem on a weighted network?
    By Katharina Zweig · Technische Universität Kaiserslautern
    Abdelaziz Abdelhamid · University of Auckland
    I think Viterbi algorithm can be used in this case. It is very efficient in finding the shorted path (best decoding hypothesis) in a huge weighted fin... [more]
  • Question asked in Speech Processing
    1 Can i use minimum phone error (MPE) criterion for training HMM acoustic models using 1-best hypothesis instead of nom and denom lattices?
    Usually lattices are used in HMM parameter optimization with the MPE criterion. What about using only one decoding hypothesis instead of lattices? 
    By Abdelaziz Abdelhamid · University of Auckland

Publications (7) View all

  • Conference Proceeding: On the Robustness of Joint Optimization on Transducer-based Decoding Graphs
    Abdelaziz A Abdelhamid, Waleed H Abdulla
    [show abstract] [hide abstract]
    ABSTRACT: It is our believe that joint optimization of acoustic and language models meets the inherent correlation between them, and thus expected to achieve better recognition perfor-mance. This nice approach should be effective in achieving robust speech recognition where the testing conditions are different from those of training. The acoustic and language models are integrated together into a unified decoding graph using weighted finite state transducers. In this paper, we report experimental re-sults of the joint optimization of acoustic and language models on the Resource Management (RM1) continuous speech recognition. The results show that the proposed joint optimization approach is effective under noisy conditions for unseen testing utterances and achieved relative word error rate reduction from 7% to 17% for different noise levels. These results emphasize our expectation about the robustness of the proposed joint optimization approach.
    IEEE TENCON, Australia; 04/2013
  • Conference Proceeding: Speech Decoding Using Lattice Rescoring
    Abdelaziz A Abdelhamid, Waleed H Abdulla
    [show abstract] [hide abstract]
    ABSTRACT: The goal of automatic speech recognition (ASR) is to decode spoken utterances, by machine, to uncover their information content. Therefore, ASR is also called speech decoding. Over the last seven decades, Tremendous research efforts emerged competing together to achieve this goal in a real-time accurate way. One of the promising approaches to achieve that goal is lattice rescoring. Several knowledge sources are usually incorporated in the speech decoding process, such as acoustic, language and lexical models that are integrated together into a recognition network (also called decoding graph). Currently, the common approach usually followed in building this recog-nition network is weighted finite-state transducer (WFST) [1]. The speech decoding process can be performed based in either single-pass or multi-pass approach. The lattice rescoring is a form of multi-pass decoding, in which the lattice is generated in the first pass using simple and low order knowledge sources and the rescoring is performed in the second pass using higher order knowledge sources. WFST and speech decoding Currently, WFST is adopted as the best unified approach for integrating speech knowledge sources into one decoding graph [1]. This is due to the efficiency of the WFST operations in building compact graphs convenient for existing decoding algorithms, such as Viterbi algorithm. The WFST is defined as a set of states and transitions, where each transition carries an input symbol (i.e., context-dependent phoneme), output symbol (i.e., word) and a weight (i.e., natural log of a language model probability). When applying the Viterbi algorithm to a WFST, the result can be either a single-best decoding hypothe-sis (single sequence of words) or N-best hypotheses. To output these N-best hypotheses in a rich format, we usually group them together in a graph called lattice. The algorithm that is commonly used to generate these hypotheses is the Viterbi beam pruning which is usually implemented in the form of token passing mechanism, as shown in Fig. 1. In this figure, the decoding network is navigated through a propagation of a set of tokens, and at each point of the navigation process the acoustic and language model scores are calculated and accumulated in the corresponding token. During the decoding process, each active transition has an associated token for storing the decoding score up to this transition, and at the end of the decoding process, the last set of tokens are compared with respect to the accumulated scores at each token, then the N-best tokens (carrying highest scores) are selected. To generate a lattice during decoding, a separate data structure can be used to store the set of transitions the tokens passed over during the decoding process.
    APSIPA Letters; 03/2013
  • Conference Proceeding: Discriminative Training of Context-Dependent Phones on WFST-based Decoding Graphs
    Abdelaziz A Abdelhamid, Waleed H Abdulla
    [show abstract] [hide abstract]
    ABSTRACT: This research proposes a sub-graph extraction method to boost the discriminability of the generative acoustic models. The proposed method explicitly formalizes the notion that the words of reference decoding paths have not only one pronunciation but instead multiple pronunciations for the same word. The discriminative training framework is based on a minimum classification error (MCE) criterion for optimizing the parameters of hidden Markov models (HMMs) for context dependent phones. The primary task examined consists of 12 different large decoding graphs composed from four HMMs with different topologies and three n-gram language models containing 5k, 20k, and 64k words. The experimental results show that the proposed approach outperforms the baseline system based on maximum likelihood estimation (MLE) and achieves a reduction of 18.7% and 17.4% in word error rate (WER) and triphone error rate (TER) respectively when tested on the resource management (RM1) speech database.
    International Conference on Communications, Signal Processing, and their Applications (ICCSPA), 2013; 02/2013
  • Conference Proceeding: Optimizing the parameters of decoding graphs using new log-based MCE
    A A Abdelhamid, W H Abdulla
    Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific; 01/2012
  • Chapter: RoboASR: A Dynamic Speech Recognition System for Service Robots
    AbdelazizA. Abdelhamid, WaleedH. Abdulla, BruceA. MacDonald
    01/2012: pages 485-495; , ISBN: 9783642341021

Following (9) See all

Followers (3) See all