M. Banik

Ahsanullah University of Science & Tech, Mujib City, Dhaka, Bangladesh

Are you M. Banik?

Claim your profile

Publications (16)0 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.
    Information Technology: New Generations (ITNG), 2011 Eighth International Conference on; 05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Universal Networking Language (UNL) is a worldwide generalizes form human interactive in machine independent digital platform for defining, recapitulating, amending, storing and dissipating knowledge or information among people of different affiliations. The theoretical and practical research associated with these interdisciplinary endeavor facilities in a number of practical applications in most domains of human activities such as creating globalization trends of market or geopolitical independence among nations. In our research work we have tried to develop analysis rules for Bangla part of speech which will help to create a doorway for converting the Bangla language to UNL and vice versa and overcome the barrier between Bangla to other Languages.
    Information Technology: New Generations (ITNG), 2011 Eighth International Conference on; 05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we compare among performance of different acoustic features for Bangla Automatic Speech Recognition (ASR). Most of the Bangla ASR system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) and local features (LFs) are inputted to the hidden Markov model (HMM) based classifiers for obtaining phoneme recognition performance. It is shown from the experimental results that MFCC-based method of 39 dimensions provides a higher phoneme correct rate and accuracy than the other methods investigated.
    Computer Applications and Industrial Electronics (ICCAIE), 2010 International Conference on; 01/2011
  • Source
    Journal of Multimedia. 01/2010; 5:543-550.
  • Circuits and Systems (APCCAS), 2010 IEEE Asia Pacific Conference on; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for extracting distinctive phonetic features (DPFs) for automatic speech recognition (ASR). The method comprises three stages: i) a acoustic feature extractor, ii) a multilayer neural network (MLN) and iii) a hidden Markov model (HMM) based classifier. At first stage, acoustic features, local features (LFs), are extracted from input speech. On the other stage, MLN generates a 45-dimentional DPF vector from the LFs of 75- dimentions. Finally, these 45-dimentional DPF vector is inserted into an HMM-based classifier to obtain phoneme strings. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed DPF extractor provides a higher phoneme correct rate and accuracy with fewer mixture components in the HMMs compared to the method based on mel frequency cepstral coefficients (MFCCs). Moreover, a higher correct rate for each phonetic feature is obtained using the proposed method.
    Signal and Image Processing (ICSIP), 2010 International Conference on; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ parameters calculated by linear regression (LR) are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.
    01/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for automatic phoneme recognition for Japanese language using tandem MLNs. The method comprises three stages: (i) multilayer neural network (MLN) that converts acoustic features into distinctive phonetic features DPFs, (ii) MLN that combines DPFs and acoustic features as input and generates a 45 dimensional DPF vector with less context effect and (iii) the 45 dimensional feature vector generated by the second MLN are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings from the input speech. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and improves phoneme accuracy tremendously over the method based on a single MLN. Moreover, it requires fewer mixture components in HMMs.
    10th International Conference on Hybrid Intelligent Systems (HIS 2010), Atlanta, GA, USA, August 23-25, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an effect of articulatory Δ and ΔΔ parameters on automatic speech recognition (ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features based ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises two stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, local features (LFs) and (ii) incorporation of dynamic parameters (Δ and ΔΔ) into a hidden Markov model (HMM) based classifier for more accurate performances. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and phoneme accuracy over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher performance.
    01/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an effect of articulatory dynamic parameters (Δ and ΔΔ) on neural network based automatic speech recognition(ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features- based in ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises three stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, (ii) incorporation of dynamic parameters into another MLN for reducing DPF context, and (iii) addition of an Inhibition/Enhancement (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt (GS) orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with hidden Markov model (HMMs)-based classifier. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher recognition performance.
    01/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an isolated word recognition method based on distinctive phonetic features (DPFs). The method comprises two multilayer neural networks (MLNs). The first MLN, MLNLF-DPF, maps local features (LFs) of an input speech signal into discrete DPFs and the second MLN, MLNDyn, restricts dynamics of outputted DPFs by the MLNLF-DPF. In the experiments on To hokudai Isolated Spoken-Word Database in clean acoustic environment, the proposed recognizer was found to provide a higher word correct rate (WCR) as well as word accuracy (WA) with fewer mixture components in hidden Markov models (HMMs) in comparison with the method proposed by T. Fukuda, et al. [6].
    Integrated Intelligent Computing. 01/2010;
  • Computer Applications and Industrial Electronics (ICCAIE), 2010 International Conference on; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to improve the phoneme probabilities for the hidden Markov models (HMMs) by reducing the context effect. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.
    Signal and Image Processing (ICSIP), 2010 International Conference on; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we have prepared a medium size Bangla speech corpus and compare performances of different acoustic features for Bangla word recognition. Most of the Bangla automatic speech recognition (ASR) system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) are inputted to the triphone hidden Markov model (HMM) based classifiers for obtaining word recognition performance. From the experiments, it is shown that MFCC-based method of 39 dimensions provides a higher word correct rate (WCR) and word accuracy (WA) than the other methods investigated. Moreover, a higher WCR and WA is obtained by the MFCC39-based method with fewer mixture components in the HMM.
    Circuits and Systems (APCCAS), 2010 IEEE Asia Pacific Conference on; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and language models (LMs) of syllable-based subwords to output more precise phoneme strings. The proposed method provides a higher phoneme correct rate as well as phoneme accuracy with fewer mixture components in hidden Markov models (HMMs).
    Computers and Information Technology, 2009. ICCIT '09. 12th International Conference on; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The usage of native language through Internet is highly demanding now a day due to rapidly increase of Internet based application in daily needs. It is important to read all information in Bangla from the internet. Universal Networking Language (UNL) addressed this issue in most of languages. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. In this paper we propose a work that aims to contribute with morphological analysis of those Bangla words from which we obtain roots and Primary suffixes and developing of grammatical attributes for roots and Primary suffixes that can be used to prepare Bangla word dictionary and Enconversion/Deconversion rules for Natural Language Processing(NLP).