Manoj Banik

University of Bristol, Bristol, England, United Kingdom

Are you Manoj Banik?

Claim your profile

Publications (19)2.65 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Universal Networking Language (UNL) is a worldwide generalizes form human interactive in machine independent digital platform for defining, recapitulating, amending, storing and dissipating knowledge or information among people of different affiliations. The theoretical and practical research associated with these interdisciplinary endeavor facilities in a number of practical applications in most domains of human activities such as creating globalization trends of market or geopolitical independence among nations. In our research work we have tried to develop analysis rules for Bangla part of speech which will help to create a doorway for converting the Bangla language to UNL and vice versa and overcome the barrier between Bangla to other Languages.
    No preview · Conference Paper · May 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Context information influences the performance of Automatic Speech Recognition (ASR). Current Hidden Markov Model (HMM) based ASR systems have solved this problem by using context-sensitive tri-phone models. However, these models need a large number of speech parameters and a large volume of speech corpus. In this paper, we propose a technique to model a dynamic process of co-articulation and embed it to ASR systems. Recurrent Neural Network (RNN) is expected to realize this dynamic process. But main problem is the slowness of RNN for training the network of large size. We introduce Distinctive Phonetic Feature (DPF) based feature extraction using a two-stage system consists of a Multi-Layer Neural Network (MLN) in the first stage and another MLN in the second stage where the first MLN is expected to reduce the dynamics of acoustic feature pattern and the second MLN to suppress the fluctuation caused by DPF context. The experiments are carried out using Japanese triphthong data. The proposed DPF based feature extractor provides better segmentation performance with a reduced mixture-set of HMMs. Better context effect is achieved with less computation using MLN instead of RNN.
    No preview · Conference Paper · May 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tongue movement ear pressure (TMEP) signals have been used to generate controlling commands in assistive human machine interfaces aimed at people with disabilities. The objective of this study is to classify the controlled movement related signals of an intended action from internally occurring physiological signals which can interfere with the inter-movement classification. TMEP signals were collected, corresponding to six types of controlled movements and activity relating to the potentially interfering environment including when a subject spoke, coughed or drank. The signal processing algorithm involved TMEP signal detection, segmentation, feature extraction and selection, and classification. The features of the segmented TMEP signals were extracted using the wavelet packet transform (WPT). A multi-layer neural network was then designed and tested based on statistical properties of the WPT coefficients. The average classification performance for discriminating interference and controlled movement related TMEP signal achieved 97.05%. The classification of TMEP signals based on the WPT is robust and the interferences to the controlling commands of TMEP signals in assistive human machine interface can be significantly reduced using the multi-layer neural network when considered in this challenging environment.
    Full-text · Conference Paper · Jan 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we compare among performance of different acoustic features for Bangla Automatic Speech Recognition (ASR). Most of the Bangla ASR system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) and local features (LFs) are inputted to the hidden Markov model (HMM) based classifiers for obtaining phoneme recognition performance. It is shown from the experimental results that MFCC-based method of 39 dimensions provides a higher phoneme correct rate and accuracy than the other methods investigated.
    No preview · Conference Paper · Jan 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The usage of native language through Internet is highly demanding now a day due to rapidly increase of Internet based application in daily needs. It is important to read all information in Bangla from the internet. Universal Networking Language (UNL) addressed this issue in most of languages. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. In this paper we propose a work that aims to contribute with morphological analysis of those Bangla words from which we obtain roots and Primary suffixes and developing of grammatical attributes for roots and Primary suffixes that can be used to prepare Bangla word dictionary and Enconversion/Deconversion rules for Natural Language Processing(NLP).
    Full-text · Article · Jan 2011 · International Journal of Advanced Computer Science and Applications

  • No preview · Article · Jan 2011 · International Journal of Advanced Computer Science and Applications
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a distinctive phonetic features (DPFs) based phoneme recognition method by incorporating syllable language models (LMs). The method comprises three stages. The first stage extracts three DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and LMs of syllable-based subwords to output more precise phoneme strings. From the experiments, it is observed that the proposed method provides a higher phoneme correct rate as well as a tremendous improvement of phoneme accuracy. Moreover, it shows higher phoneme recognition performance at fewer mixture components in hidden Markov models (HMMs).
    Full-text · Article · Dec 2010 · Journal of multimedia
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we have prepared a medium size Bangla speech corpus and compare performances of different acoustic features for Bangla word recognition. Most of the Bangla automatic speech recognition (ASR) system uses a small number of speakers, but 40 speakers selected from a wide area of Bangladesh, where Bangla is used as a native language, are involved here. In the experiments, mel-frequency cepstral coefficients (MFCCs) are inputted to the triphone hidden Markov model (HMM) based classifiers for obtaining word recognition performance. From the experiments, it is shown that MFCC-based method of 39 dimensions provides a higher word correct rate (WCR) and word accuracy (WA) than the other methods investigated. Moreover, a higher WCR and WA is obtained by the MFCC39-based method with fewer mixture components in the HMM.
    No preview · Conference Paper · Dec 2010

  • No preview · Conference Paper · Oct 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an isolated word recognition method based on distinctive phonetic features (DPFs). The method comprises two multilayer neural networks (MLNs). The first MLN, MLNLF-DPF, maps local features (LFs) of an input speech signal into discrete DPFs and the second MLN, MLNDyn, restricts dynamics of outputted DPFs by the MLNLF-DPF. In the experiments on To hokudai Isolated Spoken-Word Database in clean acoustic environment, the proposed recognizer was found to provide a higher word correct rate (WCR) as well as word accuracy (WA) with fewer mixture components in hidden Markov models (HMMs) in comparison with the method proposed by T. Fukuda, et al. [6].
    No preview · Article · Jan 2010

  • No preview · Conference Paper · Jan 2010
  • Q N Eity · M Banik · N J Lisa · F Hassan · M S Hossain · M N Huda
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ are inserted into another MLN to improve the phoneme probabilities for the hidden Markov models (HMMs) by reducing the context effect. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.
    No preview · Conference Paper · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for automatic phoneme recognition for Japanese language using tandem MLNs. The method comprises three stages: (i) multilayer neural network (MLN) that converts acoustic features into distinctive phonetic features DPFs, (ii) MLN that combines DPFs and acoustic features as input and generates a 45 dimensional DPF vector with less context effect and (iii) the 45 dimensional feature vector generated by the second MLN are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings from the input speech. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and improves phoneme accuracy tremendously over the method based on a single MLN. Moreover, it requires fewer mixture components in HMMs.
    No preview · Conference Paper · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of two stages: i) a multilayer neural network (MLN), which converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities and ii) the phoneme probabilities obtained from the first stage and corresponding Δ and ΔΔ parameters calculated by linear regression (LR) are inserted into a hidden Markov model (HMM) based classifier to obtain more accurate phoneme strings. From the experiments on Bangla speech corpus prepared by us, it is observed that the proposed method provides higher phoneme recognition performance than the existing method. Moreover, it requires a fewer mixture components in the HMMs.
    No preview · Article · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an effect of articulatory dynamic parameters (Δ and ΔΔ) on neural network based automatic speech recognition(ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features- based in ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises three stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, (ii) incorporation of dynamic parameters into another MLN for reducing DPF context, and (iii) addition of an Inhibition/Enhancement (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt (GS) orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with hidden Markov model (HMMs)-based classifier. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher recognition performance.
    No preview · Article · Jan 2010

  • No preview · Conference Paper · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for extracting distinctive phonetic features (DPFs) for automatic speech recognition (ASR). The method comprises three stages: i) a acoustic feature extractor, ii) a multilayer neural network (MLN) and iii) a hidden Markov model (HMM) based classifier. At first stage, acoustic features, local features (LFs), are extracted from input speech. On the other stage, MLN generates a 45-dimentional DPF vector from the LFs of 75- dimentions. Finally, these 45-dimentional DPF vector is inserted into an HMM-based classifier to obtain phoneme strings. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed DPF extractor provides a higher phoneme correct rate and accuracy with fewer mixture components in the HMMs compared to the method based on mel frequency cepstral coefficients (MFCCs). Moreover, a higher correct rate for each phonetic feature is obtained using the proposed method.
    No preview · Conference Paper · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an effect of articulatory Δ and ΔΔ parameters on automatic speech recognition (ASR). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features based ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises two stages: (i) DPFs extraction using a multilayer neural network (MLN) from acoustic features, local features (LFs) and (ii) incorporation of dynamic parameters (Δ and ΔΔ) into a hidden Markov model (HMM) based classifier for more accurate performances. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate and phoneme accuracy over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher performance.
    No preview · Article · Jan 2010
  • M.N. Huda · M. Banik · G. Muhammad · B.J. Kroger
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In/En) network to obtain more categorical DPF movement and decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure. Then, the third stage embeds acoustic models (AMs) and language models (LMs) of syllable-based subwords to output more precise phoneme strings. The proposed method provides a higher phoneme correct rate as well as phoneme accuracy with fewer mixture components in hidden Markov models (HMMs).
    No preview · Conference Paper · Jan 2009