Salient phonetic features of Indian languages in speech technology. Sadhana

Sadhana (Impact Factor: 0.48). 10/2011; 36(5). DOI: 10.1007/s12046-011-0039-z


Speech signal is the basic study and analysis material in speech technology as well phonetics. To form meaningful chunks of language, the speech signal should have dynamically varying spectral characteristics, sometimes varying within a stretch of a few milliseconds. Phonetics groups these temporally varying spectral chunks into abstract classes roughly called as allophones. Distribution of these allophones into higher level classes called phonemes takes us closer to their function in a language. Phonemes and letters in the scripts of literate languages – languages which use writing have varying degrees of correspondence. As such a relationship exists, a major part of speech technology deals with the correlation of script letters with chunks of time-varying spectral stretches in that language. Indian languages are said to have a more direct correlation between their sounds and letters. Such similarity gives a false impression of similarity of text-to-sound rule sets across these languages. A given letter which has parallels across various languages may have different degrees of divergence in its phonetic realization in these languages. We illustrate such differences and point out the problem areas where speech scientists need to pay greater attention in building their systems, especially multilingual systems for Indian languages.

1 Follower
16 Reads
  • Source
    • "small markings placed around the IPA letter in order to show a certain alteration or more specific description in the letters pronunciation. In [8], the author describes speech from an acoustic phonetic point of view. Accordingly, speech can be demarcated with certain acoustic phonetic segments (APS) which can be correlated with linguistic abstractions such as allophones, phonemes, morphophonemes etc. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The phonetic engine is a system that performs speech signal to symbol transformation. This work describes some issues in the development of an Assamese Phonetic Engine (PE). International phonetic alphabet (IPA) is used as the phonetic unit to transcribe the speech database collected in three different modes, namely, reading, lecture and conversation modes. Only reading mode data is used for training and Hidden markov model (HMM) is used to model each phonetic unit without imposing any language or contextual constraint. The trained HMMs are used to derive a sequence of phonetic units from a test speech signal. Accuracy of 47.31%, 45.30% and 36.13% is achieved in reading, lecture and conversation mode, respectively. Confusion among the phonetic units specific to Assamese are discussed. Issues related to different recording modes, language and native speaker dependencies are discussed. The speech data is also collected in Hindi from three different sets of speakers to study speaker, language and native dependancies. Accuracy of 40.5%, 36.10% and 29.61% is achieved in native speaker dependent, native speaker independent and non-native speaker independent cases, respectively.
    2013 Annual IEEE India Conference (INDICON), Mumbai, India; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses phonetic transcription related issues in Gujarati and Marathi (Indian Languages). Some adhoc approaches to fix relationship between the general alphabetical symbols and phonetic symbols may not always work. Hence, some research issues like ambiguity between frication and aspirated plosive are addressed in this paper. The anusvara in both of these languages are produced based on the immediate following consonant. Implication for this finding for the problem of phonetic transcription is presented. Furthermore, the effect of dialectal variations on phonetic transcription is also analyzed for Marathi. Finally, some examples of phonetic transcription for sentences of these two languages are presented.
    Asian Language Processing (IALP), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the advent of social networks, there has been an exponential growth in multimedia data including speech. This speech data is typically conversational, casual and recorded in real environment. An important characteristic of this speech data is unavailability of corresponding transcripts (text) or the language information. In this work, we discuss technologies dealing with speech data without any corresponding transcripts and/or language information. A traditional way is to adopt acoustic models from existing benchmark databases (of known languages) for obtaining a first-level transcription and then perform bootstrapping. We show inherent limitations of such approaches, and argue that signal processing algorithms based on speech production knowledge play an important role in dealing with such speech data. This paper discusses some of the ongoing work at our lab in this direction which includes building audio search, speech summarization, speech synthesis and voice conversion using untranscribed speech.
    Signal Processing and Communications (SPCOM), 2012 International Conference on; 01/2012
Show more