H.A. Murthy

Indian Institute of Technology Madras, Chennai, State of Tamil Nadu, India

Are you H.A. Murthy?

Claim your profile

Publications (5)0 Total impact

  • Conference Proceeding: Using polysyllabic units for text to speech synthesis in Indian languages
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes the design and development of Indian language Text-To-Speech (TTS) synthesis systems, using polysyllabic units. Firstly, a phone based TTS is built. Later, a monosyllable cluster unit TTS is built. It is observed that the quality of the synthesized sentences can improve if polysyllable units are used (when the appropriate units are available), since the effects of co-articulation will be preserved in such a case. Hence, we built Hindi and Tamil TTS with polysyllabic units, that contains cluster units of more than one type (monosyllable, bisyllable and trisyllable). The system selects the best set of units during the unit selection process, so as to minimize the join and concatenation costs. Preliminary listening tests indicated that the polysyllable TTS has better quality.
    Communications (NCC), 2010 National Conference on; 03/2010
  • Conference Proceeding: Robust syllable segmentation and its application to syllable-centric continuous speech recognition
    R. Janakiraman, J.C. Kumar, H.A. Murthy
    [show abstract] [hide abstract]
    ABSTRACT: The focus of this paper is two-fold: (a) to develop a knowledge-based robust syllable segmentation algorithm and (b) to establish the importance of accurate segmentation in both the training and testing phases of a speech recognition system. A robust segmentation algorithm for segmenting the speech signal into syllables is first developed. This uses a non-statistical technique that is based on group delay (GD) segmentation and Vowel Onset point (VOP) detection. The transcription corresponding to the utterance is syllabified using rules. This produces an annotation for the train data. The annotated train data is then used to train a syllable-based speech recognition system. The test signal is also segmented using the proposed algorithm. This segmentation information is then incorporated into the linguistic search space to reduce both computational complexity and word error rate (WER). WER's of 4.4% and 21.2% are reported on the TIMIT and NTIMIT databases respectively.
    Communications (NCC), 2010 National Conference on; 03/2010
  • Source
    Conference Proceeding: Internet activity analysis through proxy log
    [show abstract] [hide abstract]
    ABSTRACT: The availability of the Internet at the click of a mouse brings with it a host of new problems. Although the World Wide Web was first started by physicists at CERN to enable collation and exchange of data, today, it is used for a wide range of applications. The requirements on bandwidth for each of the applications is also varied. An Internet Service Provider must ensure satisfaction across the entire spectrum of users. To ensure this, analysis of Internet usage becomes essential. Further, an administrator can keep a record of user's Internet activity and prevent unethical activities, since the Internet is also an excellent resource for providing anonymity. This analysis can also help in resource provisioning and monitoring. In this work, a web-based tool is first proposed to analyse the Internet activity. Next, data is collected from a proxy server at a campus-wide network. Traffic patterns of different types of users are studied. Finally, the paper concludes with strategies for monitoring and control of traffic.
    Communications (NCC), 2010 National Conference on; 03/2010
  • Conference Proceeding: KL divergence based feature switching in the linguistic search space for automatic speech recognition
    J.C. Kumar, R. Janakiraman, H.A. Murthy
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we propose a novel idea for using two different feature streams in a continuous speech recognition system. Conventionally multiple feature streams are concatenated and HMMs trained to build triphone/syllable models. In this paper, instead of concatenation, we build separate subword HMMs for each of the feature streams during training. Also during training, the relevance of a feature stream to a particular sound is evaluated. During testing, hypotheses are generated by the language model. A greedy Kullback Leibler distance measure is used to determine the best feature at a particular instant, for the given hypotheses. There are two important aspects of this approach, namely, a) use of a feature that is relevant for recognizing a specific sound and b) the dimension of the feature stream does not increase with the number of different feature streams. To enable feature switching during recognition, a syllable-based automatically annotated recognition framework is used. In this framework, the test speech signal is first segmented into syllables, and, syllable boundaries are incorporated in the language model. Experiments are performed on three databases (a) Tamil DDNews database (b) TIMIT database (c) NTIMIT database, using, two features: MFCC (derived from the power spectrum of the speech signal) and MODGDF (derived from the phase spectrum of the speech signal). The results show that word error rate (WER) is lower than that of the use of joint features by almost 1.5% for the TIMIT database, by almost 3.4% for the NTIMIT database, by about 3.8% for the Tamil DDNew database.
    Communications (NCC), 2010 National Conference on; 03/2010
  • Source
    Conference Proceeding: Methods for improving the quality of syllable based speech synthesis
    [show abstract] [hide abstract]
    ABSTRACT: Our earlier work [1] on speech synthesis has shown that syllables can produce reasonably natural quality speech. Nevertheless, audible artifacts are present due to discontinuities in pitch, energy, and formant trajectories at the joining point of the units. In this paper, we present some minimal signal modification techniques for reducing these artifacts.
    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE; 01/2009