P.V. deSouza’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Acoustics-only based automatic phonetic baseform generation
  • Conference Paper

June 1998

·

26 Reads

·

36 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

B. Ramabhadran

·

L.R. Bahl

·

P.V. deSouza

·

Phonetic baseforms are the basic recognition units in most speech recognition systems. These baseforms are usually determined by linguists once a vocabulary is chosen and not modified thereafter. However, several applications, such as name dialing, require the user be able to add new words to the vocabulary. These new words are often names, or task-specific jargon, that have user-specific pronunciations. This paper describes a novel method for generating phonetic transcriptions (baseforms) of words based on acoustic evidence alone. It does not require either the spelling or any prior acoustic representation of the new word, is vocabulary independent, and does not have any linguistic constraints (pronunciation rules). Our experiments demonstrate the high decoding accuracies obtained when baseforms deduced using this approach are incorporated into our speech recognizer. Also, the error rates on the added words were found to be comparable to or better than when the baseforms were derived by hand



A new class of fenonic Markov word models for large vocabulary continuous speech recognition

May 1991

·

12 Reads

·

12 Citations

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

A technique for constructing hidden Markov models for the acoustic representation of words is described. The models, built from combinations of acoustically based subword units called fenones, are derived automatically from one or more sample utterances of words. They are more flexible than previously reported fenone-based word models and lead to an improved capability of modeling variations in pronunciation. In addition, their construction is simplified, because it can be done using the well-known forward-backward algorithm for the parameter estimation of hidden Markov models. Experimental results obtained on a 5000-word vocabulary continuous speech recognition task are presented to illustrate some of the benefits associated with the new models. Multonic baseforms resulted in a reduction of 16% in the average error rate obtained for ten speakers


Decision trees for phonological rules in continuous speech

January 1991

·

31 Reads

·

143 Citations

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

The authors present an automatic method for modeling phonological variation using decision trees. For each phone they construct a decision tree that specifies the acoustic realization of the phone as a function of the context in which it appears. Several-thousand sentences from a natural language corpus spoken by several speakers are used to construct these decision trees. Experimental results on a 5000-word vocabulary natural language speech recognition task are presented.


Automatic phonetic baseform determination

January 1991

·

21 Reads

·

53 Citations

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

The authors describe a series of experiments in which the phonetic baseform is deduced automatically for new words by utilizing actual utterances of the new word in conjunction with a set of automatically derived spelling-to-sound rules. Recognition performance was evaluated on new words spoken by two different speakers when the phonetic baseforms were extracted via the above approach. The error rates on these new words were found to be comparable to or better than when the phonetic baseforms were derived by hand, thus validating the basic approach.



METHOD OF DETERMINING REFERENCE SPECTRA SUITABLE FOR LABELING SPEECH IN AUTOMATIC SPEECH RECOGNITION SYSTEMS.

December 1984

·

1 Read

Using a computed average of the spectra aligned against each phone in the training data, rather than using random reference spectra as the starting point, and adjusting these reference spectra by iterative use of the clustering algorithm, increases overall system effectiveness in a continuous speech recognition system. For each spectrum in the training data the closest reference spectrum is found, and the most common correct/current classification error (expressed as a proportion of the correct phone's occurrence) is calculated. For each spectrum the nearest reference spectrum is found; for each reference spectrum all the training spectra for which that reference spectrum was closest are averaged and used as an adjusted reference spectrum replacing that reference spectrum.

Citations (5)


... Despite these advancements, many solutions still fall short of providing contextually relevant information quickly and accurately. The evolution of artificial intelligence and machine learning, particularly in the form of large language models, has revolutionized this process, enabling more sophisticated and efficient extraction of knowledge from vast repositories of digital documents [2]. ...

Reference:

Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems
A tree-based statistical language model for natural language speech recognition
  • Citing Article
  • July 1989

... The speech recognition and text-to-speech fields are increasingly proceeding to open-domain and the multilingual tasks [1]. In such fields, out-of-vocabulary (OOV) words which lack pronunciations are a major bottleneck [2,3]. In order to solve the OOV problem, grapheme-to-phoneme (g2p) conversion, which predicts the pronunciation for OOV words, is an extremely important component. ...

Automatic phonetic baseform determination
  • Citing Article
  • January 1991

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... GMM frontend The GMM frontend is based on Bundled Phonetic Features (BDPF) [16], which are an efficient method to create data-adapted context-dependent models for small data sets where classical context-dependent modeling cannot be applied. For BDPF modeling, a number of phonetic decision trees [34] are created, whose roots correspond to (binary) phonetic features, for example the place or manner of articulation. Multiple BDPF trees are used to compensate for the fact that a single tree may not fully discriminate all available phones, we use eight BDPF trees whose roots are the most common phonetic features [14]. ...

Decision trees for phonological rules in continuous speech
  • Citing Article
  • January 1991

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

... Speech units, according to the derivation rule, are obtained either by a linguistic criterion or by an automatic clustering technique ( ˇ Cernocký, 2002 ). Examples of speech units according to the automatic clustering technique are fenones (Bahl et al., 1993), senones (Hwang and Huang, 1992), and multones (Bahl et al., 1996). Speech units according to the linguistic criterion are common to all the languages: phonemes, diphthongs, syllables. ...

A new class of fenonic Markov word models for large vocabulary continuous speech recognition
  • Citing Conference Paper
  • May 1991

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

... To address these challenges, there have been several statistical approaches aimed at automated unit and lexicon discovery from speech audio [1,2], grapheme-to-phoneme (g2p) conversion [3] and more recently with the use of Long Short-Term Memory (LSTM) networks [4]. However, these models still need to be trained on manually curated pronunciation dictionaries. ...

Acoustics-only based automatic phonetic baseform generation
  • Citing Conference Paper
  • June 1998

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on