Science topic

Speech Science - Science topic

Explore the latest questions and answers in Speech Science, and find Speech Science experts.
Questions related to Speech Science
  • asked a question related to Speech Science
Question
6 answers
Could colleagues provide descriptive comparisons of English to the languages named? Including consonant, vowel inventories, examination of the tonal structure of Chinese languages and so forth?
Relevant answer
Answer
Hi Andrey. Although my wife is Chinese, I cannot contribute to this subject. I do not speak Yoruba, either. However, considering your incredibly wide range of interests in many sciences, I suggest you to take a look at my website named CorrectingWorldHistory. Also, I realized that many Philippine languages and some western African languages (west of the Yoruba speakers) have almost the same word for "good" (mabute, mbuti, etc.). Perhaps there are more similarities. If one could prove such relationship between the black (Melanesian) nations of the Philippines (or the Aborigines of Australia) and some of the African languages we may suggest that the same word existed in both continents before, say, 40,000 years B.P. Thus, the basic words of these languages are perhaps not only a few thousand years old but perhaps 50 thousand years old. I congratulate to your research papers and results. I am originally Hungarian who lives in Canada since 1976. Have a great weekend!!!
  • asked a question related to Speech Science
Question
4 answers
Hi all, 
I would like to ask from all the experts here, in order to get the better view on the usage of cleaned signals which already removed the echo using few types of adaptive algorithms with method of AEC.(acoustic echo cancellation)
How the significance of MSE and PSNR can improve in the classification processes? Which i mean normally we evaluate using the technique of WER, Accuracy and may EER too.Is there any kind connectivity of MSE and PSNR values in terms of improving those classification metrics.?
wish to have the clarification on this.
Thanks much
Relevant answer
Answer
It is one of the old issues in a speech recognition research field. That is on the relationship between any speech enhancement technique and the classification accuracy.
As far as I know, both the MSE and PSNR are frequently used for improving the quality of the input. They are known as useful in reducing WER. However, the relationship with recognition accuracy is not directly proportional. 
Enhancing a noisy signal in terms of MSE or PSNR means that you may have a good quality of the input but there is a risk. Sometimes, unexpected artifacts are produced by the speech enhancement techniques and WER can be increased in the worst case.
So, in phonemic classification task, matched condition is more crucial. And in the case of mismatched condition between train and test, MSE and PSNR are somewhat related to WER, but not directly. It is a case-by-case study.    
  • asked a question related to Speech Science
Question
4 answers
I have decided to work on this topic as my thesis; however, I really do not know which sources are best and helpful to study, since I need to gain a full knowledge; then start my job.
Relevant answer
Answer
Hello all
I really appreciate all your support. 
Regards
Fateme
  • asked a question related to Speech Science
Question
4 answers
Can anybody help me to get real time EEG signals for processing with speech applications? I need an EEG data set for normal and dumb persons.
Relevant answer
Answer
  • asked a question related to Speech Science
Question
11 answers
Kindly send me stuff on,
, "a socio-psychological study of university students' attitude towards varities of English speech
Relevant answer
Answer
Quantitative-based research
McKenzie (2010) explored more than 500 Japanese university students’ attitudes towards six ‘varieties’, and relevant articles are McKenzie (2008a, 2008b).  More recent attitude-related studies of his are published and available on his RG account.
McKenzie, Robert M. 2008a. The role of variety recognition in Japanese university students’ attitudes towards English speech varieties. Journal of Multilingual and Multicultural Development 29(2). 139–153.
McKenzie, Robert M. 2008b. Social factors and non-native attitudes towards varieties of spoken English: A Japanese case study. International Journal of Applied Linguistics 18(1). 63–88.
McKenzie, Robert M. 2010. The social psychology of English as a global language: Attitudes, awareness and identity in the Japanese context. Dordrecht: Springer.
Qualitative research
Jenkins (2007) explored more than 300 English teachers' attitudes towards diverse accents, and a relevant article is Jenkins (2009).  Also, Jenkins (2014: Ch.7) revealed international students’ orientations towards diversity in English.
Jenkins, Jennifer. 2007. English as a Lingua Franca: Attitude and identity. Oxford: Oxford University Press.
Jenkins, Jennifer. 2009. English as a lingua franca: Interpretations and attitudes. World Englishes 28(2). 200–207.
Jenkins, Jennifer. 2014. English as a Lingua Franca in the international university: The politics of academic English language policy. London: Routledge.
Hope this helps.
  • asked a question related to Speech Science
Question
9 answers
Acoustic-phonetic production experiments often report relative segment durations (rather than absolute durations), mostly because relative durations are less prone to influences from speaking rate.
Typical reference units for normalization in the literature are:
1) units that contain the target segment (e.g., the syllable, the word, the phrase)
2) units that are adjacent to the target segment (e.g., sounds or words to the right or left)
3) the average phone duration in the respective phrase
Depending on the structure of the utterance and/or the nature of the target segment (e.g., phonemically long vs. short), differences across experimental conditions may appear larger or smaller (depending on whether the duration of the reference unit is negatively or positively correlated with the duration of the target).
Are there theoretical considerations that speak for (or against) one of those units of reference? Or do we need perception data in order to decide which relative measure participants are sensitive to? Should we always collect recordings in different speech rates in order to identify relative durations that are not (or least) influenced by the speaking rate manipulation?
Relevant answer
Answer
Hi Bettina,
if you have enough data from your subjects, z-scoring, but separately for each phoneme class, may be an option. I've used phoneme-specific z-scores for recognizing prosodic boundaries and pitch accented syllables. This takes into account Hartmut's finding, mentioned by Susanne above, that different phonemes are affected differently. Nick Campbell, by the way, came up with the term elasticity -- phonemes differ in their elasticity, and he introduced the phoneme-specific z-scores to model phoneme durations in synthesis in a paper in 1992.
Antje
  • asked a question related to Speech Science
Question
1 answer
Dacakis and Davies 2012
Relevant answer
Answer
Hi Barbara,
Have you read their follow-up paper? I believe it gives you the information you are looking for.
Ariel
  • asked a question related to Speech Science
Question
8 answers
IN speech signal processing, i am getting these two terms more and more. what are they actually?
Relevant answer
Answer
There are two types of features of a speech signal:
  • The temporal features (time domain features), which are simple to extract and have easy physical interpretation, like: the energy of signal, zero crossing rate, maximum amplitude, minimum energy, etc.
  • The spectral features (frequency based features), which are obtained by converting the time based signal into the frequency domain using the Fourier Transform, like: fundamental frequency, frequency components, spectral centroid, spectral flux, spectral density, spectral roll-off, etc. These features can be used to identify the notes, pitch, rhythm, and melody.
  • asked a question related to Speech Science
Question
2 answers
Speech signal has both voiced and unvoiced portions, but focussing on transition occurs in the voicing portion alone, In the voicing regions the source is almost constant and transition occurs due to time varying nature of system that is source is time invariant & vocal tract system is time variant. 
Relevant answer
Answer
I think, you have to detect both, voicing level of speech signal and unvoiced portions and the time depency of the transition.