• Venkatesan Rv asked a question:
    Anyone suggest me some solution to execute this mex file?

    Dear sir/madam,

    I am doing speech segregation methods in which i have used collection of matlab files along with mex files.It runs fine in matlab 2010 where as it encounters error in matlab 2013 & 2014 especially in mex files(i.e., undefined function error on mex files).Please provide me some solution to access this mex files in higher version (matlab 2013).

                                                Thankyou in advance,

  • Karina Cerda Oñate added an answer:
    Experimental research on speech rhythm?

    Hi everyone, I would like to do research on adult language rhythm, but I haven't been able to find many labs that are currently working on this topic. Does anyone know of current projects? I am especially interested in rhythm and perception and my aim is to conduct experimental research on the topic.



    Karina Cerda Oñate

    thank you both, I am interested more in rhythm more than in timing. I'll look into your suggestions.


  • Valery Belyanin added an answer:
    Can anyone suggest research on cognitive distortions in the paradigm of psycholinguistics?
    Valery Belyanin

    thank you!

  • Tina Bögel added an answer:
    How does one control for speech rate in a production experiment and still allow for a 'natural' sounding intonation?

    I am in the process of designing an experiment in which speech rate itself is a crucial factor and needs to be controlled for. So far I have only encountered methods, where the speaker was asked to produce a sentence three times: at a 'normal' rate, a fast rate, and in a slow rate. However, this is not fine-grained enough, so I was wondering if somebody knows of any other approaches as to how to control for speech rate or would share their experience with the above mentioned 3-speech-rates approach.

    Tina Bögel

    Natalia, Leonardo, and Felipe, thank you very much for these suggestions! I will check for the metronome impact on the naturalness, and I will make sure to calculate the relative duration in comparison to how long it took to produce the sentence. If I do that in addition to the three-speech-rates approach, I should probably get a quite fine-grained representation of speech rate. Thank you!

  • Akin Adetunji added an answer:
    Could anyone share new thoughts/hypothesis on the value of deixis to presidential inaugural speeches?

    I think the rhetorical positionings of person deixis (I/you and exclusive we/ inclusive we) are predictable and over-flogged

    Akin Adetunji

    To Dafnah, thank you. But I don't know if you've observed that Wilson's (1990) categorisation of inclusive and exclusive 'we' is unique and misleading: the former as including the speaker and the latter, as excluding the speaker. However,  in pragmatics, simply put, inclusive 'we' includes the addressee while exclusive ';we' excludes the addressee.

    However, I'm really wondering if there are fresh/novel thoughts or interpretations, concerning the deployment of deixis in presidential inaugural speeches

  • Ayoub Bouziane added an answer:
    Can anyone help me find code sources of VoIP speech coders ?

    Hi everybody ,
    I am a research scientist .. actually, i am working on speaker recognition over VoIP networks .. i am looking for code sources of VoIP speech coders (G.729, G.711, G.723.1 .... ) to transcod my database, in order to study the effect of VoIP speech coding on the performance of speaker recognition over VoIP networks .. can you help me ? .. Thanks in advance.

    Ayoub Bouziane

    Marcin ... actually, i am writing a paper on the influence of some codecs on the performance of speaker recognition systems .. yes there are many studies which have been conducted in this direction.

  • Tahiri Noria added an answer:
    lpc method
    I have a project about the lpc. The topic is: linear prediction of speech; I calculate the lpc coefficients of each segment of the signal, after that I trace the spectrum of the signal from the coefficients and my goal is compared with the fft of each segment. I found almost the same results except at the amplitude.
    My questions are:

    - Why the coefficients vary from one segment to another?
    - What is the average spectrum (calculated but the coefficients of lpc) indicates?
    - The performance of the LPC method compared to the FFT
    Tahiri Noria

    I've resolved this problem, thank you so much M. Sabir.


  • Francesc González i Planas added an answer:
    Can you recommend references on mimetic indirect speech/discourse?

    I am interested on references about syntax and pragmatics of mimetic indirect speech as a variety of indirect speech.

    Thank you.

    Francesc González i Planas

    Thank you for all answers. They are very useful.

  • Nicanor García added an answer:
    Are there any suggestions for free databases for speaker recognition?


    I am looking for free speech databases for speaker recognition (at least more than 50 speakers) Do you have any suggestions?

    Nicanor García

    I have found the MIT Mobile Device Speaker Verification Corpus:


    And the Gaudi and Ahumada databases, they are 25 speakers each (though the site appears to be down at the moment):



    I also made a database with 50 speakers, but I still haven't uploaded it, I need to speak with my advisor first. Contact me in a couple of weeks.

  • Simhachalam Thamarana added an answer:
    Can you recommend readings on Bakhtin and genre theory?

    I am reading Bakhtin's "Problem of Speech Genre" and his Philosophy of the Act in the hopes of gaining a better understanding of his views on genre. Can anyone recommend additional readings, whether by Bakhtin or about his thought?

    Simhachalam Thamarana

    Dear Rebecca Gould, 

    I hope the following list of books by and on Bakhtin would assist you in some extent.

    Works by Bakhtin:

    1.     Problems of Dostoevsky’s Poetics (1929). Trans. R.W. Rotsel, Ann Arbor, MI:Ardis, 1973; trans. Caryl Emerson, Minneapolis, MN and London: University of Minnesota Press, 1984.

    2.     Rabelais and His World (1965). Trans. Hélène Iswolsky, Cambridge, MA: MIT Press, 1968.

    3.     The Dialogic Imagination: Four Essays by M.M. Bakhtin (1972). Trans. Caryl Emerson and Michael Holquist, ed. Michael Holquist, Austin, TX: University of Texas Press, 1981.

    4.     Speech Genres and Other Late Essays (1979). Trans. Vern W. McGee, ed. Caryl Emerson and Michael Holquist, Austin, TX: University of Texas Press, 1986.

    5.     Art and Answerability: Early Philosophical Essays by M.M. Bakhtin (1990). Trans. Vadim Liapunov, supplement trans. Kenneth Brostrom, eds Michael Holquist and Vadim Liapunov, Austin, TX: University of Texas Press.

    For Further reading:

    6.     Bauer, Dale M. and Jaret McKinstrey, S., eds, Feminism, Bakhtin, and the Dialogic, Albany, NY: SUNY Press, 1991.

    7.     Bernard-Donals, Michael F., Mikhail Bakhtin: Between Phenomenology and Marxism, Cambridge: Cambridge University Press, 1994.

    8.     Hirschkop, Ken and Shepherd, David, eds, Bakhtin and Cultural Theory, Manchester and New York: Manchester University Press, 2002.

    9.     Holquist, Michael, Dialogism: Bakhtin and His World, London: Methuen, 1990.

    10.  Kristeva, Julia, Desire in Language: A Semiotic Approach to Literature and Art, ed. Leon Roudiez, trans. Thomas Gora, Alice Jardine, Leon Roudiez, NY: Columbia University Press, 1980.

    11.  Todorov, Tzvetan, Mikhail Bakhtin: The Dialogic Principle, trans. Wlad Godzich, Manchester and New York: Manchester University Press, 1984.

    12.  Vice, Sue, Introducing Bakhtin, Manchester and New York: Manchester University Press, 1997.

    Best Wishes

  • Joseph E David added an answer:
    Does anyone know, or came across, a medieval slur expression 'negro speech in Arabic' (خطاب الزنجي بالعربيّة)???

    I came across an expression, probably a slur expression, that analogizes uselessness as 'negro's speech in Arabic.

    Having no knowledge about it, and very little knowledge about how Arabs viewed the Arabic spoken by Africans, I wonder if someone could assist e on that.

    Thanks in advance,


    Joseph E David

    Thanks Jennifer. An interesting fruit of thought. 

  • Bálint Pál Tóth added an answer:
    Are there any open source code written in C++ for text to speech (TTS) with high quality sound?

    i'm working on english text to speech  related to nigeria native speaker 

    Bálint Pál Tóth

    A good alternative is hts_engine, which is also written in C++. However text preprocessing is not included, but you can use Flite or your own method.

    hts_engine implements HMM-based TTS, you need training models beforehand. You can find all the necessary tools on the website of hts_engine.

  • Jyotsna Subramhanyam added an answer:
    How can I improve speech and language communication in children that have English as an additional language ?

    Hope you can help me

    Jyotsna Subramhanyam

    Children may be grouped ; familiar contexts or scenarios are to be planned; guided conversation for dialogue delivery in frequently experienced situations will make it really natural and easy for communication. A graded pattern of this kind will ensure quick acquisition of communication-both verbal and nonverbal.

  • Arif Jawaid added an answer:
    The feature fusion or decision fusion in audio-visual speech recognition, which one is better and which one is more suitable for DNN?
    I want to try feature or decision fusion in audio-visual speech recognition. I want to know which one is better and which one can be used by deep neural network.
    Arif Jawaid

    Hi Ju

    The contents we have in audio-visual matter a lot. Sorry, if they are divorcing real-life tasks and activities, the av speech recognition will be hard for students to get automation of learning. Moreover, how would you cater for active language practice. Without the above two aspects DNN will be showing concerns. Thanks! 

  • Teva Merlin added an answer:
    How can I enable simultaneous text-to-speech?

    Hi everyone. I have been conducting a few experiments with simultaneous speech, but I have been using recorded speech (.wav, .ogg or .mp3 files) in all of them. However, I would like to play the simultaneous speech using Text-to-Speech solutions directly, instead of saving to a file first (mainly to avoid the delay, but also to be used across the OS/device).

    All my attempts to play two simultaneous TTS voices (separate threads/processes, ...) have failed, as it seems that speech synthesis / TTS uses a unique channel (resulting in sequential audio).

    Do you know any alternatives to make this work (independent of the OS/device - although windows / android are preferred)? Moreover, can you provide me additional information / references on why it doesn't work, so I can try to find a workaround?

    Thanks in advance.

    Teva Merlin

    You should specify which TTS engines you have tried, on which OS. Without this information, it is hard to come up with an explanation of why it didn't work.

    For what it's worth: on Mac OS X, using the built-in TTS engine, I have no problem playing simultaneous voices. So, if you're in a hurry and can get your hands on a Mac, this may be a solution.

  • Fabian Tomaschek added an answer:
    Is fMRI reliable in overt speech tasks?

    How much of fMRI BOLD signal from a task requiring overt speech is lost because of head movement or articulation artifacts? Is there the risk that too much correction leads to unaffordable conclusions?

    Fabian Tomaschek


    A member of my old research group investigated overt speech with fMRI. Check her publications: Brendel, Bettina.


    Brendel, B., Hertrich, I., Erb, M., Lindner, A., Riecker, A., Grodd, W., et al. (2010). The contribution of mesiofrontal cortex to the preparation and execution of repetitive syllable productions: An fMRI study. NeuroImage, 50, 1219-1230

  • Mikel Penagarikano added an answer:
    Which kind of speech corpus are good for training text-independent speaker verification?

    Speaker verification
    Text independent 

    Mikel Penagarikano

    If you want to try state-of-the-art technology, then you need a huge amount of recordings. I doubt you can manage to get such a database on your own. If you are doing it anyway, take care of the channel. You could easily end up building a channel verification system.

  • Pragati Rao Mandikal Vasuki added an answer:
    How do I proceed in case of normal pure tone and speech audiometric results but complaint of difficulty hearing under background noise?

    An adult person complaints of difficulty hearing in background noise. Pure tone audiometry and speech audiometry reveals normal findings with good speech discrimination scores. ABR and OAE results normal. What can be the further investigations required? and possible interventions

    Pragati Rao Mandikal Vasuki

    I agree with what others have said above. You could also probe about the person's exposure to noise. Sometimes even though the audiogram might be "normal", there could be a "hidden hearing loss". Such patients usually present with difficulty hearing in noise and tinnitus. If the result of APD test battery are inconclusive, you might want to try out ABR at different rates. See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227662/  & http://www.ncbi.nlm.nih.gov/pubmed/21940438 for details.

    • [Show abstract] [Hide abstract]
      ABSTRACT: Ever since Pliny the Elder coined the term tinnitus, the perception of sound in the absence of an external sound source has remained enigmatic. Traditional theories assume that tinnitus is triggered by cochlear damage, but many tinnitus patients present with a normal audiogram, i.e., with no direct signs of cochlear damage. Here, we report that in human subjects with tinnitus and a normal audiogram, auditory brainstem responses show a significantly reduced amplitude of the wave I potential (generated by primary auditory nerve fibers) but normal amplitudes of the more centrally generated wave V. This provides direct physiological evidence of "hidden hearing loss" that manifests as reduced neural output from the cochlea, and consequent renormalization of neuronal response magnitude within the brainstem. Employing an established computational model, we demonstrate how tinnitus could arise from a homeostatic response of neurons in the central auditory system to reduced auditory nerve input in the absence of elevated hearing thresholds.
      No preview · Article · Sep 2011 · The Journal of Neuroscience : The Official Journal of the Society for Neuroscience

    + 1 more attachment

  • Ravi Foogooa added an answer:
    What is a suitable journal to publish a paper on cybersecurity, speech coding or telecommunication?


    What is the fast and best journal with Thomson reuters indexing for cybersecurity, speech coding and telecommunication in general?

    for me i prefer IEEE but i don't know how long it'll take? please advice  

    Ravi Foogooa

    @Sarah: This seems interesting but is it specific to the medical field ?

  • B. Tomas added an answer:
    Can we measure the amount of stress required to produce speech?

    In particular, during voiced speech production? I am looking for understanding the process of speech production in detail.

    B. Tomas

    Maybe You can see chapter

     Determination of Spectral Parameters of Speech Signal by Goertzel Algorithm,B. Tomas
    Speech Technologies 01/2011;

  • Pragati Rao Mandikal Vasuki added an answer:
    How do I acquire EEG data specific to articulated speech?

    i am doing a research on speech articulation using EEG wave patterns and i wish to acquire EEG data from the brain that is peculiar to speech: the lead systems, the vantage points and the electrodes to take into consideration

    Pragati Rao Mandikal Vasuki

    The answer actually depends on what aspect of speech you want to analyze-
    If you are looking for onset responses to speech syllables, you should be able to get decent data from F & C electrodes

    If you are looking for responses at word level, late potentials such as N400- you would need to include CP and P electrodes.

    It is very easy to assume that a particular brain area "maps" to a particular electrode. Any basic EEG textbook will advise you about the perils of making this assumption. EEG are scalp recorded potentials and even now there is much debate about the accuracy of source localization using EEG data. You must always choose a group of electrodes in a particular region to make your research protocol stronger.

    Also, could to clarify what does "peculiar" to speech mean?

  • Patricia Marie Hargrove added an answer:
    Is there any literature on prosody intervention?

    I am a master student of speech language pathology. I need some article about prosody intervention in children with speech and language impairment.

    Patricia Marie Hargrove

    I have a blog with over 100 evidence-based practice reviews of prosody interventions. Half are for children.  The address is clinical prosody.wordpress.com

  • Mark Lehman added an answer:
    How can I perform Cepstral analysis in CSL?

    I am trying to explore the procedure to do Cepstral analsis using Computerized Speech Lab (CSL), Kay Pentax. Are there any manual/procedure guidelines? I am only able to generate a FFT, however unable to proceed further.


    Mark Lehman

    Kay-Pentax sells a module for CSL called ADSV (Analysis of Dysphonia in Speech and Voice) that performs cepstral analysis.

  • Dennis Soku added an answer:
    Can somebody give me some examples of phonemic variations in a language and the probable reasons for such variations?

    In Ghana, I have observed that the phoneme /j/ is realized as /dz/; /y/ or /Ʒ/ in speeches by individuals. I have also noticed that the difference in the realizations depends on either the absence or the presence of the target phoneme in the learners’ speech (i.e. transfer errors). Where it is present but the realizations are not the same, the learner tries to articulate the phoneme as a phoneme he/ she already knows. Where the target phoneme does not exist in the already known languages of the learner, he or she tries to make a substitution with another phoneme that exists in his or her linguistic repertoire. Can someone share with me some example of phonemic variations that he or she has noticed in their students’ speeches? Are the reasons for the variations different from what I have stated?

    Dennis Soku

    Thank you Prof. Ivleva and  Prof. Prunescu. Prof. Ivleva, I very much like the historic insight given at the website. Prof. Prunescu, your point is well noted. It has to do with geographical locations. I am thankful to both of you. I am working on variations in Ewe (i.e. a local language) and your points are very useful to me.

  • Paolo Mairano added an answer:
    How does the energy contained in a speech signal be representative of the language in which it was spoken?

    I am doing my final year project on "Classification of Tonal and Non-Tonal languages" using neural networks. The system takes pitch contour and energy as parameter Using only the pitch contour as a parameter yields an accuracy of 66%, whereas adding short term energy increases it to above 80%. 

    Many standard literatures also consider energy as a characteristic feature of the language, but provides no explanation.

    Paolo Mairano

    Hi Biplav,

    I know that there have been some studies claiming that languages representing to different rhythm categories (syllable-timed, stress-timed, mora-timed, etc.) may differ in the way they use energy. I am not sure I am covinced about this, but here is the reference:

    Lee, C.S. & McAngus Todd, N. (2004) Towards an auditory account of speech rhythm: application of a model of the auditory ‘primal sketch’ to two multi-language corpora. Cognition, 93/3, 225-254.

    @Diwakar, I don't think tonal languages simply have 'more energy' in speech. If there is a difference (as suggested by Biplav's results), it is probably a difference in how energy is used in that language (rather than how much energy is used, which may depend on too many factors), right? Possibly, as you mention, there may be more consant energy peaks for vowels in tonal languages. But then again, I am not sure it as simple and as general as that: some tonal languages have neutral tones, where vowels can be fairly reduced...

  • Ali Ibrahim Aboloyoun added an answer:
    Is there a speech assessment for cleft palate children?

    What is the ideal age for speech assessment for the cleft children?

    What are the measures of speech assessment that can be done in day to day practice?

    How soon after cleft palate surgery should the speech assessment be done?

    Ali Ibrahim Aboloyoun

    Our openion for patients with cleft palate that it is a team work cases. Language and speech evlauation must be done as early as possible after surgical intervention which is the main factor affecting speech output as if it is done by clever surgeon leading to adequate palatal length and mobility the speech out put expected to be very good and if short and immobile palate the patient well need a hard work to slightly improve his speech

    evaluation can be done subjectively by direct or indirect way by lestening to and examination of the patient by expiernced Phoniatrician or SLP

  • Hendrik Schade added an answer:
    Is there any paper that clearly states that our diction /register is a lot more loose in speech than in writing?

    Dear all, I basically just need one citation (even though more would be better) on this in the context of a corpus analysis and I thought I would have an easy time finding one but I really did not so I would appreciate any help.

    I hope it is okay to ask this kind of question. So far I have only used RG to publish and to follow researchers, this is my first time using Q&A.

    Hendrik Schade

    Thank you all for the help! It is amazing that there are possibilities like this because sometimes you just spend way too much time on insignificant things without any result and then feel like you have not done anything at all.

  • A.G. Ramakrishnan added an answer:
    What are the Spectral and Temporal Features in Speech signal?

    IN speech signal processing, i am getting these two terms more and more. what are they actually?

    A.G. Ramakrishnan

    The most successful spectral features used in speech are (i) Mel frequency cepstral coefficients (MFCC) and (ii) Perceptive Linear Prediction (PLP) features. It is well known that the basilar membrane in the inner ear actually analyzes the frequency content of the speech we hear. In fact, the analysis of basilar membrane can be modeled by a bank of constant Q, band pass filters. There also exist the critical bands, which give rise to the phenomenon of masking - where one strong tone or burst can mask another weaker tone within the critical band. Actually, both MFCC and PLP capture these characteristics of our auditory system in some way; so, even though it looks strange, the same features give reasonably good performance for speech recognition, speaker recognition, language identification and even accent identification ! However, these spectral features are not very robust to noise.

    On the other hand, some of the time domain (temporal) features such as plosion index and maximum correlation coefficient are relatively more robust to noise. 

About Speech

Communication through a system of conventional vocal symbols.

Topic followers (2,473) See all