- Mikel Penagarikano added an answer:7Which kind of speech corpus are good for training text-independent speaker verification?
If you want to try state-of-the-art technology, then you need a huge amount of recordings. I doubt you can manage to get such a database on your own. If you are doing it anyway, take care of the channel. You could easily end up building a channel verification system.Following
- Pragati Rao Mandikal Vasuki added an answer:8How do I proceed in case of normal pure tone and speech audiometric results but complaint of difficulty hearing under background noise?
An adult person complaints of difficulty hearing in background noise. Pure tone audiometry and speech audiometry reveals normal findings with good speech discrimination scores. ABR and OAE results normal. What can be the further investigations required? and possible interventions
I agree with what others have said above. You could also probe about the person's exposure to noise. Sometimes even though the audiogram might be "normal", there could be a "hidden hearing loss". Such patients usually present with difficulty hearing in noise and tinnitus. If the result of APD test battery are inconclusive, you might want to try out ABR at different rates. See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227662/ & http://www.ncbi.nlm.nih.gov/pubmed/21940438 for details.Following
- Ravi Foogooa added an answer:3What is a suitable journal to publish a paper on cybersecurity, speech coding or telecommunication?
What is the fast and best journal with Thomson reuters indexing for cybersecurity, speech coding and telecommunication in general?
for me i prefer IEEE but i don't know how long it'll take? please advice
@Sarah: This seems interesting but is it specific to the medical field ?Following
- Rahimi Ali added an answer:13Can you recommend readings on Bakhtin and genre theory?
I am reading Bakhtin's "Problem of Speech Genre" and his Philosophy of the Act in the hopes of gaining a better understanding of his views on genre. Can anyone recommend additional readings, whether by Bakhtin or about his thought?
these publication delineate the issue:
- B. Tomas added an answer:5Can we measure the amount of stress required to produce speech?
In particular, during voiced speech production? I am looking for understanding the process of speech production in detail.
Maybe You can see chapter
Determination of Spectral Parameters of Speech Signal by Goertzel Algorithm,B. Tomas
Speech Technologies 01/2011;
- Jonathan P. Evans added an answer:8Are there any factors that affect pitch perception and pitch determination in speech?
Is there any research on factors that have effects on pitch perception and pitch determination in speech? I'm trying to figure whether F0 alone is sufficient for determining various pitch in speech.
You might find some interesting articles in the July issue of Journal of Phonetics, which is a special issue on High pitch. Hitting the news stands any day now!
(conflict of interest note: guest editor of the issue, and (co-)author)Following
- Pragati Rao Mandikal Vasuki added an answer:4How do I acquire EEG data specific to articulated speech?
i am doing a research on speech articulation using EEG wave patterns and i wish to acquire EEG data from the brain that is peculiar to speech: the lead systems, the vantage points and the electrodes to take into consideration
The answer actually depends on what aspect of speech you want to analyze-
If you are looking for onset responses to speech syllables, you should be able to get decent data from F & C electrodes
If you are looking for responses at word level, late potentials such as N400- you would need to include CP and P electrodes.
It is very easy to assume that a particular brain area "maps" to a particular electrode. Any basic EEG textbook will advise you about the perils of making this assumption. EEG are scalp recorded potentials and even now there is much debate about the accuracy of source localization using EEG data. You must always choose a group of electrodes in a particular region to make your research protocol stronger.
Also, could to clarify what does "peculiar" to speech mean?Following
- Patricia Marie Hargrove added an answer:4Is there any literature on prosody intervention?
I am a master student of speech language pathology. I need some article about prosody intervention in children with speech and language impairment.
I have a blog with over 100 evidence-based practice reviews of prosody interventions. Half are for children. The address is clinical prosody.wordpress.comFollowing
- Mark Lehman added an answer:3How can I perform Cepstral analysis in CSL?
I am trying to explore the procedure to do Cepstral analsis using Computerized Speech Lab (CSL), Kay Pentax. Are there any manual/procedure guidelines? I am only able to generate a FFT, however unable to proceed further.
Kay-Pentax sells a module for CSL called ADSV (Analysis of Dysphonia in Speech and Voice) that performs cepstral analysis.Following
- Dennis Soku added an answer:3Can somebody give me some examples of phonemic variations in a language and the probable reasons for such variations?
In Ghana, I have observed that the phoneme /j/ is realized as /dz/; /y/ or /Ʒ/ in speeches by individuals. I have also noticed that the difference in the realizations depends on either the absence or the presence of the target phoneme in the learners’ speech (i.e. transfer errors). Where it is present but the realizations are not the same, the learner tries to articulate the phoneme as a phoneme he/ she already knows. Where the target phoneme does not exist in the already known languages of the learner, he or she tries to make a substitution with another phoneme that exists in his or her linguistic repertoire. Can someone share with me some example of phonemic variations that he or she has noticed in their students’ speeches? Are the reasons for the variations different from what I have stated?
Thank you Prof. Ivleva and Prof. Prunescu. Prof. Ivleva, I very much like the historic insight given at the website. Prof. Prunescu, your point is well noted. It has to do with geographical locations. I am thankful to both of you. I am working on variations in Ewe (i.e. a local language) and your points are very useful to me.Following
- Susanne Fuchs added an answer:2Is fMRI reliable in overt speech tasks?
How much of fMRI BOLD signal from a task requiring overt speech is lost because of head movement or articulation artifacts? Is there the risk that too much correction leads to unaffordable conclusions?
I can't say a lot about articulation artefacts, but one thing which may be forgotten frequently is that the strength of the BOLD signal may additionally be influenced by the respiratory behaviour (which varies with cognitive load and different tasks; it also varies in different noise conditions etc. - if needed, I can provide references). Recording respiratory behaviour in addition and putting it is a potential confound in statistical models, maybe good.
- Paolo Mairano added an answer:6How does the energy contained in a speech signal be representative of the language in which it was spoken?
I am doing my final year project on "Classification of Tonal and Non-Tonal languages" using neural networks. The system takes pitch contour and energy as parameter Using only the pitch contour as a parameter yields an accuracy of 66%, whereas adding short term energy increases it to above 80%.
Many standard literatures also consider energy as a characteristic feature of the language, but provides no explanation.
I know that there have been some studies claiming that languages representing to different rhythm categories (syllable-timed, stress-timed, mora-timed, etc.) may differ in the way they use energy. I am not sure I am covinced about this, but here is the reference:
Lee, C.S. & McAngus Todd, N. (2004) Towards an auditory account of speech rhythm: application of a model of the auditory ‘primal sketch’ to two multi-language corpora. Cognition, 93/3, 225-254.
@Diwakar, I don't think tonal languages simply have 'more energy' in speech. If there is a difference (as suggested by Biplav's results), it is probably a difference in how energy is used in that language (rather than how much energy is used, which may depend on too many factors), right? Possibly, as you mention, there may be more consant energy peaks for vowels in tonal languages. But then again, I am not sure it as simple and as general as that: some tonal languages have neutral tones, where vowels can be fairly reduced...Following
- Ali Ibrahim Aboloyoun added an answer:5Is there a speech assessment for cleft palate children?
What is the ideal age for speech assessment for the cleft children?
What are the measures of speech assessment that can be done in day to day practice?
How soon after cleft palate surgery should the speech assessment be done?
Our openion for patients with cleft palate that it is a team work cases. Language and speech evlauation must be done as early as possible after surgical intervention which is the main factor affecting speech output as if it is done by clever surgeon leading to adequate palatal length and mobility the speech out put expected to be very good and if short and immobile palate the patient well need a hard work to slightly improve his speech
evaluation can be done subjectively by direct or indirect way by lestening to and examination of the patient by expiernced Phoniatrician or SLPFollowing
- Hendrik Schade added an answer:6Is there any paper that clearly states that our diction /register is a lot more loose in speech than in writing?
Dear all, I basically just need one citation (even though more would be better) on this in the context of a corpus analysis and I thought I would have an easy time finding one but I really did not so I would appreciate any help.
I hope it is okay to ask this kind of question. So far I have only used RG to publish and to follow researchers, this is my first time using Q&A.
Thank you all for the help! It is amazing that there are possibilities like this because sometimes you just spend way too much time on insignificant things without any result and then feel like you have not done anything at all.Following
- A.G. Ramakrishnan added an answer:5What are the Spectral and Temporal Features in Speech signal?
IN speech signal processing, i am getting these two terms more and more. what are they actually?
The most successful spectral features used in speech are (i) Mel frequency cepstral coefficients (MFCC) and (ii) Perceptive Linear Prediction (PLP) features. It is well known that the basilar membrane in the inner ear actually analyzes the frequency content of the speech we hear. In fact, the analysis of basilar membrane can be modeled by a bank of constant Q, band pass filters. There also exist the critical bands, which give rise to the phenomenon of masking - where one strong tone or burst can mask another weaker tone within the critical band. Actually, both MFCC and PLP capture these characteristics of our auditory system in some way; so, even though it looks strange, the same features give reasonably good performance for speech recognition, speaker recognition, language identification and even accent identification ! However, these spectral features are not very robust to noise.
On the other hand, some of the time domain (temporal) features such as plosion index and maximum correlation coefficient are relatively more robust to noise.Following
- Wijdan Alwidyani added an answer:7Is apraxia of speech (AOS) the same as language delay?
Is apraxia of speech (AOS) the same as language delay? and what are the most featured phonological patterns that characterize apraxic people?
Is there a specific battery used to diagnose apraxia of speech? and in case there is not, what are its symptoms?
Dafydd Gibbon thank you so much.Following
- Biswajit Satapathy added an answer:4Can anyone recommend a platform for building (or know of an existing) listening span task that can be acoustically manipulated?
I'm studying the effects of unfamiliar-accented speech on verbal working memory. I believe the task that best suits my experiment is a listening span task (LSPAN). I am interested if anyone has experience using these tasks and, in particular, if anyone has manipulated the acoustic boundaries of the stimuli. If so, can you offer advice or models for creating such a task (or point me towards one that currently exists that I may be able to use and adapt?)
Rather than rely on accented-speakers to record the LSPAN stimuli, I'd like to control for the exact acoustic features.
Hi Lauren, Apart from praat, following tool may help you more,
1. Wavesurfer : http://www.speech.kth.se/wavesurfer/
2. Audacity : http://audacity.sourceforge.net/
3. SOX, edinburg speech-tool
And for simulation you can use;
Here Matlab is not free but other two are free simulation toolkit.
Hope these tools will help you for speech analysis.Following
- Amaury Lendasse added an answer:7What is the best available toolbox for implementation of Deep Neural Networks (DNN)?
There are plenty of toolboxes offering functions for this specific task, so it would be great if we could all contribute and conclude about the best available DNN toolbox to this date (mainly for speech applications).
It will be great if we can give the pros and cons of using any toolbox and at the end we will conclude from the top voted answers.
- Monika Połczyńska added an answer:3Are there any neurolinguistic and psycholinguistic proofs on part of speech and syntactic position?
I think part of speech has a close relation with syntactic position. But I don't have any proof on this issue, especially proofs from neurolinguistic and psycholinguistic study. Can anybody help me with this?
There have been a number of fMRI studies on parts of speech and syntax, including (but not limited to) canonical versus non canonical word order. Here are a few articles that might be useful: Bornkessel et al. 2005, Be-Shachar and Grodzinsky 2004, Mack et al. 2013, and Meltzer-Asscher et al. 2015
Here you have PubMed links to these publications:
Best of luck!
- Dennis Soku added an answer:12How can I improve speech and language communication in children that have English as an additional language ?
Hope you can help me
In this case it is about ‘sequential bilingualism’ and not ‘simultaneous bilingualism’. In this, the children are going to use their knowledge of and experience in their first language. The use of substitution tables (i.e. making use of sentences with identical structures) will be useful. Phonetic exercises based on identical sounds in the two languages will also go a long way to improve upon their communicative skills in English.Following
- beh zad Ghorbani added an answer:6Is the "musical noise" generated by some Speech Enhancement algorithms uniformly distributed across the spectrum?
I am trying to assess the degree of degradation that "musical noise" causes in the low frequency bands of the spectrum of speech signals. Perceptually (playing back the treated signal) this artifact is stronger in mid and high frequencies (over 700 Hz), however I need an objective way to confirm or disprove this.
Does anyone have information on this subject or knows a way to evaluate the amount of musical noise present in a signal?
Thank you very much.
I can improvement the musical noise with perceptual frequency masking filter.Following
- Kuruvachan K George added an answer:1Where can I find the methods that find the silence intervals of speech?
Because the result of noisy speech filtering strongly depends on the silence intervals problem solution.
Such algorithms are part of Voice Activity Detectors (VAD), used to detect the silence segments in the speech data. Various techniques such as, signal energy, zero crossing, spectral centroid.. are used to in those algorithms. One of our papers is also attached.Following
- At L Hof added an answer:3What is the typical lung pressure for normal human phonation/speech?
I need the value of lung pressure to set up the boundary condition for the inflow for a 2D vocal fold simulation for a normal phonation condition.
You may try to consult the thesis of Harm Schutte at
- César Asensio added an answer:2How can one use posterior probability of Gaussian mixture model using matlab?
In my work, I want to use Gaussian mixture model in speaker identification. I use Mel frequency cepstral coefficient (MFCC) to extract the feature extraction of the training and testing speech signal and I use obj= fitgmdist(X,K) to estimate the parameter of Gaussian mixture model for training speech signal. I use [p, nlogl]=posterior(obj, testdata) and I choose the minimum (nlogl) to show the maximum similarity between reference and testing models as shown in matlab attach file.
The problem in my program is the minimum nlogl changes and it recognizes different speaker even if I use the same testing speech signal. For example, when I run the program for the first time, the program recognize that the first testing speaker has the maximum similarity with training speech signals (I=1) and If try to run the program again for the same testing speech, I will get the five testing speaker have the maximum similarity with training model . I do not know what is the problem in the program and why the program gives different speaker when I run the program for three times for the same testing speech signal .can any person specialize in speaker regonition system and Gaussian mixture model answer about my question
With best regards
I would suggest to test prtools toolbox for matlabFollowing
- Nikola Ilankovic added an answer:4What is the etiological relationship between MMS immunization and leasio of internal ear (laesio cochleae and n. cochlearis) by children?
What are the consequences on the speech development? What is the connection with autistic development?
Thank You Vladimir. But the Morbilli are in over 90 % very light illnes by little children! The complications are most frequently by adults. Why is then necessery the immunisation?? THe immune rection after infection and or immunistion can delay 6 and more months.Following
- Alexander I. Rudnicky added an answer:3Is there any effect of speech signals volume on the performance of speaker recognition systems ?
Is there any effect of speech signals volume on the performance of speaker recognition systems? For example, if the audio files used in the learning stage have a larger volume than those used at the test step, is this difference of volume will affect the performance of the speaker recognition system?
Well, if the data are too loud it will be distorted. So first make sure there's no clipping. Also note that the source of the loudness makes a difference. Is it because the gain was too high, or were people shouting? Other things being equal the training data ought to be reasonable similar to the test data; the end this is still a pattern matching problem.
Note that techniques such as CMN (cepstral mean normalization) are useful. In our own work we haven't observed much effect of normalization: the features are spectral, so as long as that information is reasonably there, things should should work. If anything, we've noticed that attempts at normalization usually degrade performance.
Of course what you should do to find for sure in your situation; simply do different trainings and see what happens.Following
- Iman Esmaili added an answer:2Does using speech samples with SNR < 0 make my recognition less accurate?
I am doing isolated word recognition based on MFCCs. some of my samples revealed to have SNR < 0, should I use them or simply delete them?
Of course using low SNR data degrades your recognition accuracy But to use the low SNR data or not depends on your recognition plan. we have clean speech recognition and noisy speech recognition. If you have no restriction in environment just use the clean speech but if your system must work in different conditions you must use all of your data and you have to find some way to deal with noise.
for example: spectral subtraction is a simple and efficient way to deal with white noise.Following
- Chilin Shih added an answer:7How can I estimate a person's vocal tract length, using a recorded audio file?I'm performing some experiments that require a vocal tract length change, but I need to know the original one.
I'm aware of the formula: L = c / 4F, where the "c" is the speed of sound (34029 cm/s) and "F" is the first formant frequency. I'm also aware that I should use vowels closest as possible to an unconstricted vocal tract.
However, I made a few experiments with the software program Praat and I got rather different and difficult to interpret results. In a single vowel, I get a large range of frequencies (1st formant ones), so I thought I should focus on the average? Is that correct? Moreover, among different vowels I get very different results. Is that normal?
Thanks in advance!
Alternatively, we got reasonable measurement using two microphones, one placed at the mouth and one at the throat outside the glottis, and estimate the distance by the time it takes for the acoustic wave to travel from the glottis to the opening of the mouth. The technique is described in "A quasi-glottogram signal", JASA 2003.
- Peggy Katelhoen added an answer:10How are verbs of communication used to introduce Direct Speech in different languages?
I am exploring the use of verbs of communication (verba dicenda) across languages and genres. My main aim is to see whether typological differences across languages (as described by Talmy) are maintained in the domain of communication. I am particularly concerned with how different languages use VoCs to introduce (and reconstruct) Direct Speech in written narratives and the rhetorical implications of this use, but I am also interested in their use in oral contexts. Any research dealing with this will be much appreciated.
There is an older publication (German and Spanish): Hernández, Eduardo Jorge (1993):
Verba dicendi. Kontrastive Untersuchungen Deutsch-Spanisch. Series: Hispano-Americana. Frankfurt/M., Berlin, Bern, New York, Paris, Wien,
and my own book: Katelhön, Peggy (2005): Das fremde Wort im Gespräch: Rededarstellung und Redewiedergabe in italienischen und deutschen Gespächen, Berlin: Weidler Verlag (discourse representation in spoken Italian and German languages),
For German I found out that there are verbs like "kommen" (come) that can indroduce an DS with an implicit negative valutation....Very interesting are also such forms without a verb "e lui/e lei" and "ich so, er so" o with the verb "fare" make" in Italian...Examples and bibliographie you can find in the book
With best regards, PKFollowing
- Eddy B. Brixen added an answer:10Acoustic analysis of speech - recommendations for lapel mics?
I am planning a series of 'field recordings' of speech. An individual speaker per recording - to be conducted in 'a quiet indoor space'. Planned analyses include format tracking (F0-3).
Researchers in the field of acoustic/phonetic analysis of speech: What lapel mics do you use? What are your experiences with different models? Do you have recommendations for particular models currently available? Would prefer an economical solution (for multi-site testing), but open to suggestions.
The distance and the axis is important. We have seen discussions on different LTASS profiles in different languages - and the discussion was really about the placement of microphones. A perfect microphone is the DPA4060. Remove the grid and you have a flat frequency response, low noise, and low distortion mic.
But placing the microphone on the chest makes you loose approximately 10 dB at 3-4 kHz!! (check this paper: Brixen, E.B.: Spectral degradation of speech captured by miniature microphones mounted on persons heads and chests. AES Convention no. 100, Copenhagen, Denmark. Preprint 4284.
A headset microphone that works is DPA d:fine 66. The level is 10 dB higher "at the edge of your smile" compared to the chest mounted mic. And this provides you with 10 dB less background noise......
Communication through a system of conventional vocal symbols.