- Arif Jawaid added an answer:1The feature fusion or decision fusion in audio-visual speech recognition, which one is better and which one is more suitable for DNN?I want to try feature or decision fusion in audio-visual speech recognition. I want to know which one is better and which one can be used by deep neural network.
The contents we have in audio-visual matter a lot. Sorry, if they are divorcing real-life tasks and activities, the av speech recognition will be hard for students to get automation of learning. Moreover, how would you cater for active language practice. Without the above two aspects DNN will be showing concerns. Thanks!Following
- Teva Merlin added an answer:3How can I enable simultaneous text-to-speech?
Hi everyone. I have been conducting a few experiments with simultaneous speech, but I have been using recorded speech (.wav, .ogg or .mp3 files) in all of them. However, I would like to play the simultaneous speech using Text-to-Speech solutions directly, instead of saving to a file first (mainly to avoid the delay, but also to be used across the OS/device).
All my attempts to play two simultaneous TTS voices (separate threads/processes, ...) have failed, as it seems that speech synthesis / TTS uses a unique channel (resulting in sequential audio).
Do you know any alternatives to make this work (independent of the OS/device - although windows / android are preferred)? Moreover, can you provide me additional information / references on why it doesn't work, so I can try to find a workaround?
Thanks in advance.
You should specify which TTS engines you have tried, on which OS. Without this information, it is hard to come up with an explanation of why it didn't work.
For what it's worth: on Mac OS X, using the built-in TTS engine, I have no problem playing simultaneous voices. So, if you're in a hurry and can get your hands on a Mac, this may be a solution.Following
- Fabian Tomaschek added an answer:3Is fMRI reliable in overt speech tasks?
How much of fMRI BOLD signal from a task requiring overt speech is lost because of head movement or articulation artifacts? Is there the risk that too much correction leads to unaffordable conclusions?
A member of my old research group investigated overt speech with fMRI. Check her publications: Brendel, Bettina.
Brendel, B., Hertrich, I., Erb, M., Lindner, A., Riecker, A., Grodd, W., et al. (2010). The contribution of mesiofrontal cortex to the preparation and execution of repetitive syllable productions: An fMRI study. NeuroImage, 50, 1219-1230Following
- Mikel Penagarikano added an answer:7Which kind of speech corpus are good for training text-independent speaker verification?
If you want to try state-of-the-art technology, then you need a huge amount of recordings. I doubt you can manage to get such a database on your own. If you are doing it anyway, take care of the channel. You could easily end up building a channel verification system.Following
- Pragati Rao Mandikal Vasuki added an answer:8How do I proceed in case of normal pure tone and speech audiometric results but complaint of difficulty hearing under background noise?
An adult person complaints of difficulty hearing in background noise. Pure tone audiometry and speech audiometry reveals normal findings with good speech discrimination scores. ABR and OAE results normal. What can be the further investigations required? and possible interventions
I agree with what others have said above. You could also probe about the person's exposure to noise. Sometimes even though the audiogram might be "normal", there could be a "hidden hearing loss". Such patients usually present with difficulty hearing in noise and tinnitus. If the result of APD test battery are inconclusive, you might want to try out ABR at different rates. See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227662/ & http://www.ncbi.nlm.nih.gov/pubmed/21940438 for details.Following
- Ravi Foogooa added an answer:3What is a suitable journal to publish a paper on cybersecurity, speech coding or telecommunication?
What is the fast and best journal with Thomson reuters indexing for cybersecurity, speech coding and telecommunication in general?
for me i prefer IEEE but i don't know how long it'll take? please advice
@Sarah: This seems interesting but is it specific to the medical field ?Following
- Rahimi Ali added an answer:13Can you recommend readings on Bakhtin and genre theory?
I am reading Bakhtin's "Problem of Speech Genre" and his Philosophy of the Act in the hopes of gaining a better understanding of his views on genre. Can anyone recommend additional readings, whether by Bakhtin or about his thought?
these publication delineate the issue:
- B. Tomas added an answer:5Can we measure the amount of stress required to produce speech?
In particular, during voiced speech production? I am looking for understanding the process of speech production in detail.
Maybe You can see chapter
Determination of Spectral Parameters of Speech Signal by Goertzel Algorithm,B. Tomas
Speech Technologies 01/2011;
- Jonathan P. Evans added an answer:8Are there any factors that affect pitch perception and pitch determination in speech?
Is there any research on factors that have effects on pitch perception and pitch determination in speech? I'm trying to figure whether F0 alone is sufficient for determining various pitch in speech.
You might find some interesting articles in the July issue of Journal of Phonetics, which is a special issue on High pitch. Hitting the news stands any day now!
(conflict of interest note: guest editor of the issue, and (co-)author)Following
- Pragati Rao Mandikal Vasuki added an answer:4How do I acquire EEG data specific to articulated speech?
i am doing a research on speech articulation using EEG wave patterns and i wish to acquire EEG data from the brain that is peculiar to speech: the lead systems, the vantage points and the electrodes to take into consideration
The answer actually depends on what aspect of speech you want to analyze-
If you are looking for onset responses to speech syllables, you should be able to get decent data from F & C electrodes
If you are looking for responses at word level, late potentials such as N400- you would need to include CP and P electrodes.
It is very easy to assume that a particular brain area "maps" to a particular electrode. Any basic EEG textbook will advise you about the perils of making this assumption. EEG are scalp recorded potentials and even now there is much debate about the accuracy of source localization using EEG data. You must always choose a group of electrodes in a particular region to make your research protocol stronger.
Also, could to clarify what does "peculiar" to speech mean?Following
- Patricia Marie Hargrove added an answer:4Is there any literature on prosody intervention?
I am a master student of speech language pathology. I need some article about prosody intervention in children with speech and language impairment.
I have a blog with over 100 evidence-based practice reviews of prosody interventions. Half are for children. The address is clinical prosody.wordpress.comFollowing
- Mark Lehman added an answer:3How can I perform Cepstral analysis in CSL?
I am trying to explore the procedure to do Cepstral analsis using Computerized Speech Lab (CSL), Kay Pentax. Are there any manual/procedure guidelines? I am only able to generate a FFT, however unable to proceed further.
Kay-Pentax sells a module for CSL called ADSV (Analysis of Dysphonia in Speech and Voice) that performs cepstral analysis.Following
- Dennis Soku added an answer:3Can somebody give me some examples of phonemic variations in a language and the probable reasons for such variations?
In Ghana, I have observed that the phoneme /j/ is realized as /dz/; /y/ or /Ʒ/ in speeches by individuals. I have also noticed that the difference in the realizations depends on either the absence or the presence of the target phoneme in the learners’ speech (i.e. transfer errors). Where it is present but the realizations are not the same, the learner tries to articulate the phoneme as a phoneme he/ she already knows. Where the target phoneme does not exist in the already known languages of the learner, he or she tries to make a substitution with another phoneme that exists in his or her linguistic repertoire. Can someone share with me some example of phonemic variations that he or she has noticed in their students’ speeches? Are the reasons for the variations different from what I have stated?
Thank you Prof. Ivleva and Prof. Prunescu. Prof. Ivleva, I very much like the historic insight given at the website. Prof. Prunescu, your point is well noted. It has to do with geographical locations. I am thankful to both of you. I am working on variations in Ewe (i.e. a local language) and your points are very useful to me.Following
- Paolo Mairano added an answer:6How does the energy contained in a speech signal be representative of the language in which it was spoken?
I am doing my final year project on "Classification of Tonal and Non-Tonal languages" using neural networks. The system takes pitch contour and energy as parameter Using only the pitch contour as a parameter yields an accuracy of 66%, whereas adding short term energy increases it to above 80%.
Many standard literatures also consider energy as a characteristic feature of the language, but provides no explanation.
I know that there have been some studies claiming that languages representing to different rhythm categories (syllable-timed, stress-timed, mora-timed, etc.) may differ in the way they use energy. I am not sure I am covinced about this, but here is the reference:
Lee, C.S. & McAngus Todd, N. (2004) Towards an auditory account of speech rhythm: application of a model of the auditory ‘primal sketch’ to two multi-language corpora. Cognition, 93/3, 225-254.
@Diwakar, I don't think tonal languages simply have 'more energy' in speech. If there is a difference (as suggested by Biplav's results), it is probably a difference in how energy is used in that language (rather than how much energy is used, which may depend on too many factors), right? Possibly, as you mention, there may be more consant energy peaks for vowels in tonal languages. But then again, I am not sure it as simple and as general as that: some tonal languages have neutral tones, where vowels can be fairly reduced...Following
- Ali Ibrahim Aboloyoun added an answer:5Is there a speech assessment for cleft palate children?
What is the ideal age for speech assessment for the cleft children?
What are the measures of speech assessment that can be done in day to day practice?
How soon after cleft palate surgery should the speech assessment be done?
Our openion for patients with cleft palate that it is a team work cases. Language and speech evlauation must be done as early as possible after surgical intervention which is the main factor affecting speech output as if it is done by clever surgeon leading to adequate palatal length and mobility the speech out put expected to be very good and if short and immobile palate the patient well need a hard work to slightly improve his speech
evaluation can be done subjectively by direct or indirect way by lestening to and examination of the patient by expiernced Phoniatrician or SLPFollowing
- Hendrik Schade added an answer:6Is there any paper that clearly states that our diction /register is a lot more loose in speech than in writing?
Dear all, I basically just need one citation (even though more would be better) on this in the context of a corpus analysis and I thought I would have an easy time finding one but I really did not so I would appreciate any help.
I hope it is okay to ask this kind of question. So far I have only used RG to publish and to follow researchers, this is my first time using Q&A.
Thank you all for the help! It is amazing that there are possibilities like this because sometimes you just spend way too much time on insignificant things without any result and then feel like you have not done anything at all.Following
- A.G. Ramakrishnan added an answer:5What are the Spectral and Temporal Features in Speech signal?
IN speech signal processing, i am getting these two terms more and more. what are they actually?
The most successful spectral features used in speech are (i) Mel frequency cepstral coefficients (MFCC) and (ii) Perceptive Linear Prediction (PLP) features. It is well known that the basilar membrane in the inner ear actually analyzes the frequency content of the speech we hear. In fact, the analysis of basilar membrane can be modeled by a bank of constant Q, band pass filters. There also exist the critical bands, which give rise to the phenomenon of masking - where one strong tone or burst can mask another weaker tone within the critical band. Actually, both MFCC and PLP capture these characteristics of our auditory system in some way; so, even though it looks strange, the same features give reasonably good performance for speech recognition, speaker recognition, language identification and even accent identification ! However, these spectral features are not very robust to noise.
On the other hand, some of the time domain (temporal) features such as plosion index and maximum correlation coefficient are relatively more robust to noise.Following
- Wijdan Alwidyani added an answer:7Is apraxia of speech (AOS) the same as language delay?
Is apraxia of speech (AOS) the same as language delay? and what are the most featured phonological patterns that characterize apraxic people?
Is there a specific battery used to diagnose apraxia of speech? and in case there is not, what are its symptoms?
Dafydd Gibbon thank you so much.Following
- Biswajit Satapathy added an answer:4Can anyone recommend a platform for building (or know of an existing) listening span task that can be acoustically manipulated?
I'm studying the effects of unfamiliar-accented speech on verbal working memory. I believe the task that best suits my experiment is a listening span task (LSPAN). I am interested if anyone has experience using these tasks and, in particular, if anyone has manipulated the acoustic boundaries of the stimuli. If so, can you offer advice or models for creating such a task (or point me towards one that currently exists that I may be able to use and adapt?)
Rather than rely on accented-speakers to record the LSPAN stimuli, I'd like to control for the exact acoustic features.
Hi Lauren, Apart from praat, following tool may help you more,
1. Wavesurfer : http://www.speech.kth.se/wavesurfer/
2. Audacity : http://audacity.sourceforge.net/
3. SOX, edinburg speech-tool
And for simulation you can use;
Here Matlab is not free but other two are free simulation toolkit.
Hope these tools will help you for speech analysis.Following
- Amaury Lendasse added an answer:7What is the best available toolbox for implementation of Deep Neural Networks (DNN)?
There are plenty of toolboxes offering functions for this specific task, so it would be great if we could all contribute and conclude about the best available DNN toolbox to this date (mainly for speech applications).
It will be great if we can give the pros and cons of using any toolbox and at the end we will conclude from the top voted answers.
- Monika Połczyńska added an answer:3Are there any neurolinguistic and psycholinguistic proofs on part of speech and syntactic position?
I think part of speech has a close relation with syntactic position. But I don't have any proof on this issue, especially proofs from neurolinguistic and psycholinguistic study. Can anybody help me with this?
There have been a number of fMRI studies on parts of speech and syntax, including (but not limited to) canonical versus non canonical word order. Here are a few articles that might be useful: Bornkessel et al. 2005, Be-Shachar and Grodzinsky 2004, Mack et al. 2013, and Meltzer-Asscher et al. 2015
Here you have PubMed links to these publications:
Best of luck!
- Dennis Soku added an answer:12How can I improve speech and language communication in children that have English as an additional language ?
Hope you can help me
In this case it is about ‘sequential bilingualism’ and not ‘simultaneous bilingualism’. In this, the children are going to use their knowledge of and experience in their first language. The use of substitution tables (i.e. making use of sentences with identical structures) will be useful. Phonetic exercises based on identical sounds in the two languages will also go a long way to improve upon their communicative skills in English.Following
- beh zad Ghorbani added an answer:6Is the "musical noise" generated by some Speech Enhancement algorithms uniformly distributed across the spectrum?
I am trying to assess the degree of degradation that "musical noise" causes in the low frequency bands of the spectrum of speech signals. Perceptually (playing back the treated signal) this artifact is stronger in mid and high frequencies (over 700 Hz), however I need an objective way to confirm or disprove this.
Does anyone have information on this subject or knows a way to evaluate the amount of musical noise present in a signal?
Thank you very much.
I can improvement the musical noise with perceptual frequency masking filter.Following
- Kuruvachan K George added an answer:1Where can I find the methods that find the silence intervals of speech?
Because the result of noisy speech filtering strongly depends on the silence intervals problem solution.
Such algorithms are part of Voice Activity Detectors (VAD), used to detect the silence segments in the speech data. Various techniques such as, signal energy, zero crossing, spectral centroid.. are used to in those algorithms. One of our papers is also attached.Following
- At L Hof added an answer:3What is the typical lung pressure for normal human phonation/speech?
I need the value of lung pressure to set up the boundary condition for the inflow for a 2D vocal fold simulation for a normal phonation condition.
You may try to consult the thesis of Harm Schutte at
- César Asensio added an answer:2How can one use posterior probability of Gaussian mixture model using matlab?
In my work, I want to use Gaussian mixture model in speaker identification. I use Mel frequency cepstral coefficient (MFCC) to extract the feature extraction of the training and testing speech signal and I use obj= fitgmdist(X,K) to estimate the parameter of Gaussian mixture model for training speech signal. I use [p, nlogl]=posterior(obj, testdata) and I choose the minimum (nlogl) to show the maximum similarity between reference and testing models as shown in matlab attach file.
The problem in my program is the minimum nlogl changes and it recognizes different speaker even if I use the same testing speech signal. For example, when I run the program for the first time, the program recognize that the first testing speaker has the maximum similarity with training speech signals (I=1) and If try to run the program again for the same testing speech, I will get the five testing speaker have the maximum similarity with training model . I do not know what is the problem in the program and why the program gives different speaker when I run the program for three times for the same testing speech signal .can any person specialize in speaker regonition system and Gaussian mixture model answer about my question
With best regards
I would suggest to test prtools toolbox for matlabFollowing
- Nikola Ilankovic added an answer:4What is the etiological relationship between MMS immunization and leasio of internal ear (laesio cochleae and n. cochlearis) by children?
What are the consequences on the speech development? What is the connection with autistic development?
Thank You Vladimir. But the Morbilli are in over 90 % very light illnes by little children! The complications are most frequently by adults. Why is then necessery the immunisation?? THe immune rection after infection and or immunistion can delay 6 and more months.Following
- Alexander I. Rudnicky added an answer:3Is there any effect of speech signals volume on the performance of speaker recognition systems ?
Is there any effect of speech signals volume on the performance of speaker recognition systems? For example, if the audio files used in the learning stage have a larger volume than those used at the test step, is this difference of volume will affect the performance of the speaker recognition system?
Well, if the data are too loud it will be distorted. So first make sure there's no clipping. Also note that the source of the loudness makes a difference. Is it because the gain was too high, or were people shouting? Other things being equal the training data ought to be reasonable similar to the test data; the end this is still a pattern matching problem.
Note that techniques such as CMN (cepstral mean normalization) are useful. In our own work we haven't observed much effect of normalization: the features are spectral, so as long as that information is reasonably there, things should should work. If anything, we've noticed that attempts at normalization usually degrade performance.
Of course what you should do to find for sure in your situation; simply do different trainings and see what happens.Following
- Iman Esmaili added an answer:2Does using speech samples with SNR < 0 make my recognition less accurate?
I am doing isolated word recognition based on MFCCs. some of my samples revealed to have SNR < 0, should I use them or simply delete them?
Of course using low SNR data degrades your recognition accuracy But to use the low SNR data or not depends on your recognition plan. we have clean speech recognition and noisy speech recognition. If you have no restriction in environment just use the clean speech but if your system must work in different conditions you must use all of your data and you have to find some way to deal with noise.
for example: spectral subtraction is a simple and efficient way to deal with white noise.Following
- Chilin Shih added an answer:7How can I estimate a person's vocal tract length, using a recorded audio file?I'm performing some experiments that require a vocal tract length change, but I need to know the original one.
I'm aware of the formula: L = c / 4F, where the "c" is the speed of sound (34029 cm/s) and "F" is the first formant frequency. I'm also aware that I should use vowels closest as possible to an unconstricted vocal tract.
However, I made a few experiments with the software program Praat and I got rather different and difficult to interpret results. In a single vowel, I get a large range of frequencies (1st formant ones), so I thought I should focus on the average? Is that correct? Moreover, among different vowels I get very different results. Is that normal?
Thanks in advance!
Alternatively, we got reasonable measurement using two microphones, one placed at the mouth and one at the throat outside the glottis, and estimate the distance by the time it takes for the acoustic wave to travel from the glottis to the opening of the mouth. The technique is described in "A quasi-glottogram signal", JASA 2003.
Communication through a system of conventional vocal symbols.