Science topic

Speech and Language Processing - Science topic

A group for researchers working in the area of speech and language processing.
Questions related to Speech and Language Processing
  • asked a question related to Speech and Language Processing
Question
9 answers
Hello,
We are working on a review regarding the relationship between language and the mutiple-demand network. You will be responsible for addressing the reviewer's criticisms. Please leave your email address if you are interested.
Best,
W
Relevant answer
Answer
I hope you get it... Best of luck. Commenting for better each for you...
  • asked a question related to Speech and Language Processing
Question
4 answers
Hi everyone. I have been conducting a few experiments with simultaneous speech, but I have been using recorded speech (.wav, .ogg or .mp3 files) in all of them. However, I would like to play the simultaneous speech using Text-to-Speech solutions directly, instead of saving to a file first (mainly to avoid the delay, but also to be used across the OS/device).
All my attempts to play two simultaneous TTS voices (separate threads/processes, ...) have failed, as it seems that speech synthesis / TTS uses a unique channel (resulting in sequential audio).
Do you know any alternatives to make this work (independent of the OS/device - although windows / android are preferred)? Moreover, can you provide me additional information / references on why it doesn't work, so I can try to find a workaround?
Thanks in advance.
Relevant answer
Answer
Did you try to use different engines?
  • asked a question related to Speech and Language Processing
Question
6 answers
I have trained an isolated spoken digit model for 0-9. My speech recognition system is recognizing the isolated digits like 0,1,2...9 but it fails to recognize the continuous digits like 11, 123, 11111, etc.. Can anyone please help me in converting these isolated digits to connected digits
Relevant answer
Answer
Segmentation of naturally spoken speech into words, even when there is a relatively small dictionary of words, is a harder problem than recognizing isolated digits.
People tend to think of spoken words as somehow isolated but "close" in time. This is not the case, unless you have a cooperating speaker (who helps the detection, or at least monitors it and repeats when it misdetects).
You can easily find in the literature the standard end-point detection mechanisms people use (mostly Viterbi based), and then run the isolated word detectors, but they are computationally expensive and don't really work very well for natural speech (the possible exception was flexible endpoint DTW, but I doubt that you are using DTW as a detector).
Y(J)S
  • asked a question related to Speech and Language Processing
Question
6 answers
I am going to teach a Speech and Hearing Science class.  We use Praat for acoustic analysis experiments.  Recently Windsurfer was downloaded to the lab computers.  I have never used it.  I have read some research comparing the two software packages.  What is the consensus?  Is one better than the other?  I do need directions for Windsurfer, however.
Relevant answer
Answer
Kesavarao,
Lots of links to Praat software. Its a free download.
Jeff Knox
  • asked a question related to Speech and Language Processing
Question
5 answers
I'm looking for paper in deep learning and machine learning used in Arabic abstractive text summarization. 
Relevant answer
  • asked a question related to Speech and Language Processing
Question
11 answers
                                        UPDATE (SEPTEMBER 2017)
TITLE: The Rate of Verbal Thought:  An Hypothesis
AUTHOR:   Ronald Netsell, PhD, Emeritus Professor, Communication Sciences and Disorders, Missouri State University, Springfield, MO
The purpose of this report is to develop the hypothesis that the rate of verbal thought is no faster than the rate of inner speech or speech aloud.  There is prima-facie evidence that inner speech and speech aloud are direct reflections of verbal thought.  Why else would you say “That’s not what I meant” after hearing what you said?  Or, “I don’t realize it until I hear it.”   This hypothesis was published in 1959 in an article entitled “Evidence that 'thinking aloud' constitutes an externalization of inner speech” (Benjafield, 1969).  Others have discussed this hypothesis (Morin, 2009; Glass, 2013).     
It’s important to distinguish two types of inner speech: expanded and condensed (Ferneyhough, 2044).   Expanded inner speech refers to word-for-word production, while condensed inner speech is fragmented, rapidly crossing topics with one word.  Interestingly, and in context of the present report, these two types also have been referred to as “willful voluntary thought” and “verbal mind wandering”, respectively (Perrone-Bertolotti et.al, 2014).  Apparently, these authors assumed that their types of inner speech represented verbal thought.  The block diagram of Figure 1 distinguishes verbal from nonverbal thought.    The idea that the rate of expanded inner speech (willful voluntary thought) was the same as the "rate of verbal thought" arose our recent findings (Netsell et al, 2016).  Participants were instructed to “say the first thing that comes to mind.”  Although this instruction was not intentionally designed to elicit verbal thought (thinking with words),
_____________________________________________________________________
Insert Figure 1 about here
_____________________________________________________________________
it appears to have done so.    We found that expanded inner speech was 600 msecs faster than speech aloud (p=.0002). We hypothesized that speech aloud was slower because of the time it takes to move the articulators (lips, tongue, etc).  This hypothesis has been criticized (e.g. Glass, 2013; Ghitza 2016?).
These findings suggest the rate of neural processing is the same for expanded inner speech and speech aloud.  Why wouldn't the rate neural processing of verbal thought be the same (~5.0 syllables/second)?  We listen to our verbal thought on-line as we're talking aloud (speaking without 'thinking'). If what we say aloud doesn't match what we’re thinking verbally, we'll say something like "That's not what I meant to say." Then, revise what's said aloud. Obviously, the hypothesis that we think no faster than we talk will be very difficult to test empirically.
____________________________________________________
REFERENCES       
Benjafield, J. (1969)  Evidence that 'thinking aloud' constitutes an externalization of inner speech” Psychonomic Science 15(2):83-84.
 Morin, A. (2009).  Inner Speech and Consciousness. In: William P. Banks, (Editor), Encyclopedia of Consciousness. Oxford: Elsevier 389-402.
Glass, J. (2013).  A neurobiological model of ‘inner speech’ for conscious thought.  Journal of Consciousness Studies 20:7-14.
Ferneyhough, C.(2004).  Alien voices and inner dialogue: towards a developmental account of auditory verbal hallucinations. New Ideas in Psychology 22:49–68
Perrone-Bertolotti, M. . Rapin,L, J.-P. Lachauxc,d, M. Baciua,b, H. Lœvenbruck (2014). What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring.  Behavioral Brain Research 261:220–239.
Netsell, R., Kleinsasser, S., & Daniel, T. (2016).  The rate of expanded inner speech during spontaneous sentence productions.  Perceptual & Motor Skills 123(2): 383-393.
__________________________________________________________________________
­­­­­­­­Figure 1. A block diagram model representing the process of transforming thought into words. Thought can be verbal or nonverbal. Verbal thought can be expressed aloud without conscious thought (speaking without thinking). Alternatively, verbal thought can be expressed consciously as expanded inner speech i.e. talking to yourself inside your head (Netsell et al, 2016). 
Relevant answer
Answer
  • asked a question related to Speech and Language Processing
Question
3 answers
For my bachelor thesis, I would like to analyse the voice stream of a few meetings of 5 to 10 persons.
The goal is to validate some hypothesis linking speech time repartition to the workshop creativity. I am looking for a tool that can be implemented easily and without any extensive knowledge of signal processing.
Ideally, I would like to feed the tool with an an audio input and get the time segments of the speaker either graphically or in matrix/array form.
- diarization does not need to be realtime
- source can be single or multi stream (we could install microphones on each participant)
- the process and can be (semi-)supervised if need be, we know the number of participants beforehand.
- Tool can be an matlab, .exe, java, or similar file. I am open for suggestions.
Again I am looking for the simplest, easy-to-install solution.
Thank you in advance
Basile Verhulst
Relevant answer
Answer
  • asked a question related to Speech and Language Processing
Question
4 answers
for speech synthesis
Relevant answer
Answer
Hi Wy, you may find the RAVDESS helpful in your work. It's a validated multimodal database of emotional speech and song. It contain 7356 recordings in English, with 8 emotions: calm, happy, sad, angry, fearful, surprise, disgust, and neutral, each at two emotional intensities. Downloaded for free - https://zenodo.org/record/1188976
  • asked a question related to Speech and Language Processing
Question
2 answers
I am currently looking at validating the ACE III in the alcohol related brain damage population. I would like to collapse the immediate and delayed memory domains of the RBANS to create a superordinate "memory" domain to allow a more direct comparison. Similarly, I would like to collapse the "language" and "fluency" domains of the ACE III into a superordinate "language" domain for better comparison to the RBANS "language" domain. Is there a precedent for doing this?
Relevant answer
Answer
I am not familiar with a prior attempt to collapse the RBANS components and then compare to the Addenbrooke.  I think the "language" and "fluency" domains of the ACE-III could be collapsed to compare directly to the RBANS.  I tend to prefer keeping immediate and delayed memory performances separate as different factors can affect each and if they were to be collapsed, that information would be lost.  However, if you are wanting to have a global mnemonic performance level, it would be interesting to see what you find. 
  • asked a question related to Speech and Language Processing
Question
3 answers
please I need help to found paper more relevant to Extracting Relations from the conversational text and related to deep learning and machine generated 
Relevant answer
Answer
  • asked a question related to Speech and Language Processing
Question
8 answers
What is your suggestion to obtain reliability for Visual Analogue Scale?
Relevant answer
Answer
there was excellent work in 1987 by Jagodzinski in Sociological Methods on using quasi-simplex models to get estimates of reliability and stability of single items. Tremendous papers and still very relevant when trying to assess psychometric properties of single items
  • asked a question related to Speech and Language Processing
Question
5 answers
Dear colleagues, 
I've almost completed intonation awareness-rising activities (English intonation training for Russian and Chilese EFL learners). I've got losts of recorded material that I'll now start to analyze. I'll be using RAAT for displaying tones (falling, rising, fall-rising). 
Relevant answer
Answer
Praat is so useful dear.
Regards
  • asked a question related to Speech and Language Processing
Question
1 answer
Is there a specificity of similarity measures for the Arabic language
(Or we can apply the measures developed in the literature, such as levensthein directly)
Relevant answer
Answer
you might be interested in the Aljameel et al (2016) survey article about string similarity for the Arabic language (please see link)
  • asked a question related to Speech and Language Processing
Question
7 answers
We talk about "boundaries of intonation units" and that language is a "code". And if we go in the direction of "categorial perceptions" and "motor theories," could it be expected that speech pauses (rather hesitations than breath pauses) can draw attention to the listener and thus promote the performance of remembrance? At which point could one expect a discriminating point at the pause length in relation to the rate of articulation? Imagine a Morse Code, e.g. SOS: We say three times short, three times long, three times short, but nobody talks about the silence between the individual units, right? Someone in distress at sea might have a different frequency of all units including pauses than someone on a deserted island who has been sending this code for weeks or months. How does the receiver discriminate between the individual units (in these cases, of course, we hope that there is a receiver at all ;-) ) and how does he know that it is an SOS signal? Can this model-like idea be applied to the language? And does it make any sense to think about the long-term memory? Or does it only concern the short-term memory and what is actually stored in the brain are generated emotions?
Relevant answer
Answer
Hi, we wrote a whole series of papers which investigate this issue, looking at disfluent "er", repetition, and silence (in English).  In brief, "er" and silence increase recognition memory for words which follow disfluencies (Corley, MacGregor, and Donaldson, 2007; MacGregor, Corley, and Donaldson, 2010); repetitions do not (MacGregor, Corley, and Donaldson, 2009).  In the case of "er" and silence, there is an associated attenuation of an N400 ERP effect at the target word, suggesting that people's expectations about what they will hear have been affected.  Importantly, Collard, Corley, MacGregor and Donaldson (2008) use a P300 ERP paradigm to show that this altered expectation is accompanied by an attentional modulation (as you suggest above).
A synthetic view of these studies might be that disfluency is detected when the signal becomes "non-linguistic" (hence not repetitions), and acts as a signal that the speaker is unlikely to utter a predictable word (N400 modulation).  This causes the listener to heighten attention to the signal (reliance on bottom-up information; P300 modulation), resulting in a greater recognition memory for the subsequent (target) word.
Hope that helps!
--MC
  • asked a question related to Speech and Language Processing
Question
6 answers
I am working on color categorization and terminology with bilingual speakers. The two languages follow different paths of categorization, and the system that each language uses overlaps in individual speech. I was wondering whether there was any other study concerning a similar topic. Thanks!
Relevant answer
Answer
Dear Fabio Gasparini,
Generally speaking, color naming , color semantics, color  categorization , and  shape of color space across different languages has long been an area of great interest in bilingual/multilingual studies. Prototypicality norms ( Rosch's model) have also been a matter of importance trying to examine how bilinguals tend to categories the tokens belonging to a given language type. I hope the following links shed more light on what you are looking for.
Best of luck,
R. Biria
  • asked a question related to Speech and Language Processing
Question
5 answers
Below is a partial abstract of our recent study.  The inner speech sentences were self-timed by the subject.  We're looking for a physical (EEG) measure of sentence onset and offset to calculate rate of inner speech. 
ABSTRACT.  The rate expanded inner speech and outer speech was compared in 20 typical adults.  Participants generated and timed spontaneous sentences with expanded inner speech and outer speech following the instruction to say “the first thing that comes to mind.”   The rate of expanded inner speech was slightly, but significantly, faster (0.6 seconds) than the rate of outer speech.  The findings supported the hypothesis that expanded inner speech was faster than outer speech because of the time required to move the articulators in the latter.  Physical measures of speaking rate are needed to validate self-timed measures. 
Thanks for any input. 
Relevant answer
Answer
There are two things that come to my mind.  First, one should be able to see a spike at the onset and a decay at the termination of speech if you are tapping the signals at C3, C4. or F3, F4.  The planning of opening the mouth and the corresponding signal should be evident at F3, F4.  Secondly, one may even get an EMG signal due to the muscle movement.  Strictly speaking, it is an artifact.  However, it may even do the job for your experiments.  Let me know your thoughts.
  • asked a question related to Speech and Language Processing
Question
11 answers
I want to see, if there are any relations between the consumption of digital media and the speech development or rather the vocabulary acquisition.
Relevant answer
Answer
My simple advice to all non-English native speakers (including me!)  to convert or switch their electronic computers, mobiles, social media accounts, TVs, etc into English mode, I think is the simplest way to gain more vocabularies and be default to communicate by using English language.
Emad
  • asked a question related to Speech and Language Processing
Question
3 answers
Arabic speech corpus  developed by @Nawar Halabi @MicroLinkPc is machine generated voice from machine auto diacritized texts  maybe some human correction involved
what is the diacritizer and the  TTS tools and algorithms used in the generation?
Relevant answer
Answer
this is speech corpus where is the text?
I'm asking specifically about the tool and algorithms used in producing completed  corpus
you seems added your answer without reading the question and wasted my time
  • asked a question related to Speech and Language Processing
Question
3 answers
Furthermore, I would like to know if there are some paper about the hours and sample needed to have valid and reliable data using Automatic Speech Recognition.
Relevant answer
Answer
I mean for mexican, I would like one that can recognize speech in natural environment. Anyway, I think I will use nuance.
Thank you all!
  • asked a question related to Speech and Language Processing
Question
5 answers
We define automatic or fluent as "done without thinking". The questions is done using that definition.
Relevant answer
Answer
We might have some posters that could be interesting, on cross-linguistic priming.
  • asked a question related to Speech and Language Processing
Question
3 answers
I am trying to analyze frequency mean, range, and variability from a speaker reading a passage aloud. I am using Praat and a Matlab script I am writing to analyze these. The common threshold in Praat is 75 Hz to 300 Hz for a male speaking voice and 100 Hz to 500 Hz for a female speaking voice. I want to make sure I am obtaining the most accurate fundamental frequencies of their voice, not higher frequencies from breathes or ends of words. Does anyone with experience in these analyses have a more accurate threshold criteria or are these thresholds in Praat suitable?
Relevant answer
Answer
I personally prefer Praat scripts
Good luck
  • asked a question related to Speech and Language Processing
Question
6 answers
During the process of questionnaire translation and validation, the original readability of the items should be maintained. How much testing of readability of a translated version of questionnaire is important? How to measure this readability? Gunning fog index? Flesch–Kincaid readability tests? Homan-Hewitt readability formula? Maybe other suggestions, please? Should I compare each one-sentence original item with the corresponding translated item? Which statistical measure should I use? Repeated measures t-test?
Relevant answer
Answer
Dear Michal S. Karbownik,
The application of  a questionnaire is believed to be meritorious because of  reliability, comparability, relative ease of administration and  analysis as well as its potential benefit of obtaining  data from  large samples . However, as you have rightly observed, it may also result in certain undesirable  epistemological consequences. In point of fact, the validity of the data depends  on the extent to which the targeted respondents are able to understand the statements or the  items on the questionnaire. Notably, readability of the statements or questions  may have a direct bearing on  the expected outcomes leading to such  problems as low return rates, missing data, and careless responses. Such issues can negatively influence the representativeness of  the expected   outcome and the validity and reliability of the instrument. As such, to measure the validity of the translated questionnaire enlist what Davies calls " specialist opinion" by giving it to several colleagues to evaluate and identify the appropriacy of the items.  To measure the reliability,  you can use pilot testing. By administering the instrument to a small group of participants , you can obtain a clear picture of the efficacy of the items. However, you can also use various readability formulas such as the ones you have referred to . For more information, I refer you to the following link, which can introduce to you a number of more recent readability formulas.
Best regards,
R. Biria 
  • asked a question related to Speech and Language Processing
Question
3 answers
I am doing speech recordings for an upcoming study to measure loudness and frequency of speech in people with motor disorders. We have a method for all of the recording already, but are having trouble playing a calibration tone at a known dB level. Would we do this by measuring with a sound level meter as close to the sound source as we can. We'll be using the same sound level meter to measure the calibration tone at a fixed distance from the microphone that is recording the participant.
Relevant answer
Answer
Set up the calibration tone source at equi-distance from the microphone and the sound level meter. This distance should be the same as the distance from the speaker's mouth to microphone during recording (usually ~15cm). The calibration tone is typically a 1000Hz pure tone that can be generated using Audacity (free software). See a more detailed guide by Anders Asplund here: http://www.clinsci.umu.se/digitalAssets/50/50362_andersa0902.pdf
  • asked a question related to Speech and Language Processing
Question
3 answers
Hi, I would like to know what are specific language disabilities any pattern or classification?
How learning mnemonics will improve language abilities ,is it more of memory or speech..
Relevant answer
Answer
Thank you all
  • asked a question related to Speech and Language Processing
Question
4 answers
I am working with DMDX to record vocal responses. With four stimuli, everything looks fine, but with 5, the program just stops working.
Relevant answer
Answer
Johnathan maintains an active listserv for his DMDX program: http://www.u.arizona.edu/~kforster/dmdx/list_serv.htm
Tell him I recommended you to join = )
  • asked a question related to Speech and Language Processing
Question
4 answers
I want to compare a model of speech synthesis to other concurrent and well known models (WaveNet,etc).
Relevant answer
Answer
guess Danila means this one https://catalog.ldc.upenn.edu/ldc93s1
  • asked a question related to Speech and Language Processing
Question
4 answers
Deep learning and Generative Models : Trends?
Anything in this area will be usful : survy, recent article, etc
Relevant answer
Answer
Generative adversarial networks (GAN)
This is the current hottest concept in DL after convolutional neural networks (CNN) and deep belief networks (DBL).
  • asked a question related to Speech and Language Processing
Question
4 answers
Hi all, 
I would like to ask from all the experts here, in order to get the better view on the usage of cleaned signals which already removed the echo using few types of adaptive algorithms with method of AEC.(acoustic echo cancellation)
How the significance of MSE and PSNR can improve in the classification processes? Which i mean normally we evaluate using the technique of WER, Accuracy and may EER too.Is there any kind connectivity of MSE and PSNR values in terms of improving those classification metrics.?
wish to have the clarification on this.
Thanks much
Relevant answer
Answer
It is one of the old issues in a speech recognition research field. That is on the relationship between any speech enhancement technique and the classification accuracy.
As far as I know, both the MSE and PSNR are frequently used for improving the quality of the input. They are known as useful in reducing WER. However, the relationship with recognition accuracy is not directly proportional. 
Enhancing a noisy signal in terms of MSE or PSNR means that you may have a good quality of the input but there is a risk. Sometimes, unexpected artifacts are produced by the speech enhancement techniques and WER can be increased in the worst case.
So, in phonemic classification task, matched condition is more crucial. And in the case of mismatched condition between train and test, MSE and PSNR are somewhat related to WER, but not directly. It is a case-by-case study.    
  • asked a question related to Speech and Language Processing
Question
3 answers
Regards,
Relevant answer
Answer
Yes,
1. is the phrases in passive or active voice and what is proportion of both
2. What kind of pronouns are used in terms of level of politeness
3. What type of phrases: simple, subjectless, complex; with or without images;
4. Any proverbial references?
5. Plain or idiomatic what is the proportion of both
6. Then establish the features as a defining cluster for sub-genres in songs
thanks
  • asked a question related to Speech and Language Processing
Question
3 answers
The continuous sequence of images (for example the conversation between a deaf person using an interpreter to converse with someone who does not understand the signs) being converted to speech, where the system would serve as the image-to-speech converter.
Relevant answer
Answer
This is a good idea however it needs a lot of work, Firstly, we can employ computer vision to identify objects and in a simple case just convert object names into speech. In summary, the components of such a system are available namely; a module for object identification in images and speaking words module. Just integration is needed if not done somewhere.
  • asked a question related to Speech and Language Processing
Question
4 answers
I want to classify audio advertisements based on user preferences.So I need to extract features from the audio files which will be pertaining to users.thus I need a way to extract features.I want to know whether there is a method for this without processing the text form of this audio.
Relevant answer
Answer
Thank you everyone for your instant response. Your solutions are highly appreciated.
  • asked a question related to Speech and Language Processing
Question
3 answers
Hello. I would like to know if there is a readability formula, using SLM & SVM for Spanish language. Thank you in advance.
Relevant answer
Answer
I believe you can use the same as for English, since both languages do not differ much on their word length (except monosyllables). They correlate similary among levels.
Jorge
  • asked a question related to Speech and Language Processing
Question
5 answers
I had to create a word list for a speech intelligibility assessment I am completing. A previous relatively large scale study has analysed the phoneme distribution in % of the language. I need to compare the phoneme distribution (% for each sound) of my wordlist to the phoneme distribution of the large study. Which statistical test should I use to analyse if they are similar to each other and hence ascertain that my list approximates the distribution? Thanks
Relevant answer
Answer
 Hi Pasquale, 
I faced a similar situation - i wanted to compare the phoneme frequency of patient (people with aphasia) productions against normal phoneme frequencies. If you want to look at the overall distributions then you could use a kolmorogov-smirnov 2 test which should tell you whether your phoneme distributions differ generally (i.e. whether the zipfian type distribution we see typically in English phoneme frequency is adhered to in your word list). This will not tell you whether specific phonemes are over represented though. You could also use a correlation analysis. If you wanted to analyse each phoneme separately then maybe a chi- square would be helpful.
Hope this is useful,
Emma.
  • asked a question related to Speech and Language Processing
Question
6 answers
Does anyone know of any sources to check the relative frequency of various consonant places of articulation in word-initial position in English (or any other language)?
In other words, what percentage of word-initial consonants in English are coronal, labial, dorsal, etc.?
Relevant answer
Answer
If you were able to find a database of English words in IPA and import that into SIL's Phonology Assistant program (https://www.sil.org/resources/software_fonts/phonology-assistant, its free) you could easily answer your question and also look at it from multiple angles. 
If you can't find a database in IPA, you could import the CMU corpus above, but you would have to define the phonological features of each of the graphemes and digraphemes. If would be a little bit of work, but not too much. You could then easily compare your results against a token frequency list such as found at (http://www.wordfrequency.info/)
  • asked a question related to Speech and Language Processing
Question
4 answers
I want to do semi supervise part of speech tagging for this first i want to cluster the un-label corpus base on words pattern.Which technique will be best for this.
Relevant answer
Answer
Just use word frequency. Better to first get bigrams and tri grams keywords for your data. Then apply VSM model to simply label the clusters. 
  • asked a question related to Speech and Language Processing
Question
2 answers
I'm collecting data for my speech therapy degree. I'm building a three-test battery to gauge speed and accuracy in adults with former developmental dyslexia.
One of the tests is a lexical decision task, which needs to be made harder by shaping it in a tachistoscopic presentation, and I therefore need to estabilish a basic amount of time for the stimuli to be recognized (and not only detected). 
In literature it can be found a rather large range of intervals for the minimum amunt time of the stimulus recognition, from about 20 ms to about 200 ms, accordingly to word lenght and some other variables, however over all in studies about visual analysis  
What I am seeking, is very specific data on the very baseline of the recognitizion of words, i.e. data on the reading abilities in normal subjects.
Relevant answer
Answer
If you are focusing on dyslexic learners, I think then see two aspects. The words they can and they words they cannot recognize. These students have diverse reading problems. So keeping 'can' aspect let them attain maximum speed and recognition and calculate the time. Moreover, you have to sort out the extent and variables involved in this process. Otherwise, it will be leading to a generalization. I hope it helps!
  • asked a question related to Speech and Language Processing
Question
1 answer
I could find very few solutions with limited level of application which were not directed to speech. I am very curious to know about any relevant methods to speech.
Relevant answer
Answer
What do you mean by noisy reduction?
  • asked a question related to Speech and Language Processing
Question
5 answers
As I'm new to the topic, I'm looking for information on benchmark corpora that can be obtained (not necessary free) for audio events classification or computational auditory scene analysis.
I'm especially interested in house/street sounds.
Relevant answer
Answer
The dcase is indeed a good reference
depending on what you are looking for you can also have a look to the
sweet-home corpora : http://sweet-home-data.imag.fr/
and
both having sounds captured in a home
  • asked a question related to Speech and Language Processing
Question
1 answer
Our project entails the evaluation of the "best" ASR software that runs in the Cloud and, preferably in embedded devices. 
While we will start with grammar-command applications, we want to quickly migrate to more applications that require NLU & NLP processing at a "state-of-the-art" level.  This is a commercial platform-- but not a "toy".
Relevant answer
Answer
Links from a unfortunately dated project (2012) - but perhaps an OK starting point?:
Revolutionising communication, tango! the sound of a child's voice by Acapela.
Acapela "childrens voices"
Loquendo TTS Multimedia Package and Voice Creator Now Available
Loquendo emotional TTS
Junichi Yamagishi
 
English speech synthesis of child (under R&D in 2012)
Publication of ongoing research on English speech synthesis of child:
+
McNamara, Lisa. Comparison of Voice Output Types for a Child Using AAC. 2006 ASHA Convention. Missouri State University.
Conference Paper A Child's Voice
  • asked a question related to Speech and Language Processing
Question
7 answers
there are differences between the frequencies of sounds, as 8000Hz , 16000Hz, 44100Hz , ....,etc.
why the researchers prefer the higher frequencies?
Relevant answer
Answer
Frequency approach to music, speech and hearing research has never ceased to raise fundamental questions since Ohm's Acoustical Law (1843) and Helmholtz's Resonance Theory (1877). Hussein's question above is just one of millions unanswered questions. . Stephen's reference above to the Nyquist theorem provides adequate answer to the question at hand. If I record musical instruments at say 8 k/Hz sample rate, the sound quality is relatively poor in comparison to the same signal recorded at a higher sample rate. In speech, you might find that different phonemes (particularly fricatives/sibilants) are not well captured at low sampling rates as you'll lose the high frequency components which are critical distinctive features (acoustically speaking).  Thus, if you use a unique sample rate, you might limit yourself to a specific sound source. I adopt Xaver's procedure mentioned above. It is one wise way to discover what sample rate does to the quality of your recordings.
  • asked a question related to Speech and Language Processing
Question
2 answers
Dear sir/madam,
 I have segregated combined speech sources using neural network based classifier in speech segregation process.For the estimation of Signal-to-noise ratio whether we should use the outputs of ideal binary mask is my doubt.Please guide me to do the estimation.
Thankyou in advance
Relevant answer
Answer
 Thankyou for your kind response
  • asked a question related to Speech and Language Processing
Question
2 answers
I am new to this tool. I got one in-house project to build an desktop application for translation (English - Hindi) . I am getting problem in hindi-pos tagging, and parsing of the Hindi sentence.
Can any one help me out with this.
Hindi-POS tagging showing 'UNK' for hindi font.
Relevant answer
Answer
Hi Saket,
You need to train the tagger using Devanagari tagged input data. The following Stack Overflow post explains this:
But for convenience and for others I'll repeat the code here. For example using the TNT tagger:
from nltk.corpus import indian
from nltk.tag import tnt
word_to_be_tagged = u"ताजो स्वास आनी चकचकीत दांत तुमचें व्यक्तीमत्व परजळायतात."
train_data = indian.tagged_sents('hindi.pos')
tnt_pos_tagger = tnt.TnT()
tnt_pos_tagger.train(train_data) #Training the tnt Part of speech tagger with hindi data
print tnt_pos_tagger.tag(nltk.word_tokenize(word_to_be_tagged))
  • asked a question related to Speech and Language Processing
Question
4 answers
Represent words in devangari to phonems or consonant vowel pattern
Relevant answer
Answer
When Devanagari (and other Indian languages) were encoded with ISCII, every sign could be directly mapped to a phoneme. Now, with Unicode it may necessitate using context-sensitive rewrite rules equivalent to converting back Unicode to ISCII. I have done the opposite on databases, using the attached table...
Not sure it will help!
  • asked a question related to Speech and Language Processing
Question
9 answers
I am looking for literature discussing if some types of phonemes are more or less likely to undergo sound changes. 
It seems intuitively the case that some sounds like /m/, /n/ and /a/ are less likely to change during the process of language change than sounds with more complex or "marked" articulation.
Relevant answer
Answer
Your intuition seems plausible, but it's flawed.
There are phonemes that are considered common, based on their frequency in languages around the world (see http://wals.info/chapter/18 for a discussion of languages that lack common consonants, consonants in the classes of fricatives, bilabials, and nasals - like your examples of /m/ and /n/.). The hypothesis that common phonemes, or at least consonants, are easier to articulate (the Theoretical Issues section in this discussion: http://wals.info/chapter/1).
Vowels are harder to define, and more subject to change, but there are far more types of consonants than vowels (7 is considered a large vowel inventory while the average number of consonants is 22). Languages with small vowel inventories usually include /a/, along with /i/ and /u/ (http://wals.info/chapter/2).
But, if we accept the hypothesis that easier-to-articulate phonemes are less likely to change than phonemes that require complex articulation, we still have to remember that phonemes that are easy to articulate are members of a class of phonemes. As members of a class, what impediments would there be for one easily articulated phoneme to change into another? /m/ and /n/ are both common nasal phonemes with similar articulation, so what would prevent /m/ from changing into /n/?
Then there is the question of origin. Why do phonemes with complex articulation exist at all? The number of consonants in a language ranges from 6 to 120 or so; vowels from 2 to somewhere in the 20s. There is a general inverse correlation between vowels and consonants, languages that have few of one tend to have many of the other, but there is a strong tendency toward the "average" amounts or consonants in the 20s and vowels close to 10. But what reason is there for the outliers on the high end of either type of phoneme? If phonemes that are difficult to articulate tend to change more than those easier to articulate, there should be a trend where phonemes that are difficult to articulate disappear. But after many thousands of years of language use, they persist in many languages.
The sense that they "persist" is also a problem. The implication is that, the earlier you go back in human history, the more complex the inventory of phonemes was. What would be the reason for this? The alternative, though, is that languages developed phonemes that are more complex in articulation from an inventory of easier-to-articulate phonemes.
Complexity of articulation is also a difficult to define, as it often depends on what other phonemes exist in a language. If language users are comfortable with an articulation pattern based on one phoneme, minor variations on that pattern should be relatively easy to adopt.
But this isn't necessarily true, and has an impact on the hypothesis that ease of articulation should influence the presence of a phoneme (if a phoneme is more resistant to change, it is more likely to be present at any particular time of measurement). Languages tend to "skip" similar phonemes that other languages use. There is very little difference in articulation of an aspirated stop consonant and the non-aspirated version, so little that languages may contain both sounds but not consider them phonemic, but allophones. English has an aspirated /p/ and a non-aspirated /p/. The /p/ in "pit" is aspirated, the /p/ in "spit" isn't. So why doesn't English take the easy route and use these sounds that already exist as phonemes, and maintain them as comparatively easy to produce? Hindi does. But it has its own quirks.
The issue is complex, and I haven't investigated it, but have thought about it. There are real problems with the hypothesis. There's no reason for one easy-to-articulate phoneme to not change into another easy-to-articulate phoneme. Difficult to articulate phonemes should disappear over time - but they seemingly shouldn't exist in the first place. Languages don't use "easy" sound variations that already exist in the language as phonemes, while other languages do use them.
  • asked a question related to Speech and Language Processing
Question
16 answers
Listening skill  has often been called the Cinderella skill of language teaching because it involves a number of variables that are too difficult to be operationalized within the allotted class time. A comprehensive model of L2 listening comprehension cannot be developed without a full account of the parameters dominating the process.
Relevant answer
Answer
Attached you will find a rubric I use for speaking proficiency.  If I were to create a similar rubric for listening comprehension, I might include the following sub-areas:
  1. Technical knowledge of the language -- vocabulary, grammar, etc.
  2. Ability to comprehend the accent of the speaker -- pronunciation of words may vary considerable across speakers with different native accents.
  3. Tempo -- Can the listener process the information in "real time" well enough to "keep up" with the speaker.  Some native speakers speak very fast, compared to second/foreign language speakers.
  4. Non-verbal -- Does the listener process non-verbal messages of the speaker accurately (facial expressions, body language, etc)?
  5. Construction of meaning -- Does the listener understand the meaning intended by the speaker with necessary accuracy?
  6. Nature of the speech -- Spoken thoughts tend to be more lengthy than written language and because they are constructed by the speaker "on the fly" meaning may be less clear or variable.  Comprehending someone speaking impromptu may be considerably different from comprehending someone reading a prepared text.
These are some ideas that come to mind for me about the factors influencing listening comprehension.
  • asked a question related to Speech and Language Processing
Question
5 answers
Does anyone know about published research (or other available resources) on scoring issues on sign language production tests? For example, development of scoring instruments, type of scales being used, inter-/intra-rater reliability, procedures to solve disagreement between raters, construct representation etc.
Relevant answer
Answer
  • asked a question related to Speech and Language Processing
Question
6 answers
what features are beneficial to find the age from the voices of human beings?
Relevant answer
Answer
Dear Manish,
In the following related papers you can find many relevant feature sets:
(1)
Sedaaghi, M. H. (2009). A comparative study of gender and age classification in speech signals. Iranian Journal of Electrical and Electronic Engineering, 5(1), 1-12.‏
(2)
Lingenfelser, F., Wagner, J., Vogt, T., Kim, J., & André, E. (2010). Age and gender classification from speech using decision level fusion and ensemble based techniques. In INTERSPEECH (Vol. 10, pp. 2798-2801).‏
(3)
Chaudhari, S., & Kagalkar, R. (2012). A Review of Automatic Speaker Age Classification, Recognition and Identifying Speaker Emotion Using Voice Signal. International Journal of Science and Research (IJSR).‏
(4)
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., ... & Littel, B. (2007, April). Comparison of four approaches to age and gender recognition for telephone applications. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-1089). IEEE.‏
(5)
Brown, W. S., Morris, R. J., Hollien, H., & Howell, E. (1991). Speaking fundamental frequency characteristics as a function of age and professional singing. Journal of Voice, 5(4), 310-315.‏
Best!
Yaakov
  • asked a question related to Speech and Language Processing
Question
3 answers
We are looking for a French text in which speech sounds are selected such as to obtain a fixed proportion of voiced and unvoiced sounds (or more degrees of sonority). This text would we used in a contrastive multilingual experiment on vocal load.
In addition, we are interested in phonetically balanced corpora for French.
Thank you!
Relevant answer
Answer
You may find these books useful although quite old.
Lucile Charles & Annie-Claude Motron (2001). Phonetique Progressive du Francais avec 600 exercices. Paris: CLE International.
Lhote Elizabeth (1990). Le paysage sonore d'une langue, le francais. Hambourg: Buske Verlag.
Best wishes
  • asked a question related to Speech and Language Processing
Question
11 answers
Hi there,
I would like to ask you how do you compare a speech sample and a different kind of auditory sample (e.g., noise, sounds produced by animals...) when you are looking for similarities and differences between the two samples.
For instance, there are some times when people believe they are listening to words when hearing a noise, or the wind. If a participant reported having heard "mother" when he/she actually listened to a noise, how would you carry out the comparison between the two different sounds? Is there any way to do that?
Ideas and references are welcome!
Thanks!
Relevant answer
Answer
You're looking at more of a psychological phenomenon than an acoustic one. It's similar to the "phonetic restoration" effect that's been studied in the past.  
If you think of  the human auditory system as actively seeking evidence for a particular speech event and finding sufficient evidence for it in the sound then you get the observed phantom percept.  A different version of the effect can be observed in "babble", what people will hear in recordings of superimposed voices.
Actually, if you want to pursue this systematically it could get interesting. For example, can you find sounds that, across listeners, appear to be fertile sources of illusion? What are their characteristics?
  • asked a question related to Speech and Language Processing
Question
5 answers
To set up a speaker recognition system using NIST 2004 dataset I found speaker indices of test "x???.sph" from the address : http://www.itl.nist.gov/iad/mig/tests/spk/2006/
To train total variability I need to use speaker indices of train data "t???.sph". where can I find it?
Please help me.
Thanks in advance
Relevant answer
Answer
ِDear Amir 
No I did not find the answer for NIST 2004
I have not NIST 2008, If you have it, we may have beneficial negotiations.
Please let me know of your decision. 
  • asked a question related to Speech and Language Processing
Question
4 answers
I would like to analyze vocal responses from a working memory n-back task with two possible responses ("yes" vs no response). Aim of the analysis is to get an automatically generated output file with two columns: (1) subjects study code (1...n) or rather file label and (2) vocal response (e.g. "yes" vs no or 1 vs 0).
I already tried Inquisit Lab 5's tool "Analyze recorded responses" but it did not work that well, i. e. after analyzing a few data sets which were coded correctly, Inquisit is not able to distinguish between responses and non-responses any longer.
Do you have experiences with Inquisit Lab 5 or any other suggestions regarding to speech recognition?
Thanks a lot!
Relevant answer
Answer
Dear Volker! I hope to address are very useful articles on the subject of my electronic library.
Vladimir
  • asked a question related to Speech and Language Processing
Question
1 answer
What would be the effect of the speech utterance length on speaker recognition. i.e
if T, UBM, LDA, PLDA-----> are trained on short utterance i.e. from 3 to 15 seconds, but 
enrollment of speaker (modeled speaker) are trained on long utterance such as 30 to 60 seconds uttarnce?  Would it effect the performance of the system????
Relevant answer
Answer
These models treat observations independently across time, so there should be no problem with train and testing on utterances of different length.
There are few concerns that you should take into account:
1) It is better to make sure you have enough observations (per speaker) for training. If you have several short utterances per speaker, this should be fine
2) UBM training can be severely affected by silences. When you have long utterances in test, they likely contain a lot of silences. A common practice is to use at least energy-based voice activity detector and to score using only voiced frames.
You may find useful the SRE10 Kaldi recipe at least for having some general ideas about data pre-processing.
  • asked a question related to Speech and Language Processing
Question
1 answer
Is there any influence of the mismatch between the language used for training the system hyperparameters (TVS, LDA, and PLDA hyperparameters ) and the system users' language on the performance of the speaker recognition system ??
Thanks in advance .
Relevant answer
Answer
Hi Ayoub, 
Language mismatch between train and test data can surely affect the performance of the SR system (the error rate can be doubled or even worse if there is a mismatch between languages used to train the GMM, TV matrix and the PLDA model), I think that this paper can be a good starting point [1], it also proposes a phoneme histogram normalization technique to match the phonetic spaces of train and test languages. A possible solution is to use many languages to train your system, i.e. GMM, TV matrix and PLDA (using NIST SRE data) or other databases. Some systems [2] use up to 11 languages. 
--- 
References : 
[1] Abhinav Misra, John H. L. Hansen , Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora
[2] Pavel Matejka et al, Analysis of DNN approaches to speaker identification. 
  • asked a question related to Speech and Language Processing
Question
8 answers
I recently started to work on Speaker/Language recogntion using i-vector, and after consluting with researcher on researchgate, I came to the following steps:
1) Database
i) Developement dataset (UBM, T training), if labeled (LDA and PLDA also)
ii) Training dataset(For speaker/Language Enrollment, modeled speakers), if the Developement dataset is not labeled, I trained LDA and PLDA on training dataset(needs comments on this) 
iii) Testing dataset (for testing the modeled speakers/language)
About Language Detection:
If I have lot of speech samples, but no labled for that speech utterance, how can I train LDA/PLDA for languages? or can I trained these on training languages data? 
What about the Gender? how much the results will be effected if we have different/same UBM, T? Is it ok to have single UBM, T for both genders?
Is there any way to apply the i-vector detection without applying LDA and PLDA such as SVM on i-vectors without i-vector reduction??
Relevant answer
Answer
You don't have to perform i-vector reduction using PLDA/LDA, in that case you can use Cosine Distance Scoring for comparison with your model i-vectors.
  • asked a question related to Speech and Language Processing
Question
7 answers
I need data base for research work
Relevant answer
Answer
TO whom shall i communicate for the Nemours and Torgo database.
Is it available to PhD scholars
  • asked a question related to Speech and Language Processing
Question
10 answers
loss is inevitable during translation.but which level of language is more liable for loss (morphological , syntactic or semantic?
in morphological level: what type / category of words
in Syntactic : what sentence pattern/ structure
in Semantic: what type of meaning/domain
Relevant answer
Answer
Dear Muhammad,
Indubitably, translation loss is inevitable even in the translation of the simplest texts. The main culprit may be said to be related to  the message planned for the source language, original readership. Very often, the message intended by the SL author  may be radically different for the recipients in the target context. Consequently, translator's attempt to create equivalence may miss the original and something may go amiss during the process. In point of fact, translation loss may occur by different causes such as change of register, cultural protection, literal translation of puns, lexical translation leading to ambiguity, use of words lacking direct translation.
Best regards,
R. Biria
  • asked a question related to Speech and Language Processing
Question
4 answers
I have the samples of phonemes of English language. I want the best method of concatenative synthesis and also the best way to resolve glitch observed in concatenating small units of speech(phonemes)
Relevant answer
Answer
The way I did it was inspired by an algorithm I found in a book titled DAFX by Zölzer (2002). Basically you calculate the cross-correlation between the end of the first phoneme and the beginning of the second and find the point where they correlate the most. This is where you want to concatenate the phonemes/diphones/units. Based on this point, you multiply the end of the first phoneme by a decreasing ramp and the beginning of the second phoneme by an increasing ramp and then just add them up. I hope you understand the idea.
  • asked a question related to Speech and Language Processing
Question
3 answers
I have been trying to acquire EMG signal for sub vocal feature extraction but signal contains a lot of noise that makes impossible to work on subsequent steps.
Thanks in advance
Relevant answer
Answer
Dear Rashmi,
In addition I send to you some papers to play with this subject. To resolve the problem of signal noise, I recommend a band-pass Butterworth filter of order 8, formed by
a high-pass filter of 30 Hz and a low pass filter of 450 Hz; the information can be recorded at a sampling frequency of 50 kHz in vectors of 2 seconds and 50 signals per word. I think that this filter will can help you.
Sincerely yours.
  • asked a question related to Speech and Language Processing
Question
4 answers
I have used P2FA force alignment system to anotate the .wav files. However, the results of phonemic anotation is not good. Is there any open source force alignment software available for American English? By the way, if the software using the CMU dictionary, it will be good for me.
Relevant answer
Answer
Check out the prosodylab aligner by Kyle Gorman and Michael Wagner
Info and tutorial here:
code on github:
  • asked a question related to Speech and Language Processing
Question
5 answers
Especially from the point of view of textual competence
Relevant answer
Answer
Communicative Competence is the ability to communicate your intended message in socio-cultural context. The choice of grammatical structure is clearly related to the circumstances and is chosen for its appropriacy in those circumstances. Learners need to know how to express a variety of functions and which choices are appropriate in different circumstances
  • asked a question related to Speech and Language Processing
Question
9 answers
Especially theory of constructivism
Relevant answer
Answer
Dear Kanchana Prapphal,
As a new catchword in education, constructivism tends to address how people learn. Accordingly, it suggests that learners construct knowledge for themselves. As a consequence, constructing meaning materializes  learning. The idea is deeply rooted in the philosophies practiced by Piaget and Vygostsky. The latter, in particular, maintains that meaningful  negotiation between the expert and the novice can result in learning. Therefore, to actualize the full potentiality of a learner's linguistic competence, it is  necessary to adopt  an interactive approach in which the affective support provided by the teacher/expert can establish a meaningful rapport helping the individual's progress and learning.
Best regards,
R. Biria 
  • asked a question related to Speech and Language Processing
Question
5 answers
A lot of theories and studies seem to deal with how input is processed for meaning and form, but I am interested in looking at how the learner then takes this processed input and constructs some sort of representation of the L2 for later use. Does anyone know which, or any, SLA theories that specifically deal with this process?
Relevant answer
Answer
The keyword in your query appears to be 'specifically' and addressing this I would have to say Constructivism. However, the notion of noticing as proposed by Schmidt (2010) and the Interactional Hypothesis by Long (1981) are key to eventually internalizing the L2 input being acquired and successfully putting the acquired knowledge to use.
  • asked a question related to Speech and Language Processing
Question
3 answers
I'm seeking for free Speech Recognition for Arabic language (ASR). Can you help me to find it?
Relevant answer
Answer
Here is the link of one paper which is about vowel recognition in Assamese language...This can be implemented in case of Arabic language as well, however you will have to create your own Arabic database for all the vowel phonemes, hope it will be useful
  • asked a question related to Speech and Language Processing
Question
4 answers
I have looked at Steen G. 1999 and other works that have cited him, still I find the notation a little diffcult to understand and apply.
Relevant answer
Answer
Thanks once again. I am already checking the sources out. I truly appreciate your interesting in my question.
Best wishes,
Onwu
  • asked a question related to Speech and Language Processing
Question
4 answers
I'm interested in differences in the use of /r/-liaison between native speakers of non-rhotic English (e.g. RP) and the use of that phenomenon by EFL learners.
Relevant answer
Answer
Dear Jose,
You might have a look the article on the /r/ sound by following the link below:
Hope you find it useful! 
Ali. 
  • asked a question related to Speech and Language Processing
Question
13 answers
Hi I have some papers in speaker recognition, if any body have any interesting subject?
Relevant answer
Answer
Hi Khaled,
I am looking for free speech databases for speaker recognition (at least more than 80 speakers) ... Can you help me ?
  • asked a question related to Speech and Language Processing
Question
3 answers
I am facing some difficulties to make the indicator of someone who having good stress, rhythm and intonation why they are speaking and reading.
Relevant answer
Answer
Dear Suciati Anandes,
First, you have to make an inventory of certain targeted statements that carefully pose the suprasegmental features you want to investigate . Subsequently, ask the participants (the selected sample) to read them. Naturally, you record their voices in this particular stage. Finally, ask three native speakers of English to rate them based on the accuracy and appropriacy of their production of suprasegmentals. Good luck with your research.
Best regards,
R.Biria
  • asked a question related to Speech and Language Processing
Question
3 answers
brain functions in the early child language acquisition
Relevant answer
Answer
The L1 synapses / neural net for the L1 are stronger than the neural nets formed for the L2. The L1 knowledge is more entrenched. That's why the L1 plays a toll upon the L2. So, the fact that the L1 is transferring to the L2 (interference is not a good term if you are actually working with language in a cognitive perspective) is an evidence that the L1 synapses are actually not changing. The extensive use of the L2, on the other hand, such as in immigration to an l2-dominant community where the L1 is not frequently used, for example, may lead the L2 to become more robust, making the L1 change.  If thats what you are interested in, I would suggest reading about Language Attrition... Look for Monika Schmid and Kees De Bot paper on it..
  • asked a question related to Speech and Language Processing
Question
6 answers
Hi, I'm a Phd student and I've been interested in Brain Computer interface and speech imagery , in particular in vowels and sillables imagery . I 'm at the beginning of my study and my specific field is that of EEg signals related to spoken and unspoken (imagery) speech. I need of scientific articles about this issue. and about lthe neural anguage processing......can someone help me?? Thanks
Relevant answer
Answer
Speech Imagery is a rather fresh topic in Brain-Computer Interfaces to my best knowledge, there are few papers published on EEG based Speech Imagery classification, here's a list and links to the ones I'de read:
1- DaSalla et al. "Single-trial classification of vowel speech imagery using common spatial patterns"
2- Matsumuto & Hori "Classification of silent speech using support vector machine and relevance vector machine"
  • asked a question related to Speech and Language Processing
Question
4 answers
Need to get in touch with someone who has already worked once with Kaldi ASR for speech recognition.
Relevant answer
Answer
In India, we are working on the Indian language (Tamil & Hindi) speech recognizer  based on the KALDI toolkit.
  • asked a question related to Speech and Language Processing
Question
4 answers
I think not always. For example, "to turnover a new leaf"(of life) the word leaf is used not in its first meaning (a leaf of a tree) but may be a blank sheet of paper? Then it's a metaphor.
Relevant answer
Answer
I would say that the most common use of the word "leaf" is leaves on a tree. However with respect to books, "leaf" is usually used as a verb, i.e. to leaf through a book. Leaf can also mean "hinged flap on the side of a table" is from 1550s" because pages in a book are also hinged.  
So you are correct that "leaf" meaning a page of a book is a secondary meaning and a is metaphor. Metaphors do not always use the most common meanng
  • asked a question related to Speech and Language Processing
Question
13 answers
Hope you can help me
Relevant answer
Answer
Children may be grouped ; familiar contexts or scenarios are to be planned; guided conversation for dialogue delivery in frequently experienced situations will make it really natural and easy for communication. A graded pattern of this kind will ensure quick acquisition of communication-both verbal and nonverbal.
  • asked a question related to Speech and Language Processing
Question
2 answers
I am working on language identification through i-vector.But i am thinking that if a language which use two language like if we speak hindi and some time u prefer some english word than that type of data set shows some problem for model which built for corresponding language.and it also decrease the performance of model
Relevant answer
Answer
To identify a language spoken by any human being, first of all, we need to verify its specific as well as common linguistic features such as pronunciation including stress, intonation, rhythm and so on.   These common linguistic features are attributed to many languages with little or much variations.  So all the languages are considered as different and unique in themselves.  So,  In order to identify any particular language or languages using language identification programs such as through i-vector or whatever, one should need to incorporate each and every feature of a target language or language for identification or understanding into application (computer programme). Further, To solve the problem of dual language or multiple languages in identification; incorporating most frequently used words and their features from target languages into the software  can be helpful.
Thank you 
  • asked a question related to Speech and Language Processing
Question
10 answers
I am writing chapter 3 of my proposal and I need an instrument to measure language development for low functioning autistic children. I will appreciate if any of you will allow me to use the instrument that you already have.
I am doing a quasi-experimental study and using a small population of 5 autistic students (3 to 5 years old). My strategy includes photographs of each child natural environment which will allow me to initiate conversation with each one. I use each child's IEP as a pretest and will use the measurement that I am looking for to verify progress in the post-test towards the end of the training.
Relevant answer
Answer
Communication Matrix focuses early communicative skills and has been used in studies Before. https://www.communicationmatrix.org/
Have you checked fx Solomon-Rice&Soto "Facilitating Vocabulary in Toddlers Using AAC A Preliminary Study Comparing Focused Stimulation and Augmented Input" for inspiration?
Good luck
  • asked a question related to Speech and Language Processing
Question
4 answers
I'd like to find a source for the population so the Ethnologue can cite it.
Relevant answer
Answer
In fact that, when we talk about "Mông" in Vietnam, it's presented for Hmong people. In other country, they have another name such as Miáo (Chinese) or Maew (Thai). I'm not sure whether Hmong Do is a subgroup of H'Mong or refer to H'mong. However, if you are know that they are live in Ha Giang, Lao Cai and Bac Ha province, Dong Van and Meo Vae district (which province?), then you can look up on the file I gave you. It presented the H'mong people depend on their location. If you have the trouble with Vietnamese, I can help you sort this data.
  • asked a question related to Speech and Language Processing
Question
5 answers
I didn't find any datasets for Natural Language questions and their corresponding SQL statements. So, I was thinking of creating one for myself and other researchers to work on. I do want to know what's the best way to do that, collect them from people and do manual reviews of the matching between NL and SQL, and an automatic review of the SQL statements that they are working?
Relevant answer
I don't get what you mean, I think you misunderstood the question somehow!
  • asked a question related to Speech and Language Processing
Question
7 answers
Hi everybody,,
I'm currently working on Handwritten Arabic words recognition, i have built a feature matrix for each image of size 34x10, where there is 3 types of features; 8 concavity, 11 distribution and 15 gradient for a total of 34, and the 10 is the length of the image divided by the sliding window size which is 3, hence the 10.
What i'm asking for is, how can i input this feature vector to HTK to start training HMM models with a given topology but unknown parameters, if there is an example that will be great, i have read through the HTK guide but most of it i couldn't understand, since it talks about speech recognition.
Please guide me.
Thank you.
Relevant answer
Answer
Hello. Here are some references you might be able to use. The article "An evaluation of HMM-based Techniques for the Recognition of Screen rendered text" describes the basic idea of applying a HMM to a text recognition problem. It treats the horizontal axis (direction of writing) as the "time axis" that is usually used in HMMs for speech and other time-series based signal processing. You could also try random fields, which also use the HMM formalism.