Science topic

Speech - Science topic

Communication through a system of conventional vocal symbols.
Questions related to Speech
  • asked a question related to Speech
Question
1 answer
How can I measure (or calculate) the below fluency?
Could anyone explain to me with an example of two people?
When I read the paper, it does not explain how I can calculate it exactly?
Thanks a lot
(1) Speed fluency
A. Articulation rate:
mean number of syllables per minute divided by mean amount of phonation time (excluding pauses)
B. Speech rate : mean number of syllables per minute divided by total time (including pauses)
(2) Breakdown Fluency
A. Mean length of pauses per 60 seconds
B. Mean number of pauses per 60 seconds (clause-internal versus clause-external)
(3) Repair fluency
A. Repair measures : mean number of partial or compete repetitions, hesitations, false starts and reformulations
B. Mean number of filled pauses (e.g., em and er)
(4) Dialogue only measure
A. Number of turns
Relevant answer
Answer
Dear Mee-Jee Kim,
You can find answers to your questions about speech and articulation rates in our study: https://dergipark.org.tr/en/download/article-file/1200953 When measuring these parameters, it is very important which time you will refer to as the pause time (e.g., pauses short than 250 Ms for articulation rate).
Good luck.
  • asked a question related to Speech
Question
4 answers
Hi everyone. I plan to investigate students' politeness based on Brown and Levinson's (1987) politeness theory. However, I haven't found yet, how to measure their level of politeness. Is there any grading criteria or coding scheme for this measurement in the related articles? Thank you
Relevant answer
Answer
Politeness can be heartfelt or merely pro forma, sincere or insincere, well-meant or ironical — lots of factors to be distinguished. This article might be helpful: https://files.eric.ed.gov/fulltext/EJ1126942.pdf
However, I think you will need to zero in on one specific type of politeness; otherwise your study will be compromised by multiple ambiguities.
  • asked a question related to Speech
Question
2 answers
I am interested in the study of features than can determine gender and age from short speech (from 1 to 9 seconds). The audios are from a public set (Mozilla common voice dataset), where the duration and the quality are variable.
Relevant answer
Answer
- F0 is recommended for gender
- harmonic to noise ratio, Jitter, Shimmer tend to correlate with age.
-For age, I would add "closed phase analysis", that is the ratio of the pitch period in which the vocal folds are actually closed, and correlates with harmonic to noise ratio. This may need an electroglottogram device to obtain reliable mesurements.
  • asked a question related to Speech
Question
4 answers
Dear colleagues! We ask you to participate in a series of surveys related to the scientific experiment of the Emoji language. In this experiment, I need to write a short statement about each Emoji sign that you see in the survey. We are interested in what association you have with a particular sign, in what context you use it (or can use it). The answers are arbitrary: you can express yourself in different parts of the speech, sentences (also interrogatives), exclamations, etc. Any expression format convenient for you. Your opinion is important for us! Thank you!
¡Estimados colegas! Les pedimos que os participáis a serie de encuestas relacionadas con el experimento científico del lenguaje Emoji. En este experimento, necesito escribir una breve declaración sobre cada signo de Emoji que veis en la encuesta. Nos interesa qué asociación tenéis con un signo en particular, en qué contexto lo usáis (o podéis usarlo). Las respuestas son arbitrarias: os podéis expresar en diferentes partes del discurso, oraciones (también interrogativas), exclamaciones, etc. Cualquier formato de expresión conveniente para vosotros. ¡Su opinión es importante para nosotros! ¡Gracias!
Шановні колеги! Просимо вас долучитися до одного із серії опитувань, пов'язаного з науковим експериментом мови Emoji. В означеному експерименті вам необхідно коротко висловитися про кожен знак Emoji, який ви бачите в опитуванні. Нас цікавить, яка асоціація виникає у вас із конкретним знаком, у якому контексті ви вживаєте його (або могли б уживати). Відповіді носять довільний характер: можуть бути виражені різними частинами мови, реченням (питальним також), вигуками тощо. Будь-який зручний для вас формат висловлювання, в якому ваші слова будуть важливі. А нам важлива ваша думка! Дякуємо!
Relevant answer
Answer
Happy World Emoji Day, every🧐ne!
  • asked a question related to Speech
Question
3 answers
I am interested how is defined the role of speech pathologist and special educator regarding learning disabilities (specific learning disabilities) in your country.
What kind of support is provided from speech pathologist and what support is provided from special educator?
What are differences between those two professions when having in mind children with LD.
Please answer how it is in your country. And if you can send me a website for undergraduate as well as graduate studies with all courses for getting competences for special educators
Relevant answer
Answer
يجب ان ان يكون المعلم المختص على قدر كبير ودراية وافرة بالحالات التي يقدم لها التعلمات وخاصة اذا كانت تعاني من صعبوات في التعلم ضف الى ذلك المشكلات المتعلقة بالنطق .
حيث يجب ان يكون له برنامج مكيف على حسب الحالات كما يجب ان يكون لهذه الفئة المستهدفة حجم ساعيمدروس وفق الوتيرة المدرسية المخصصة لهم والتي تتوافق مع مشكلاتهم المدرسية (صعوبات التعلم )ومشكلاتهم النطقية
  • asked a question related to Speech
Question
4 answers
Every new researcher faces a necessity in his/her scientific life to participate in conferences, workshops, training sessions etc. Still being in search of an ideal presentation, we do know that specific tips may differ according to theme/type of conference/presented data etc. The question is - are there any general tips on perfect presentation of research results? How it should be presented and what should be on slides? Feel free to share your thoughts and expertise. Thanks in advance.
Image: Swiss Digital Health.
Relevant answer
Answer
Don't forget to add an opening slide about the learning objectives /outcomes for the audience
Start from a story or scenario,paper or clinical presentation and then proceed.
Don't make the presentation crowded, only add bullets (write the crux in the bullet and explain the background)
Keep an eye on your audience(if they are losing the concentration,then add some humor /joke or question)
Use results in graphs and tables
Don't make too many color , a white background is always safe!
Have a hard copy of your presentation
Don't make it too long ,end it within your time : if no time limit ,then end in 30-40 min
And lastly ,know your gadgets used during for AV aid
Yes of course : dress up well
Good luck
  • asked a question related to Speech
Question
10 answers
Can you give examples of ostensible speech acts such as invitations, offers and compliments from your native language? Ostensible speech acts are often issued to convey other purposes than
those conveyed by the genuine ones.
Relevant answer
Answer
Sorry, Khaled, I am responding late... The concept of 'ostensible ritual' makes sense, assuming that many rituals are realised on the level of discourse, and often rituals trigger certain speech acts.
  • asked a question related to Speech
Question
2 answers
We are building an Arabic speech emotion dataset with 508 recorded persons, and every person recorded ten exact phrases divided into five emotions. The WAV files are noise-free and will be converted to MFCC and LDD features.
The validation process is in progress manually by a team of neuro-linguistics and psychologists. The dataset will be a public free access dataset.
What is the process of publishing this dataset?
What is the best journal to publish in?
Relevant answer
Answer
Ángel Carrión-Tavárez thank you for your reply.
  • asked a question related to Speech
Question
3 answers
Self-supervised learning: in which domain between NLP, Computer Vision and Speech, it is used the most ?
Relevant answer
Answer
Examples of self-supervised learning include future word prediction, mask word prediction in-painting, colorization, and super-resolution. Self-supervised learning is widely used in the field of NLP, i.e., Word2Vec, BERT, RoBERTa, ALBERT, etc. The CV uses contrastive learning or MAE methods to learn general representation.
  • asked a question related to Speech
Question
2 answers
I am interested in measuring stress in speech and language pathologists.
Can you give me information on how to access the stress inventory of a speech therapist and language pathologist?
Thanks in advance,
Emina
Relevant answer
Answer
Good Evening ...Useful document is attached
  • asked a question related to Speech
Question
4 answers
There is a concept in Soviet and Russian linguistics such as "speech culture" or "language culture" which means possession of the language norm of oral and written language, as well as "the ability to use expressive language means in different communication conditions". The same phrase denotes a linguistic discipline that is engaged in defining the boundaries of cultural (in the above sense) speech behavior, developing normative manuals, promoting the language norm and expressive language means. (wikipedia).
What is the American equivalent for this concept or is there similar discipline in English-speaking or other countries?
Also there is a similar concept but slightly different concept- "speech culture" which is defined as a set of knowledge, skills and abilities of oral and written speech used in a certain situation of communication and in compliance with the ethics of communication to achieve the desired effect in achieving the goals of communication. I need to find a synonym to this concept too.
Relevant answer
Answer
In the Arabic language, the term language culture refers to the sufficient linguistic skill and the limited amount of linguistic knowledge that enables the educated person to express in an easy, straightforward and sound language and to know the secrets of the language in expressing the subtle meanings affecting the recipient, such as artistic, aesthetic and creative expression. Moreover, its integrity from defects and its purification from the impurities that scratch the personality of the addressee, respecting it and not underestimating it, because it is the nation’s heritage and history, so there is no good in a nation whose intellectuals underestimate its civilization, heritage and history represented by language.
  • asked a question related to Speech
Question
5 answers
I need article on the topic ASL to speech conversion for my project. Kindly help me if you can.
Relevant answer
Answer
Basically i am working on a project that converts American sign language to text and speech. So i wanted help on this regarding some key points.
  • asked a question related to Speech
Question
3 answers
Using acoustical cues (duration/ F0/ intensity/ vowel quality)
Relevant answer
Answer
The assistance you may find depends on the subjects' L1 involved besides English. I believe it is helpful if you specify the L1 in the question.
  • asked a question related to Speech
Question
7 answers
Hello everyone,
I am looking for links of audio datasets that can be used in classification tasks in machine learning. Preferably the datasets have been exposed in scientific journals.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Relevant answer
Answer
In our work we used UrbanSound8K .
Here is our work with code provided. Do not forget to cite our work
  • asked a question related to Speech
Question
3 answers
I am modelling breath, speech and cough droplets in fluent, the volumetric fraction of the droplets in the breath/larger volume of exhaled fluid is less than 1%, is the Volume of fluid model appropriate?
Relevant answer
Answer
The Eulerian approach is usually not able to deal with particle-wall collisions of heavy boyant particles (usual Eulerian wall BC's are not accounting for particle collisions and reflections). Furthermore the volume averaged Eulerian method cannot deal with situations, where e.g. heavy particles from different origin cross one and the same finite volume under substantially different directions/pathes. In such situations the Eulerian method can only predict the mean averaged particle velocity in this finite volume, since the disperse phase velocity field is represented just by one value for each of the three velocity vector components for each mesh cell. This is not a serious limitation for a disperse bubbly flow. But it is almost not applicable for a disperse solid particle flow in a gas stream with heavy particle-wall interations. The mentioned droplet flow in this thread is somewhere in between of the both extreme situations.
Best regards,
Dr. Th. Frank.
  • asked a question related to Speech
Question
8 answers
Hi everyone. I'm doing a study about the relationship between voice pitch and fatigue. I decided to do the analysis in PRAAT, since it seems the most user-friendly for a newbie in voice research. However, I'm not quite sure what pitch range to use. I was thinking of a pitch range of 100-500 Hz for females and 75-300 Hz for males. Will this be sufficient or shall I set a lower pitch floor in case of a creaky voice?
And is it better to get the average pitch of full phrases or do I have to get the pitch per word? (I want to determine the average pitch of a file of about 15 phrases).
Relevant answer
Answer
Yes indeed.. I cite your paper in my own research! Thanks for this
  • asked a question related to Speech
Question
15 answers
Traditional methods could be defined as writing and even speaking. Do modern communicative means such as texting, tweeting, memes, email, social media, virtual reality and so on, create a technological stepping stone for society to become post-human or even trans-human? - where the technological augments the biological, to where even speech organs are no longer required. 
Relevant answer
Answer
In some current discourses around the normativity of digitalization and artificial intelligence, biotechnology and human engineering, positions of classical humanism have been depotentiated and/or deconstructed. People get into fundamental existential conflicts of goals. Sometimes the discourse oscillates between posthumanist and transhumanist dispositions.
Regardless of differences and nuances to be conceded, posthumanism represents the basic view that the human species has already reached the climax of its evolutionary genesis. The next stage of development lies in the power of disposal of an artificial, neurocomputational intelligence, which would be superior to Homo sapiens in numerous respects. Transhumanism, which wants to extend the limits of human possibilities, be they biological and cognitive, physical and psychological, by the use of (super-)intelligent, technological procedures, designs as future scenarios among others.
From a historical-systematic point of view it becomes clear: As soon as the socio-technical framework of society, communication, culture and community changes, this circumstance causes shifts in the normative premises, maxims and imperatives for the ultimately social togetherness. For what is significant for an ethics of a society that is networked or interconnected via media, data, and neurons is the integration of real, virtual, and/or artificial community forms and participation norms that are no longer to be fixed 'media-externally' or 'technology-externally', but rather 'media-internally' and 'technology-internally'.
In the final consequence, we or other (sic!) - we do not know today which 'entities' these could be - could inevitably be confronted with the question: Will the 'law of these new forms of society' - hence participation norms - be respected elementarily as ethically legitimizing? It is possible to speculate which analysis might be effective: Depending on theory implication and problem implication, the spectrum of answers ranges from a reformulated Aristotelian virtue ethics to a neopragmatic disposition to an ethics of the undecidable.
  • asked a question related to Speech
Question
7 answers
I am trying to build a model that can produce speech for any given text?
i could not find any speech cloning algo that can clone the voice based on speech only so I turned to TTS(Text-to-speech) models. I had the following doubts regarding data preparation?
As per LJSpeech dataset which has many 3-10 sec recordings we require around 20 hours of data. It will be very hard for me to build these many 10 sec recordings. What would be the impact if I make many 5 min recordings. One could be high resource req (but how much), are there any others.
Also is there some way through which I could convert these 5 min recordings as per LJSpeech format
Relevant answer
Answer
  • asked a question related to Speech
Question
3 answers
I'm learning the anti-feminist motives in the MP's for the 1-year time span. In this regard, I will analyze all the transcripts (for the sake of the validity of the data) of the speeches and will focus on the anti-feminist discourses. Still, I have suspicions in terms of the validity and reliability issues.
Relevant answer
Answer
I think it is reliable. To what extent the results of a discourse analysis then really meet the criteria of reliability and validity is difficult. From my point of view, they do not have to meet these criteria, because a discourse analysis wants something else. It shows power structures, repetitions, legitimations in processes.
There are no standardised quality criteria for qualitative research. However, three quality criteria seem to make sense: transparency, intersubjectivity and range.
Your discourse analysis should be transparent, which means that you document the work steps and present them in a comprehensible way. In a further step, you could reflect on and discuss the subjectively obtained data, thus fulfilling the criterion of intersubjectivity. The presentation of the limitations and scope of your discourse analysis would also be important.
  • asked a question related to Speech
Question
13 answers
Hi everybody,
I would like to do part of speech tagging in an unsupervised manner, what are the potential solutions?
Relevant answer
Answer
  • asked a question related to Speech
Question
3 answers
where to get EEG dataset of imagined speech?
Relevant answer
Answer
Maybe imagined speech can be measured with EMG (electromyograms).
Regards,
Joachim
  • asked a question related to Speech
Question
1 answer
I am interested to partake in researches targeted at aiding certain speech deficiencies. I have a passion in community work, and being a linguist, I will be willing to participate in works or projects that might require the assistance of an Applied phonologist who can assist speech impaired children. I am currently working on Tongue-tie, among children in Southern Nigeria. Keep me informed if there is any such research I can help out with.
Relevant answer
Answer
How wonderful to have you take an interest in applying your expertise to aid people with communication challenges! It would be great for you to pair with one or more speech-language pathologists. I suggest you contact the Speech Pathologists and Audiologist Association in Nigeria:
Of course, if there's any way you could gain your clinical credentials in speech-language pathology, that would be terrific, as there is so much need in your country (as you know, I am sure). There are great programs all over the world for you to consider, and many offer funding.
I note short courses being offered through SPAAN.
These are not courses to lead you to be an SLP but might help you see more connections for your work.
I wish you the best!
Brooke Hallowell
  • asked a question related to Speech
Question
8 answers
Hi everybody,
Given the different methods of speech feature extraction (IS09, IS10,...), which one do you suggest for EMODB and IEMOCAP datasets?
Relevant answer
Answer
The key to speech emotion recognition is the feature extraction process. The quality of the features directly influences the accuracy of classification results. If you are interested in typically feature extraction, the Mel-frequency Cepstrum coefficient (MFCC) is the most used representation of the spectral property of voice signals as well as you can try energy, pitch, formant frequency, Linear Prediction Cepstrum Coefficients (LPCC), and modulation spectral features (MSFs).
According to your suggested IS09 and IS10 which one is better so both are working good and there is no big difference but I recommend trying high-level (DL) features, it will be defiantly better than low-level.
  • asked a question related to Speech
Question
6 answers
I wanna to execute a research on' freedom of speech in social media in Bangladesh during covid-19.'
Relevant answer
Answer
A very good topic. Be prepare to embark with your research and to take up the challenges. I foresee that a qualitative research fit well with the topic. All the best!
  • asked a question related to Speech
Question
5 answers
Hello everyone,
I am looking for links of scientific journals with dataset repositories.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Relevant answer
Answer
Dear Cecilia-Irene Loeza-Mejía
I think you should have a look at the site «re3data: Registry of Research Data Repositories» (https://www.re3data.org).
There you will find the following search/browsing options: Browse by content type Browse by subject Browse by country
When you choose "Browse by content type", you will get "Raw data" or "Scientific and statistical data formats" (among others): https://www.re3data.org/browse/by-content-type/.
With best regards Anne-Katharina
  • asked a question related to Speech
Question
3 answers
Dataset for stuttering (speech disfluencies), is required because i am planning to do a masters thesis on it, to develop a model that can detect stuttering. Please suggest where can i get such datasets.
Relevant answer
  • asked a question related to Speech
Question
3 answers
Hello everyone,
I am looking for links of audio datasets of indigenous Mexican languages that can be used in classification tasks in machine learning.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Relevant answer
Answer
  • asked a question related to Speech
Question
2 answers
In speech based emotion recognition, for individual artists almost get 90 percentage of accuracy using neural network, but accuracy is decreased up to 70 percentage for entire speech corpus.
How can I increase accuracy?
My speech database contain 10 artists.
Relevant answer
Answer
Interesting topic.
  • asked a question related to Speech
Question
3 answers
Hi everyone,
I'm looking for Free speech recognition software for pathology.
Relevant answer
Answer
It's not free either, but in my opinion, Dragon is not too bad.
  • asked a question related to Speech
Question
5 answers
This question may help a lot of people to be able to decide on what to use, when, and why.
Also what I mean by the nature of data is beyond the type of the dataset itself ( Time series, images, speech etc ...)
I'll make sure to synthesize all the answers in one place.
Relevant answer
Answer
K-means clustering is a basic and widely used unsupervised machine learning technique.
Predicting future patterns in pricing, sales, and stock trading are the most typical applications of supervised learning. Linear Regression, Logistical Regression, Neural Networks, Decision Trees, Random Forest, Support Vector Machines (SVM), and Naive Bayes are examples of supervised algorithms.
Kind Regards
Qamar Ul Islam
  • asked a question related to Speech
Question
5 answers
Dear RG Community,
I am looking for options as I plan to analyse speeches. Is there any suggestions in terms of tools to analyse speeches (text data)? Thank you.
Relevant answer
Answer
يمكن استخدام فن الإشارة...وفن الرسم لتحليل بعض البيانات البسيطة ..وحسب الفئة العمرية
  • asked a question related to Speech
Question
6 answers
I want to collect a speech dataset of real world interview conversations between two persons. Which microphone / recording instrument is advisable for it so that it doesn't interfere with the process ? Please suggest
Relevant answer
what do you mean with it does not interfere with the process.
Eventually you can use a microphone and an MP3 digital video recorder/player.
The sound will be stored in digital files that cab be easy retrieved and sorted.
Best wishes
  • asked a question related to Speech
Question
3 answers
Any research paper or review that discuss pharmacological treatments for voice and speech disturbances that occur as a result of anxiety? For example stuttering, weak/trembling voice, etc.. as a result of anxiety? Any pharmacological that can address the vocal cords and breathing that can resolve this problem?
Relevant answer
Answer
Hi,
Here are few of the refeences:
Vasenina EE, Levin OS. Narushenie rechi i trevoga: mekhanizmy vzaimodeistviya i vozmozhnosti terapii [Speech disorders and anxiety: interaction mechanisms and therapy potential]. Zh Nevrol Psikhiatr Im S S Korsakova. 2020;120(4):136-144. Russian. doi: 10.17116/jnevro2020120041136
Lowe R, Menzies R, Onslow M, Packman A, O'Brian S. Speech and Anxiety Management With Persistent Stuttering: Current Status and Essential Research. J Speech Lang Hear Res. 2021 Jan 14;64(1):59-74. doi: 10.1044/2020_JSLHR-20-00144
Bergamaschi MM, Queiroz RH, Chagas MH, de Oliveira DC, De Martinis BS, Kapczinski F, Quevedo J, Roesler R, Schröder N, Nardi AE, Martín-Santos R, Hallak JE, Zuardi AW, Crippa JA. Cannabidiol reduces the anxiety induced by simulated public speaking in treatment-naïve social phobia patients. Neuropsychopharmacology. 2011 May;36(6):1219-26. doi: 10.1038/npp.2011.6
  • asked a question related to Speech
Question
2 answers
Hi everyone!
I'm looking for some articles regarding language assessment in awake craniotomy and the role of speech therapist during such procedures.
Let me know if anybody has something like that! Thank you!
Relevant answer
Answer
some of articles on deep brain stimulation ( electrical/ magnetic) may be similar to what you have requested. Many such articles on PD and other neurological/ neurophysiological aspects of speech are available on pubmed
  • asked a question related to Speech
Question
8 answers
I am about conducting a contrastive research study on the pragmatic competence of Yemeni EFL learners by comparing their responses in the speech act of gratitude to the ones made by English native speakers. My question is on the eligibility of using already-published-research data of the native speakers for my research as these native speakers are not reachable. If yes, can you suggest a criteria reference for this procedure?
Relevant answer
Answer
I would determine whether or not the participants consented to the use of their data beyond the study that they originally consented to. Sometimes this means contacting the collecting researcher.
  • asked a question related to Speech
Question
8 answers
Hi,
I am looking for easy-to-use software to create surveys that allow integrating speech data (recording and playing back).
In particular, I want to record a participant's responses first. Then, I want to play back to this participant his/her recorded responses so he/she can make judgements of his/her own speech.
Any suggestions?
Thank you!!!
Relevant answer
Answer
Dear Marta,
I found https://www.cognition.run/ very useful for these purposes. It uses jsPsych libraries and is straightforward. It is free, and you do not need to manage the database for collecting your responses.
If you decide to go with this one, you can contact me, and I'll be happy to give you more references and helpful resources.
Best,
  • asked a question related to Speech
Question
5 answers
I am working on the problem: how the sensory-motor experience of the child is reflected in his speech, specifically - in the questions that the child asks.
Are there diferens kinds of questions in the speech of modern young children (what? Where? Which? When? How much? Why?)?
Relevant answer
Answer
Thanks a lot for good information
  • asked a question related to Speech
Question
8 answers
Does extra embellishment in literature, art or speech indicate the absence of something more important or maybe controversial (which is omitted)?
Relevant answer
Answer
Yes, it is possible that, focusing on one thing may be absent from the rest of the other things that fall within the framework of building the overall picture or the general form.
  • asked a question related to Speech
Question
12 answers
Studies on information structure have traditionally been made, considering assertive speech acts, since Prague Linguistic Circle. Is there any systematic approach to information structure considering non assertive speech acts, like interrogatives, for example?
Relevant answer
Answer
How about the following?
Knud Lambrecht & Laura A. Michaelis. 1998. Sentence accent in information questions: Default and projection. Linguistics and Philosophy 21 (5):477-544.
Happy to discuss it!
  • asked a question related to Speech
Question
3 answers
Hi everyone,
I'm looking for some ideas for my thesis and I've already had a few, but I wouldn't mind some help to come up with something better. I'm interested in neurolinguistics and dysphonia. Could anyone recommend some useful articles and materials? Thank you!
Relevant answer
Answer
If you're looking to combine dysphonia and neurolinguistics into a single study, perhaps some study of expressive use/implementation of paralinguistic aspects of language (prosody) in people with dysphonia and/or how any modifications that they make are perceived/processed by listeners?
  • asked a question related to Speech
Question
6 answers
Are there pre-trained English Speech to text deep learning models available as open-source ?
Relevant answer
Answer
Check snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple (github.com).
  • asked a question related to Speech
Question
3 answers
I have been working in the field of the morphosyntax of child directed speech (CDS) and keep noting that relatively little work has been carried out over the last 30 years despite relatively little being known about the grammatical structure and overall reliability of CDS. Many questions were posed decades ago and remain unanswered. Does anyone have an idea for the reason that this field is not of greater interest to linguists?
Relevant answer
Answer
It is an interesting area of research in Linguistics. Though Noam Chomsky has pointed out an internal grammar in his TG Grammar, CDS is to be studied with much care. It is the way by which an adult speaks to the Child and the interesting fact is that the child enjoys and understands the same too. CDS is different from culture to culture, therefore it is necessary to know the patterns they use for CDS. I am from Kerala, India. Here the adults speaks to a child with a musical pattern of talking and even they modulate their voices and diction.
  • asked a question related to Speech
Question
3 answers
Several reports exists relating to the frequency fundamental of male and female speech. Though they not all agree, there is a clear trend that the fundamental frequency of men's voices is lower than females. One example: "The voiced speech of a typical adult male will have a fundamental frequency from 85 to 155 Hz, and that of a typical adult female from 165 to 255 Hz."[1]
QUESTION: Is it meaningful to study speech below these frequencies and why?
I am studying speech directivity and for some reason in the literature the male and female voice seems to repeatedly compared at 125 Hz, near the male fundamental. This seems nonsensical to me but maybe there is a good reason for this? I have recorded a fair bit of female speech and I see very little sound energy in this frequency band.
[1] Baken, R. J. (2000). Clinical Measurement of Speech and Voice, 2nd Edition. London: Taylor and Francis Ltd. (pp. 177), ISBN 1-5659-3869-0. That in turn cites Fitch, J.L. and Holbrook, A. (1970). Modal Fundamental Frequency of Young Adults in Archives of Otolaryngology, 92, 379-382, Table 2 (p. 381).
Relevant answer
I find your research interesting. Now bear in mind that one of the objectives of conducting researches is to prove or support existing theories as well as yield results which are dissimilar to the available published findings and resources. I believe you would want to choose either of these two options.
  • asked a question related to Speech
Question
3 answers
In order to create short utterances from audio recordings, I found two solutions:
  1. identification of speech areas using the acoustic properties of the speech signal such as energy
  2. classification areas (speech/non-speech) using statistical approaches like NNs.
However, I am struggling to find efficient solution because recordings may contain noise or music. Moreover, I have neither a trained model nor an annotated corpus to build a new model.
Any advice would be greatly appreciated.
Thanks in advance.
Relevant answer
Answer
Dear Yaakov,
maybe You could try Audacity od Praat - these are programs You can easily download on Your computer.
Hope it was helpful.
  • asked a question related to Speech
Question
3 answers
I need suggestions for research topics that involve both speech and image, that use/need expertise in both speech/audio and image.
Relevant answer
Answer
Dear Rizwan,
An interesting topic is detection of truth or falsehood in face-to-face interviews.
BTW, we published two papers about detection of truth or falsehood in textual stories:
(1) HaCohen-Kerner, Y., Dilmon, R., Friedlich, S., & Cohen, D. N. (2016). Classifying true and false Hebrew stories using word N-Grams. Cybernetics and Systems, 47(8), 629-649.
(2) HaCohen-Kerner, Y., Dilmon, R., Friedlich, S., & Cohen, D. N. (2015, October). Distinguishing between True and False Stories using various Linguistic Features. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters (pp. 176-186).‏‏
Best regards,
Yaakov
  • asked a question related to Speech
Question
6 answers
My background is in Speech/Audio enhancement and separation. I want some suggestions for using my background in Natural Language Processing topics.
Relevant answer
Spontaneously, I have not a concrete suggestion, but I am curious where you are taking the Náhuatl-recordings from? I am now working with Sabina de la Cruz, an L1-Nahuatl speaker, who has a 10-month-old daughter. She is currently trying to establish Nahuatl as the language of use in her family so that her little daughter also acquires Nahuatl as her L1.
She has been recording the babble production of her child and I am in charge of analyzing the data.
  • asked a question related to Speech
Question
4 answers
I'm working on a speaker recognition challenge.
I have already trained my model on the voxceleb2 dataset in triplet setup. Now, for the challenge, I have two sets.
enrollment (1 audio/subject) [IDs given]
test (random number of audios without IDs)
I need to report EER on the test.
Is it okay if I train my model on enrollment data too or it will be considered leakage/cheating while reporting EER?
Let me elaborate, In speaker verification/recognition, we have enrollment and trials data. In the experiment, let's say, I have 5 known subjects/speakers. I first record their speech and label them. This is my enrollment data.
Enrollment:
speaker 1 -> audio_1
speaker 2 -> audio_2
speaker 3 -> audio_3
speaker 4 -> audio_4
speaker 5 -> audio_5
Now, I take some random speech data from other sources + more audio data from the 5 speakers, this is my test data.
Test:
speaker 1 -> audio_11
speaker 2 -> audio_21
speaker 3 -> audio_31
speaker 4 -> audio_41
speaker 5 -> audio_51
random -> random_1
random -> random_2
Now, I will generate the trials from the test.
speaker 1, audio_11
speaker 3, audio_11
speaker 2, audio_21
speaker 4, random_1
speaker 1, random_2
I need to predict from the trials if audio_11 belongs to speaker 1 or not, audio_11 belongs to speaker 3 or not, audio_21 belongs to speaker 2 or not, etc. (based on audio similarity).
In my case, I'm segmenting the enrollment audio and training my model on them before making the predictions on the trial/test data.
Relevant answer
Answer
  • asked a question related to Speech
Question
1 answer
I have found " Age and gender speech corpus - aGender" speech dataset but couldn't find it as publicly available. is there any other publicly available speech datasets for age and gender classification for research purpose?
Relevant answer
Answer
  • asked a question related to Speech
Question
5 answers
I'm designing a study that assesses L2 learners’ ability to pay attention to contextual cues when performing a communicative function (e.g., apologizing, thanking). I’d like to use a short video clip to see what they fixate their eyes when they produce a brief speech. Does eye trackers work on a video input? Also, is there any rule in terms of the length of input? I’m thinking about a brief background scene (5-10 sec.) to contextualize the situation and then leading to the critical scene where the picture is paused and participants need to produce the communicative function directed to the person in the video. The focus of interest is the critical scene, but I’m wondering how long is the sufficient background before the critical scene.
Relevant answer
Answer
I think you can use Eye Pen 3, that software allow you make that design:
  • asked a question related to Speech
Question
4 answers
L2, Grice's Maxims, Speech Acts, or Spoken language are approximating my area of interest.
Relevant answer
Answer
Check my paper here on creative writing assisted by corpora.
  • asked a question related to Speech
Question
3 answers
Considering tokenization, lemmatization,removing special characteristics,tagging parts of speech and finally word cloud for big data?? Is there any one who can suggest me some tutorial or document on it???please help me....if it possible.
Relevant answer
Answer
  • Step: 1 Installation instructions. The text we are about to handle is “ Introduction to Machine Learning” and the string is stored in the variable doc.
  • Step: 2 Filtering tokens. Calculate the frequency of each token using the “ Counter” function, store it in freq_word and to view top 5 frequent words, most_common method can be ...
  • Step: 3 Normalization. This is the major part where each sentence is weighed based on the frequency of the token present in each sentence.
  • Step: 4 Weighing sentences. {Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task.: 4.125,
  • Step: 5 Summarizing the string. And the nlargest function returns a list containing the top 3 sentences which are stored as summarized_sentences.
Text summarization using spaCy. The article explains, What is spa…
  • asked a question related to Speech
Question
2 answers
Speech Based Silence ejection algorithm is tested for results in MATLAB. To implement it as a Hardware, what type of recent Micro - controller can be used ?
Relevant answer
Answer
Ask the following questions - how much memory does your code need, how much data memory is needed, what is the required CPU speed, do you need any special interfaces?
  • asked a question related to Speech
Question
7 answers
Dear all,
I'm doing some study on speech acts, and I'd like your help figuring out how to say congratulations in many ways. I'd appreciate it if you could inform me if this is the case. Please state your country as well as the reason for your contratulations.
Regards,
Relevant answer
Answer
Algeria: Graduation (مبارك عليكم)
  • asked a question related to Speech
Question
5 answers
I wrote a code to obtain a glottal wave corresponding to a speech signal in MATLAB and I calculated the derivative of the glottal waveform. I tried to plot both waveforms separately using subplot, but it's only giving one waveform at the output and blank space for the other subplot. What can be the possible error in my code?
Relevant answer
Answer
You need to define the variable "frame" you have used in the second line, that is why the blank subplot is being displayed
  • asked a question related to Speech
Question
2 answers
I am interested how is defined the role of speech pathologist and special educator regarding learning disabilities (specific learning disabilities) in your country.
What kind of support is provided from speech pathologist and what support is provided from special educator?
What are differences between those two professions when having in mind children with LD.
Please answer how it is in your country. And if you can send me a website for undergraduate as well as graduate studies with all courses for getting competences for special educators.
Thank you!!
Relevant answer
Answer
Academic skills training
  • asked a question related to Speech
Question
6 answers
I trained a 3-layer LSTM network to extract d-vector embedding using keras. I extracted MFCC features from TIMIT dataset as input to the model, and defined a custom loss function (i.e. GE2E loss).
After less than 213444 batches I got zero loss (on train and dev sets), however, when I use the model to predict d-vecotrs (even using input form training set) I keep having nearly the same output (i.e. The cosine similarity between any output vectors is 0.99999xxx).
I double checked the code and the loss function implementation, it seems to be correct.
Any idea what might cause such problem?
Relevant answer
Answer
Hey,
maybe it doesn't matter, but I got the same issue. After some investigation I found two solutions that works very good in my case:
1: Increase the the sequence a little bit.
2: Adding an MDN Layer
Best regards
  • asked a question related to Speech
Question
5 answers
Hello everyone,
I am working on a project which includes object recognition, currency recognition, text to speech, and location of the user. In order to perform the currency recognition of Pakistan's currency, I'd need a lot of data and computing resources to train a model, but unfortunately, I don't have access to either of these.
So, I just wanted to know whether there is an open-source pre-trained model that I could use for my project?
Any help would be appreciated.
Thank you.
Relevant answer
Answer
  • asked a question related to Speech
Question
3 answers
What is the relationship of articulation with the theory of speech acts?
Relevant answer
Answer
I don't see any obvious relationship. However, I am currently reading an article in Language in Society that links use of "do you want..?" vs "wanna..?" to the speaker's perception of the listener's willingness to accept the proposition. So there could well be a relationship that has not been systematically explored.
  • asked a question related to Speech
Question
4 answers
Do diaphragmatic breathing and speech therapy actually help? How often did you require multidisciplinary team approach?
Relevant answer
Answer
Rumination syndrome is a condition in which people repeatedly and unintentionally regurgitate undigested or partially digested food. It affects mainly infants, young children, and people with cognitive disabilities/depression.
Clinically, we need to rule out gastroparesis/bulimia in adults patient that vomit after taking food. I would say it's rare in adults...
  • asked a question related to Speech
Question
2 answers
I do my research and have a big problem to collect speech sample of cleft palate child.
thank you for your help.
Rujira
Relevant answer
Answer
Thank you !
  • asked a question related to Speech
Question
7 answers
In positioning theory, "speech act" is a central concept. Speech acts are something that is performed by interlocutors in an interaction. But what then is "speech" itself, i.e. without the addition of "act"? What is speech when it is not an action/a performance? And how would one analyze "speech" if not by analyzing acts/actions/performances?
Relevant answer
Answer
It depends on the theory you use to define "speech". Commonly, speech is the process while discourse is the result, and a speech act, verbal or no verbal is the obtention on an effect
  • asked a question related to Speech
Question
8 answers
For a few years now, I have dedicated a part of my research and publications to the problem of hate speech on social networks. I am currently outlining a new article on that. There is something that makes me worried. It seems obvious that there are easily labelable and traceable hate speeches (especially in cyberspace with the help of automatic word processing software). Many published works provide today relevant data that allows detecting hate speech and identifying potentially criminal users masked under pseudo anonymity.
My concern is whether there would be another way to better approach hate speech without departing from purely scientific objectives, without contributing to the materialization of a kind of Linguistic Court willing to rule on the acceptability of expressions, perhaps in order to cleaning, fixing and giving “splendour” to digital language, improving, incidentally, the public image of social platforms and regulating the coexistence in cyberspace in such a way that only “good people” could participate ?
The truth is that, in addition to those speeches that explicitly express hatred, there is an untraceable, ungrammaticalizable hatred that resists both Logic and Empiricism. The question I ask you is if there is someone else who, like me, fears the risks of such practices of identifying "bad speeches" that could be used to purify the language, limiting freedom of expression? (sorry for my English, which is a language I use very rarely)
Relevant answer
Answer
  • asked a question related to Speech
Question
5 answers
I have just seen two media treatments at the Nobel speech of O. Tokarczuk (Nobel Prize in Literature 2018):
1. Reprinted speech with underlined fragments
2. Article entitled: three words to remember from speech: "I am", "myth" and "tenderness"
I consider such media practices as an example of generating modern intellectual slavery because: (1) the media suggest learning selected content by heart instead of initiating a discussion of these fragments, (2) the media skip other fragments and the context of all speech (3) the media put themselves in the position of hierarchs of ready-made unquestionable values ​​and the auditorium in the position of someone ... (4) the media does not give the auditorium any benchmark or tool for independent analysis.
I think the role of an intellectual or leader is to give the auditorium (auditoriums) tools for independent analysis/use.
For example, if we use the metaphor 'art is a tool' (Ernst et al 2016) for analysis of the speech, we will obtain our independent result.
Consequently, the scientists' task is not only to do empirical research and suggest theorie, but also to identify and suggest methodological metaphors to immunize/empower the auditoriums against the above-mentioned practices.
 What do you think about this?
Literature:
Ernst D, Esche Ch, Erbslöh U (2016) The art museum as lab to re-calibrate values ​​towards sustainable development. Journal of Cleaner Production 135: 1446-1460
Relevant answer
Answer
I have just submitted a paper on the sci-art cooperation. Here is the abstract. What do you think? From reformers to living labs: radical changes (transformations) of practices in art and science. Abstract Living labs is a formula of cooperation between scientists and artists that combines new methods of project implementation, new research methods (action research and design-based research). The results of these projects represent radical changes in current practices. I begin by presenting what a radical change in art and science is all about. Next, I present interdisciplinary stakeholder workshops as a working method of living labs and social phenomenology as an interpretative and methodological basis for researching artistic practices in the humanities. In the last part, I present what and how scientists of humanities and artists can study in the living lab formula. The originality / value of the research presented in the article lies in the indication of how artists and scientists can implement joint Sci-Art projects, based on the above-mentioned methods and the model of transformation of artistic practices. Keywords: radical change (transformation), social practices, living lab, humanities, art, social phenomenology
  • asked a question related to Speech
Question
15 answers
It´s a question of the important role of speech therapists. Many of those skilled ones are nowadays missing in many schools in my region.
- What about that in your different areas?
Relevant answer
Answer
Speech error data have been used to show the reality in phonetic pictures. The phoneme/p/ differs from /b/ only on voicing. The vocal folds vibrate to produce/b/ while no vibration features on /p/ such as voicing maybe miss assign in rapid speech causing errors.
  • asked a question related to Speech
Question
6 answers
I m very much confused about without following any paper, how can we decide the molarity of solution that we need to take to make nanoparticles, as by following a particular we can decide some up and down but not by reding them , is there any way to decide molarity for solutions to synthesize nanoparticles?
Relevant answer
Answer
@Athchaya Sundararajan molar concentration is too high for green synthesis. Yes, it will give you Nanoparticles, but agglomeration will be a major issue in my opinion. Plus, it will greatly increase the cost of the process and wastage of precursor salt.
  • asked a question related to Speech
Question
8 answers
What are the possible applications of speech samples in the detection of Corona Virus?
Relevant answer
Answer
Agreed with dear Di Michl
  • asked a question related to Speech
Question
5 answers
Since the decline of audiolingualism, there was bias towards speech in language teaching for communicative ends. Though, the stress on speaking rather than writing produces fluent but inaccurate learners (Hughes, 1983).
Relevant answer
Answer
Learning to speak comes first, in all evolutionary terms for the human species.
Writing (numbers included) is there and emerged to solidify our language foundation and communication.
It has factually a mental function to be precise, exact and clear=understandable over a distance. However, as we know from language archeology, it is not an easy task to re-translate a written text into spoken language, if the line of tradition got somehow lost, e.g. we can read Plato, but have no authentic phonetics. This is especially critical with sacred texts, where authorities claim to be orthodox.
  • asked a question related to Speech
Question
2 answers
I'm doing a research in linguistics. the study is on synthetic speech rate, the participants (30) listened to the speech rate at 4 levels (sentences in normal rate, then accelerated at 10%, then 20%,then 30%) each level had 5 recordings which the participants had to evaluate it's speed at 7 point Likert scale.
the aim of the study is to find out if individuals perceive the normal speech rate as slow or not? also what speed rate would they perceive as the "most normal"?
I figured it's a repeated measures since the same participant will listen to all 4 speeds "levels?" of the stimuli. but I couldn't figure out how to run it or how to organize the data in SPSS
please find attached the SPSS file
Relevant answer
Answer
For repeated measures, the HLM is preferred with between-person variables at level 2 and within-person variables at level 1. Hope it would be helpful.
Best wishes
Zhang Zhenduo
  • asked a question related to Speech
Question
5 answers
Or not.
Harry Jerison in his 1991 book Brain Size and the Evolution of Mind, at p. 89 has:
Mind is a necessary brain adaptation that organizes otherwise unmanageable amounts of neural information into a representation of the external world.
Is Jerison right?
Relevant answer
Answer
Neurons are not the best level of abstraction when speaking of mind. You don't talk about myocytes when you discuss about soccer. Concepts, symbols, and the various types of interactions between them, are more appropriate building blocks of mind.
Regards,
Joachim
  • asked a question related to Speech
Question
2 answers
I am planning to conduct a study where the verbal responses of the participants will be recorded while sitting and also while standing. I want to use software that can precisely record the responses (i.e. converting speech to text) along with the reaction time of the responses or it can at least give accurate timestamps of the responses. Please suggest any reliable software or platform that can be used for this purpose.
Relevant answer
Answer
If you intend to use a paid software, then E-Prime would be helpful when used with Chronos and microphone.
E-Prime can record voice after each stimuli and keep it as separate recordings and it can get response times from the start of vocal responses.
You can use AudioInRecord Task Event, but have a look at their webinar on this topic for more information.
I don't think it can automatically convert the recording to text though.
  • asked a question related to Speech
Question
8 answers
Olá! Se você trabalha em organização pública, privada, terceiro setor ou empresa de economia mista, então você pode participar da minha pesquisa do doutorado para construímos juntos a definição das competências necessárias para que o setor público gere inovações que proporcionem melhoria na prestação dos serviços (https://forms.gle/6Q8iYHN1X2VDwLko6). Desta forma, você poderá refletir a respeito de quais delas estão disponíveis nas organizações públicas ou precisam ser desenvolvidas para gerar serviços que agreguem valor à sociedade! Ah, peço a gentileza de compartilhar esta mensagem com seus contatos para usarmos nossas redes na construção colaborativa da definição das competências para inovação, além de compartilhar com mais pessoas a oportunidade de conhecerem as competências e poderem refletir também sobre necessidades de desenvolvimento das competências! Conto contigo! Lana Montezano (Doutoranda em Administração – PPGA/UnB; Pesquisadora do Laboratório de Inovação e Estratégia em governo - UnB).
Relevant answer
Answer
Hello dear
I think the successful organizations that strive to ensure their survival and continuity are strong and influential should not stop at the threshold of economic efficiency only, but innovation and renewal become the hallmarks of their products and their performance.
good job
  • asked a question related to Speech
Question
14 answers
I am pursuing my masters in data science. I need a good research papers on speech recognition that i can refer for my research to create neural networks - recurrent neural network for speech to text etc.
Relevant answer
Answer
more practical machine learning course https://www.fast.ai/
  • asked a question related to Speech
Question
20 answers
The availability of natural resources and raw materials is essential for the development of society, which increasingly demands new products and solutions from companies and industries. Not long ago, however, the realization that resources are not inexhaustible and that their scarcity may become a reality were not major social concerns, nor even the importance of biodiversity for the maintenance of human life on the planet.
For some years now, however, socio-environmental issues have been gaining more and more space in social, academic, political and governmental areas. People have developed an awareness of the importance of environmental preservation, especially to ensure development for the next generations.
This speech was mainly disseminated by the UN, which adopted the term sustainable development. The expression started to be used after the First United Nations Conference on Environment and Development, which took place in 1972, in Stockholm, Sweden.
Relevant answer
This is difficult to implement. This is the brochure. True, it is in Russian.
  • asked a question related to Speech
Question
4 answers
•To draw attention to the role of vocabulary in foreign/second language learning (F/SL).
•It is evident that vocabulary is the sine qua non for language learners while both constructing and producing a text. In reading in a second language, words serve as life buoys in the construction of the message in the text for SL readers. They are the building blocks.
•Multi-word units like phrasal words, idioms, fixed phrases, and proverbs in English prove difficult for any language learner. In a text, idioms are complementary aspects of natural speech.
Relevant answer
Answer
A very interesting research topic. To me it should. Language comprehension is based on making predictions as to what could be said next and using words to select from options. Look at my project and opening chapter "How does language work?" including the chapter "The brain for the linguist." So having larger vocabulary makes predictions more specific and also provides more items to use for making predictions. So the larger the vocabulary, the more likely correct guesses of set phrases and idioms, the more likely "picking up" and learning their meanings.
  • asked a question related to Speech
Question
3 answers
I need Urgent basis, How to Implement Arabic Speech to Urdu Text using speech recognition Matlab tool, if i use neural network is it possible to design according to my requirements.
Speech to Text (Using Matlab)
Arabic Quranic Speech to Urdu Text if i used audio data sets and recorded speech and i want to convert it into a Urdu text using Matlab.
Answer please !
Thanks
Relevant answer
Answer
  • asked a question related to Speech
Question
10 answers
Superar los retos que implica abordar la investigación por parte de instituciones educativas, las a través de la Gestión del Conocimiento, como plantean Passaillaigue y Estrada (2016), permite la atención e integración de las nuevas necesidades del entorno global mediante la gestión articulada de las funciones misionales y de las específicas en materia de investigación e innovación. Igualmente favorece la conducción efectiva del conocimiento y su capacidad innovadora, para fortalecer la calidad educativa de la oferta educativa institucional. Ello conlleva la conveniencia de desarrollar un modelo de Gestión del Conocimiento,
Como exponen Trejos y Ayala (2018), la función de investigación se ha convertido en el baluarte de la transformación sociocultural, a través del descubrimiento y la producción de conocimiento. En tal sentido, la función investigativa permea el quehacer formativo intelectual y su “integración, apropiación y participación en lo político, económico, social y cultural” (p.31), convirtiéndose en un componente relevante por la generación de conocimiento y vinculación de las IES con la sociedad en su conjunto.
Relevant answer
Answer
En este artículo se presenta la forma de establecer un sistema de trabajo para el logro de los resultados científicos que debe caracterizar a una universidad.
Espero le sea de utilidad.
  • asked a question related to Speech
Question
1 answer
I would like to detect my participants' emotional state and categorize how engaging their speeches are under different metrics. I wondered if any programs categorize the "emotion" and speech quality of speakers, so we can focus on higher-level analyzes.
Relevant answer
Answer
For emotion detection I ended up finding this link:
and this survey:
Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., & Wegrzyn-Wolska, K. (2019). Survey on AI-Based Multimodal Methods for Emotion Detection.
However, I still haven't found anything to detect how engaging the speech is.