Science topic
Audio Analysis - Science topic
Explore the latest questions and answers in Audio Analysis, and find Audio Analysis experts.
Questions related to Audio Analysis
Transana is a software for qualitative research that enables researchers to work on authentic data that is recorded or videotaped. If anyone has employed this tool, I would be interested in learning more about their experience with the tool and its effectiveness in managing and analyzing complex qualitative data in their studies. Furthermore, the software is with different versions, so I am confused about which one to use.
I would like to get them to make a vocal sound related to a texture. This would have them use their voice to answer the question and I would collect the audio recording to use as research in my paper.
Thank you,
Colm
Hello everyone,
I am looking for links of Mexican datasets that can be used in classification tasks in machine learning. Preferably the datasets have been exposed in scientific journals.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
I have a project in which I have given a dataset (more than enough) of 10-20seconds audio files (singing these "swar" / "ragas": "sa re ga ma pa" ) without any labels, nothing well in data ... and I have to create a deep learning model which will recognise what speech it is and for how long it is present in the audio clip (time range of particular "Swar" sa ,re ,ga, ma )
The answesr to questions that I am looking for are
1. how I can achieve my goal , should I use RNN , CNN ,LSTM or hidden Markov model or something else like unsupervised learning for speech recognition ?
2. How to get correct speech tone for Indian language as most acoustic speech recognition models are tuned for English ?
3. How to find the time range ?for what range particular sound with particular "swar" is present in music clip ? how to add that time range recognition with speech recognition model ?
4. are there any existing music recognition models which resembles my research topic ? ,if yes please tag them .
I am looking for full guide for this project as it's completely new and people who are interested to work with me /guide me are also welcome .
Hello everyone,
I am looking for links of audio datasets of indigenous Mexican languages that can be used in classification tasks in machine learning.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Hello everyone,
I am looking for links of audio datasets that can be used in classification tasks in machine learning. Preferably the datasets have been exposed in scientific journals.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
This Project continues as an examination of ALL-PASS Band-Pass circuits.
Project Paper : updated Feb 01, 2022
"Analog Phase-Filtering
in Active-Band-Pass Circuits"
emphasizing the use of All-Pass filters.
- - - Here, we continue our earlier "AFX" Project, which was presented in RGN at :
.......
Introduction for the "AFC" project :
...
We examine the "ALL-PASS-FILTER" and develop an Analog Narrow-Band-Pass Audio Filter, which has immediate application in receiving Morse Code signals in a Amateur Radio Station.
...
Our resulting model is an experiment to gather this data.
A Proper Analysis of this design may aid in understanding the nature of All-Pass Filtering. Once an adequate system equation is achieved, then resulting models may be useful in designing Band-Pass Filters for Audio applications which can be based on Non-Resonant Phase-Filtered circuits, similar to our "AFX" design.
...
Theory:
All-Pass (phase-shifting) filters have frequency responses which must be " zero at w=0 and at w=pi ". From the research, This means that AllPass Filters cannot be used for (1) Low-Pass nor (2) High-Pass nor (3) Band-Pass designs.
This is because the resulting combination of waveforms are homogeneous ;
ie, the combinations are always simple phase shifts,
producing no frequency & amplitude changes.
...
*** The authors have developed working Dual-Notch Band-Pass circuits
which (1) perform a BAND-PASS function which is f(0) peaked at 700Hz.
(2) generates DUAL-NOTCHES around f(0) at plus/minus aprox. 200 Hz . The current All-Pass project is titled : "AFC"
...
*** First Experimental Target :
(1) Utilize All-Pass stages to replace resonance tuned Active-BandPass stages.
(2) Reduce Number of MFB active filter stages required to Align Signal Phases
(a) in order to support Dual-Notch Generation around f(0) ; (b) in support of our previous project "AFX" "AFV-3RL-v4F-D-vQ-Man".
...
Continued Project now uses the Schematic in the groups:
AFC_1R-1A-12A-2F-Sum-S-451 and AFC-3R-2F-8A-Dif-S-451 .
The Bode plot and Magnitude plot are in the pre-paper.
...
The Problem to be resolved is why this design (1) using one All-Pass Lo-Pass paralleled with twelve All-Pass Hi-Pass Filters (2) will produce an Wave-Form Output in the Bode plot. ...The Problem to be resolved is " Why Do One APF Lo-Pass paralleled with Twelve APF Hi-Pass interact in an unfamiliar manner.
...
This "AFC" project is derived from our previous "AFX" project
...
Our long series of projects in Analog Narrow Band-Pass Filters has been presented on our website at : http://www.geocities.ws/glene77is/
...
2021 Oct 12
...This Project continues as an examination of ALL-PASS Band-Pass circuits. ...This "AFC" project is derived from our previous "AFX" project https://www.researchgate.net/post/Are-there-any-Analog-Active-Audio-Filters-that-match-any-Digital-Signal-Processing-filters.
...
Latest upload: 2021 Oct26
We have a paper attached : "AFC_All-Pass_Phase-Filter_Paper.pdf"
...
Latest upload: 2021 Nov 29
"AFC_All-Pass_Phase-Filter_Proj-211129-0502"
...





+1
Hi everyone,
I and my teammates want to find out if there is a way to do (remote) scientific collaboration in the field of Machine Learning/Deep Learning about speech recognition and audio analysis. The goal is only to learn and to become a member in our project.
Thanks in advance.
I am trying to build a voice cloning model. Is there some scripted text I should use for the purpose or speak anything randomly?
What should be the length of the audio and any model suggestions that are fast or accurate?
The work of George and Shamir describes a method to use spectrogram as an image for classifying audio records. The method described is interesting, but the results seemed to me a little adjusted to the chronology and not to the spectrogram properties at itself. The spectrogram gives a limited information about the audio signal, but it is enough to do a classification method?
Hi guys,
is there any option in AVISOFT SASLab Pro software which enables you to eliminate unwanted noise from digital recording without effecting your original sound? In my case, sounds are recorded in the experimental tanks with a hydrophone connected to the digital audio recorder. The lab is full of low-frequency noise, which in some proportions, disrupts my sound of interest. If I high-pass filter recording, there is still noise which is not eliminated and it is overlapping with the sound frequency spectra.
Any advise would be helpful.
[Tell us about the issues that you had while developing impedance tubes]
[The issue that I had has been solved, but I didn't manage to fully understand why, since I change my measurement system and my data analysis script also.
[Nevertheless, I would like you to tell us about the issues that arose while developing an impedance tube since it could provide reference information for other researchers]
I developed an impedance tube to measure the sound absorption coefficient with ISO 10534-2 method. While processing reflection coefficient and absorption coefficient from transfer function data or audio files obtained using (Arta or audio recordings of white noise, sine sweep or MLS inside the tube) (transfer function is in dB), I obtained negative reflection coefficients or data out of common bounds (see images) Some ideas on possible sources of error or necessary preprocessing of the signals or the transfer function?
Complex Reflection Coefficient R = [ {H-e(-j*k*s)} / {e(j*k*s)-H} ] * e(2*j*k(L+s))
absorption coefficient alpha = 1-|R|^2
[The issue that I had has been solved, but I didn't manage to fully understand why, since I change my measurement system and my data analysis script also.
[Nevertheless, I would like you to tell us about the issues that arose while developing an impedance tube since it could provide reference information for other researchers]

It is well known that audio compression (e.g., MP3, AAC) usually processes the audio data frame-by-frame. However, I am curious about the feasibility of single frame based processing.
A commonly accepted notion is that frame based processing has time resolution of audio data while a single frame processing does not have. This is similar to comparing DFT and STFT.
However, why we need time resolution of audio signal during compression? For a given audio clip, its single frame FFT has super frequency resolution (huge points) and no time resolution. However, we can still calculate tonal and non-tonal elements, masking curves, and generate quantization index, etc. In this way, the modifications of any frequency bins will be reflected throughout time domain whenever this frequency appears along the time axis in the compressed time domain audio samples.
I personally do not see any potential problems of performing single frame compression as described above. The only problem I can imagine is in terms of hardware implementation for huge DCT points. But the computational complexity of FFT is O(nlogn) which approaches a linear function of n when n is large. Hence I do not see this as a big problem with the consideration of rapid developed computer capabilities.
Please help to point out my mistakes in the above statements.
Hello everyone I'm working in audio analysis for emotion classification. I'm using parselmouth (a PRAAT integration in python) to get feature. I'm not well versed in audio analysis, starter. After reading many papers and forums. I see mfcc are used for this, I've also discovered some features they're (jitter, shimmer, hnr, f0, zero_crossing) are they used for this work?
What I've to do with audio files before extracting mfcc and these features?
After getting these features I've to predict emotion using machine learning.
It'll involve:
- The algorithm must be able to make predictions in real time or near
- Taking into account the sex and the neutral voice of each person (for example, by reducing and centering the variables of the model to consider only their variations with respect to the mean - average which will thus change value as and when the sequential analysis since it will be first calculated between 0 and 1 second, then 0 and 2 seconds, etc.)
Any help and suggestion for best practice are welcome.
Thanks
I would like to outsource the transcription of interviews (around 20-30h of audio recording/IDI). Do you have any experience with Polish companies in that field? Can you recommend any?
I would be really grateful :)
I came across a few links that looked promising, but they are no longer active.
What type of models, analytics, data science is used by call center companies where the large number of calls are made everyday.
Does it involves audio analytics or speech to text conversion and then analyze? which approach is better, any pros and cons for it?
Any suggestion, discussion or reply is appreciated. Thanks in advance!
I have two frequency spectrums as shown in the attached picture. The shape of the peaks are similar but they have only slightly shifted in frequency. I want to match the frequencies with similar peaks.
I tried the DP matching algorithm and backtrace the least cost path to find the frequencies that are most similar. I have attached that image too. I was intending to insert/delete/replace the amplitudes of these matching frequencies so that a score can be calculated between these two spectrums such that it is not influenced by these peaks. But seeing the least cost path output, there are some one-to-many mapping between the features (especially from reference to test pattern) which I am unable to understand how to interpret.
Is it possible to extract features that have similar peaks such as in the first picture ? If DP matching can do it, how do I apply the method ?
Thank you.


I want to compare “neutral” baseline data with data recorded in a test session to finally be able to evaluate arousal/affect of the infant.
Which software would you recommend? Do you have any literature advice?
Any advice would be appreciated!
All the best
Sam
Hi,
Does anyone have experience in synchronizing audio and videorecording using a single DAQ device?
In my setup I’m using two devices, one for videorecording (pointgrey camera fl3 u3 13s2m cs, https://www.ptgrey.com/flea3-13-mp-color-usb3-vision-sony-imx035-camera) and one for audiorecording of rat USVs (Ultrasoundgate 416H, http://www.avisoft.com/usg/usg416h.htm, 4 microphone channels), that are connected to the same computer, but started by two different softwares. What I’m trying to do is to find a way to start both recordings at the same time so that to synchronize the two data (video and sound). The goal is to know precisely when sounds occur during the video.
I'm completely new in this kind of tasks and in the field of data acquisition, so any help is truly appreciated.
- Do you think that connecting both recording systems to the same DAQ device will allow me to solve this issue? If so, once both systems are connected to the DAQ, can I start symultaneous recording? How?
-What type of DAQ device would be better for this task? May you give me some suggetion?
-What method of syncronization should be performed? I read about start trigger synchronization and sample clock synchronization, but I'm not sure which of the I need to use.
-Once the recording has been done, will I have 2 different files as output? (one for audio and one for the video?)
Please tell me if you need more information.
Than you very much.
I have experimentally recorded the Sound Pressure Levels of a horn. The SPLs have been obtained through simulation from LMS. But the output of LMS is in the form of spectrum in excel. I want to conver this excel into an audible sound to make psychoacoustic characterisation. How can I do it in MATLAB or any other available resource?
I am seeking your responses for my research project. I am interested, your voice relating to Indigenous community. Any help will be greatly appreciated
Dear all,
I need to analyse automatically conversational features such as the amount of time each person speaks, amount of overlapping speech, number of interruptions, who speaks louder, and so on.
I have separate audio files for each participant (only with his/her voice). How can I analyse these features automatically? Is there any tool that eases such analysis?
Thanks in advance
Should the mean amplitude be average of SPL values at regular intervals ?
I am wondering if there is a comprehensive review on feature construction, selection and classification for audio classification tasks (not necessarily music classification).
I am more interested in a problem where I am recording audio on a machine and let's say there is a fault in the machine, can I pick that up automatically using audio classification?
Thanks!
Sumeet
Compression is used for the audio and video file from olden days. In a database how it can be handled in efficient way?
The following are the details of the wave file,
sampling rate : 48 kHz
16-bit, 10-second duration file of 480000 samples, compressed in PCM format.
I am trying to reconcile some issues about the sounds bumblebees make while flying and while sonicating pollen from anthers. This link is interesting with good recording quality https://www.youtube.com/watch?v=yrjLZ_UYUl4 Now the quandry: It is sometimes said, sometimes with great authority, that the sound the bumblebees make while sonicating anthers is Middle C (C4) at 262 Hz. Is is also said, again sometimes with great authority that the the wing beat frequency of a bumblebee is 200 Hz. That would translate to a sound of 400 Hz (one compression on each of the upstroke and downstroke of the wing) which is close to A4 (440Hz) on a piano. The sonication vibration from the thorax of a worker of Bombus impatiens has been recorded by vibrometer at about 350 Hz, but does that translate to an F4 as a sound? It is clear, even to my ear, that the flight sound is at a much lower pitch than the sonication sound. Thus, there is something wrong with some of the conventional ideas of the sounds that bumblebees make. Perhaps one of our musically adept entomologists can listen to the sounds on the link and suggest clarifications as to sounds (notes and Hz) and wing and/or thoracic vibrations. Thank you, all. Peter
I am working in steganography andi want top algorithms in audio scope. I wanna to improve security and capacity is not importatnt.
I want to know how to measure the frequency and intensity of cricket chirps efficiently. What do you think is the best way in doing so and is a sound meter enough to do so?
I want to select an optimal window for STFT for different audio signals. For a signal with frequency contents from 10 Hz to 300 Hz what will be the appropriate window size ? similarly for a signal with frequency contents 2000 Hz to 20000 Hz, what will be the optimal window size ?
I know that if a window size is 10 ms then this will give you a frequency resolution of about 100 Hz. But if the frequency contents in the signal lies from 100 Hz to 20000 HZ then 10 ms will be appropriate window size ? or we should go for some other window size because of 20000 Hz frequency content in a signal ?
I know the classic "uncertainty principle" of the Fourier Transform. You can either have high resolution in time or high resolution in frequency but not both at the same time. The window lengths allow you to trade off between the two.
As monaural auditory thresholds may be different, how can we measure binaural loudness discomfortable levels by presenting pure tones through earphones?
helllo everyone , i was doing my thesis entitled "filterless class-d amplifier", first i used the simple PWM scheme and i found my output glitchy and out of phase from the input audio signal. and when i measured the output frequency it is smaller than the input signals' frequency (<20khz). My goal is to develop a filterless class d amplifier that will amplify the amplitude and still remain the input frequency (which is 20khz).
I am able to access the transcripts but I am unable to access the audio files even on free online corpora webpages. Could anyone tell me how to access both transcripts as well as audio files together?
I am working with an audio sound profile and. I want to analyse the frequency of that sound and I am using wavpad sound editing software for frequency analysis of that sound. In that case, I have generated frequency versus time graph. But these sound shows multiple frequencies at a time. So I'm not able to generate a perfect graph and also not able to find a frequency range of that sound. So, can you tell me how I can analyse these frequencies ?
For example, if we want to transmit audio or video streaming, how we can calculate signal and channel bandwidth?
I want to extract the pitch of many files (<100) using Wavesurfer and the RAPT method. I know it is possible to generate a file with the pitch information by opening the audio file and choosing the Save Data File. But I want to perform that automatically. Does anyone know how to perform this?
Thank you very much.
I am considering doing discourse analysis of presidential speeches as part of a larger research project. I need a transcription from audio files to code the text in Discourse Network Analyzer for later Social Network Analysis.
Which software could do the transcription as accurately as possible? I am a Mac user by the way. Has anyone experimented oTranscribe?
I have audio file with me and also the text data for that audio. I want to map the text with audio or in short want to highlight the text with audio stream.
I don't want to use (text-to-speech) as I have audio with some background music. (Android)
I am currently working on a project which requires me to characterise deviations from baseline using acoustics. I already have generated the frequency spectra of both the signals, but I am having trouble comparing them to see if there is any difference.
Any help?
Because until recently many scientist have not fully appreciated how widespread and important fish sounds are in the marine soundscape, I wonder if sounds produced by fishes that are being preyed upon by cetaceans could be mistaken for cetacean sounds in some, probably rare, cases. Fish often only make sounds under particular conditions, such as when attacked by a predator, so you would only hear that sound in that circumstance, hence the possibility of mistaken identification. To be shore most fish sounds have much more limited detection ranges than cetaceans. But shouldn't scientists reporting new sounds at least consider the possibility?
Hi all,
I need some help to use WEKA. The study which I am conducting researches if musical features of a song (such as the tempo or the key) are able to predict if a song will end up high or low in the charts. A logistic regression and discriminant analyses were conducted. In the next part of my study, I wanted to split the file on key (major and minor) and see if the other musical features are able to predict if a song will end high or low in the charts, when the data is split on key. In SPSS split file on key was easy, but how can I also do this in WEKA? So, what I am trying to figure out with this analysis is how songs which have a major key can predict if a song will end up high or low in the chart by using the other musical features and the same for minor key. Thanks for your help!
Not the freewares like PRAAT. For sleep deprivation studies.
Are there any java wrappers available for Praat? If so, which is the best one in terms of speed and functionality? I ask this because the Praat scripts that I have written take far too long to execute over my entire dataset of several audio files, and I was therefore wondering whether there may be a java wrapper for Praat which would allow me to execute all the functions of Praat through Java in a shorter time.
I'm looking for a good tool to extract audio features like Mel-frequency, energy, etc. from a sound file. As my final aim is to extract the emotion of the speaker in the audio, it would be most preferable if I could have a tool that already does basic emotion extraction. I have come across some tools like:
and
YAAFE - http://yaafe.sourceforge.net/
Which could be useful for this task, but I have found that their user-base is not too much and so the tools themselves do not seem to be too user-friendly. Also, since I have still not started working with them, I wanted to know whether there are any better tools available that do the same task, in a better or easier way.
I understand that the time domain representation of white noise looks like impulses.
What do their autocorrelation functions look like? (for color noise)
Is there any method or software for cleaning signals from music or separate between speech and music
Do we need to address each and every byte of the file or only starting address and ending address? How to read/write a mp3 file into a memory?
There are a few good spectral editing programs available for Windows, but the process of isolating the different constituent sounds is strictly manual, and can be extremely tedious and difficult, often with disappointing results. So, is anyone aware of any software or plugin that can analyse a recording and automatically identify and isolate all of the various unique waveforms (i.e., the vocals and individual instruments) with a high degree of accuracy, so that they could then be placed on separate tracks, for example, in order to enable subsequent mixing into stereo? (I would imagine that this would be analogous to the "edge detect" effect in graphics editing software. Is this analogy correct?)
Most attempts to produce pseudo-stereo from mono recordings have historically used various tricks such as time delays, EQ adjustments, reverb, comb filters, etc., invariably with unsatisfactory results such as clearly noticeable artifacts and phase errors. This kind of pseudo-stereo is totally unrealistic, and could never be mistaken for true stereo.
However, utilizing spectral editing software, with accurate isolation and rendering of all constituent sonic waveforms, one can theoretically produce a result which is indistinguishable from true stereo, because it is in actuality a true stereo mix, having been constructed from individual tracks that each contain only one isolated component sound. These tracks are equivalent to the output of a multi-track machine.
The automation of the process of sonic detection and isolation would enable and greatly facilitate the production of these virtually flawless mixes.
I have a huge audio dataset (m x n), m instances and n features, on which I would like to perform Principal Component Analsysis. Is there any method to use a separate Validation Dataset to choose the number of PCs 'k' ? Because the training validation error monotonically decreases and reaches 0 when all the PCs are taken (k=n). I know that we can set a cut off for percentage of variance explained, but is there any other way by using a validation dataset ?
I want to extract a temporal envelope of a speech.
I am working on a chapter about the importance of high quality sound in virtual restorative environments (for healthcare applications) and I am looking for studies that investigate the quality of audio on relaxation or any other dependent variable. Is anyone aware of any such studies? Note that the VR factor is not important at the moment, just the effect of low/high quality of audio on perception. If it is in a healthcare or relaxation field that's great but not essential. Any leads would be greatly appreciated.
Two omnidirectional microphones are used to form the gradient microphone (Faller 2010) to get the stereo signal. But when I use two onmi-directional microphones, I get good stereo effect, but when I use the other two microphones, the stereo effect is much worse. Maybe it is because of the difference between the frequency response of these two microphones. I compensated for the spectrum of two microphones, so that they have the same frequency response. But the effect is not as good as expected. Is there any other reasons for this problem? And is there any methods to enhance the stereo effect?
Similar to the NU-6 auditory test.
If anyone has any information it would be greatly appreciated
A vocal tract replica has been excited with a sine sweep in an anechoic chamber, in order to compute the transfer function of the tract. Does the impulse response recorded by the mic need to be convolved with an inverse function of the sine sweep?