Questions related to Acoustic Signal Processing
I am working on an echo removal project. So far, I have successfully identified the far-end signal of length 21 ms at a sampling rate of 48000Hz whose echo is present in my near-end signal of 21ms. I did it using Echo Detection and Delay Estimation using a Pattern Recognition Approach and Cepstral Correlation .
Now, I want to remove that far-end echoed signal from my near-end signal which contains(echoed signal of farend and voice).
Things I tried:
- Time-domain subtraction of PCM signals. i.e output[n] = near_end[n] - far_end[n]
- Spectral Subtraction technique Eliminate Signal A from Signal B. Even Ephraim-Malah
In both, I am not getting the expected result as for spectral subtraction I read that It works well when there is static noise or one signal is stationary. For non-stationary signals, it doesn't work well.
What are the other techniques to remove the echo in my scenario? Since I have identified the far-end chunk whose echo is present in the near end chunk, I just want to remove it from near end chunk.
I am trying to build a text to speech convertor from scratch.
Text 'A' should sound Ayyy
Text 'B' should sound Bee
Text 'Ace' should sound 'Ase'
So how many total sounds should I need to resconstruct full English language words
is there any option in AVISOFT SASLab Pro software which enables you to eliminate unwanted noise from digital recording without effecting your original sound? In my case, sounds are recorded in the experimental tanks with a hydrophone connected to the digital audio recorder. The lab is full of low-frequency noise, which in some proportions, disrupts my sound of interest. If I high-pass filter recording, there is still noise which is not eliminated and it is overlapping with the sound frequency spectra.
Any advise would be helpful.
Anyone have any idea on how to harvest the acoustic energy from a line sound source? The line sound source is in small scale, maybe in a centimeter range, and the sound pressure is very small, around uPa I guess.
I have involved and interested to estimate the age of users through speech signals. Kindly suggest some of the free corpus available to do this research.
It is well known that audio compression (e.g., MP3, AAC) usually processes the audio data frame-by-frame. However, I am curious about the feasibility of single frame based processing.
A commonly accepted notion is that frame based processing has time resolution of audio data while a single frame processing does not have. This is similar to comparing DFT and STFT.
However, why we need time resolution of audio signal during compression? For a given audio clip, its single frame FFT has super frequency resolution (huge points) and no time resolution. However, we can still calculate tonal and non-tonal elements, masking curves, and generate quantization index, etc. In this way, the modifications of any frequency bins will be reflected throughout time domain whenever this frequency appears along the time axis in the compressed time domain audio samples.
I personally do not see any potential problems of performing single frame compression as described above. The only problem I can imagine is in terms of hardware implementation for huge DCT points. But the computational complexity of FFT is O(nlogn) which approaches a linear function of n when n is large. Hence I do not see this as a big problem with the consideration of rapid developed computer capabilities.
Please help to point out my mistakes in the above statements.
The aim is to follow the evolution of the attenuation coefficient vs. mortar age.
The experiments will be done using the Ultrasound Pulse Echo Method with P-waves and by Immersion testing (using imersion transducers)
I have two speech signals coming from two different people. I want to find out whether or not both people are saying the same phrase. Is there anything that I can directly measure between the two signals to know how similar they are?
I would appreciate it, if someone explain to me how I could estimate parameters of stoneley waves (such as arrival time, corresponding to the stoneley waves frequency or range of frequencies, velocity and so on...) that propagate in rock sample - limestone.
I've used ultrasonic pulser/receiver and oscilloscope for measurements and got this results ( please find attached ), but not sure what should be the next steps.
How I could differentiate where are P-wave, S-wave or Stoneley-wave moment of arrival and if there are any at all?
Never deal with this topic before, therefore would be grateful for guidance or advice on literature that I should read first.
For my bachelor thesis, I would like to analyse the voice stream of a few meetings of 5 to 10 persons.
The goal is to validate some hypothesis linking speech time repartition to the workshop creativity. I am looking for a tool that can be implemented easily and without any extensive knowledge of signal processing.
Ideally, I would like to feed the tool with an an audio input and get the time segments of the speaker either graphically or in matrix/array form.
- diarization does not need to be realtime
- source can be single or multi stream (we could install microphones on each participant)
- the process and can be (semi-)supervised if need be, we know the number of participants beforehand.
- Tool can be an matlab, .exe, java, or similar file. I am open for suggestions.
Again I am looking for the simplest, easy-to-install solution.
Thank you in advance
I have the following 2 files in xlsx format. I would like to have a matlab code to read in the data of both files and do a corrleation to chek how much similarity there is between them and if they differ, how much do they differ from each other. Can someone help me, I have limited matlab knowledge
I want to measure the velocity fluctuation in a air stream using two microphone. I got the data. How do i process it to get the velocity.
I want to do an auditory experiment, in which the intensity of sound changes in different conditions, e.g., 60dB in one condition, 35dB in another condition. How could this be achieved? Is there any hardware or software to control the intensity of sound in dB level?
i want to calculate the flame transfer function of swirl stabilized non premixed flame. I have a loud speaker at my disposal. Do i need a siren instead of a loudspeaker.
I understand that power delay profile is created a particular measurement point in the propagation environment. If that is so, I'm wondering how can I create a single power delay profile for a 8 m propagation measurement with 28 measurement points.
The channel sounder captures complex frequency response which translated to the time domain using IFFT. The transmission bandwidth is between 1-4 GHz.
Thanks for your kind help in advance.
I am trying to determine the signal to noise ratio (SNR) of audio signals for speech research. I have found several ways to calculate it. Is there a standard equation generally used in this field and is there a standard value the SNR should be above?
I want to get the Intensity readings of a sustained vowel at even time intervals (i.e. every 0.01 or 0.001 seconds). In Praat when I adjust the time step to "fixed" and change the fixed time to 0.01 or 0.001 it adjusts this for pitch and formants, but not for intensity. Intensity remains at increments of 0.010667 seconds between each time. Is it possible to change the time step for intensity or can it only be changed for the other parameters? Any help is much appreciated!
I am analyzing recorded speech of sustained vowel phonation and am trying to figure out which filters are necessary for the analysis. Does an A-weighted filter need to be applied to account for the fundamental frequency? And does any de-noising need to be done to the signal?
Could you please let me know the any references or details for designing a wideband SAW receiver working at a centre frequency of 20MHz and BW of 25%.
Is there any limitation for designing SAW wideband devices at this frequency? I saw that most of the works are in the GHz range.
I am working on psychoacoustic active noise control. I want to design an psychoacoustic model for sound quality measurement. I went through the book of Zwicker's loudness: Psychoacoustic Facts and Models.
I am not able to understand how to write the program in matlab to calculate the loudness.
I have two acoustic signals (amplitude as a function of time). I do an FFT or Welch analysis to see, for which frequencies the two spectra are similar and where they differ. I am thinking of a correlation coefficient as a function of frequency. How do I do this (e.g. in matlab)?
PS the spectra are quite noisy.
Synthetic data generation is an interesting ara of research but i have difficulties finding articles and textbooks about the topic. I want an idea about definitions and framework for automatic synthetic data generations in any area, particullary on sound analysis.
my name rouland, working in university of pakuan departmen of biology. now i studying about frogs. i have a big problem, as a beginner biggest problem was identified frogs sound. so i have a idea to analyze every type of frog sounds and colleting base on similarity of wave sound so i can have what kind of wave which same frogs. the hard one is searching software to analyze the sounds? so what kind of software suit for my research? thnks
My eventual aim is to model the radiation of acoustic sources using a hybrid approach (different mesh gradients and domain sizes).
I would like the be able to create models that have changing flow velocities at a boundary. This would eventually lead to simulating audible sources (loudspeakers etc), so the source velocity would need to change appropriately with each step of the computation, and would need to effectively be an inlet and an outlet.
Is this currently possible with only some minor tweaks of an arbitrary 2D case?
Has anyone come across a tool that already exists for this?
Any advice or suggestions are more than welcome.
Dear Physicists, please I would like to know if there is a relation between the intensity of a signal acoustic wave and the ability of that wave signal to shatter a body. For instance, if a tuning fork is emitting an acoustic wave into body X (such as a glass cup) at a resonant frequency of body X (for instance the fundamental frequency) is there a threshold intensity (e.g. wave amplitude, wave power) of the wave emitted by the tuning fork which needs to be exceeded before body X will shatter? It seems well known that the shattering effect will occur at specific frequencies (the fundamental frequency or its harmonics) however, does the wave amplitude, power or other material properties of the body play a part? If yes, please what relation or formula governs these? Answers appreciated. Thanks
The mean processed signal to noise ratio was calculated to be 30 dB for the Raytheon sonar, and 13 dB for the Klein sonar. Using the Receiver Operating Characteristic (ROC) curve displayed (figure with calculations from Urick,1983 is attached: file name is "ROC curves calculations.bmp"), and given the desired false alarm probability of 0.5%, the probability of detection corresponding to the mean processed signal to noise ratio for each sonar
was calculated at the false alarm level. The probability of detection was calculated to be 0.998 for the Raytheon sonar (green lines on the plot attached), and 0.82 for the Klein sonar (yellow lines on the plot attached).
I tried to make the calculations mentioned by MATLAB tools (with the rocsnr function), but I cannot receive the same results as by paper plots. MATLAB gives the essentially higher values: e.g., for Raytheon sonar the probability of detection is always 1 (for signal to noise ratio equal to 30 dB). MATLAB code for calculations is relatively simple and is given below.
[Pd,Pfa] = rocsnr(30);
idx = find(Pfa==0.005); % find index for Pfa=0.005
The result for calculation looks as follows (I expected to get 0.998).
Empty matrix: 1-by-0
After getting this result I tried to increase Pfa value, but the result is 1.
[Pd,Pfa] = rocsnr(30);
idx = find(Pfa==0.01); % find index for Pfa=0.01
For Klein sonar the probability of detection is almost 1 instead of 0.82 (for signal to noise ratio equal to 13 dB). I cannot obtain the result for false alarm probability of 0.5%, in case of 0.1% I get 0.999967062.
[Pd,Pfa] = rocsnr(13);
idx = find(Pfa==0.005); % find index for Pfa=0.005
Empty matrix: 1-by-0
[Pd,Pfa] = rocsnr(13);
idx = find(Pfa==0.01); % find index for Pfa=0.01
What is the reason for such inconsistence in paper plot calculations and "efficient" MATLAB calculations performed automatically for the same data input?
The original figure for ROC curves (Urick,1983) without additional lines plotted is attached too (file name is "ROC curves (Urick, 1983).bmp").
The links to MATLAB documentation related to ROC curves are given below.
It is interesting that ROC curves were first introduced in MATLAB R2011a.
Check this video:
It seems to prove the string modes are finite states that are either one state or another.
Note the waves are never standing but twist and vibrate in another mode on top of the standing wave concatenary. The standing wave has no time derivative but the concatenary has its own oscillation.
Can anyone prove the string can have two modes at once?
I have used 356B20 accelerometer to record acceleration data during a drop test. I have extracted the information in an excel file. It contains X, Y, z acceleration data from 1 sensor. I would like to remove any noise from the signal to get the smooth acceleration data. Please suggest me a suitable method.
Two time series data (1. Engine RPM wrt time and 2. Seat Acceleration wrt time) were analyzed using LMS Testlab and LMS AMESim post processing tools. It was observed that the order plot of acceleration wrt rpm obtained from both softwares using the same window type and trend removal option do not match. Further investigations reveals that using the same sampling frequency in AMESim and Testlab produced a result that have similar trends with Testlab result. However, the accelerations amplitudes have significant deviation from Testlab results.
What other parameters should be checked to obtain same results from Testlab and AMESim?
At the application of the Analog-Digital-Microprocessor (ADµP™) for formant measurement, analysis and synthesis I have found that the frequency of the second formants corresponds to the frequency of the second harmonic. Now, it must be decided whether the Analog-Digital-Microprocessor to ascertain the second harmonic. This implies more computing time and less performance.
Experiment Findings Appendix 17: Formant analysis with ADµP® – 2nd formant F2 of...
I have data, where ACC is elicited with frequency change of 5, 10, 25, 50, 100 Hz. Here, (for eg.) it is possible that thresholds could be anywhere between 25 and 50. How to find it?
Increasing volume fraction of a tungsten epoxy backing from 5 - 25% increases density, decreases speed of sound and overall impedance increases. Does increasing density correlate to a greater hardness?
From papers, i know that the frequency range of the AE in a machining is averagely 100khz-300khz, although the different material has different frequency range. But i still can't make sure what is the useful frequency range which decides my denoising processing. The following picture are my result.
I want to know whether the 25khz and 50khz is the useful signal of mechining. And is the final denoising right?
A flow past a mouth of a deep cavity can result in an exciation of high - amplitude acoustic pulsations. such pulsations are often encounterd in gas-transport systems, heat exchangers and other industrial processes involving transport of a fluid through a pipleline.
I really don't know how noise occurs.
one said "swirls induced by separation interact between each other and generating noise."
when rotating flow contract with other rotating flow, noise occurs? how??
I have a measured data signal in which a noise component of roughly 107 MHz appears. How can I get rid of it? Could you explain me how to implement it in MatLab? Thank you very much.
I would also be most grateful if anyone who works within the field of infrasound would contact me to discuss an exciting collaboration.
The source is fixed and waveforms are recorded in many stations. I want to determine the delay time between two stations. I know some ways to measure similarity like the Cross-Correlation,semblance,dynamic time warping etc. And meet the biggest problem that my data are with strong noise background like the cars noise ,walking noise etc . Anyway, I don't know what to do and how to do. if two stations is close ,so i can use the CC way to obtain the delay time .but the raypath is not same and the site effect, so using the CC may be risky. CAN ANYONE HELP ME？
Hello, i need to measure the cavitation in a water tank (cause by ultrasound) using a hydrophone, however i am stuck with the data interpretation measured by the hydrophone.Is there any journal paper/help online on interpreting acoustic cavitation measured by hydrophone ? Thanks
It will be interesting to gain experience from persons applying a hydrophone for the measurement of sound in air. Initial experiments seems to indicate that a hydrophone performs different in air compared to water, probably due to grater impedance mismatch between the impedance of the transmission medium and the sensor.
I am new in speaker recognition field. I have collected the speech data form 100 person. each person have 3 speech samples. NOW , I want to extract features in my data in order to build model for speaker recognition. for example: MFCC and Formant ..... so can i applied these acoustic technique to feature extraction from the speech signal directly ?
.... I will appreciate to any one help me
How to determine the maximum speed(ensuring no collisions) of an omnidirectional robot with a ring of sequentially firing 'n'number of ultrasonic sonar sensors? Given the frequency 'f' of each sensor, the max acceleration 'a'.
The robot is i a world filled with sonar-detectable fixed (nonmoving) obstacles that can only be detected at 'x' meters and closer
Is this the maximum velocity that can be attained by the robot as in the 'ring cycle time'?
The way I approached it,
Consider: F=70kHz, a=0.5m/s2, x=5m, n=8.
Now considering the obstacle is at the furthest detectable distance, 5m
Considering the speed of sound as 300 m/s.
Time taken for the sensor to receive the signal = 2* ((1/70*103)/(300/70*103))*5 seconds. =1/30 seconds
Now as there are 8 sensors fired sequentially, total time taken= 8/30 seconds.
Therefore, ring cycle time=8/30 seconds.
Also overall update frequency of 1 sensor = 30/8 Hz.
Now is the maximum velocity , the velocity attained by the robot for t= 8/30 seconds?
I have changed the frequency slightly during my experiments, how should it affect on the intensity of sonication? Many publications said that it should remain same, while some reported that it should be increased simultaneously.
I applied spectral subtraction technique to AE signal by subtracting the recorded noise spectrum from the acquired AE spectrum (NB: AE data contains both AE and spindle noise influence). the ifft of the residual in the time domain indicates higher amplitudes in the result compared to the time domain amplitude of the signal + noise. why is that so? does it mean that lower frequency content results in high amplitudes in the time domain, is there any explanation to support this inverse proportionality
I have seen research papers that propose the method to calculate guided wave transmission/reflection coefficients by dividing the amplitude at the center frequency of the receiving signal spectrum by that of the excitation spectrum. My question is that is there any windowing applied before such a division? In other words, do the scattered or reflected wave-packets need to be isolated first? If so, what kind of windows are commonly used?
I am trying to calculate the Breathiness Index, suggested by Fukazawa et al. (1988), as a measure for breathy voice. The paper indicates a range of 8.3 to 75.7 for values of BRI, but my calculations yield values at an order of 1015.
I refer to the definition of BRI as the ratio between the energy of the second derivative of a signal and the energy of the non-derived signal.
I performed the analysis in Praat. The original sound was converted to a matrix, then I applied a formula ((self [col+1] - self [col]) / dx) twice, to obtain the second derivative, and cast the matrix to a sound. Energy was calculated using the Get energy command (which calculates the integral of the squared signal between two time points).
Any idea what I am missing here?
Alternatively, can anyone suggest another measure for spectral tilt that does not require an arbitrary cut-off frequency between low and high frequencies?
I am trying to find something very technical with lots of examples in order to understand how acoustic signal reacts on particular settings (frequency, pulse etc.).
I use stereo microphones and I could not find a good open source SSL code with Google. I would like to have an implementation of Time Difference of Arrival, Interaural Phase Difference or other sophisticated method.
In direction of arrival estimation the MUSIC algorithm differs from the EV algorithm in eigenvalues weight so what is the advantages and disadvantages for each one over the other? Also I would like the MATLAB code for DOA estimation using two methods mentioned above, MUSIC and EV.
When i try to get output as 44k music format, its have some delay in case of buffer process and buffer size (as array). Finally, the output sound like have any distortion. How to prevents that delay?
As the title, acoustics analysis by my BEM,the results what I obtained are conjugate complex of reference solutions. For example: the reference solution is 1+2*i, my result is 1-2*i. I have not found the reason. Who can tell me the reasons?
The problem I solve is 2D frequency domain. The normal direction is consistent with my BIE formulation. The reference solutions come from software of prof. Yijun Liu.
At a certain transducer power level the maximum intensity as well as the average intensity in a volume of interest drop rapidly (instead of in a linear manner, as I would have expected). I wondered whether this is due to some kind of rescaling in 3D-US software, but couldn't find any information on that.
Image below shows the same effect observed on a Philips SONOS 7500 3DUS.
I am working with the following setup: Ultrasonix SonixTablet, running 'Porta SDK 6.07' on it, 4DL14-5/38 linear 4D transducer. Checked the manual for answers. I am taking pictures of metallic surgery tools in soft tissue, such as ex vivo pig hearts.
I'd be happy about any information(paper, book,..) related to that topic.
Thank you for your help
I want to know the criteria for designing a particular frequency band PZT-based AE sensor such as the size, shape of the PZT pallet etc...
I'm trying to differentiate between different vowels based on formant frequencies. I see that are list of formant-frequency-pair for different acoustic vowels. For any particular vowel, are the formant frequencies same even for different person (irrespective of gender and age)?
Thanks in advance
I am looking for research on the topic of training Boltzmann machines, Deep Belief Nets or other generative models on audio samples. Ultimately I would like to train these on specific sounds and then have the network generate new sound samples using Gibbs sampling. The only research that I find are for training and generating music scores. The closest I can find is for speech recognition but it is very specific to speech and combined with HMMs.
I have implemented MVDR beamformer for speech signal processing, assuming the gain to be unity for the desired direction(delay vector), but when I am checking it for speech files gain seems to be many folds during speech regions. due to this speech is getting distorted and saturated.I am using 2 mic linear array with separation of 6cm for capturing audio files.
from the problem formulation of MVDR beamforming we assume unit gain in desired direction, do I need to multiply the calculated weights with some small constant(fixed/adaptive)as below in order to control the gain
w = 0.05.*( Inv(noise_cor) * c_ ) / ( ( c_' *Inv(noise_cor )*c_)) ;
or there is some implementation mistake ?
For the attached sawtooth wave, it is apparent that 0th complex-form Fourier series coefficient is equal to zero, c0=0, because average of the sawtooth wave is zero.
Furthermore, for any k value, the complex-form Forier series coefficients are obtained as
ck=j*[(-1)k] / [k*pi].
My question is: Shouldn't we obtain c0 as a special case of ck if we substitute k=0?
But, if we do this, it seems like c0 diverges to (j*infinity) instead of going to 0.
Am I missing something???
I want to know if there are estimations of the signal-to-noise ratio of speech signals affected by acoustic noise found in different environments.
I want to ask you if there is anyone searching on the DOA using eigenvector method (EV). This method was proposed after MUSIC method. I am faced with a problem in calculating the magnitude and accurate angle for this method.
I have been recently interested in source separation for musical signals. The paper I study (see below), uses nonnegative matrix factorization (NMF) for separation of musical audio recordings based on the magnitude spectrogram which could be a size MxN nonnegative matrix.
"Score-informed source separation for musical audio recordings: An overview", Ewert, S., Pardo, B., Muller, M., Plumbley, M. D., IEEE Signal Processing Magazine, vol: 31, no: 3, pp:116 - 124, May 2014.
NMF separates the magnitude spectrom into a size MxK template matrix W and a size KxN activation matrix H, both of which are also nonnegative valued. Dimensions M and N correspond to the numbers of the frequecy bins and time frames, respectively, of the input magnitude spectrogram. But, the additional dimension value of K is shared by both W and H and should also be given to the NMF. In the above paper, K is manually set dependent on the number of instruments and musical pitches existing in the particular musical piece that is to be separated.
In that case, can we still claim that we are performing a blind source separation method? Or, it is better to classify it as semi-blind, or even something else? What is the accepted terminology? I will appreciate some expert opinions.
Hi, I need to process real time signals from PZT crystals. can I take these signals and put them into MATLAB for further processing?
What is the maximum sampling rate I can have (the signals are around 1Mhz, sine)?
I would like to attempt to use otoacoustic signal for biometric application. I searched for the datasets, but unable to find any. Please help me in this regard.