What are some good tools for emotion extraction from audio features?
I'm looking for a good tool to extract audio features like Mel-frequency, energy, etc. from a sound file. As my final aim is to extract the emotion of the speaker in the audio, it would be most preferable if I could have a tool that already does basic emotion extraction. I have come across some tools like:
Which could be useful for this task, but I have found that their user-base is not too much and so the tools themselves do not seem to be too user-friendly. Also, since I have still not started working with them, I wanted to know whether there are any better tools available that do the same task, in a better or easier way.
Rupali Kawade I can tell you at the moment that I have made the handling of open Smile outside the language. I have programmed a script in linux to extract the features, and asynchronously, the app developed in python consumes the results from a repository. I think it is all about implementing some integration strategy between the script you run in linux, and the prototype you may have in python. Of course, in the future it would be good to find a library similar to open smile that runs in a native way in python. In my case I have an advanced project, and I would have a huge effort if I make a change of that level now.
We use PRAAT in our lab. http://www.fon.hum.uva.nl/praat/. This does not directly measure "emotion," but it is flexible. If you don't mind using proxies like pitch, frequency, duration, onset time, and other measures of variability or entropy as measures of emotion, then it is quite useful. Our variable of interest is prosody, for which we use a script (which, I believe was freely available on the internet or elsewhere) We splice our audio files to include only the subject's vocal output and analyze using PRAAT. This gives an output in Excel, which allows for analysis.
They have a utility that extracts emotion time courses from raw audio. The time courses are based on empirically derived formulae for features that map to emotional expression.
Of course, the above is intended to deal with music, but in practice it deals directly with audio files, calculates many spectral and temporal functions (incl MFCC), and it's not clear to me that there's any reason not to use it on speech.
We use PRAAT in our lab. http://www.fon.hum.uva.nl/praat/. This does not directly measure "emotion," but it is flexible. If you don't mind using proxies like pitch, frequency, duration, onset time, and other measures of variability or entropy as measures of emotion, then it is quite useful. Our variable of interest is prosody, for which we use a script (which, I believe was freely available on the internet or elsewhere) We splice our audio files to include only the subject's vocal output and analyze using PRAAT. This gives an output in Excel, which allows for analysis.
OpenEAR is well known and efficient in the area. I think you would not face difficulty if you use it under Linux Ubuntu. it depend on the programming environment you work in because the OpenEAR codded in C++. even if you have different environment the features could be produced in arff files. Praat software also used for the same purpose praat.en.softonic.com/
Also we use PRAAT or CSL speech KAI in our lab. They don't extract emotions but acoustical vocal features (of emotional speech) like the variation of pitch, energy, time, formants, etc. They are flexible but not friendly.
For Feature extraction I also reconmend opensmile. It is a Framework to extract a huge bunch of features in realtime. See either http://www.openaudio.eu/ or http://sourceforge.net/projects/opensmile/ for further information. This tool extracts spectral as well as prosodic and voice quality features. Additionally, signal preprocessing and adding statistical functionals is possible.
I have no good knowledge about emotion features extraction, but with earlier experience I think the wavelet transform offers good tool for preprocessing and feature extration.
Have a look on Bjorn Schuller's publications and you'll find some state of art tools (including opensmile) and other powerfull emerging methods for emotion recognition from audio.
Thanks everyone for your answers. Before this, I had been working with OpenEAR and OpenSMILE a bit. After reading your answers, I think I should have more of a look into PRAAT. It seems to be much more user-friendly and works on different platforms, which suits my needs. Also, I like the fact that it is quite intuitive - if I can visualize what I'm doing in the tool, it's so much easier to execute my plans. Moreover, there is some work on how people have extracted emotions using analyses of pitch, formant, spectral and voice quality features, etc. So I think I'll try to execute these ideas too, as a start before modifying them to achieve better results on my task.
@Lionel, thanks for the suggestion. I have read some of his work but I'll more closely follow him now.
Hi, everybody. Someone knows how to use opensmile directly from python. Is it possible to include it as a library? or should it be used outside the language?
Rupali Kawade I can tell you at the moment that I have made the handling of open Smile outside the language. I have programmed a script in linux to extract the features, and asynchronously, the app developed in python consumes the results from a repository. I think it is all about implementing some integration strategy between the script you run in linux, and the prototype you may have in python. Of course, in the future it would be good to find a library similar to open smile that runs in a native way in python. In my case I have an advanced project, and I would have a huge effort if I make a change of that level now.
The problem of generative process tracking involves detecting and adapting to changes in the underlying generative process that creates a time series of observations. It has been widely used for visual background modelling to adaptively track the generative process that generates the pixel intensities. In this paper, we extend this idea to audio ba...
This paper introduces the principle of Audio analyzer, analyses the method of analyzing audio spectrum by Fast Fourier Transform. It puts forward the scheme of audio analysis over the audio signal using ARM7 and UDA1341TS, adopting the technology of double cache and DMA to increase speed of data processing. A system of collecting audio signal and a...
In most real-world audio recordings, we encounter several types of audio events. In this paper, we develop a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which we call Acoustic Unit Descriptors or AUDs. Experiments show that the methodology works...