Question
Asked 11th Apr, 2014

What are some good tools for emotion extraction from audio features?

I'm looking for a good tool to extract audio features like Mel-frequency, energy, etc. from a sound file. As my final aim is to extract the emotion of the speaker in the audio, it would be most preferable if I could have a tool that already does basic emotion extraction. I have come across some tools like:
and
Which could be useful for this task, but I have found that their user-base is not too much and so the tools themselves do not seem to be too user-friendly. Also, since I have still not started working with them, I wanted to know whether there are any better tools available that do the same task, in a better or easier way.

Most recent answer

14th Sep, 2020
Yesid Ospitia Medina
Universidad Nacional de La Plata
Rupali Kawade I can tell you at the moment that I have made the handling of open Smile outside the language. I have programmed a script in linux to extract the features, and asynchronously, the app developed in python consumes the results from a repository. I think it is all about implementing some integration strategy between the script you run in linux, and the prototype you may have in python. Of course, in the future it would be good to find a library similar to open smile that runs in a native way in python. In my case I have an advanced project, and I would have a huge effort if I make a change of that level now.

Popular answers (1)

14th Apr, 2014
Kyle Mitchell
Louisiana State University
We use PRAAT in our lab. http://www.fon.hum.uva.nl/praat/. This does not directly measure "emotion," but it is flexible. If you don't mind using proxies like pitch, frequency, duration, onset time, and other measures of variability or entropy as measures of emotion, then it is quite useful. Our variable of interest is prosody, for which we use a script (which, I believe was freely available on the internet or elsewhere) We splice our audio files to include only the subject's vocal output and analyze using PRAAT. This gives an output in Excel, which allows for analysis.
3 Recommendations

All Answers (20)

14th Apr, 2014
Frederick Streeter Barrett
Johns Hopkins Medicine
Check out the MIR Toolbox:
They have a utility that extracts emotion time courses from raw audio. The time courses are based on empirically derived formulae for features that map to emotional expression.
3 Recommendations
14th Apr, 2014
Frederick Streeter Barrett
Johns Hopkins Medicine
Of course, the above is intended to deal with music, but in practice it deals directly with audio files, calculates many spectral and temporal functions (incl MFCC), and it's not clear to me that there's any reason not to use it on speech.
1 Recommendation
14th Apr, 2014
Kyle Mitchell
Louisiana State University
We use PRAAT in our lab. http://www.fon.hum.uva.nl/praat/. This does not directly measure "emotion," but it is flexible. If you don't mind using proxies like pitch, frequency, duration, onset time, and other measures of variability or entropy as measures of emotion, then it is quite useful. Our variable of interest is prosody, for which we use a script (which, I believe was freely available on the internet or elsewhere) We splice our audio files to include only the subject's vocal output and analyze using PRAAT. This gives an output in Excel, which allows for analysis.
3 Recommendations
14th Apr, 2014
Abdulbasit Al-Talabani
Koya University
OpenEAR is well known and efficient in the area. I think you would not face difficulty if you use it under Linux Ubuntu. it depend on the programming environment you work in because the OpenEAR codded in C++. even if you have different environment the features could be produced in arff files. Praat software also used for the same purpose praat.en.softonic.com/‎
3 Recommendations
14th Apr, 2014
Francisco Martínez-Sánchez
University of Murcia
Praat., clearly!
2 Recommendations
14th Apr, 2014
Maria Rita Ciceri
Catholic University of the Sacred Heart
Also we use PRAAT or CSL speech KAI in our lab. They don't extract emotions but acoustical vocal features (of emotional speech) like the variation of pitch, energy, time, formants, etc. They are flexible but not friendly.
2 Recommendations
14th Apr, 2014
Ingo Siegert
Otto-von-Guericke-Universität Magdeburg
For Feature extraction I also reconmend opensmile. It is a Framework to extract a huge bunch of features in realtime. See either http://www.openaudio.eu/ or http://sourceforge.net/projects/opensmile/ for further information. This tool extracts spectral as well as prosodic and voice quality features. Additionally, signal preprocessing and adding statistical functionals is possible.
3 Recommendations
14th Apr, 2014
Mohammed Abdel-Megeed Mohammed Salem
The German University in Cairo
I have no good knowledge about emotion features extraction, but with earlier experience I think the wavelet transform offers good tool for preprocessing and feature extration.
1 Recommendation
14th Apr, 2014
Lionel Prevost
ESIEA
Have a look on Bjorn Schuller's publications and you'll find some state of art tools (including opensmile) and other powerfull emerging methods for emotion recognition from audio.
3 Recommendations
17th Apr, 2014
Tahir Sousa
Technische Universität Darmstadt
Thanks everyone for your answers. Before this, I had been working with OpenEAR and OpenSMILE a bit. After reading your answers, I think I should have more of a look into PRAAT. It seems to be much more user-friendly and works on different platforms, which suits my needs. Also, I like the fact that it is quite intuitive - if I can visualize what I'm doing in the tool, it's so much easier to execute my plans. Moreover, there is some work on how people have extracted emotions using analyses of pitch, formant, spectral and voice quality features, etc. So I think I'll try to execute these ideas too, as a start before modifying them to achieve better results on my task.
@Lionel, thanks for the suggestion. I have read some of his work but I'll more closely follow him now.
1 Recommendation
28th May, 2014
Permagnus Lindborg
City University of Hong Kong
Hi Tahir, some people whose work on speech emotion you may want to check include Eduardo Coutinho (e.g. https://www.academia.edu/1087299/Psychoacoustic_cues_to_emotion_in_speech_prosody_and_music), Petri Laukka (http://w3.psychology.su.se/staff/pela/), and Klaus Scherer (http://www.affective-sciences.org/user/scherer).
I will also mention one work of mine even though it is more towards art/design rather than generalisable analysis (https://www.academia.edu/842766/About_TreeTorika_Rhetorics_CAAC_and_Mao._book_chapter_) and perhaps of lesser interest to you. Have fun researching!
3 Recommendations
10th Jun, 2014
Don Knox
Glasgow Caledonian University
We use the MIR Toolbox, Psysound, Marsayas.
1 Recommendation
28th Oct, 2015
Bartosz Zeliński
Jagiellonian University
What kind of features can you obtain from Psysound, as I cannot found it anywhere?
What kind of software have you decided to use?
1 Recommendation
28th Oct, 2015
Francisco Martínez-Sánchez
University of Murcia
1 Recommendation
28th Oct, 2015
Bartosz Zeliński
Jagiellonian University
Is it possible to extract emotion with it?
Are there any ready to use classifiers, or I have to learn them myself?
1 Recommendation
28th Oct, 2015
Abdulbasit Al-Talabani
Koya University
OpenEAR software provid a C++ code which work properly under Unix, it extracts more than 6000 LLDs. 
1 Recommendation
29th Apr, 2019
Sippee Bharadwaj
where are classifiers in pratt software?
6th Apr, 2020
Yesid Ospitia Medina
Universidad Nacional de La Plata
Hi, everybody. Someone knows how to use opensmile directly from python. Is it possible to include it as a library? or should it be used outside the language?
14th Sep, 2020
Rupali Kawade
PCCOER
Yesid Ospitia Medina I have same question like you.. If anyone has idea plz share.
14th Sep, 2020
Yesid Ospitia Medina
Universidad Nacional de La Plata
Rupali Kawade I can tell you at the moment that I have made the handling of open Smile outside the language. I have programmed a script in linux to extract the features, and asynchronously, the app developed in python consumes the results from a repository. I think it is all about implementing some integration strategy between the script you run in linux, and the prototype you may have in python. Of course, in the future it would be good to find a library similar to open smile that runs in a native way in python. In my case I have an advanced project, and I would have a huge effort if I make a change of that level now.

Similar questions and discussions

How to extract formants using the openSMILE toolkit?
Question
1 answer
  • Edwin WaldEdwin Wald
Hello everyone,
for my thesis I want to extract some voice features from audio data recorded during psychotherapy sessions. For this I am using the openSMILE toolkit. For the fundamental frequency and jitter I already get good results, but the extraction of center frequencies and bandwidths of the formants 1-3 is puzzling me. For some reason there appears to be just one formant (the first one) with a frequency range up to 6kHz. Formants 2 and 3 are getting values of 0. I expected the formants to be within a range of 500 to 2000 Hz.
I tried to fix the problem myself but could not find the issue here. Does anybody have experience with openSMILE, especially formant extraction, and could help me out?
For testing purposes I am using various audio files recorded by myself or extracted from youtube. My config file looks like this:
///////////////////////////////////////////////////////////////////////////
// openSMILE configuration template file generated by SMILExtract binary //
///////////////////////////////////////////////////////////////////////////
[componentInstances:cComponentManager]
instance[dataMemory].type = cDataMemory
instance[waveSource].type = cWaveSource
instance[framer].type = cFramer
instance[vectorPreemphasis].type = cVectorPreemphasis
instance[windower].type = cWindower
instance[transformFFT].type = cTransformFFT
instance[fFTmagphase].type = cFFTmagphase
instance[melspec].type = cMelspec
instance[mfcc].type = cMfcc
instance[acf].type = cAcf
instance[cepstrum].type = cAcf
instance[pitchAcf].type = cPitchACF
instance[lpc].type = cLpc
instance[formantLpc].type = cFormantLpc
instance[formantSmoother].type = cFormantSmoother
instance[pitchJitter].type = cPitchJitter
instance[lld].type = cContourSmoother
instance[deltaRegression1].type = cDeltaRegression
instance[deltaRegression2].type = cDeltaRegression
instance[functionals].type = cFunctionals
instance[arffSink].type = cArffSink
printLevelStats = 1
nThreads = 1
[waveSource:cWaveSource]
writer.dmLevel = wave
basePeriod = -1
filename = \cm[inputfile(I):name of input file]
monoMixdown = 1
[framer:cFramer]
reader.dmLevel = wave
writer.dmLevel = frames
copyInputName = 1
frameMode = fixed
frameSize = 0.0250
frameStep = 0.010
frameCenterSpecial = center
noPostEOIprocessing = 1
buffersize = 1000
[vectorPreemphasis:cVectorPreemphasis]
reader.dmLevel = frames
writer.dmLevel = framespe
k = 0.97
de = 0
[windower:cWindower]
reader.dmLevel=framespe
writer.dmLevel=winframe
copyInputName = 1
processArrayFields = 1
winFunc = ham
gain = 1.0
offset = 0
[transformFFT:cTransformFFT]
reader.dmLevel = winframe
writer.dmLevel = fftc
copyInputName = 1
processArrayFields = 1
inverse = 0
zeroPadSymmetric = 0
[fFTmagphase:cFFTmagphase]
reader.dmLevel = fftc
writer.dmLevel = fftmag
copyInputName = 1
processArrayFields = 1
inverse = 0
magnitude = 1
phase = 0
[melspec:cMelspec]
reader.dmLevel = fftmag
writer.dmLevel = mspec
nameAppend = melspec
copyInputName = 1
processArrayFields = 1
htkcompatible = 1
usePower = 0
nBands = 26
lofreq = 0
hifreq = 8000
usePower = 0
inverse = 0
specScale = mel
[mfcc:cMfcc]
reader.dmLevel=mspec
writer.dmLevel=mfcc1
copyInputName = 0
processArrayFields = 1
firstMfcc = 0
lastMfcc = 12
cepLifter = 22.0
htkcompatible = 1
[acf:cAcf]
reader.dmLevel=fftmag
writer.dmLevel=acf
nameAppend = acf
copyInputName = 1
processArrayFields = 1
usePower = 1
cepstrum = 0
acfCepsNormOutput = 0
[cepstrum:cAcf]
reader.dmLevel=fftmag
writer.dmLevel=cepstrum
nameAppend = acf
copyInputName = 1
processArrayFields = 1
usePower = 1
cepstrum = 1
acfCepsNormOutput = 0
oldCompatCepstrum = 1
absCepstrum = 1
[pitchAcf:cPitchACF]
reader.dmLevel=acf;cepstrum
writer.dmLevel=pitchACF
copyInputName = 1
processArrayFields = 0
maxPitch = 500
voiceProb = 0
voiceQual = 0
HNRdB = 0
F0 = 1
F0raw = 0
F0env = 1
voicingCutoff = 0.550000
[lpc:cLpc]
reader.dmLevel = fftc
writer.dmLevel = lpc1
method = acf
p = 8
saveLPCoeff = 1
lpGain = 0
saveRefCoeff = 0
residual = 0
forwardFilter = 0
lpSpectrum = 0
[formantLpc:cFormantLpc]
reader.dmLevel = lpc1
writer.dmLevel = formants
copyInputName = 1
nFormants = 3
saveFormants = 1
saveIntensity = 0
saveNumberOfValidFormants = 1
saveBandwidths = 1
minF = 400
maxF = 6000
[formantSmoother:cFormantSmoother]
reader.dmLevel = formants;pitchACF
writer.dmLevel = forsmoo
copyInputName = 1
medianFilter0 = 0
postSmoothing = 0
postSmoothingMethod = simple
F0field = F0
formantBandwidthField = formantBand
formantFreqField = formantFreq
formantFrameIntensField = formantFrameIntens
intensity = 0
nFormants = 3
formants = 1
bandwidths = 1
saveEnvs = 0
no0f0 = 0
[pitchJitter:cPitchJitter]
reader.dmLevel = wave
writer.dmLevel = jitter
writer.levelconf.nT = 1000
copyInputName = 1
F0reader.dmLevel = pitchACF
F0field = F0
searchRangeRel = 0.250000
jitterLocal = 1
jitterDDP = 1
jitterLocalEnv = 0
jitterDDPEnv = 0
shimmerLocal = 0
shimmerLocalEnv = 0
onlyVoiced = 0
inputMaxDelaySec = 2.0
[lld:cContourSmoother]
reader.dmLevel=mfcc1;pitchACF;forsmoo;jitter
writer.dmLevel=lld1
writer.levelconf.nT=10
writer.levelconf.isRb=0
writer.levelconf.growDyn=1
nameAppend = sma
copyInputName = 1
noPostEOIprocessing = 0
smaWin = 3
[deltaRegression1:cDeltaRegression]
reader.dmLevel=lld1
writer.dmLevel=lld_de
writer.levelconf.isRb=0
writer.levelconf.growDyn=1
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
[deltaRegression2:cDeltaRegression]
reader.dmLevel=lld_de
writer.dmLevel=lld_dede
writer.levelconf.isRb=0
writer.levelconf.growDyn=1
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
[functionals:cFunctionals]
reader.dmLevel = lld1;lld_de;lld_dede
writer.dmLevel = statist
copyInputName = 1
frameMode = full
// frameListFile =
// frameList =
frameSize = 0
frameStep = 0
frameCenterSpecial = left
noPostEOIprocessing = 0
functionalsEnabled=Extremes;Moments;Means
Extremes.max = 1
Extremes.min = 1
Extremes.range = 1
Extremes.maxpos = 0
Extremes.minpos = 0
Extremes.amean = 0
Extremes.maxameandist = 0
Extremes.minameandist = 0
Extremes.norm = frame
Moments.doRatioLimit = 0
Moments.variance = 1
Moments.stddev = 1
Moments.skewness = 0
Moments.kurtosis = 0
Moments.amean = 0
Means.amean = 1
Means.absmean = 1
Means.qmean = 0
Means.nzamean = 1
Means.nzabsmean = 1
Means.nzqmean = 0
Means.nzgmean = 0
Means.nnz = 0
[arffSink:cArffSink]
reader.dmLevel = statist
filename = \cm[outputfile(O):name of output file]
append = 0
relation = smile
instanceName = \cm[inputfile]
number = 0
timestamp = 0
frameIndex = 1
frameTime = 1
frameTimeAdd = 0
frameLength = 0
// class[] =
printDefaultClassDummyAttribute = 0
// target[] =
// ################### END OF openSMILE CONFIG FILE ######################
2x2 repeated measures (fully within-subjects) ANOVA power analysis in G*Power?
Question
2 answers
  • Lydia SearleLydia Searle
Hello,
I am trying to do a power analysis for a 2x2 repeated measures design to determine how many participants I need to achieve 80% power. I'm new to the world of power analysis and don't really have a strong stats background.
IV1 = face orientation
Level 1 = upright
Level 2 = inverted
IV2 = context
Level 1 = background present
Level 2 = background removed
This is a fully within-subjects design. I'm trying to use G*Power 3.1 to do the calculation. This is what I have entered into G*Power so far:
Test family: F tests
Statistical test: ANOVA: Repeated measures, within factors
Type of power analysis: A priori...
Effect size f = 0.25 (just assuming a medium effect)
Alpha err prob = 0.05
Power = 0.8
Number of groups = 1
Number of measurements = 4
Corr among rep measures = 0.5 (leaving it at the default)
Nonsphericity correction E = 1 (leaving it at the default)
The number of groups and number of measurements is the part I'm having an issue with. Will G*Power let me calculate n for a 2x2 within design, or is it assuming this is a 1x4 design? From what I've read and watched, number of groups comes into play if you have a between factor, which I don't, so I've set this to 1. As I have a 2x2 design, each participant is being measured 4 times, hence I've put number of measurements to 4.
Sometimes I've read/heard that G*Power DOES allow you to do a 2x2 within design, and sometimes I've read/heard that it does NOT allow you to do this.
I've had a look at GLIMMPSE 3.0.0. as an alternative but there are many fields it requires where I don't know the answer, mainly that there are a list of tests to choose from, none of which are a repeated measures ANOVA. It also wants me to put the means and SDs for each condition, but I haven't run the study yet, plus it's exploratory so I can't even really guess.
Can anyone with some stats / G*Power knowledge help?
Thank you,
Lydia

Related Publications

Conference Paper
The problem of generative process tracking involves detecting and adapting to changes in the underlying generative process that creates a time series of observations. It has been widely used for visual background modelling to adaptively track the generative process that generates the pixel intensities. In this paper, we extend this idea to audio ba...
Article
This paper introduces the principle of Audio analyzer, analyses the method of analyzing audio spectrum by Fast Fourier Transform. It puts forward the scheme of audio analysis over the audio signal using ARM7 and UDA1341TS, adopting the technology of double cache and DMA to increase speed of data processing. A system of collecting audio signal and a...
Conference Paper
In most real-world audio recordings, we encounter several types of audio events. In this paper, we develop a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which we call Acoustic Unit Descriptors or AUDs. Experiments show that the methodology works...
Got a technical question?
Get high-quality answers from experts.