Conference PaperPDF Available

Analysis of Facial Expressiveness During Experimentally Induced Heat Pain

Authors:

Abstract and Figures

To develop automatic pain monitoring systems, we need a deep understanding of pain expression and its influencing factors and we need datasets with high-quality labels. This work analyzes the variation of facial activity with pain stimulus intensity and among subjects. We propose two distinct methods to assess facial expressiveness and apply them on the BioVid Heat Pain Database. Experimental results show that facial response is rare during low intensity pain stimulation and that the proposed measures can successfully identify highly expressive individuals, for whom pain stimuli can be classified reliably, and non-expressive individuals, who may have felt less pain than intended and encoded in labels.
Content may be subject to copyright.
2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
Analysis of Facial Expressiveness During Experimentally Induced Heat Pain
Philipp Werner, Ayoub Al-Hamadi
University of Magdeburg, Germany
Email: {Philipp.Werner, Ayoub.Al-Hamadi}@ovgu.de
Steffen Walter
Ulm University, Germany
Email: Seffen.Walter@uni-ulm.de
Abstract—To develop automatic pain monitoring systems, we
need a deep understanding of pain expression and its influ-
encing factors and we need datasets with high-quality labels.
This work analyzes the variation of facial activity with pain
stimulus intensity and among subjects. We propose two distinct
methods to assess facial expressiveness and apply them on the
BioVid Heat Pain Database. Experimental results show that
facial response is rare during low intensity pain stimulation
and that the proposed measures can successfully identify highly
expressive individuals, for whom pain stimuli can be classified
reliably, and non-expressive individuals, who may have felt less
pain than intended and encoded in labels.
1. Introduction
Facial expression is a valuable cue for pain assessment
[1], [2] and can be exploited to recognize pain automatically
with computer vision and machine learning techniques [3],
[4], [5], [6]. However, pain expression does not directly
reflect the pain experience, but is known to be influenced
by personal factors and social context [2], [7]. This is a
challenge for pain recognition research, which relies on
well-labeled and representative datasets.
Several pain studies reported that a part of the subjects,
who are sometimes called stoics, displayed no facial re-
sponse to pain [7], [8]. Further, some works in automatic
pain recognition mention differences in expressiveness [9],
[10], [11], but do not analyze further. Werner et al. [12]
includes an experiment showing the difference in pain
recognition performance between more and less expressive
subjects, which motivated the more detailed analysis we
conduct in this work.
Many papers in automatic pain recognition avoid the
expressiveness problem by using an observer-based pain
definition [6], [13], [14], [15], i.e. videos or single frames
are labeled according to visible pain reactions. Stoics are
correctly labeled with this approach (no pain), but the la-
beling may fail in other cases, in which subject reported to
feel pain (see discussion in [12]).
Contributions: In this work, we propose methods to
estimate facial expressiveness (Sec. 3) that are independent
of each other. They do neither consider the features used
for automatic recognition nor the classification results. We
apply the methods on the BioVid Heat Pain Database [9],
[16] and assess the facial activity in the pain intensity classes
of the dataset (Sec. 4). Further, we show the correlation of
the proposed measures and their capability to estimate the
expressiveness of a subject (Sec. 5). Subjects are categorized
using the measures and the plausibility of the subject subsets
is supported by agreement of the measures and classification
rates that we achieve on these subsets with an independent
recognition system. We discuss the results in Sec. 6.
2. BioVid Heat Pain Database
The BioVid Heat Pain Database [9], [16] (online: http:
//www.iikt.ovgu.de/BioVid.html) was collected in a study
with 90 participants. Pain was induced experimentally by
a thermal stimulator (Medoc PATHWAY) at the right arm
and pain responses were recorded with video cameras and
physiological sensors. The temperature applied for pain
stimulation was recorded synchronously, which provides
fine-grained information in time and value domain.
Before the data recording was started, the participant’s
individual pain threshold and tolerance were determined
based on self-report. The goal was to compensate for indi-
vidual pain sensitivities, i.e. to select person-specific stim-
ulation temperatures that elicit pain experiences with same
severity across subjects.
In the main experiment, pain was stimulated in four
intensities. The highest pain intensity PA4 was stimulated
by applying the person-specific temperature that the subject
selected as pain tolerance (highest acceptable pain intensity);
the lowest pain intensity PA1 was defined by the person-
specific pain threshold (lowest temperature that the subject
identified as being painful). PA2 and PA3 were defined as
intermediate intensities by linear interpolation between PA1
and PA4. Each pain intensity was stimulated 20 times. At
each time temperature was held for 4 seconds followed by
a pause of 8-12 seconds. Pauses were used to extract the
non-painful baseline samples (BLN).
The BioVid database consists of several parts. In this
paper we use part A, which consists of 8,700 samples of 87
subjects, each with 5.5 seconds video and some time series.
The 100 samples per subject include 20 samples of each of
the 4 pain intensities (PA1 to PA4) and 20 samples of the
pain-free baseline condition (BLN). We further use part C,
which is the continuous superset of part A, to extract more
baseline samples.
P. Werner, A. Al-Hamadi, S. Walter, "Analysis of Facial Expressiveness During Experimentally Induced Heat Pain", in International Conference on Affective
Computing and Intelligent Interaction Workshops and Demos (ACIIW), 2017.
This is the accepted manuscript. The final, published version is available on IEEE Xplore.
(C) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other works.
Figure 1. Optical flow estimated with median flow tracker [20] in a 5×5
grid (black). Valid flow vectors (blue) are used to calculate a movement
vector for each grid cell (green, scaled for improved visibility). For each
frame, the maximum vector magnitude (of all 25 regions) is saved to create
a time series that estimates facial activity.
3. Measures to Estimate Facial Expressiveness
In this section we propose methods to measure facial
activity and expressiveness.
PSPI: First, we consider the Prkachin and Solomon
Pain Intensity (PSPI) [6], [17], a pain expression score based
on the facial action coding system (FACS) [18]. It is widely
used in facial expression based pain recognition [6], [13],
[15]. We use the FACS coded subset of the BioVid heat
pain database (1 sequence per intensity and subject = 435
videos = 60,030 frames of part A) [12], [19] to calculate
PSPI. The PSPI is a frame based measure; since we work
on video level, we use maximum PSPI per video as it has
been done to validate the score [17]. FACS and PSPI are
highly objective, but require a lot of work of a trained FACS
coder – about two hours per minute of video. In general,
this solution is impractical for huge datasets, but the score
is useful to validate the other proposed measures.
Subjective Rating: As a cheap alternative to FACS
coding, we propose subjective rating. For each subject in
the dataset, an untrained observer viewed the highest pain
intensity videos (as they show the strongest response) and
rated it on a scale from 1 (no observed reaction) to 5 (clear
reactions in nearly all videos). This is much faster than
FACS coding, i.e. we can look at much more samples in
less time. Although it may be less objective and may miss
some subtle details, it should be sufficient for the purpose
of categorizing persons by expressiveness roughly.
Optical Flow: We further propose to assess facial ac-
tivity in a fully automatic manner using low-level computer
vision features. Based on optical flow, a frame-by-frame
activity score is calculated as follows:
1) We localize the face using the Viola-Jones face detec-
tor available in OpenCV [21]. To exclude background,
we only use the center part of the bounding box (60%
of width and 80% of height).
2) The region of interest is subdivided in a regular 5×5
grid. In each grid cell we apply the (very robust)
median flow tracker by Kalal et al. [20]: We calculate
sparse optical flow of 100 points per cell though the
Lucas-Kanade algorithm. The flow is calculated for-
ward and backward and the error between both is used
to remove outliers (points with erroneous movement
estimation) by only using the half of points with
lower error. More outliers are removed by a similar
criterion based on normalized cross correlation. The
remaining flow vectors are considered to be valid and
are combined into one movement vector per region by
calculating the median in each spacial dimension.
3) We calculate the magnitude of all 25 movement vec-
tors and select the maximum as an estimate of the
current facial activity (one point in the time series of
the video).
Fig. 1illustrates the median flow estimation. To speed up
the calculation, we scale the image down to half of the
resolution before applying the above processing steps. We
processed all videos of part A of the BioVid database
and further extracted more baseline samples from part C
(the superset of part A) to reduce influence of occasional
movements during some baseline samples. This way we got
80 baseline and 4×20 pain samples per subject, each with
a facial activity time series.
To analyze the variation in expressiveness across pain
intensities (Sec. 4), we use that all videos have the same
length and are temporally aligned with respect to the pain
stimulation. We calculate the median time series per class,
i.e. we keep the temporal resolution and calculate the median
across all subjects’ samples of that class for each instance
of time.
Further, we analyze the facial activity of individuals
(Sec. 5). For this purpose we calculate the mean flow
magnitude per video yielding one distribution for each class
per subject. Next, we check whether the subjects shows
more activity during PA4 than during BLN by applying
a permutation test [22] (which makes fewer assumptions
and provides greater accuracy than a t-test): We randomly
generate 2,000 permutations of the class labels, i.e. the class
assignments of activity scores are mixed up randomly 2,000
times. For each permutation, we calculate the difference
of the mean result of PA4 and the mean result of BLN,
yielding the distribution of the activity differences under
the assumption that both activities are equal. Then pis
the probability of differences being greater or equal to the
observed difference, determined from the aforementioned
distribution. I.e. a p-value close to zero indicates that the
subject shows significantly more facial activity during PA4
than during BLN.
4. Facial Reactions Across Stimulus Intensities
Fig. 2compares the median facial activity time series
across pain stimulation classes: baseline/no pain (BLN),
pain intensity at person-specific pain threshold (PA1), pain
intensity at person-specific pain tolerance (PA4), and two
012345
0.15
0.2
0.25
0.3
time (s)
median flow (pixel)
BLN
PA1
PA2
PA3
PA4
Figure 2. Median flow time series during pain stimulation (PA*) and base-
line, i.e. pauses (BLN). Red background illustrates the timing of the high
temperature plateau for stimulation (not present in BLN). In the majority
of cases there is no significant activity during baseline (BLN), in lowest
(PA1), and second lowest pain intensity (PA2). In highest pain (PA4),
facial activity starts about 2 s after the temperature plateau is reached. In
PA3 activity is lower and starts later.
    
 







mean PSPI: 0.36 0.43 0.76 1.37 1.95
PSPI = 0
PSPI = 1
PSPI = 2
PSPI = 3
PSPI = 4
PSPI > 4
Figure 3. Distributions of the maximum PSPI across pain intensities (PA*
and baseline (BLN).
intermediate pain levels (PA2 and PA3). Each point in
these curves represents the flow magnitude that is exceeded
by 50% of the samples of that class (median). We clearly
observe activity in PA4 and PA3. Activity in PA2 is barely
above BLN at the end of the time window and unless some
noise, PA1 stays constant. As to be expected, the facial
response reduces with the pain stimulus severity. However,
at the beginning of the time windows, BLN has even higher
activity than the pain classes (with a falling tendency). This
indicates a bias in the selection of the baseline samples.
The baseline time windows follow pain stimulations, which
induce a high level of activity. This activity is still fading
away during many BLN samples, which we see in decreasing
activity. When the next pain stimulus starts, activity often
has reached a lower level than during BLN. This bias should
be avoided in future studies by applying longer pauses and
selecting baseline samples more diversely.
Fig. 2also shows the time delay between stimulation
and response, which gets shorter with higher intensity. A
higher temperature leads to faster heat conduction from the
TABLE 1. CORRELATION BETWEEN FACIAL EXPRESSIVENESS
MEASURES (PEA RSO N CORR EL ATION CO EFFIC IE NTS ).
Subj. rating Flow pPSPI PA4
Flow p-0.607
PSPI PA4 0,683 -0,476
PSPI SD 0.634 -0.465 0.876
thermal stimulator onto the skin, i.e. the skin is heated up
earlier. Further, higher intensity nociception results in faster
reactions to escape the noxious stimulation.
Fig. 3illustrates difference between classes in his-
tograms of PSPI scores. It conforms with Fig. 2: Facial
reactions are rare in low pain intensities – far less than
half of the samples have a maximum PSPI pain score
greater than zero in PA1 and PA2. Further, PSPI=1 also
often corresponds to no activity, since several subjects had
closed eyes permanently (see [12] for more details). Greater
scores occur more often with higher pain intensity. They
correspond to intensity of facial pain response; the higher
the score, the more pronounced the expression.
The majority of PA1 and PA2 samples do not comprise
observable pain response, since the stimulation did not
exceed the critical thresholds to trigger a facial response for
many subjects. If we only consider facial expression and
use all subjects, this induces a lot of label noise which is
a heavy burden for machine learning algorithms used for
automatic assessment systems. That is why we suggest to
focus on PA3 and PA4 and/or to work with a subject subset
excluding the non-responding subjects.
5. Facial Reactions Across Individuals
In Sec. 3we proposed to assess expressiveness of indi-
viduals by (1) subjective rating and (2) optical flow statistics.
This section reports experiments to validate those measures.
We compare them with FACS-based PSPI measures: (3)
PSPI of the subject in the coded PA4 sample and (4) stan-
dard deviation (SD) of the PSPI across the coded samples
off all classes. Table 1lists the correlations of the measures
(1-4). The subjective rating is strongly related to all other
measures. Flow pis moderately correlated to the PSPI
measures. PSPI, subjective rating, and flow phave been
determined with very distinct methods, so their correlations
support the validity of the measures.
In another experiment we apply several thresholds on
subjective rating and flow pto split the dataset into a more
and a less expressive subject subset. We compare the subsets
regarding mean values of the measures (1-4) and regarding
the classification accuracy that the pain recognition method
proposed in [12] achieves if it is trained and tested on
the subsets. We evaluated two classification problems with
leave-one-subject-out cross validation on each subject set:
BLN vs PA4 (2 classes) and BLN vs PA3 vs PA4 (3 classes).
To reduce training time, we learned random forest ensembles
of 100 instead of 1,000 trees.
TABLE 2. SPLITTING SUBJECTS IN A LESS EXPRESSIVE (LE) AND A M ORE E XP RES SIV E (ME) SUBSET BY SEVERAL FACIAL EXPRESSIVENESS
CRITERIA. FO R EACH S UB SET,T HE TABL E RE PORT S TH E NUM BER O F SU BJE CTS I N TH E SET,THE MEAN VALUES OF EXPRESSIVENESS MEASURES,
AN D CLA SSI FIC ATION A CCU RAC IES .
# Split criterion Subject count Subj. rating Flow pPSPI PA4 PSPI SD CA 2-class CA 3-class
LE ME LE ME LE ME LE ME LE ME LE ME LE ME
1 No split 87 2.5 0.29 2.0 1.0 71.8 50.5
2 Subj. rating >480 7 2.3 5.0 0.32 0.02 1.7 4.4 0.9 2.1 70.2 92.5 48.1 70.7
3 Subj. rating >368 19 2.0 4.4 0.37 0.04 1.5 3.7 0.8 1.8 66.6 89.7 44.6 68.7
4 Subj. rating >250 37 1.6 3.7 0.46 0.07 0.9 3.4 0.5 1.7 59.2 87.4 39.6 63.9
5 Subj. rating >120 67 1.0 2.9 0.59 0.21 0.3 2.4 0.2 1.3 49.3 78.5 31.7 55.5
6 Flow p < 0.01 56 31 2.0 3.4 0.46 0.00 1.2 3.3 0.7 1.6 63.3 88.2 41.0 67.3
7 Flow p < 0.148 39 1.9 3.3 0.53 0.01 1.1 3.0 0.6 1.5 60.7 85.5 40.4 63.5
8 Flow p < 0.240 47 1.7 3.2 0.61 0.03 0.9 2.9 0.5 1.4 58.0 83.5 38.2 61.5
9 Flow p < 0.334 53 1.6 3.0 0.67 0.05 1.0 2.6 0.5 1.3 60.5 80.7 40.2 59.0
10 Flow p(67 lowest) 20 67 1.6 2.8 0.80 0.14 0.8 2.3 0.4 1.2 56.1 76.2 38.8 55.0
CA 2-class: Classification accuracy BLN vs PA4 in percent (chance is 50%)
CA 3-class: Classification accuracy BLN vs PA3 vs PA4 in percent (chance is 33%)
LE: less expressive subset ME: more expressive subset
The results are shown in Table 2. The differences be-
tween the more and less expressive subject groups are
significant for all splits and measures. Differences in rows
(3-10) are significant with p < 0.001; differences in row
(2) are significant with p < 0.05 (due to low sample size
in the more expressive group). The differences illustrate
the high variation of subjects’ expressiveness and its big
impact on the predictive performance of automatic pain
assessment systems. Row (2) shows that recognition works
very well with highly expressive subjects (classification rate
of more than 92% in 2-class and more than 70% in 3-class
problem). The split in row (5) is also remarkable, since
the classification rates of the less expressive 20 subjects
are below chance. This indicates that we have found a
group of stoic subjects, who did not react visibly to the
induced pain stimuli. Very low PSPI and high flow psupport
this conclusion. Excluding these subjects from experiments
is reasonable, since they introduce noise that may con-
fuse machine learning and lead to suboptimal recognition
models. To compare subjective rating and flow p, we split
the subjects based on the ranking of flow pto get the
same group sizes as (5), see (10). PSPI and classification
accuracies indicate a less clear split than (5), which suggests
that subjective rating is more suitable for identifying stoic
subjects than the fully automatic flow-based method.
6. Discussion
We analyzed the facial expressiveness across stimulus in-
tensities and subjects in the BioVid Heat Pain Database. Low
intensity pain does not alway trigger facial response [7],
[23]. We have observed this in two independent measures for
the two lowest pain intensities PA1 and PA2, since the ma-
jority of samples and subjects did not show facial response
in these intensities. From our perspective, it is reasonable
to exclude these classes from facial expression based pain
recognition experiments whenever we want to reduce the
burden for machine learning in interest of better models.
This also moves the focus towards high pain intensities,
which are more important for clinical applications.
Further, we proposed to assess facial expressiveness
of individuals with subjective rating and an optical flow
based measure. A high agreement was found between the
two proposed measures and two FACS-based measures. By
thresholding the measures we were able to successfully
split the dataset in a more and a less expressive group,
which differed significantly regarding all measures and the
accuracy that was achieved by a pain recognition system
on the subject groups. The subsets induced by subjective
rating were more distinct than those induced by flow p. So
subjective rating seems to be superior for finding highly
expressive or stoic subjects, but flow palso yields good
results and can be calculated fully automatically.
Pain expression is known to be influenced by personal
factors [2], [7]. Among others, there is a person-specific
pain severity threshold that must be exceeded to trigger a
facial response [7], [23]. I.e. a reason for a lack of facial
response may be too low pain intensity. Experimental pain
studies offer highly controlled pain stimulation (regarding
both intensity and time), which is valuable as a high quality
ground truth. However, due to individual differences in pain
sensitivity some kind of self-report is needed to assess
pain experience, at least to calibrate the pain stimuli in an
preceding experiment. Although self-report is the current
“gold standard" in pain assessment, it has its weaknesses
[1]. It is a controlled and goal-oriented response to pain [2],
which might be affected by reporting bias and variances in
memory and verbal ability [1]. In an experimental study,
the (paid) participant may not want to feel severe pain, so
he may underestimate his pain and/or tolerance threshold
(intentionally or unintentionally) during stimulus calibration.
Further, some subjects have low pain sensitivity resulting in
a high tolerance threshold that cannot be stimulated without
causing tissue damage, which also leads to less painful
stimulation. We observed such problems in the study with
experimentally induced heat pain, in which we recorded the
BioVid Heat Pain Database [9], [16]. They probably have
caused the lack of facial pain response for some subjects.
The split criterion in row (5) of Table 2identifies such
subjects, which we call stoics. Since these non-responding
subjects, who probably experienced less pain than intended
and encoded in the labels, are not representative for the
planned application of clinical assessment of acute pain,
we propose to exclude them from future pain recognition
experiments.
Acknowledgments
Funded by German Research Foundation proj. AL 638/3-2.
References
[1] K. D. Craig, “The facial expression of pain Better than a thousand
words?” APS Journal, vol. 1, no. 3, pp. 153–162, 1992. 1,4
[2] K. D. Craig, K. M. Prkachin, and R. E. Grunau, “The facial ex-
pression of pain,” in Handbook of Pain Assessment, D. C. Turk and
R. Melzack, Eds. Guilford Press, 2011. 1,4
[3] P. Werner, A. Al-Hamadi, and R. Niese, “Pain Recognition and Inten-
sity Rating based on Comparative Learning,” in IEEE International
Conference on Image Processing (ICIP), 2012, pp. 2313–2316. 1
[4] ——, “Comparative learning applied to intensity rating of facial
expressions of pain,” International Journal of Pattern Recognition
and Artificial Intelligence, vol. 28, no. 05, p. 1451008, Jun. 2014. 1
[5] M. Kächele, P. Thiam, M. Amirian, P. Werner, S. Walter,
F. Schwenker, and G. Palm, “Multimodal Data Fusion for Person-
Independent, Continuous Estimation of Pain Intensity,” in Engineer-
ing Applications of Neural Networks, Springer, 2015, pp. 275–285,
DOI: 10.1007/978-3-319-23983-5_26. 1
[6] P. Lucey, J. F. Cohn, K. M. Prkachin, P. E. Solomon, S. Chew, and
I. Matthews, “Painful monitoring: Automatic pain monitoring using
the UNBC-McMaster shoulder pain expression archive database,
Image and Vision Computing, vol. 30, no. 3, pp. 197–205, 2012.
1,2
[7] K. M. Prkachin and K. D. Craig, “Expressing pain: The communica-
tion and interpretation of facial pain signals,” Journal of Nonverbal
Behavior, vol. 19, no. 4, pp. 191–205, Dec. 1995. 1,4
[8] M. Kunz and S. Lautenbacher, “The faces of pain: a cluster analysis
of individual differences in facial activity patterns of pain,European
Journal of Pain, vol. 18, no. 6, pp. 813–823, Jul. 2014. 1
[9] P. Werner, A. Al-Hamadi, R. Niese, S. Walter, S. Gruss, and H. C.
Traue, “Towards Pain Monitoring: Facial Expression, Head Pose, a
new Database, an Automatic System and Remaining Challenges,”
in Proceedings of the British Machine Vision Conference (BMVC).
BMVA Press, 2013, pp. 119.1–119.13. 1,5
[10] ——, “Automatic Pain Recognition from Video and Biomedical Sig-
nals,” in International Conference on Pattern Recognition, 2014, pp.
4582–4587. 1
[11] M. Kächele, M. Amirian, P. Thiam, P. Werner, S. Walter, G. Palm, and
F. Schwenker, “Adaptive confidence learning for the personalization
of pain intensity estimation systems,” Evolving Systems, vol. 8, no. 1,
pp. 71–83, 2017. 1
[12] P. Werner, A. Al-Hamadi, K. Limbrecht-Ecklundt, S. Walter, S. Gruss,
and H. Traue, “Automatic Pain Assessment with Facial Activity
Descriptors,” IEEE Transactions on Affective Computing, vol. PP,
no. 99, pp. 1–1, 2016. 1,2,3
[13] S. Kaltwang, O. Rudovic, and M. Pantic, “Continuous Pain Intensity
Estimation from Facial Expressions,” in Advances in Visual Comput-
ing, Springer, 2012, pp. 368–377. 1,2
[14] K. Sikka, A. Dhall, and M. S. Bartlett, “Classification and weakly
supervised pain localization using multiple segment representation,”
Image and Vision Computing, vol. 32, no. 10, pp. 659–670, 2014. 1
[15] O. Rudovic, V. Pavlovic, and M. Pantic, “Context-Sensitive Dynamic
Ordinal Regression for Intensity Estimation of Facial Action Units,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 37, no. 5, pp. 944–958, 2015. 1,2
[16] S. Walter, P. Werner, S. Gruss, H. Ehleiter, J. Tan, H. C. Traue, A. Al-
Hamadi, A. O. Andrade, G. Moreira da Silva, and S. Crawcour,
“The BioVid Heat Pain Database: Data for the Advancement and
Systematic Validation of an Automated Pain Recognition System,” in
IEEE Int’l Conf. on Cybernetics (CYBCONF), 2013, pp. 128–131. 1,
5
[17] K. M. Prkachin and P. E. Solomon, “The structure, reliability and
validity of pain expression: Evidence from patients with shoulder
pain,” PAIN, vol. 139, no. 2, pp. 267–274, 2008. 2
[18] P. Ekman and W. V. Friesen, Facial Action Coding System: A
technique for the measurement of facial movements. Palo Alto:
Consulting Psychologist Press, 1978. 2
[19] K. Limbrecht-Ecklundt, P. Werner, H. C. Traue, A. Al-Hamadi, and
S. Walter, “Mimische Aktivität differenzierter Schmerzintensitäten,
Der Schmerz, vol. 30, no. 3, pp. 248–256, Apr. 2016. 2
[20] Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-Backward Error:
Automatic Detection of Tracking Failures,” in International Confer-
ence on Pattern Recognition (ICPR), 2010, pp. 2756–2759. 2
[21] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical Analysis of
Detection Cascades of Boosted Classifiers for Rapid Object Detec-
tion,” in DAGM Pattern Recognition Symposium, 2003, pp. 297–304.
2
[22] D. S. Moore, G. P. McCabe, W. M. Duckworth, and S. L. Sclove,
The practice of business statistics: using data for decisions. Wh
Freeman, 2003. 2
[23] M. Kunz, V. Mylius, K. Schepelmann, and S. Lautenbacher, “On the
relationship between self-report and facial expression of pain,The
Journal of Pain, vol. 5, no. 7, pp. 368–376, Sep. 2004. 4
... For Part A of the BioVid Heat Pain database, the authors in [5] propose a subject subset that excludes 20 study participants who did not respond visibly to the pain stimuli. For this work, we use 67 subjects in [5] recommendations out of 87 subjects. ...
... For Part A of the BioVid Heat Pain database, the authors in [5] propose a subject subset that excludes 20 study participants who did not respond visibly to the pain stimuli. For this work, we use 67 subjects in [5] recommendations out of 87 subjects. Simultaneously, we also compare previous studies based on 87 subjects. ...
... The performances are estimated with LOSO on all the available subjects in the dataset. In [5], the authors propose a subject subset of 20 that excludes participants as noise subjects because they do not respond clearly to applied pain stimuli. So, LOSO cross-validation is conducted with the remaining 67 subjects. ...
Article
Full-text available
Automatic pain recognition is essential in healthcare. In previous studies, automatic pain recognition methods preferentially apply the features extracted from physiological signals for conventional models. This method provides good performance but mainly relies on medical expertise for feature extraction of physiological signals. This paper presents a deep learning approach based on physiological signals that have the role of both feature extraction and classification, regardless of medical expertise. We propose multi-level context information for each physiological signal discriminating between pain and painlessness. Our experimental results prove that multi-level context information has more significant performances than uni-level context information based on Part A of the BioVid Heat Pain database and the Emopain 2021 dataset. For Part A of the BioVid Heat Pain database, our experimental results for pain recognition tasks include: Pain 0 and Pain 1, Pain 0 and Pain 2, Pain 0 and Pain 3, and Pain 0 and Pain 4. In the classification task between Pain 0 and Pain 4, the results achieve an average accuracy of 84.8 ± 13.3% for 87 subjects and 87.8 ± 11.4% for 67 subjects in a Leave-One-Subject-Out cross-validation evaluation. The proposed method adopts the ability of deep learning to outperform conventional methods on physiological signals.
... Recent developments have seen a range of innovative methods to assess pain levels from video data. Werner et al. [20] focused on domain-specific features, using facial action markers with a deep random forest (RF) classifier, and proposed a 3D distance computation method among facial points while in [21], an optical flow method was introduced to track facial points and capture expression changes across frames. The dynamic aspects of pain were addressed by developing long short-term memory networks combined with sparse coding (SLSTM) [22]. ...
... The proposed visionbased method, utilizing RGB and synthetic thermal modalities, demonstrated performances comparable to or exceeding that of previous methods. Compared to the findings reported in studies [21], [22], [24], [25], improved accuracy was attained in binary and multi-level tasks. It is noted that the authors in [20] reported accuracies of 72.40% and 30.80%, showing an improvement of 1.37% and 0.10% over our results. ...
Preprint
Full-text available
Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.
... The BioVid (PartA) consists of 87 subjects captured in a controlled environment, where each subject falls into one of five categories: "no pain" and four distinct pain levels denoted as PA1, PA2, PA3, and PA4, from lower to higher. Previous research [42] The pain intensity for each frame in the video is scored using the PSPI scale [29], which ranges from 0 to 15. Due to the significant imbalance between pain intensities, we are following the same quantization strategy as [30], [33] that divides the pain intensities into five discrete levels: 0 (no pain), 1(1), 2(2), 3(3), 4(4-5), 5 (6)(7)(8)(9)(10)(11)(12)(13)(14)(15). ...
Conference Paper
Full-text available
Adapting a deep learning model to a specific target individual is a challenging facial expression recognition (FER) task that may be achieved using unsupervised domain adaptation (UDA) methods. Although several UDA methods have been proposed to adapt deep FER models across source and target data sets, multiple subject-specific source domains are needed to accurately represent the intra- and inter-person variability in subject-based adaption. This paper considers the setting where domains correspond to individuals, not entire datasets. Unlike UDA, multi-source domain adaptation (MSDA) methods can leverage multiple source datasets to improve the accuracy and robustness of the target model. However, previous methods for MSDA adapt image classification models across datasets and do not scale well to a more significant number of source domains. This paper introduces a new MSDA method for subject-based domain adaptation in FER. It efficiently leverages information from multiple source subjects (labeled source domain data) to adapt a deep FER model to a single target individual (unlabeled target domain data). During adaptation, our subject-based MSDA first computes a between- source discrepancy loss to mitigate the domain shift among data from several source subjects. Then, a new strategy is employed to generate augmented confident pseudo-labels for the target subject, allowing a reduction in the domain shift between source and target subjects. Experiments performed on the challenging BioVid heat and pain dataset with 87 subjects and the UNBC- McMaster shoulder pain dataset with 25 subjects show that our subject-based MSDA can outperform state-of-the-art methods yet scale well to multiple subject-based source domains.
... Various innovative approaches have emerged to estimate pain levels from video data. Werner et al. (30) introduced an optical flow method that tracks facial points to capture changes in facial expressions across frame sequences. Focusing on the dynamic nature of pain led to the development of long short-term memory networks with sparse coding (SLSTM) (31). ...
Article
Full-text available
Accurate and objective pain evaluation is crucial in developing effective pain management protocols, aiming to alleviate distress and prevent patients from experiencing decreased functionality. A multimodal automatic assessment framework for acute pain utilizing video and heart rate signals is introduced in this study. The proposed framework comprises four pivotal modules: the Spatial Module, responsible for extracting embeddings from videos; the Heart Rate Encoder, tasked with mapping heart rate signals into a higher dimensional space; the AugmNet, designed to create learning-based augmentations in the latent space; and the Temporal Module, which utilizes the extracted video and heart rate embeddings for the final assessment. The Spatial-Module undergoes pre-training on a two-stage strategy: first, with a face recognition objective learning universal facial features, and second, with an emotion recognition objective in a multitask learning approach, enabling the extraction of high-quality embeddings for the automatic pain assessment. Experiments with the facial videos and heart rate extracted from electrocardiograms of the BioVid database, along with a direct comparison to 29 studies, demonstrate state-of-the-art performances in unimodal and multimodal settings, maintaining high efficiency. Within the multimodal context, 82.74% and 39.77% accuracy were achieved for the binary and multi-level pain classification task, respectively, utilizing 9.62 million parameters for the entire framework.
... This study specifically examines a subset of the database, which includes videos labelled with pain/no pain or four different pain intensities. Our analysis focused on a subgroup of subjects who displayed visible reactions to the pain stimuli [25], excluding those who did not react visibly; Pain E-motion Faces Database (PEMF) [10] consists of 272 micro-clips of 68 different identities. Each subject shows a neutral expression and three pain-related facial expressions: posed, induced via tonic spontaneous pain by algometer and phasic spontaneous pain by CO 2 laser. ...
Chapter
This study focuses on using facial expressions to evaluate acute pain levels. We analyse videos by relying on an extended set of 17 Action Units (AUs) and head pose components. Multiple models are trained and compared to detect the presence of pain and classify its intensity on a 5-point scale, ranging from no pain to high pain. Validation studies were conducted on two publicly available datasets, evaluating both in within- and cross-dataset conditions. The experimental results show better pain classification performance when using both the extended AU set, instead of the restricted AU set related to pain expressions, and head pose information.
... The BioVid dataset contains videos of 90 healthy adults between the ages of 20 and 65. The dataset was created by the Neuroinformatics Technology Group at the University of Magdeburg and the Medical Psychology Group at the University of Ulm [11,26]. Participants underwent controlled experiments in which thermal stimuli were applied to various body regions, including the forearm and leg. ...
Article
Full-text available
Pain assessment is a critical aspect of healthcare, influencing timely interventions and patient well-being. Traditional pain evaluation methods often rely on subjective patient reports, leading to inaccuracies and disparities in treatment, especially for patients who present difficulties to communicate due to cognitive impairments. Our contributions are three-fold. Firstly, we analyze the correlations of the data extracted from biomedical sensors. Then, we use state-of-the-art computer vision techniques to analyze videos focusing on the facial expressions of the patients, both per-frame and using the temporal context. We compare them and provide a baseline for pain assessment methods using two popular benchmarks: UNBC-McMaster Shoulder Pain Expression Archive Database and BioVid Heat Pain Database. We achieved an accuracy of over 96% and over 94% for the F1 Score, recall and precision metrics in pain estimation using single frames with the UNBC-McMaster dataset, employing state-of-the-art computer vision techniques such as Transformer-based architectures for vision tasks. In addition, from the conclusions drawn from the study, future lines of work in this area are discussed.
... We improved the performance of CNN with frontal RGB images compared to RFc with FAD by about 1% when using the sample weighting method. Downweighting misclassified samples during training improved the performance, these samples often contain low or no facial responses to pain (see [53] for details of this phenomenon). We duplicated some training samples with more facial responses based on the classification score (score above 0.3). ...
... The model may also include data fusion, especially for system integration in a multimodal system, which can be done at the decision, feature, or intermediate levels. The most of methods utilise Support Vector Machines (SVMs) to categorise pain, either linearly or with a Radial Basis Function (RBF) kernel [11], [28], [30], [38], [39], [42]. Relevance Vector Regression generates continuous-valued output [1], [5].and ...
Article
Pain is a dynamic and subjective experience that can be difficult to measure. Automated clinical pain assessment method offers a lot of potential, and they're not widely employed in medical practice presently. There is now an a need for a comprehensive and precise method to identify acute pain among intensive care units in order to assist professionals in dispensing pain relievers at the proper dosage and on time. We review and discuss autonomous pain identification algorithms in this article also provide an introduction of pain processes and reactions, as well as a discussion of commonly used clinical pain assessment techniques and shared datasets.
Article
Administration of pain relieving drugs requires information of actual existence and severity of pain. Present work focuses towards patient’s verbal communication independent identification of the instant of generation and extinction of pain sensation using Galvanic skin response (GSR). Alteration of GSR during different stages of pain was recorded along with no pain condition. With application of pain stimuli, the nature of gSr becomes chaotic which is analyzed in Phase Space Reconstruction domain. Further, Time Absolute Integral and Z-score based analysis clearly provides the pain onset & offset time. Probably for the first time, pain onset & offset time detection was studied using GSR signal and Phase Space Reconstruction technique. Four pain states were considered for the study. The absolute average error of the proposed method is calculated as 1.7 second and 1.1 second for the transition instant of no pain to any of the pain sensation and any pain to no pain sensation respectively considering 85 subjects. Further analysis is presented in result section. Present method explains a model to detect pain onset and offset time using GSR. The zero crossing analysis makes the measurement robust and easy to detect pain onset & offset time. Actual estimation of pain onset & offset time may significantly enhance automatic drug administration management of patients undergoing exposure to pain. This can reduce the overdosing risk of opioids to reduce pain.
Article
Full-text available
In this work, a method is presented for the continuous estimation of pain intensity based on fusion of bio-physiological and video features. Furthermore, a method is proposed for the adaptation of the system to unknown test persons based on unlabeled data. First, an analysis is presented that shows which modalities and feature sets are suited best for the task of recognizing pain levels in a person-independent setting. For this, a large set of features is extracted from the available bio-physiological channels (ECG, EMG and skin conductivity) and the video stream. We then propose a method to learn the confidence of a regression system using a multi-stage ensemble classifier. Based on the outcome of the classifier, which is realized by a neural network, confident samples are selected by the adaptation procedure. In various experiments, we show that the algorithm is able to detect highly confident samples which can be used to improve the overall performance. We furthermore discuss the current limitations of automatic pain intensity estimation—in light of the presented approach and beyond.
Article
Full-text available
Pain is a primary symptom in medicine, and accurate assessment is needed for proper treatment. However, today’s pain assessment methods are not sufficiently valid and reliable in many cases. Automatic recognition systems may contribute to overcome this problem by facilitating objective and continuous assessment. In this article we propose a novel feature set for describing facial actions and their dynamics, which we call facial activity descriptors. We apply them to detect pain and estimate the pain intensity. The proposed method outperforms previous state-of-the-art approaches in sequence-level pain classification on both, the BioVid Heat Pain and the UNBC-McMaster Shoulder Pain Expression database. We further discuss major challenges of pain recognition research, benefits of temporal integration, and shortcomings of widely used frame-based pain intensity ground truth.
Conference Paper
Full-text available
In this work, a method is presented for the continuous estimation of pain intensity based on fusion of bio-physiological and video features. The focus of the paper is to analyse which modalities and feature sets are suited best for the task of recognizing pain levels in a person-independent setting. A large set of features is extracted from the available bio-physiological channels (ECG, EMG and skin conductivity) and the video stream. Experimental validation demonstrates which modalities contribute the most to a robust prediction and the effects when combining them to improve the continuous estimation given unseen persons.
Article
Full-text available
Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when , what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and how are modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from th- imbalanced intensity data.
Conference Paper
Full-text available
Automatic pain recognition is an evolving research area with promis-ing applications in health care. In this paper, we propose the first fully automatic approach to continuous pain intensity estimation from facial images. We first learn a set of independent regression functions for continuous pain intensity esti-mation using different shape (facial landmarks) and appearance (DCT and LBP) features, and then perform their late fusion. We show on the recently published UNBC-MacMaster Shoulder Pain Expression Archive Database that late fusion of the afore-mentioned features leads to better pain intensity estimation compared to feature-specific pain intensity estimation.
Article
Full-text available
Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through ‘concept frames’ to ‘concept segments’ and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation.
Conference Paper
Full-text available
Pain is what the patient says it is. But what about these who cannot utter? Automatic pain monitoring opens up prospects for better treatment, but accurate assessment of pain is challenging due to the subjective nature of pain. To facilitate advances, we contribute a new dataset, the BioVid Heat Pain Database which contains videos and physiological data of 90 persons subjected to well-defined pain stimuli of 4 intensities. We propose a fully automatic recognition system utilizing facial expression, head pose information and their dynamics. The approach is evaluated with the task of pain detection on the new dataset, also outlining open challenges for pain monitoring in general. Additionally, we analyze the relevance of head pose information for pain recognition and compare person-specific and general classification models.
Conference Paper
Full-text available
The objective measurement of subjective, multi-dimensionally experienced pain is still a problem that has yet to be adequately solved. Though verbal methods (i.e., pain scales, questionnaires) and visual analogue scales are commonly used for measuring clinical pain, they tend to lack in reliability or validity when applied to mentally impaired individuals. Expression of pain and/or its biopotential parameters could represent a solution. While such coding systems already exist, they are either very costly and time-consuming, or have been insufficiently evaluated with regards to the theory of mental tests. Building on the experiences made to date, we collected a database using visual and biopotential signals to advance an automated pain recognition system, to determine its theoretical testing quality, and to optimize its performance. For this purpose, participants were subjected to painful heat stimuli under controlled conditions.
Article
Background The monitoring of facial expressions to assess pain intensity provides a way to determine the need for pain medication in patients who are not able to do so verbally. Objectives In this study two methods for facial expression analysis – Facial Action Coding System (FACS) and electromyography (EMG) of the zygomaticus muscle and corrugator supercilii – were compared to verify the possibility of using EMG for pain monitoring. Material and methods Eighty-seven subjects received painful heat stimuli via a thermode on the right forearm in two identical experimental sequences – with and without EMG recording. Results With FACS, pain threshold and pain tolerance could be distinguished reliably. Multiple regression analyses indicated that some facial expressions had a predictive value. Correlations between FACS and pain intensity and EMG and pain intensity were high, indicating a closer relationship for EMG and increasing pain intensity. For EMG and FACS, a low correlation was observed, whereas EMG correlates much better with pain intensity. Conclusions Results show that the facial expression analysis based on FACS represents a credible method to detect pain. Because of the expenditure of time and personal costs, FACS cannot be used properly until automatic systems work accurately. The use of EMG seems to be helpful in the meantime to enable continuous pain monitoring for patients with acute post-operative pain.