Content uploaded by Philipp Werner
Author content
All content in this area was uploaded by Philipp Werner on Nov 26, 2014
Content may be subject to copyright.
AUTOMATIC HEART RATE ESTIMATION FROM PAINFUL FACES
Philipp Werner?Ayoub Al-Hamadi?Steffen Walter†Sascha Gruss†Harald C. Traue†
?Institute for Information Technology and Communications, University of Magdeburg, Germany
{Philipp.Werner, Ayoub.Al-Hamadi}@ovgu.de
†Department for Psychosomatic Medicine and Psychotherapy, University of Ulm, Germany
ABSTRACT
Non-contact measurement of the heart rate is more comfort-
able than classical methods and can facilitate new applica-
tions. However, current approaches are very susceptible to
motion. Aiming at overcoming this limitation, we propose
a new, more robust approach to estimate the heart rate from
a videotaped face. It features non-planar motion compen-
sation, fusion of multiple ROI signals, and a RANSAC-like
time-domain heart rate estimation algorithm. In experiments
with a comprehensive pain recognition dataset we show that
our approach outperforms previous methods in the presence
of spontaneous head movement and facial expression.
Index Terms—heart rate, imaging photoplethysmogra-
phy (PPG), motion artifacts, facial expression, pain
1. INTRODUCTION
The heart rate (HR) and its variability are important physi-
ological parameters, not only for patients in life-threatening
conditions, but also for risk assessment [1] or finding the
right balance in sporting activities [2]. The gold standard
measurement method is electrocardigraphy (ECG). However,
it requires medical staff to properly attach electrodes to the
patient. These can cause skin irritation and discomfort. An
easier to use alternative for getting the heart rate is a pulse
oximetry sensor. It measures the peripheral blood perfusion
optically through a method called photoplethysmography
(PPG). As blood absorbs more light than surrounding tis-
sue, the blood volume pulse is reflected in periodic changes
of the light absorption. Usually, the sensor is attached to a
finger or earlobe with a spring-loaded clip, which may be
uncomfortable or even painful when worn too long.
Next to the clinically established contact PPG method,
several approaches for remote PPG has been proposed, most
working with cheap consumer cameras. They promise very
comfortable heart rate measurement and open up prospects
for new applications, e. g. in tele-medicine or sports. Some of
the methods rely on a dedicated light source [3, 4, 5, 6, 2, 7, 8]
as contact PPG. Others showed that it is possible to measure
This work was funded by the German Research Foundation (DFG),
project AL 638/3-1 AOBJ 585843.
several physiological parameters with ambient light only [9,
10, 11, 12, 13, 14].
In general, imaging-based PPG methods extract the mean
intensity of a region of interest (ROI) on face or hand for
each video frame, obtaining a temporal signal for further pro-
cessing. In case of color-cameras the signal is either ob-
tained from the mean of the green channel [9, 6, 7, 13], or
by extracting the mean-of-region signal for each color chan-
nel and applying independent component analysis [10, 11],
principal component analysis [12] or non-linear dimension
reduction techniques [14]. The latter methods aim at sepa-
rating the original PPG from interfering signals. The imag-
ing PPG signal, i. e. mean intensity, mean of green channel
or result of blind source separation is either directly analyzed
using frequency-domain methods (e. g. [9, 10, 2]) or band-
pass filtered and interpolated for detection of peaks or zero-
crossings in the time domain, which ideally correspond to
heart beats (e. g. [11, 14]). The resulting time series is filtered
for wrong beats followed by the calculation of the inter-beat-
intervals (IBIs, corresponding to RR intervals in ECG). The
mean heart rate is then determined from the mean of IBIs.
Imaging PPG measures the amount of reflected light
which changes with the cardiac cycle. However, this change
has a low amplitude compared to variations of other factors
like the location of measurement, its illumination or the cam-
era configuration. Consider an involuntary slight movement
of the head. For example, it might 1) alter the measurement
location to one with different tissue structure (skin texture), 2)
shadow the measurement location, or 3) trigger a automatic
gain correction of the camera. Each of these effects induces
artifacts in the measured signal, very often with higher am-
plitude than the pulsatile PPG signal component.
Most previous studies tried to avoid motion-induced arti-
facts by instructing the participants to minimize movement.
Even studies which aimed at motion-compensation only al-
lowed slow movement [10] or planar movement [2]. How-
ever, these constraints restrict the practical applicability. In
various potential applications the subject cannot be expected
to be motionless (e. g. for long term monitoring, or during
sports) and it never can be enforced. Further, several of the
above mentioned works do not describe complete algorithms,
but rely on manual intervention e. g. for selecting the blind
This is the accepted manuscript. The final, published version is available on IEEE Xplore.
P. Werner, A. Al-Hamadi, S. Walter, S. Gruss, und H. C. Traue, "Automatic Heart Rate Estimation from Painful Faces", in
IEEE International Conference on Image Processing, Paris, France, 2014, pp. 1947–1951.
pbl pbr
pell
pelr
perl perr
pml pmr
Fig. 1. Regions of interest and facial points used as anchors.
source separation component [2, 12] or ensuring correctness
of the extracted IBIs [8]. This is also not acceptable in practi-
cal applications.
In this paper, we contribute a novel, fully automatic imag-
ing PPG method that is more robust to strong head motion and
facial expression than previous approaches. Sect. 2 describes
our algorithm in detail. In Sect. 3 we summarize the experi-
ments we conducted with the BioVid Heat Pain Database [15,
16]. Conclusion and outlook follow in Sect. 4.
2. ROBUST HEART RATE ESTIMATION
The new approach we propose here, extracts multiple PPG
signals from motion-stabilized ROIs (Sect. 2.1), detects and
groups peaks from the signals to create a set of hypothetic
beat-IBI pairs (Sect. 2.2) and looks for most plausible IBI se-
ries (Sect. 2.3) which is used to calculate the HR. For the fol-
lowing description and our experiments we assumed the HR
to be in the range of 30 to 200 beats per minute (bpm).
2.1. Motion-Compensated PPG Signals from Video
As in several previous works, we use the face detector by
Lienhart et al. [17] as provided in the OpenCV library. Pre-
vious works determine their region of interest (ROI) directly
from the facial bounding box. In contrast, we refine the lo-
cation using state-of-the-art facial feature point detection by
Xiong et al. [18]. Among others, it provides the points de-
picted in Fig. 1. We temporally smooth the points with a five
frame median filter. From the resulting points we calculate
four ROIs as specified in the appendix. One is on the fore-
head, one on each of the cheeks, and one covering a larger
part of the face (see Fig. 1). The way of determining the ROIs
stabilizes the location of measurement for a wide range of
motions, including out-of-plane rotations. This significantly
reduces motion artifacts. Using multiple ROIs reduces the
probability of loosing PPG peaks through local artifacts, as
they are common for facial expressions (e. g. deepening of
the nasolabial fold or closing of the eyes). For each ROI
70
80
90
100
amplitude
−2
0
2
amplitude
Face Forehead
Left cheek Right cheek
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
0
5
time (s)
amp.
Fig. 2. Raw mean-of-green-channel signals (top), band-pass
filtered signals (middle), ECG signal (bottom) and some of
the corresponding video frames.
and frame we calculate the mean of the green channel (dis-
cussion in Sect. 3). The resulting signals are band-pass fil-
tered (zero-phase FIR filter, 64-point Hamming window, 0.5-
3.33 Hz). However, as obvious in Fig. 2 the proposed motion
compensation and filtering is not sufficient to remove artifacts
completely. Several problems like lighting change or occlu-
sions/disocclusions remain. As we allow for spontaneously
fast movements, the artifact’s frequency spectrum often over-
laps with the band of interest. So artifacts introduce additional
peaks or phase shifts in the band-pass filtered signal.
2.2. Inter-Beat-Interval Candidates
The next step is to detect potential PPG peaks (corresponding
to heart beats), from which we derive IBI candidates. For this,
the band-pass filtered ROI signals are first interpolated with
a cubic spline function to a sampling rate of 100 Hz. This
refines the temporal resolution for peak detection, which is
the next step. We only consider the upper half of each signal,
i. e. we subtract the signal’s median and truncate the resulting
signal on the zero-level (see Fig. 3a). We observed, that this
reduces the rate of artifact peaks. Further, if there are multiple
peaks in a period less than the minimum IBI (0.3 s) in one
signal, only the peak with highest amplitude is taken.
Next, regardless of their origin (ROI) or amplitude, the
peaks are grouped in the time dimension employing agglom-
erative clustering with complete linkage to obtain the PPG
peak candidates (see Fig. 3b-c). More precisely, if there are
multiple peaks in a period of less than 0.2 s, they are replaced
by one peak at their average time.
For each PPG peak candidate, we calculate the periods to
all potentially correct PPG peak successors, i. e. to all candi-
dates within the following 2 s (maximum IBI). This yields the
IBI candidates which can be plotted in time-IBI plane. The
plot implicitly defines a directed graph of all possible IBI se-
ries (see Fig. 3d).
0 0.5 1 1.5 2 2.533.5 4 4.5 5 5.5 6 6.5 7 7.5 8
0
0.5
1
1.5
amp.
(a)
(b)
(c)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.555.5 6 6.5 7 7.5 8
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
time (s)
IBI (s)
(d)
Fig. 3. Getting the IBI candidates. (a) Upper halves of the
spline interpolated band-pass filtered ROI signals with de-
tected peaks. (b) Temporal distribution of peaks from all
ROIs. (c) Result of peak clustering (PPG peak candidates).
(d) IBI candidates with graph of all possible IBI series. Dot-
ted vertical lines are ECG R-wave peaks (for comparison).
2.3. Interval Series Consensus
The last step is to find the most plausible series of IBI inter-
vals in the time-IBI plane. Our algorithm to solve this task
is inspired by the RANSAC algorithm [19], which can fit a
model to a set of samples, even if it is dominated by outliers.
Basically, we apply the idea of creating a set of hypotheses,
ranking them and selecting the best. Each hypothesis is cre-
ated from a minimal sample set. The ranking is based on
the consensus set, i. e. the samples which are consistent with
the hypothesized model. As the IBI model we choose a lin-
ear function of time. Samples which deviate less than ±20%
from the model are considered to be consistent, i. e. are in the
consensus set (see Fig. 4).
In contrast to RANSAC, we do not determine our hy-
potheses from randomly chosen samples. Since HR does not
change abruptly, most of the randomly chosen pairs result in
a model line with too high slope. Thus, to reduce compu-
tation time we systematically select the pairs, first by only
combining samples from the left half of the plane with sam-
ples from the right, second by only considering the lowest
slope hypotheses (10% of all).
A second difference to RANSAC and its variants is how
we rank a consensus set. Again, we exploit application do-
main knowledge. First, the graph of model-consistent IBI se-
ries is created (arrows in Fig. 4). It may consist of multiple
unconnected subgraphs. Next, we look for the longest path in
this graph, i. e. the longest continuous IBI series regarding the
number of intervals (blue arrows in Fig. 4). The length of this
series is used to rank a consensus set. So, the longest of all
0 0.5 1 1.522.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
time (s)
IBI (s)
Fig. 4. Interval Series Consensus. Three hypothetic linear IBI
models (solid lines), each with its consensus set range (gray
background), IBI series graph (arrows) and longest connected
IBI series (thick blue arrows). The middle series is selected
as the best solution. ECG IBIs (orange) are shown for com-
parison. The HR estimation error of this sample is 1.14 bpm.
nearly linear IBI series with low slope is considered to be the
most plausible and evident. Finally, we calculate the HR as
60/IBI with IBI being the mean IBI of the longest series.
3. EXPERIMENTS
We conduct experiments with the BioVid Heat Pain Database
[15, 16], a comprehensive dataset collected in a study with
90 participants. In the study, each participant was subjected
to a series of painful heat stimuli alternating with periods of
rest. Video (25 Hz) of the upper body and ECG (512 Hz)
were recorded synchronously. The participants were allowed
to move freely and behave spontaneously while remaining
seated in front of the camera. So the recorded material con-
tains a lot of head motion, including non-planar motion, and
even more facial expression, namely expression of pain. We
extract 3,600 non-overlapping time windows of 8s length,
half during stimulation of highest induced pain level and half
during the resting periods. For each of the time windows, we
determined the mean heart rate (HR) from ECG (as ground
truth) and from video employing several methods. ECG heart
beats (R wave peaks) were detected through the algorithm of
Hamilton and Tompkins [20]. Some detections errors that oc-
curred were manually corrected. The determined ground truth
HR ranges from 41.5 to 112.7 bpm (mean 72.4, SD 11.3). For
each method and sample (time window), we calculate the HR
estimation error as E=HRiPP G −HRECG .
Table 1 and Fig. 5 compare the error of the proposed
imaging PPG method with some other methods. In Table 1
we list the mean and standard deviation of the error, as well
as the percentage of samples with an error in range ±3 and
±10 bpm. Fig. 5 shows the empirical distribution functions
of the absolute error. We compare our approach (IBISC)
with that of Poh et al. [10] (F-ICA-2), since they describe
their method as motion-tolerant. For this, we extracted the
012345678910
0
0.2
0.4
0.6
0.8
1
absolute error e(bpm)
empir. prob. P(|E| ≤ e)
F-ICA-2 [10] F-ICA-2 ZP3 F-ICA-P ZP3
F-Green [9] IBISC (ours)
Fig. 5. Empirical cumulative distribution functions of abso-
lute heart rate estimation error |E|for several methods.
E(bpm) P(|E| ≤ ...)
Method Mean SD 3 bpm 10 bpm
F-ICA-2 [10] -5.8 22.7 0.38 0.57
F-ICA-2 ZP3 -5.8 22.0 0.47 0.59
F-ICA-P ZP3 -6.9 15.2 0.64 0.72
F-Green [9] -6.5 13.8 0.71 0.78
IBISC (ours) 0.2 10.9 0.78 0.91
Table 1. HR estimation error of several methods. Mean and
standard deviation of the error Eand empirical probability of
an absolute error |E|of less or equal to 3 and 5 bpm.
mean of the color channels from our motion-compensated
face ROI, then applied the JADE independent component
analysis (ICA) algorithm on the three color signals and fi-
nally selected the frequency with the highest power in the
band of 0.7-4 Hz from the second independent component, as
described in their paper [10]. Due to the characteristics of our
data, it was not possible to use time windows of 30 s and apply
their sliding window artifact rejection. Under this conditions,
the HR determined by Poh’s method is strongly biased and
has a high deviation from ground truth. More precisely, in
the mean the HR was estimated about 6 bpm too low and
only 57% of the samples deviated less than 10 bpm from the
correct HR. Since our shorter time window leads to less fre-
quency bins, we try to modify the approach by zero-padding
the signal before applying Fourier transform as proposed by
Verkruysse et al. [9]. I. e. we subtract the mean and append a
zero signal three times the length of the original signal. The
modified method (F-ICA-2 ZP3) is more accurate (higher
initial slope in Fig. 5). We further modified the method to
choose the independent component with the highest power
peak (F-ICA-P ZP3) as proposed by Poh et al. in a follow-up
publication [11]. This significantly improves performance.
However, it is still outperformed by simply skipping the ICA
and using the green channel instead (F-Green), as previously
done by Verkruysse et al. [9] and others. We conclude, that
there is no benefit from ICA if there are very high power
artifacts, since it introduces the additional problem to select
an output component for further processing.
You can see in Table 1 that all frequency-domain methods
(F-...) suffer from a strong bias, i. e. they tend to measure a
too low HR. The reason is their implicit assumption, that the
HR is the dominant frequency in the considered band. This
assumption is often violated, because we allowed the subjects
to move spontaneously. These movements introduce a broad
variety of artifacts, often including in-band spectral compo-
nents with higher power than the pulsatile PPG signal com-
ponent.
In contrast, our method (IBISC) is far less biased (only
0.2 bpm), as it does not rely on the spectral power assumption.
It further has the lowest error standard deviation and the best
absolute error distribution of the compared methods. E. g. the
deviation from ground truth is less than 1 bpm for 55% of the
tested samples and less than 3 bpm for 78%.
4. CONCLUSION
In this work, we have introduced a new algorithm for auto-
matic heart rate estimation from a videotaped face. It is based
on motion-compensated imaging PPG signals from multiple
ROIs and an inter-beat-interval series consensus (IBISC) al-
gorithm finding the most plausible IBI series. Our approach
has shown to be more robust to strong head movement and
facial expression than previous methods.
However, the measurement accuracy is still not sufficient
for several applications. A promising direction for further re-
search is to reduce the rate of artifact peaks. This decreases
the probability of selecting a false hypothesis in the IBISC
algorithm, and consequently decreases the rate of high mea-
surement errors.
5. APPENDIX
Here we define the regions of interest (ROI) used for the
PPG signal extraction. Given the points pell ,pelr,perl ,perr,
pbl,pbr ,pml,pmr as shown in Fig. 1 and COG(X)being
the center of gravity of the point set X, we define pec =
COG({pell,pelr ,perl,per r}),de=COG({perl ,perr})−
COG({pell,pelr })and dv=COG({pml,pmr })−pec. The
corner points of the face ROI are calculated as r1,i =pec +
αide+βidvwith 1≤i≤4and α1=α4=−0.8,α2=α3=0.8,
β1=β2=−0.6and β3=β4=−1.2. The forehead ROI cor-
ner points are r2,i =pbr +γ3−idvfor i=1,2and r2,i =
pbl +γi−2dvfor i= 3,4with γ1=−0.02 and γ2=−0.3.
The left and right cheek ROI points (rcl,i and rcr,i) are
rcj,i =pmj +δ3−i(pejr −pmj )for i=1,2and rcj,i =
pmj +δi−2(pejl −pmj )for i=3,4, both for j∈ {l, r }(left,
right) with δ1=0.25 and δ2=0.75.
6. REFERENCES
[1] A. J. Camm, M. Malik, J. T. Bigger, G. Breithardt,
S. Cerutti, R. J. Cohen, P. Coumel, E. L. Fallen, H. L.
Kennedy, and R. E. Kleiger, “Heart rate variability:
standards of measurement, physiological interpretation
and clinical use.,” European Heart Journal, vol. 17, pp.
354–381, 1996.
[2] Y. Sun, S. Hu, V. Azorin-Peris, S. Greenwald, J. Cham-
bers, and Y. Zhu, “Motion-compensated noncontact
imaging photoplethysmography to monitor cardiorespi-
ratory status during exercise,” Journal of Biomedical
Optics, vol. 16, no. 7, pp. 077010, 2011.
[3] T. Wu, “PPGI: new development in noninvasive and
contactless diagnosis of dermal perfusion using near In-
fraRed light,” J. of the GCPD e.V., vol. Vol. 7, no. No.
1, pp. 17–24, Oct. 2003.
[4] K. Humphreys, T. Ward, and C. Markham, “Noncontact
simultaneous dual wavelength photoplethysmography:
a further step toward noncontact pulse oximetry,” Re-
view of scientific instruments, vol. 78, no. 4, pp. 044304,
2007.
[5] G. Cennini, J. Arguel, K. Akit, and A. van Leest, “Heart
rate monitoring via remote photoplethysmography with
motion artifacts reduction,” Optics Express, vol. 18, no.
5, pp. 4867, Feb. 2010.
[6] U. Rubins, V. Upmalis, O. Rubenis, D. Jakovels, and
J. Spigulis, “Real-time photoplethysmography imag-
ing system,” in 15th Nordic-Baltic Conference on
Biomedical Engineering and Medical Physics (NBC
2011), number 34 in IFMBE Proceedings, pp. 183–186.
Springer Berlin Heidelberg, Jan. 2011.
[7] C. G. Scully, J. Lee, J. Meyer, A. M. Gor-
bach, D. Granquist-Fraser, Y. Mendelson, and K. H.
Chon, “Physiological parameter monitoring from opti-
cal recordings with a mobile phone,” IEEE Transactions
on Biomedical Engineering, vol. 59, no. 2, pp. 303–306,
Feb. 2012.
[8] Y. Sun, S. Hu, V. Azorin-Peris, R. Kalawsky, and S.
Greenwald, “Noncontact imaging photoplethysmogra-
phy to effectively access pulse rate variability,” Journal
of Biomedical Optics, vol. 18, no. 6, pp. 061205, 2013.
[9] W. Verkruysse, L. O. Svaasand, and J. Stuart Nel-
son, “Remote plethysmographic imaging using ambient
light,” Optics express, vol. 16, no. 26, pp. 21434-21445,
2008.
[10] M. Z. Poh, D. J. McDuff, and R. W. Picard, “Non-
contact, automated cardiac pulse measurements using
video imaging and blind source separation,” Optics Ex-
press, vol. 18, no. 10, pp. 10762-10774, 2010.
[11] M. Z. Poh, D. J. McDuff, and R. W. Picard, “Advance-
ments in noncontact, multiparameter physiological mea-
surements using a webcam,” Biomedical Engineering,
IEEE Transactions on, vol. 58, no. 1, pp. 7-11, 2011.
[12] M. Lewandowska, J. Ruminski, T. Kocejko, and
J. Nowak, “Measuring pulse rate with a webcam - a
non-contact method for evaluating cardiac activity,” in
2011 Federated Conference on Computer Science and
Information Systems (FedCSIS), 2011, pp. 405–410.
[13] Y. Sun, C. Papin, V. Azorin-Peris, R. Kalawsky, S.
Greenwald, and S. Hu, “Use of ambient light in re-
mote photoplethysmographic systems: comparison be-
tween a high-performance camera and a low-cost web-
cam,” Journal of Biomedical Optics, vol. 17, no. 3, pp.
037005, Mar. 2012.
[14] L. Wei, Y. Tian, Y. Wang, T. Ebrahimi, and T. Huang,
“Automatic webcam-based human heart rate measure-
ments using laplacian eigenmap,” in Computer Vision -
ACCV 2012, pp. 281–292. Springer, 2013.
[15] P. Werner, A. Al-Hamadi, R. Niese, S. Walter, S. Gruss,
and H. C. Traue, “Towards pain monitoring: Facial ex-
pression, head pose, a new database, an automatic sys-
tem and remaining challenges,” in Proceedings of the
British Machine Vision Conference. 2013, BMVA Press.
[16] S. Walter, P. Werner, S. Gruss, H. Ehleiter, J. Tan, H. C.
Traue, A. Al-Hamadi, A. O. Andrade, G. Moreira da
Silva, and S. Crawcour, “The BioVid heat pain database:
Data for the advancement and systematic validation of
an automated pain recognition system,” in Cybernetics
(CYBCONF), 2013 IEEE International Conference on,
2013, pp. 128–131.
[17] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical
analysis of detection cascades of boosted classifiers for
rapid object detection,” in DAGM 25th Pattern Recog-
nition Symposium, 2003, pp. 297–304.
[18] X. Xiong and F. De la Torre, “Supervised descent
method and its applications to face alignment,” in
Computer Vision and Pattern Recognition (CVPR), 2013
IEEE Conference on, 2013, pp. 532–539.
[19] M. A. Fischler and R. C. Bolles, “Random sample con-
sensus: A paradigm for model fitting with applications
to image analysis and automated cartography,” Com-
mun. ACM, vol. 24, no. 6, pp. 381–395, June 1981.
[20] P. S. Hamilton and W. J. Tompkins, “Quantitative in-
vestigation of QRS detection rules using the MIT/BIH
arrhythmia database,” Biomedical Engineering, IEEE
Transactions on, vol. 12, pp. 1157–1165, 1986.