Content uploaded by Michal Rapczynski
Author content
All content in this area was uploaded by Michal Rapczynski on Sep 07, 2018
Content may be subject to copyright.
HOW THE REGION OF INTEREST IMPACTS CONTACT FREE HEART RATE ESTIMATION
ALGORITHMS
Michal Rapczynski, Philipp Werner, Frerk Saxen and Ayoub Al-Hamadi
Otto-von-Guericke University Magdeburg
Neuro-Information Technology (NIT)
Universitaetsplatz 2, 39106 Magdeburg
ABSTRACT
The contact free camera-based estimation of human vital
signs is more comfortable than the classical contact-based
methods. Current methods suffer in realistic environments
from e.g. occlusions by hair or glasses. Several approaches
use complex methods to extract the pulse signals from the
skin, but use basic geometric defined regions of the face,
which ignore possible occlusions. In this paper we compare
for the first time the influence of several region based meth-
ods and a newly proposed parameter-free skin segmentation
approach on the performance of the heart rate estimation for
nine different algorithms from the literature on two datasets
with strong facial and head movement.
Index Terms—Heart rate estimation, photoplethysmo-
graphic, skin segmentation,
1. INTRODUCTION
The heart rate and heart rate variability are important human
vital parameters. Commercial devices use contact based ap-
proaches for the measurement. This leads to several disadvan-
tages like skin irritation, bacterial exposure, and significantly
impedes the freedom of movement of the patient. Several ap-
proaches have been proposed for camera based photoplethys-
mographic (PPG) measurement, which is based on minimal
changes of skin color corresponding to the blood volume in
the skin changing in dependence of the heart beats.
1.1. Contactless photoplethysmographic approaches
Every heart rate estimation algorithm can usually be broken
down into three parts. The Region-of-Interest (ROI), the cal-
culation of the PPG signal from the color information of the
video data and the heart rate estimation from that PPG sig-
nal. This PPG signal is commonly extracted in every frame
of a RGB camera video. At first a ROI is defined in the video
image. This ROI is in most cases based on a face tracking al-
gorithm and/or facial landmarks to define a specific region in
the face like the forehead [1, 2] or the cheeks [3, 4]. Other ap-
proaches applied a skin detection algorithm inside the bound-
Video
Video
Video PPG
Extraction
Region of
Interest
Heart Rate
Estimation
Fig. 1. Basic approach for contactless heart rate estimation.
ing box of the face to individually decide on the inclusion of
each pixel to the ROI [5, 6]. The second part of every algo-
rithm is the extraction of the PPG signal from the color infor-
mation of the ROI. Several methods have been proposed for
this step. Approaches using only a single color channel [7],
ICA [8], PCA [1] and more complex approaches which are
based on the optical and physiological properties of the skins
reflective behaviour [4, 9] can be found in literature. A review
of different PPG extraction approaches can be found in [9].
The last step in every algorithm is the estimation of the
heart rate from the extracted signal. Usually one of two meth-
ods is used to calculate the estimated heart rate. Either a fre-
quency analysis approach [1, 6, 9] is used, or a band-pass filter
following the extraction of peaks or zero-crossings and their
resulting Inter-Beat-Intervalls (IBIs) [8, 4, 5].
1.2. Choice of the ROI
The use of an adaptive skin detection based on skin color seg-
mentation algorithms was proposed by Rapczynski et al. [5].
Earlier DeHaan et al. [6] used a “simple skin selection pro-
cess”. State of the art skin detection can be separated into
two categories, pixel and region based approaches. While the
pixel based approaches are usually very fast (due to the in-
dependence of the pixel data from each other), region based
approaches need a lot more computation time [10]. The very
fast (∼20ms per image) and high-performance look-up-table
approach from Jones and Rehg [11] is the current state of the
art for the pixel based approaches. In the last years more and
more region based approaches have been proposed with better
skin classification. These approaches also have considerably
higher computational costs (∼1000 ms per image). Due to
the fact that color is already a very good feature for skin detec-
tion, most region based state of the art approaches are based
(a) (b) (c)
Fig. 2. Facial landmark based ROIs: (a) middle of the face
(FaceMid), (b) Forehead, (c) Feng-Roi (ROI pixel in pink).
on a look-up-table approach, followed by texture [12, 13] or
superpixel [14, 10] analysis. Because of the need for real-time
calculation for the heart rate estimation we use the Jones und
Rehg [11] approach due to its fast computation.
Generally it is expected that the use of more skin pixels
improves the signal-to-noise ratio and leads to a cleaner PPG
signal and better heart rate estimation [7]. But just choos-
ing a big area can include several parts of the face, like the
mouth or the eyes, which can introduce unwanted noise due
to facial movement. Various interfering factors (such as hair,
beards, blinking, glasses, or clothing) in a predefined ROI
can decrease the estimation performance significantly. An of-
ten used way to counteract these effects is to exclude large
parts of the face (e.g. eyes or mouth region) from the ROI
and refrain from using the color information in these regions.
Therefore the choice of the used ROI is of great importance
for the quality of the calculated PPG signal and a more flexi-
ble approach for determining the ROI is preferable to a fixed
geometric ROI.
In this paper we compare the influence of facial landmark
based ROIs and our new color based skin segmentation on
heart rate estimation. This comparison analyses for the first
time the impact on the results of nine different algorithms in
combination with various ROIs.
Section 2 describes the used methods, starting with the
algorithms for the heart rate estimation and the different used
ROI approaches, including landmark based and skin detection
based ROIs. Section 3 reports the experimental results, which
are discussed in section 4.
2. METHODS
2.1. Regions of Interest
For the examination of the influence of the ROI on the heart
rate estimation we chose ROIs from related works and com-
pare them to our newly proposed parameter-free skin segmen-
tation approach. The first step in both cases is a face and facial
landmark detection to find the subject’s face in the image. A
subimage is then extracted using a bounding box around the
facial landmarks, extended at the top and bottom by half the
length of the distance between the eye corners and the bottom
of the nose. These anchor points were chosen due to their
relative fixed position in the face in comparison to other in-
stable metrics like e.g. the bounding box height, which could
change due to movement of the eyebrows and the mouth. This
approach enlarged the ROI vertically to cover the skin at the
neck and forehead while being adaptive to the effective pixel
resolution of the face in the video data.
Three popular ROI methods were implemented. The
FaceMid ROI is using 60% of the centered width and the full
height of the face bounding box as the active area (see Fig.
2a). The Forehead ROI is uses 50% of the centered width
and 30% the length of the distance between the eye corners
and the bottom of the nose as height starting at the eyebrows
(see Fig. 2b). The Feng-ROI proposed by Feng et al. [4]
uses two regions on the cheeks as the active area. The ROIs
are calculated by tracking speeded-up-robust-feature (SURF)
points in the center of the face and applying an affine trans-
formation on the ROIs to compensate head movement (see
Fig. 2c). The whole BoundingBox was used as a baseline
region, using no further segmentation, to estimate the added
value of each ROI.
For the new proposed Skin ROI we use the lookup table
approach from Jones and Rehg [11], which provides for each
color pixel cthe relative frequency p:
p(c) = n(c, Xskin)
n(c, X)(1)
where n(c, Xskin)represents the number of observations
of the color c in the skin dataset and n(c, X)the number
of observations of the color cin the entire dataset. Figure 3
shows an example of a skin probability map. p(c)can be effi-
ciently saved in a lookup table and was trained with the ECU
dataset [15]. This relative frequency p(c)can be transformed
into a segmentation by applying a threshold, which should be
chosen to allow only a little number of false positives.
As a parameter-free alternative we propose to avoid hard
thresholding by defining a soft ROI for every frame. We used
the skin probability pifor each pixel i(see Fig. 3) as a weight-
ing factor for the color value ciwhen calculating the pixel
mean for the PPG signal SPPG, instead of a binary masking to
skin/non-skin pixel.
SPPG =1
n
P
i=1
pi
·
n
X
i=1
pi·ci(2)
2.2. Implemented Algorithms
To test the effect of the ROIs on the subsequent processing
steps, we chose and implemented algorithms with various ap-
proaches on the different stages of heart rate estimation.
Bl¨
ocher et al. (2017) [2] use the forehead region of the
face as ROI (see Fig. 2b). After normalizing the color chan-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 3. Skin probability p, which is used for weighting the
individual pixel’s contribution to the PPG signal.
nels, an ICA using the Jade algorithm implementation is used
to calculate the PPG-Signal, which is then bandpass filtered
(0.75-4 Hz). The approach uses the Hilbert transformation to
calculate the phase angle of the puls signal. The heart rate is
calculated by detecting the phase jumps, which represent the
beats, and determine the heart rate from the IBIs.
DeHaan and Jeanne (2013) [6] use a “simple skin
selection process” to define the ROI. The method uses a
chrominance-based approach to combine the RGB channels
into the PPG signal. The maximal spectrum power peak is
used to calculate the heart rate.
Feng et al. (2015) [4] use the Feng ROI. An adaptive
green/red difference color model is used to calculate the PPG
signal. Following a long (20s) window for spectral analy-
sis, an adaptive bandpass with small passband (±10BPM )
around the maximal found frequency is used to filter the last
4 seconds of the signal. The heart rate is then calculated from
the IBIs of the signal peaks. We also modified the algorithm
(Feng mod) by not using the harmonics as additional pass-
bands like presented in the original paper. This reduced the
number of false positive peaks greatly and improved the over-
all estimation rate.
Lewandowska et al. (2011) [1] use the FaceMid ROI (see
Fig. 2a). The extracted RGB channel signals are bandpass
filtered (0.5-3.7 Hz). After a PCA of the three color signals
the first prinicpal component output is used to calculate the
heart rate by finding the maximal spectrum power peak in the
Fourier transformation of the signal.
Li et al. (2014) [16] proposed an illumination rectifica-
tion approach using the distance regularized level set evolu-
tion method to segment the background region as reference,
and calculating the heart rate using the frequency with the
maximal power response in the power spectral density esti-
mate. The algorithm was implemented without the retroactive
discarding of the noisiest 5% of signal segments to enable bet-
ter comparability.
Poh et al. (2011) [8] published one of the first approaches
for contact less heart rate estimation. Their algorithm uses the
faceMid ROI. All three RGB channels are extracted. After a
smoothing and an ICA the PPG signal is calculated from the
output signal with the maximal spectrum power peak. This
signal is bandpass filtered (0.7-4 Hz) and the heart rate is cal-
culated from the IBIs of the signal peaks.
Rapczynski et al. (2016) [5] use color based skin seg-
mentation as ROI. The normalized green channel is used as
PPG signal. The signal is then filtered using an adaptive band-
pass with a small passband (±15BPM) dependent on the last
heart rate estimation. A graph based algorithm determines
an optimal peak sequence continuing the sequence of the last
measurement with minimal error, which is then used to calcu-
late the heart rate from the IBIs of the valid peaks.
Wang et al. (2017) [9] use the FaceMid ROI. The RGB
color signals are first Fourier transformed. The PPG signal
is then reconstructed from the frequency domain in the de-
sired frequency range using specific color-channel combina-
tions and calculated weights under consideration of the devel-
oped skin model. The maximal spectrum power peak of the
PPG signal is then used to calculate the heart rate.
2.3. MMSE-HR and BioVid Datasets
The proposed methods were tested using the BioVid Heat
Pain Database Part C [17] and the MMSE-HR subset of the
multimodal spontaneous emotion corpus for human behavior
analysis [18]. In the BioVid dataset study 87 participants were
repeatedly subjected to painful heat stimuli. The video data
was recorded at 25 FPS with a resolution of 1388x1038 pixel.
Around 25 minutes of continuous video footage is available
for every subject. The MMSE-HR dataset contains 102 tasks
of 40 ethnic diverse subjects with 2D videos with a resolu-
tion of 1040x1392 pixel at 25 fps with a length of 20-79 sec-
onds . The participants undertook different tasks to evoke a
wide range of emotions. No movement or behavior restric-
tions were imposed on the participants in both datasets and
include a great amount of head movement and facial expres-
sions.
2.4. Ground Thruth and Error Calculation
The ground truth heart rate was calculated by first using the
QRS heart beat detection method described by Schmidt et al.
[19] followed by a manual check for missed or falsely clas-
sified heart beats. For every heart rate estimation the same
window was analyzed in the video (PPG) and ground truth
(GT) data. The mean of the extracted ground truth IBIs from
the window was calculated and converted to the ground truth
heart rate BPM value (HR). The error for each heart rate esti-
mation step was then calculated as E=HRGT −H RPPG.
We used the error calculation described in the IEC standard
60601-2-27 for medical ECG devices as benchmark for the
different heart rate estimation approaches. Using this calcula-
tion, an estimation is considered valid, if the error between the
estimated and the ground truth heart rate is less then 10% of
the ground truth or 5 BPM, whichever is higher. The percent-
age of measurements which meet the IEC standard is further
referred to as the IEC accuracy (in %).
3. RESULTS
The results for all algorithms were calculated with the ROIs:
BoundingBox, FaceMid, Forehead and Skin. Every heart rate
estimation was done once per second using a sliding window
on the data. The starting point (0-30 sec) and width of the
window (4-30 sec) was dependent on the used algorithm.
The Feng-ROI approach had difficulty finding and track-
ing stable SURF points in the subjects faces over the extended
period of time. The videos in the original paper were only 20
seconds long, compared with up to 2 min in MMSE-HR and
around 25 minutes in the BioVid datasets. This caused a drift
of the ROIs, due to the constant affine transformation sum-
ming up small errors over time. For this reason the ROIs on
the cheeks were very unstable and unreliable with a mean IEC
of 14.4 (BioVid) and 14.5 (MMSE-HR) over all algorithms. A
periodic reset of the ROIs or other corrective measures could
stabilize the approach in future comparisons. Therefore we
omitted the Feng-ROI from the result tables.
A further problem appeared using the Li algorithm. The
accuracy ratings for the approach were much lower for the
MMSE-HR then the BioVid dataset. This could be caused by
a disadvantageous step size and result in an bad iterative back-
ground rectification. Tables 1 and 2 show the IEC accuracy
of every combination for all measurements of algorithm and
ROI.
4. DISCUSSION
The results of testing different algorithms with the various
ROIs show an interesting outcome. The mean and standard
deviations in Tab. 1 and 2 indicate that the choice of ROI
has less impact on the accuracy than the choice of algorithm
for the heart rate estimation, as the mean values of the IEC
accuracy for the ROIs have a distinctly smaller range (62.9 -
76.6%) than the mean values for the algorithms (48.9-90.1%
excluding Li+MMSE HR).
The BoundingBox ROI performs notably well compared
to the more complex ROI, especially on the MMSE-HR data.
The Forehead and Skin ROIs achieve the best results on both
datasets. This points to a quality over quantity approach
choosing the ROI pixel. Less “better” pixels achieve higher
accuracy than more “unreliable” pixels. We plan to investi-
gate the optimal choice of ROI, depending on the number and
quality of the chosen pixels, incorporating pulsatility maps
presented in literature to identify high SNR regions.
The Rapczynski and Wang algorithms performed best on
both datasets, always achieving accuracy ratings of at least
84% up to 93.3%. Both were also comparably stable over
different regions of interest having standard deviations from
1% to 3.8%. The best overall result was achieved using the
Table 1. Results of the cross testing of ROIs (columns) and
algorithms (rows) for the BioVid dataset. IEC accuracy in
%. Original ROI of the algorithms in bold.
Bounding
Box FaceMid Forehead Skin Mean Std dev
Bl¨
ocher [2] 53.2 57.8 59.6 51.1 55.4 3.9
DeHaan [6] 67.9 70.5 71.7 67.8 69.5 1.9
Feng (mod) 67.3 80.0 87.2 88.3 80.7 9.7
Feng [4] 46.2 58.7 65.6 66.3 59.2 9.3
Li [16] 34.3 49.0 66.2 45.2 48.7 13.2
Lewandowska [1] 61.0 71.8 80.3 90.2 75.8 12.4
Poh [8] 67.5 75.3 81.8 80.8 76.4 6.6
Rapczynski [5] 84.9 89.7 92.6 93.3 90.1 3.8
Wang [9] 84.0 85.5 84.7 86.3 85.1 1
Mean 62.9 70.9 76.6 74.4
Std. deviation 16.5 13.6 11.3 17.6
Table 2. Results of the cross testing of ROIs (columns)
and algorithms (rows) for the MMSE-HR dataset. IEC ac-
curacy in %. Original ROI of the algorithms in bold.
Bounding
Box FaceMid Forehead Skin Mean Std dev
Bl¨
ocher [2] 45.8 45.1 40.0 64.4 48.9 10.7
DeHaan [6] 77.2 75.3 79.1 71.2 75.7 3.4
Feng (mod) 74.6 70.8 72.2 77.7 73.8 3.0
Feng [4] 66.1 65.2 65.9 69.8 66.7 2.1
Li [16] 8.5 8.0 21.0 10.5 12.0 6.1
Lewandowska [1] 62.5 62.4 68.0 79.6 68.1 8.1
Poh [8] 81.1 80.0 84.9 82.3 82.0 2.1
Rapczynski [5] 89.5 89.3 87.3 91.1 89.3 1.6
Wang [9] 86.2 86.5 91.3 88.4 88.1 2.4
Mean 65.7 64.7 67.7 70.6
Std. deviation 25.3 25.2 23.3 24.1
Rapczynski+BioVid combination. The Rapczynski approach
however was first presented on the BioVid data, thus a posi-
tive bias can be assumed for this dataset, but also achieves an
accuracy of 91.1% on the MMSE-HR dataset. The method by
Wang was the most independent of the chosen ROI.
No other algorithms performed with high accuracy (>80%
IEC error) on both datasets. This can be probably traced back
to the data that was used in the development of the algo-
rithms. The two top algorithms were implemented using
datasets with strong movement of the face. While Rapczyn-
ski used the BioVid dataset and Wang video data of people
running on a treadmill, the other approaches were presented
on data with minimal movement of the head, and were not
designed for the strong facial and head movements in the
used datasets.
On the other hand some algorithms (e.g. Li and Bl¨
ocher)
were developed with changing illumination, and should per-
form better on data with lighting changes. For future work we
want to test the algorithms on datasets with changing illumi-
nation to test the algorithms under other conditions. The goal
will be to combine the strengths of the different algorithms
into one superior approach.
ACKNOWLEDGMENT This work is funded by the Fed-
eral Ministry of Education and Research (BMBF) (Vitalkam
2, no. 03ZZ0465) within Zwanzig20 - Alliance 3D Sensation.
5. REFERENCES
[1] Magdalena Lewandowska, Jacek Rumi ´
nski, Tomasz
Kocejko, and Jedrzej Nowak, “Measuring pulse rate
with a webcam - a non-contact method for evaluating
cardiac activity,” in Computer Science and Informa-
tion Systems (FedCSIS), 2011 Federated Conference on.
IEEE, 2011, pp. 405–410.
[2] Timon Bl ¨
ocher, Johannes Schneider, Markus Schinle,
and Wilhelm Stork, “An online ppgi approach for cam-
era based heart rate monitoring using beat-to-beat detec-
tion,” in Sensors Applications Symposium (SAS), 2017
IEEE. IEEE, 2017, pp. 1–6.
[3] Humaira Nisar, Muhammad Burhan Khan, Wong Ting
Yi, Yap Vooi Voon, and Teoh Shen Khang, “Contactless
heart rate monitor for multiple persons in a video,” in
Consumer Electronics-Taiwan (ICCE-TW), 2016 IEEE
International Conference on. IEEE, 2016, pp. 1–2.
[4] Litong Feng, Lai-Man Po, Xuyuan Xu, Yuming Li,
and Ruiyi Ma, “Motion-resistant remote imaging pho-
toplethysmography based on the optical properties of
skin,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 25, no. 5, pp. 879–891, 2015.
[5] Michal Rapczynski, Philipp Werner, and Ayoub Al-
Hamadi, “Continuous low latency heart rate estimation
from painful faces in real time,” in 23th International
Conference on Pattern Recognition (ICPR), 2016.
[6] Gerard de Haan and Vincent Jeanne, “Robust pulse
rate from chrominance-based rppg,” IEEE Transactions
on Biomedical Engineering, vol. 60, no. 10, pp. 2878–
2886, 2013.
[7] Wim Verkruysse, Lars O. Svaasand, and J. Stuart Nel-
son, “Remote plethysmographic imaging using ambient
light,” Optics express, vol. 16, no. 26, pp. 21434–21445,
2008.
[8] Ming-Zher Poh, Daniel J McDuff, and Rosalind W Pi-
card, “Advancements in noncontact, multiparameter
physiological measurements using a webcam,” IEEE
Transactions on Biomedical Engineering, vol. 58, no. 1,
pp. 7–11, 2011.
[9] Wenjin Wang, Albertus C den Brinker, Sander Stuijk,
and Gerard de Haan, “Robust heart rate from fitness
videos,” Physiological Measurement, vol. 38, no. 6, pp.
1023, 2017.
[10] Frerk Saxen and Ayoub Al-Hamadi, “Superpixels for
skin segmentation,” in 20. Workshop Farbbildverar-
beitung, 2014, pp. 153–159.
[11] Michael J Jones and James M Rehg, “Statistical color
models with application to skin detection,” Interna-
tional Journal of Computer Vision, vol. 46, no. 1, pp.
81–96, 2002.
[12] Michal Kawulok, “Fast propagation-based skin regions
segmentation in color images,” in 10th IEEE Interna-
tional Conference and Workshops on Automatic Face
and Gesture Recognition (FG). 2013, pp. 1–7, IEEE.
[13] Michal Kawulok, Jolanta Kawulok, Jakub Nalepa, and
Bogdan Smolka, “Self-adaptive skin segmentation in
color images,” in Proceedings of the 19th Iberoamerican
Congress (CIARP), Puerto Vallarta, Mexico, November
2–5 2014, Springer, vol. 8827, pp. 96–103.
[14] Lei Huand, Wen Ji, Zhiqiang Wei, Bo-Wei Chen,
Chenggang Clarence Yan, Jie Nie, Jian Yin, and
Baochen Jiang, “Robst skin detectiion in real-world
images,” Journal of Visual Communication and Image
Representation, vol. 29, no. 1481, pp. 147–152, May
2015.
[15] Son Lam Phung, Abdesselam Bouzerdoum, and Dou-
glas Chai, “Skin segmentation using color pixel classifi-
cation: analysis and comparison,” IEEE transactions on
pattern analysis and machine intelligence, vol. 27, no.
1, pp. 148–154, 2005.
[16] Xiaobai Li, Jie Chen, Guoying Zhao, and Matti
Pietikainen, “Remote heart rate measurement from face
videos under realistic situations,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 4264–4271.
[17] Steffen Walter, Sascha Gruss, Hagen Ehleiter, Jun-
wen Tan, Harald C Traue, Stephen Crawcour, Philipp
Werner, Ayoub Al-Hamadi, and Adriano O Andrade,
“The biovid heat pain database data for the advancement
and systematic validation of an automated pain recogni-
tion system,” in Cybernetics (CYBCONF), 2013 IEEE
International Conference on. IEEE, 2013, pp. 128–131.
[18] Zheng Zhang, Jeff M Girard, Yue Wu, Xing Zhang,
Peng Liu, Umur Ciftci, Shaun Canavan, Michael Reale,
Andy Horowitz, Huiyuan Yang, et al., “Multimodal
spontaneous emotion corpus for human behavior anal-
ysis,” in Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2016, pp. 3438–
3446.
[19] Marcus Schmidt, Johannes W Krug, Andreas Gierstor-
fer, and Georg Rose, “A real-time qrs detector based
on higher-order statistics for ecg gated cardiac mri,”
in Computing in Cardiology Conference (CinC), 2014.
IEEE, 2014, pp. 733–736.