Conference PaperPDF Available

How the Region of Interest Impacts Contact Free Heart Rate Estimation Algorithms

Authors:

Abstract and Figures

The contact free camera-based estimation of human vital signs is more comfortable than the classical contact-based methods. Current methods suffer in realistic environments from e.g. occlusions by hair or glasses. Several approaches use complex methods to extract the pulse signals from the skin, but use basic geometric defined regions of the face, which ignore possible occlusions. In this paper we compare for the first time the influence of several region based methods and a newly proposed parameter-free skin segmentation approach on the performance of the heart rate estimation for nine different algorithms from the literature on two datasets with strong facial and head movement.
Content may be subject to copyright.
HOW THE REGION OF INTEREST IMPACTS CONTACT FREE HEART RATE ESTIMATION
ALGORITHMS
Michal Rapczynski, Philipp Werner, Frerk Saxen and Ayoub Al-Hamadi
Otto-von-Guericke University Magdeburg
Neuro-Information Technology (NIT)
Universitaetsplatz 2, 39106 Magdeburg
ABSTRACT
The contact free camera-based estimation of human vital
signs is more comfortable than the classical contact-based
methods. Current methods suffer in realistic environments
from e.g. occlusions by hair or glasses. Several approaches
use complex methods to extract the pulse signals from the
skin, but use basic geometric defined regions of the face,
which ignore possible occlusions. In this paper we compare
for the first time the influence of several region based meth-
ods and a newly proposed parameter-free skin segmentation
approach on the performance of the heart rate estimation for
nine different algorithms from the literature on two datasets
with strong facial and head movement.
Index TermsHeart rate estimation, photoplethysmo-
graphic, skin segmentation,
1. INTRODUCTION
The heart rate and heart rate variability are important human
vital parameters. Commercial devices use contact based ap-
proaches for the measurement. This leads to several disadvan-
tages like skin irritation, bacterial exposure, and significantly
impedes the freedom of movement of the patient. Several ap-
proaches have been proposed for camera based photoplethys-
mographic (PPG) measurement, which is based on minimal
changes of skin color corresponding to the blood volume in
the skin changing in dependence of the heart beats.
1.1. Contactless photoplethysmographic approaches
Every heart rate estimation algorithm can usually be broken
down into three parts. The Region-of-Interest (ROI), the cal-
culation of the PPG signal from the color information of the
video data and the heart rate estimation from that PPG sig-
nal. This PPG signal is commonly extracted in every frame
of a RGB camera video. At first a ROI is defined in the video
image. This ROI is in most cases based on a face tracking al-
gorithm and/or facial landmarks to define a specific region in
the face like the forehead [1, 2] or the cheeks [3, 4]. Other ap-
proaches applied a skin detection algorithm inside the bound-
Video
Video
Video PPG
Extraction
Region of
Interest
Heart Rate
Estimation
Fig. 1. Basic approach for contactless heart rate estimation.
ing box of the face to individually decide on the inclusion of
each pixel to the ROI [5, 6]. The second part of every algo-
rithm is the extraction of the PPG signal from the color infor-
mation of the ROI. Several methods have been proposed for
this step. Approaches using only a single color channel [7],
ICA [8], PCA [1] and more complex approaches which are
based on the optical and physiological properties of the skins
reflective behaviour [4, 9] can be found in literature. A review
of different PPG extraction approaches can be found in [9].
The last step in every algorithm is the estimation of the
heart rate from the extracted signal. Usually one of two meth-
ods is used to calculate the estimated heart rate. Either a fre-
quency analysis approach [1, 6, 9] is used, or a band-pass filter
following the extraction of peaks or zero-crossings and their
resulting Inter-Beat-Intervalls (IBIs) [8, 4, 5].
1.2. Choice of the ROI
The use of an adaptive skin detection based on skin color seg-
mentation algorithms was proposed by Rapczynski et al. [5].
Earlier DeHaan et al. [6] used a “simple skin selection pro-
cess”. State of the art skin detection can be separated into
two categories, pixel and region based approaches. While the
pixel based approaches are usually very fast (due to the in-
dependence of the pixel data from each other), region based
approaches need a lot more computation time [10]. The very
fast (20ms per image) and high-performance look-up-table
approach from Jones and Rehg [11] is the current state of the
art for the pixel based approaches. In the last years more and
more region based approaches have been proposed with better
skin classification. These approaches also have considerably
higher computational costs (1000 ms per image). Due to
the fact that color is already a very good feature for skin detec-
tion, most region based state of the art approaches are based
(a) (b) (c)
Fig. 2. Facial landmark based ROIs: (a) middle of the face
(FaceMid), (b) Forehead, (c) Feng-Roi (ROI pixel in pink).
on a look-up-table approach, followed by texture [12, 13] or
superpixel [14, 10] analysis. Because of the need for real-time
calculation for the heart rate estimation we use the Jones und
Rehg [11] approach due to its fast computation.
Generally it is expected that the use of more skin pixels
improves the signal-to-noise ratio and leads to a cleaner PPG
signal and better heart rate estimation [7]. But just choos-
ing a big area can include several parts of the face, like the
mouth or the eyes, which can introduce unwanted noise due
to facial movement. Various interfering factors (such as hair,
beards, blinking, glasses, or clothing) in a predefined ROI
can decrease the estimation performance significantly. An of-
ten used way to counteract these effects is to exclude large
parts of the face (e.g. eyes or mouth region) from the ROI
and refrain from using the color information in these regions.
Therefore the choice of the used ROI is of great importance
for the quality of the calculated PPG signal and a more flexi-
ble approach for determining the ROI is preferable to a fixed
geometric ROI.
In this paper we compare the influence of facial landmark
based ROIs and our new color based skin segmentation on
heart rate estimation. This comparison analyses for the first
time the impact on the results of nine different algorithms in
combination with various ROIs.
Section 2 describes the used methods, starting with the
algorithms for the heart rate estimation and the different used
ROI approaches, including landmark based and skin detection
based ROIs. Section 3 reports the experimental results, which
are discussed in section 4.
2. METHODS
2.1. Regions of Interest
For the examination of the influence of the ROI on the heart
rate estimation we chose ROIs from related works and com-
pare them to our newly proposed parameter-free skin segmen-
tation approach. The first step in both cases is a face and facial
landmark detection to find the subject’s face in the image. A
subimage is then extracted using a bounding box around the
facial landmarks, extended at the top and bottom by half the
length of the distance between the eye corners and the bottom
of the nose. These anchor points were chosen due to their
relative fixed position in the face in comparison to other in-
stable metrics like e.g. the bounding box height, which could
change due to movement of the eyebrows and the mouth. This
approach enlarged the ROI vertically to cover the skin at the
neck and forehead while being adaptive to the effective pixel
resolution of the face in the video data.
Three popular ROI methods were implemented. The
FaceMid ROI is using 60% of the centered width and the full
height of the face bounding box as the active area (see Fig.
2a). The Forehead ROI is uses 50% of the centered width
and 30% the length of the distance between the eye corners
and the bottom of the nose as height starting at the eyebrows
(see Fig. 2b). The Feng-ROI proposed by Feng et al. [4]
uses two regions on the cheeks as the active area. The ROIs
are calculated by tracking speeded-up-robust-feature (SURF)
points in the center of the face and applying an affine trans-
formation on the ROIs to compensate head movement (see
Fig. 2c). The whole BoundingBox was used as a baseline
region, using no further segmentation, to estimate the added
value of each ROI.
For the new proposed Skin ROI we use the lookup table
approach from Jones and Rehg [11], which provides for each
color pixel cthe relative frequency p:
p(c) = n(c, Xskin)
n(c, X)(1)
where n(c, Xskin)represents the number of observations
of the color c in the skin dataset and n(c, X)the number
of observations of the color cin the entire dataset. Figure 3
shows an example of a skin probability map. p(c)can be effi-
ciently saved in a lookup table and was trained with the ECU
dataset [15]. This relative frequency p(c)can be transformed
into a segmentation by applying a threshold, which should be
chosen to allow only a little number of false positives.
As a parameter-free alternative we propose to avoid hard
thresholding by defining a soft ROI for every frame. We used
the skin probability pifor each pixel i(see Fig. 3) as a weight-
ing factor for the color value ciwhen calculating the pixel
mean for the PPG signal SPPG, instead of a binary masking to
skin/non-skin pixel.
SPPG =1
n
P
i=1
pi
·
n
X
i=1
pi·ci(2)
2.2. Implemented Algorithms
To test the effect of the ROIs on the subsequent processing
steps, we chose and implemented algorithms with various ap-
proaches on the different stages of heart rate estimation.
Bl¨
ocher et al. (2017) [2] use the forehead region of the
face as ROI (see Fig. 2b). After normalizing the color chan-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 3. Skin probability p, which is used for weighting the
individual pixel’s contribution to the PPG signal.
nels, an ICA using the Jade algorithm implementation is used
to calculate the PPG-Signal, which is then bandpass filtered
(0.75-4 Hz). The approach uses the Hilbert transformation to
calculate the phase angle of the puls signal. The heart rate is
calculated by detecting the phase jumps, which represent the
beats, and determine the heart rate from the IBIs.
DeHaan and Jeanne (2013) [6] use a “simple skin
selection process” to define the ROI. The method uses a
chrominance-based approach to combine the RGB channels
into the PPG signal. The maximal spectrum power peak is
used to calculate the heart rate.
Feng et al. (2015) [4] use the Feng ROI. An adaptive
green/red difference color model is used to calculate the PPG
signal. Following a long (20s) window for spectral analy-
sis, an adaptive bandpass with small passband (±10BPM )
around the maximal found frequency is used to filter the last
4 seconds of the signal. The heart rate is then calculated from
the IBIs of the signal peaks. We also modified the algorithm
(Feng mod) by not using the harmonics as additional pass-
bands like presented in the original paper. This reduced the
number of false positive peaks greatly and improved the over-
all estimation rate.
Lewandowska et al. (2011) [1] use the FaceMid ROI (see
Fig. 2a). The extracted RGB channel signals are bandpass
filtered (0.5-3.7 Hz). After a PCA of the three color signals
the first prinicpal component output is used to calculate the
heart rate by finding the maximal spectrum power peak in the
Fourier transformation of the signal.
Li et al. (2014) [16] proposed an illumination rectifica-
tion approach using the distance regularized level set evolu-
tion method to segment the background region as reference,
and calculating the heart rate using the frequency with the
maximal power response in the power spectral density esti-
mate. The algorithm was implemented without the retroactive
discarding of the noisiest 5% of signal segments to enable bet-
ter comparability.
Poh et al. (2011) [8] published one of the first approaches
for contact less heart rate estimation. Their algorithm uses the
faceMid ROI. All three RGB channels are extracted. After a
smoothing and an ICA the PPG signal is calculated from the
output signal with the maximal spectrum power peak. This
signal is bandpass filtered (0.7-4 Hz) and the heart rate is cal-
culated from the IBIs of the signal peaks.
Rapczynski et al. (2016) [5] use color based skin seg-
mentation as ROI. The normalized green channel is used as
PPG signal. The signal is then filtered using an adaptive band-
pass with a small passband (±15BPM) dependent on the last
heart rate estimation. A graph based algorithm determines
an optimal peak sequence continuing the sequence of the last
measurement with minimal error, which is then used to calcu-
late the heart rate from the IBIs of the valid peaks.
Wang et al. (2017) [9] use the FaceMid ROI. The RGB
color signals are first Fourier transformed. The PPG signal
is then reconstructed from the frequency domain in the de-
sired frequency range using specific color-channel combina-
tions and calculated weights under consideration of the devel-
oped skin model. The maximal spectrum power peak of the
PPG signal is then used to calculate the heart rate.
2.3. MMSE-HR and BioVid Datasets
The proposed methods were tested using the BioVid Heat
Pain Database Part C [17] and the MMSE-HR subset of the
multimodal spontaneous emotion corpus for human behavior
analysis [18]. In the BioVid dataset study 87 participants were
repeatedly subjected to painful heat stimuli. The video data
was recorded at 25 FPS with a resolution of 1388x1038 pixel.
Around 25 minutes of continuous video footage is available
for every subject. The MMSE-HR dataset contains 102 tasks
of 40 ethnic diverse subjects with 2D videos with a resolu-
tion of 1040x1392 pixel at 25 fps with a length of 20-79 sec-
onds . The participants undertook different tasks to evoke a
wide range of emotions. No movement or behavior restric-
tions were imposed on the participants in both datasets and
include a great amount of head movement and facial expres-
sions.
2.4. Ground Thruth and Error Calculation
The ground truth heart rate was calculated by first using the
QRS heart beat detection method described by Schmidt et al.
[19] followed by a manual check for missed or falsely clas-
sified heart beats. For every heart rate estimation the same
window was analyzed in the video (PPG) and ground truth
(GT) data. The mean of the extracted ground truth IBIs from
the window was calculated and converted to the ground truth
heart rate BPM value (HR). The error for each heart rate esti-
mation step was then calculated as E=HRGT H RPPG.
We used the error calculation described in the IEC standard
60601-2-27 for medical ECG devices as benchmark for the
different heart rate estimation approaches. Using this calcula-
tion, an estimation is considered valid, if the error between the
estimated and the ground truth heart rate is less then 10% of
the ground truth or 5 BPM, whichever is higher. The percent-
age of measurements which meet the IEC standard is further
referred to as the IEC accuracy (in %).
3. RESULTS
The results for all algorithms were calculated with the ROIs:
BoundingBox, FaceMid, Forehead and Skin. Every heart rate
estimation was done once per second using a sliding window
on the data. The starting point (0-30 sec) and width of the
window (4-30 sec) was dependent on the used algorithm.
The Feng-ROI approach had difficulty finding and track-
ing stable SURF points in the subjects faces over the extended
period of time. The videos in the original paper were only 20
seconds long, compared with up to 2 min in MMSE-HR and
around 25 minutes in the BioVid datasets. This caused a drift
of the ROIs, due to the constant affine transformation sum-
ming up small errors over time. For this reason the ROIs on
the cheeks were very unstable and unreliable with a mean IEC
of 14.4 (BioVid) and 14.5 (MMSE-HR) over all algorithms. A
periodic reset of the ROIs or other corrective measures could
stabilize the approach in future comparisons. Therefore we
omitted the Feng-ROI from the result tables.
A further problem appeared using the Li algorithm. The
accuracy ratings for the approach were much lower for the
MMSE-HR then the BioVid dataset. This could be caused by
a disadvantageous step size and result in an bad iterative back-
ground rectification. Tables 1 and 2 show the IEC accuracy
of every combination for all measurements of algorithm and
ROI.
4. DISCUSSION
The results of testing different algorithms with the various
ROIs show an interesting outcome. The mean and standard
deviations in Tab. 1 and 2 indicate that the choice of ROI
has less impact on the accuracy than the choice of algorithm
for the heart rate estimation, as the mean values of the IEC
accuracy for the ROIs have a distinctly smaller range (62.9 -
76.6%) than the mean values for the algorithms (48.9-90.1%
excluding Li+MMSE HR).
The BoundingBox ROI performs notably well compared
to the more complex ROI, especially on the MMSE-HR data.
The Forehead and Skin ROIs achieve the best results on both
datasets. This points to a quality over quantity approach
choosing the ROI pixel. Less “better” pixels achieve higher
accuracy than more “unreliable” pixels. We plan to investi-
gate the optimal choice of ROI, depending on the number and
quality of the chosen pixels, incorporating pulsatility maps
presented in literature to identify high SNR regions.
The Rapczynski and Wang algorithms performed best on
both datasets, always achieving accuracy ratings of at least
84% up to 93.3%. Both were also comparably stable over
different regions of interest having standard deviations from
1% to 3.8%. The best overall result was achieved using the
Table 1. Results of the cross testing of ROIs (columns) and
algorithms (rows) for the BioVid dataset. IEC accuracy in
%. Original ROI of the algorithms in bold.
Bounding
Box FaceMid Forehead Skin Mean Std dev
Bl¨
ocher [2] 53.2 57.8 59.6 51.1 55.4 3.9
DeHaan [6] 67.9 70.5 71.7 67.8 69.5 1.9
Feng (mod) 67.3 80.0 87.2 88.3 80.7 9.7
Feng [4] 46.2 58.7 65.6 66.3 59.2 9.3
Li [16] 34.3 49.0 66.2 45.2 48.7 13.2
Lewandowska [1] 61.0 71.8 80.3 90.2 75.8 12.4
Poh [8] 67.5 75.3 81.8 80.8 76.4 6.6
Rapczynski [5] 84.9 89.7 92.6 93.3 90.1 3.8
Wang [9] 84.0 85.5 84.7 86.3 85.1 1
Mean 62.9 70.9 76.6 74.4
Std. deviation 16.5 13.6 11.3 17.6
Table 2. Results of the cross testing of ROIs (columns)
and algorithms (rows) for the MMSE-HR dataset. IEC ac-
curacy in %. Original ROI of the algorithms in bold.
Bounding
Box FaceMid Forehead Skin Mean Std dev
Bl¨
ocher [2] 45.8 45.1 40.0 64.4 48.9 10.7
DeHaan [6] 77.2 75.3 79.1 71.2 75.7 3.4
Feng (mod) 74.6 70.8 72.2 77.7 73.8 3.0
Feng [4] 66.1 65.2 65.9 69.8 66.7 2.1
Li [16] 8.5 8.0 21.0 10.5 12.0 6.1
Lewandowska [1] 62.5 62.4 68.0 79.6 68.1 8.1
Poh [8] 81.1 80.0 84.9 82.3 82.0 2.1
Rapczynski [5] 89.5 89.3 87.3 91.1 89.3 1.6
Wang [9] 86.2 86.5 91.3 88.4 88.1 2.4
Mean 65.7 64.7 67.7 70.6
Std. deviation 25.3 25.2 23.3 24.1
Rapczynski+BioVid combination. The Rapczynski approach
however was first presented on the BioVid data, thus a posi-
tive bias can be assumed for this dataset, but also achieves an
accuracy of 91.1% on the MMSE-HR dataset. The method by
Wang was the most independent of the chosen ROI.
No other algorithms performed with high accuracy (>80%
IEC error) on both datasets. This can be probably traced back
to the data that was used in the development of the algo-
rithms. The two top algorithms were implemented using
datasets with strong movement of the face. While Rapczyn-
ski used the BioVid dataset and Wang video data of people
running on a treadmill, the other approaches were presented
on data with minimal movement of the head, and were not
designed for the strong facial and head movements in the
used datasets.
On the other hand some algorithms (e.g. Li and Bl¨
ocher)
were developed with changing illumination, and should per-
form better on data with lighting changes. For future work we
want to test the algorithms on datasets with changing illumi-
nation to test the algorithms under other conditions. The goal
will be to combine the strengths of the different algorithms
into one superior approach.
ACKNOWLEDGMENT This work is funded by the Fed-
eral Ministry of Education and Research (BMBF) (Vitalkam
2, no. 03ZZ0465) within Zwanzig20 - Alliance 3D Sensation.
5. REFERENCES
[1] Magdalena Lewandowska, Jacek Rumi ´
nski, Tomasz
Kocejko, and Jedrzej Nowak, “Measuring pulse rate
with a webcam - a non-contact method for evaluating
cardiac activity,” in Computer Science and Informa-
tion Systems (FedCSIS), 2011 Federated Conference on.
IEEE, 2011, pp. 405–410.
[2] Timon Bl ¨
ocher, Johannes Schneider, Markus Schinle,
and Wilhelm Stork, An online ppgi approach for cam-
era based heart rate monitoring using beat-to-beat detec-
tion,” in Sensors Applications Symposium (SAS), 2017
IEEE. IEEE, 2017, pp. 1–6.
[3] Humaira Nisar, Muhammad Burhan Khan, Wong Ting
Yi, Yap Vooi Voon, and Teoh Shen Khang, “Contactless
heart rate monitor for multiple persons in a video,” in
Consumer Electronics-Taiwan (ICCE-TW), 2016 IEEE
International Conference on. IEEE, 2016, pp. 1–2.
[4] Litong Feng, Lai-Man Po, Xuyuan Xu, Yuming Li,
and Ruiyi Ma, “Motion-resistant remote imaging pho-
toplethysmography based on the optical properties of
skin,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 25, no. 5, pp. 879–891, 2015.
[5] Michal Rapczynski, Philipp Werner, and Ayoub Al-
Hamadi, “Continuous low latency heart rate estimation
from painful faces in real time, in 23th International
Conference on Pattern Recognition (ICPR), 2016.
[6] Gerard de Haan and Vincent Jeanne, “Robust pulse
rate from chrominance-based rppg,” IEEE Transactions
on Biomedical Engineering, vol. 60, no. 10, pp. 2878–
2886, 2013.
[7] Wim Verkruysse, Lars O. Svaasand, and J. Stuart Nel-
son, “Remote plethysmographic imaging using ambient
light,” Optics express, vol. 16, no. 26, pp. 21434–21445,
2008.
[8] Ming-Zher Poh, Daniel J McDuff, and Rosalind W Pi-
card, Advancements in noncontact, multiparameter
physiological measurements using a webcam,” IEEE
Transactions on Biomedical Engineering, vol. 58, no. 1,
pp. 7–11, 2011.
[9] Wenjin Wang, Albertus C den Brinker, Sander Stuijk,
and Gerard de Haan, “Robust heart rate from fitness
videos,” Physiological Measurement, vol. 38, no. 6, pp.
1023, 2017.
[10] Frerk Saxen and Ayoub Al-Hamadi, “Superpixels for
skin segmentation, in 20. Workshop Farbbildverar-
beitung, 2014, pp. 153–159.
[11] Michael J Jones and James M Rehg, “Statistical color
models with application to skin detection,” Interna-
tional Journal of Computer Vision, vol. 46, no. 1, pp.
81–96, 2002.
[12] Michal Kawulok, “Fast propagation-based skin regions
segmentation in color images, in 10th IEEE Interna-
tional Conference and Workshops on Automatic Face
and Gesture Recognition (FG). 2013, pp. 1–7, IEEE.
[13] Michal Kawulok, Jolanta Kawulok, Jakub Nalepa, and
Bogdan Smolka, “Self-adaptive skin segmentation in
color images,” in Proceedings of the 19th Iberoamerican
Congress (CIARP), Puerto Vallarta, Mexico, November
2–5 2014, Springer, vol. 8827, pp. 96–103.
[14] Lei Huand, Wen Ji, Zhiqiang Wei, Bo-Wei Chen,
Chenggang Clarence Yan, Jie Nie, Jian Yin, and
Baochen Jiang, “Robst skin detectiion in real-world
images,” Journal of Visual Communication and Image
Representation, vol. 29, no. 1481, pp. 147–152, May
2015.
[15] Son Lam Phung, Abdesselam Bouzerdoum, and Dou-
glas Chai, “Skin segmentation using color pixel classifi-
cation: analysis and comparison,” IEEE transactions on
pattern analysis and machine intelligence, vol. 27, no.
1, pp. 148–154, 2005.
[16] Xiaobai Li, Jie Chen, Guoying Zhao, and Matti
Pietikainen, “Remote heart rate measurement from face
videos under realistic situations,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 4264–4271.
[17] Steffen Walter, Sascha Gruss, Hagen Ehleiter, Jun-
wen Tan, Harald C Traue, Stephen Crawcour, Philipp
Werner, Ayoub Al-Hamadi, and Adriano O Andrade,
“The biovid heat pain database data for the advancement
and systematic validation of an automated pain recogni-
tion system,” in Cybernetics (CYBCONF), 2013 IEEE
International Conference on. IEEE, 2013, pp. 128–131.
[18] Zheng Zhang, Jeff M Girard, Yue Wu, Xing Zhang,
Peng Liu, Umur Ciftci, Shaun Canavan, Michael Reale,
Andy Horowitz, Huiyuan Yang, et al., “Multimodal
spontaneous emotion corpus for human behavior anal-
ysis,” in Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2016, pp. 3438–
3446.
[19] Marcus Schmidt, Johannes W Krug, Andreas Gierstor-
fer, and Georg Rose, A real-time qrs detector based
on higher-order statistics for ecg gated cardiac mri,
in Computing in Cardiology Conference (CinC), 2014.
IEEE, 2014, pp. 733–736.
... In order to define the ROI accurately, most rPPG algorithms employ a face detection module upstream and subsequently localize the ROI on the basis of the resulting face crop. Most methods in prior work use static geometric regions as ROI based on the face bounding box and/or facial landmarks, such as the forehead (Sanyal and Nundy 2018;Blöcher et al. 2017) or cheeks (Feng et al. 2015;Nisar et al. 2016), or utilize pixel-based skin segmentation with color thresholds inside the face crop (Rapczynski et al. 2018;Wang et al. 2017b;Fouad et al. 2019). However, predefined static ROIs have the disadvantage that they are unable to react to interfering pixels caused by hair, beard, glasses, headgear and others. ...
... It originates from the IEC standard 60601-2-27, which is used for benchmarking medical electrocardiogram devices. Therefore, it is well suited for the given purpose and has already been employed in other works (Rapczynski et al. 2018(Rapczynski et al. , 2019. It gives the percentage of correctly determined windows, whereby a window is classified as correctly determined as soon as the error between estimated HR and ground truth is smaller than 10% of the ground truth or smaller than 5 beats per minute (bpm), depending on which of both is higher. ...
... Furthermore, a pixel-based skin segmentation was proposed as ROI by Rapczynski et al. (2018). It uses a lookup table approach which provides for each pixel a relative probability of being skin. ...
Article
Full-text available
The selection of a suitable region of interest (ROI) is of great importance in camera-based vital signs estimation, as it represents the first step in the processing pipeline. Since all further processing relies on the quality of the signal extracted from the ROI, the tracking of this area is decisive for the performance of the overall algorithm. To overcome the limitations of classical approaches for the ROI, such as partial occlusions or illumination variations, a custom neural network for pixel-precise face segmentation called FaSeNet was developed. It achieves better segmentation results on two datasets compared to state-of-the-art architectures while maintaining high execution efficiency. Furthermore, the Matthews Correlation Coefficient was proposed as a loss function providing a better fitting of the network weights than commonly applied losses in the field of multi-class segmentation. In an extensive evaluation with a variety of algorithms for vital signs estimation, our FaSeNet was able to achieve better results in both heart and respiratory rate estimation. Thus, a ROI for vital signs estimation could be created that is superior to other approaches.
... The main error measure for evaluation in this work is the metric defined by the International Electrotechnical Commission (IEC) in the standard 60601-2-27, which is referred to as IEC Accuracy. It was originally designed for medical approval of ECG devices and has already been utilised in other studies on camera-based HR estimation [16], [17]. An estimate is considered as valid if the absolute error between estimated HR and ground truth is less than 10% of the ground truth or at most 5 bpm, whichever is higher. ...
... The recorded signals were compared qualitatively and quantitatively. For the quantitative evaluation and comparison with the reference, the so-called success rate described in [37], the coverage with respect to IEC 60601-2-27 according to [44,45] and the Pearson correlation coefficient for the covered segments were used. The success rate gives the percentage of the recording in which the absolute difference between the estimated and reference heart/respiratory rate are bounded by a limit l. l was varied between 0 BPM and 10 BPM with a step size of 0.1 BPM. ...
Article
Full-text available
With higher levels of automation in vehicles, the need for robust driver monitoring systems increases, since it must be ensured that the driver can intervene at any moment. Drowsiness, stress and alcohol are still the main sources of driver distraction. However, physiological problems such as heart attacks and strokes also exhibit a significant risk for driver safety, especially with respect to the ageing population. In this paper, a portable cushion with four sensor units with multiple measurement modalities is presented. Capacitive electrocardiography, reflective photophlethysmography, magnetic induction measurement and seismocardiography are performed with the embedded sensors. The device can monitor the heart and respiratory rates of a vehicle driver. The promising results of the first proof-of-concept study with twenty participants in a driving simulator not only demonstrate the accuracy of the heart (above 70% of medical-grade heart rate estimations according to IEC 60601-2-27) and respiratory rate measurements (around 30% with errors below 2 BPM), but also that the cushion might be useful to monitor morphological changes in the capacitive electrocardiogram in some cases. The measurements can potentially be used to detect drowsiness and stress and thus the fitness of the driver, since heart rate variability and breathing rate variability can be captured. They are also useful for the early prediction of cardiovascular diseases, one of the main reasons for premature death. The data are publicly available in the UnoVis dataset.
... Usually, the whole face, the rectangular area of the face cut with a particular proportion, and the forehead, nose, or cheek area can be selected as ROI. Rapczynski et al. proved that the whole facial skin and forehead could obtain more accurate HR information [26]. More skin pixels can improve SNR to obtain a clearer rPPG signal and better HR estimation. ...
Article
Full-text available
The remote photoplethysmography (rPPG) based on cameras, a technology for extracting pulse wave from videos, has been proved to be an effective heart rate (HR) monitoring method and has great potential in many fields; such as health monitoring. However, the change of facial color intensity caused by cardiovascular activities is weak. Environmental illumination changes and subjects’ facial movements will produce irregular noise in rPPG signals, resulting in distortion of heart rate pulse signals and affecting the accuracy of heart rate measurement. Given the irregular noises such as motion artifacts and illumination changes in rPPG signals, this paper proposed a new method named LA-SSA. It combines low-rank sparse matrix decomposition and autocorrelation function with singular spectrum analysis (SSA). The low-rank sparse matrix decomposition is employed to globally optimize the components of the rPPG signal obtained by SSA, and some irregular noise is removed. Then, the autocorrelation function is used to optimize the global optimization results locally. The periodic components related to the heartbeat signal are selected, and the denoised rPPG signal is obtained by weighted reconstruction with a singular value ratio. The experiment using UBFC-RPPG and PURE database is performed to assess the performance of the method proposed in this paper. The average absolute error was 1.37 bpm, the 95% confidence interval was −7.56 bpm to 6.45 bpm, and the Pearson correlation coefficient was 98%, superior to most existing video-based heart rate extraction methods. Experimental results show that the proposed method can estimate HR effectively.
... The number of sensors must also match the number of individuals who are required to be measured. Therefore, wearable technology is unsuitable for certain individuals and environments, including newborns, drivers, exercisers, skindamaged patients and disaster-affected patients [10]. 2 of 21 To solve this problem, researchers are examining non-contact methods for measuring the pulse using facial video, which minimizes user intervention and enables monitoring of the pulse rate when driving [11], sleeping [12], or stress monitoring [13] is required. ...
Article
Full-text available
Pulse wave and pulse rate are important indicators of cardiovascular health. Technologies that can check the pulse by contacting the skin with optical sensors built into smart devices have been developed. However, this may cause inconvenience, such as foreign body sensation. Accordingly, studies have been conducted on non-contact pulse rate measurements using facial videos focused on the indoors. Moreover, since the majority of studies are conducted indoors, the error in the pulse rate measurement in outdoor environments, such as an outdoor bench, car and drone, is high. In this paper, to deal with this issue, we focus on developing a robust pulse measurement method based on facial videos taken in diverse environments. The proposed method stably detects faces by removing high-frequency components of face coordinate signals derived from fine body tremors and illumination conditions. It optimizes for extracting skin color changes by reducing illumination-caused noise using the Cg color difference component. The robust pulse wave is extracted from the Cg signal using FFT–iFFT with zero-padding. It can eliminate signal-filtering distortion effectively. We demonstrate that the proposed method relieves pulse rate measurement problems, producing 3.36, 5.81, and 6.09 bpm RMSE for an outdoor bench, driving car, and flying drone, respectively.
... Deciding separately for each pixel whether it is skin or not reduces the influence of interfering pixels, which can occur e.g. when wearing glasses or by hair hanging into the face. This assumption is also proved by the research results of [15] and [16], where the skin ROI outperforms all others. For our method we used the skin detection algorithm designed by Rapczynski et al. [17]. ...
Conference Paper
Full-text available
Video based heart rate estimation has several advantages compared to the classical method. Current approaches use long time windows (30sec) to calculate heart rates, which results in high latency and is a big disadvantage for a practical use. To overcome this constraint, we propose a low latency approach for continuous frame based heart rate estimation. It is based on combination of face tracking and skin detection using short time windows (10sec) to filter and analyze the extracted PPG signals in real time. In experiments the presented approach performs with high accuracy (85,2%, with error <3 BPM) under stable illumination conditions using a pain recognition data set including facial expressions and head movement for validation.
Conference Paper
Full-text available
Heart rate is an important indicator of people's physiological state. Recently, several papers reported methods to measure heart rate remotely from face videos. Those methods work well on stationary subjects under well controlled conditions, but their performance significantly degrades if the videos are recorded under more challenging conditions, specifically when subjects' motions and illumination variations are involved. We propose a framework which utilizes face tracking and Normalized Least Mean Square adaptive filtering methods to counter their influences. We test our framework on a large difficult and public database MAHNOB-HCI and demonstrate that our method substantially outperforms all previous methods. We also use our method for long term heart rate monitoring in a game evaluation scenario and achieve promising results.
Conference Paper
Full-text available
Nowadays, Cardiovascular Magnetic Resonance (CMR) is gaining popularity in medical imaging and diagnosis. The acquisition of CMR images needs to be synchronized with the current cardiac phase to compensate the motion of the beating heart. The Electrocardiogram (ECG) signal can be used for such applications by detecting the QRS complex. However, the magnetic fields of the MR scanner contaminate the ECG signal which hampers QRS detection during CMR imaging. This paper presents a new real-time QRS detection algorithm for CMR gating applications based on the higher order statistics of the ECG signal. The algorithm uses the 4th order central moment to detect the R-peak. The algorithm was tested using two different databases. One database consisted of 12-lead ECGs which were acquired from 9 subjects inside a 3 T Magnetic Resonance Imaging (MRI) scanner with a total of 9241 QRS complexes. The 12-lead ECG arrhythmia database from the St. Petersburg Institute of Cardiological Technics (InCarT) served as the second database. 168341 QRS complexes were used from this database. For the ECG database acquired inside the MRI scanner, the proposed algorithm achieved a sensitivity (Se) of 99.99% and positive predictive value (+P) of 99.60%. Using the InCarT database, Se=99.43% and +P=99.91% were achieved. Hence, this algorithm enables a reliable R-peak detection in real-time for triggering purposes in CMR imaging.
Article
Objective: This paper aims to improve the rPPG technology targeting continuous heart-rate measurement during fitness exercises. The fundamental limitation of the existing (multi-wavelength) rPPG methods is that they can suppress at most n - 1 independent distortions by linearly combining n wavelength color channels. Their performance are highly restricted when more than n - 1 independent distortions appear in a measurement, as typically occurs in fitness applications with vigorous body motions. Approach: To mitigate this limitation, we propose an effective yet very simple method that algorithmically extends the number of possibly suppressed distortions without using more wavelengths. Our core idea is to increase the degrees-of-freedom of noise reduction by decomposing the n wavelength camera-signals into multiple orthogonal frequency bands and extracting the pulse-signal per band-basis. This processing, namely Sub-band rPPG (SB), can suppress different distortion-frequencies using independent combinations of color channels. Main results: A challenging fitness benchmark dataset is created, including 25 videos recorded from 7 healthy adult subjects (ages from 25 to 40 yrs; six male and one female) running on a treadmill in an indoor environment. Various practical challenges are simulated in the recordings, such as different skin-tones, light sources, illumination intensities, and exercising modes. The basic form of SB is benchmarked against a state-of-the-art method (POS) on the fitness dataset. Using non-biased parameter settings, the average signal-to-noise-ratio (SNR) for POS varies in [-4.18, -2.07] dB, for SB varies in [-1.08, 4.77] dB. The ANOVA test shows that the improvement of SB over POS is statistically significant for almost all settings (p-value <0.05). Significance: The results suggest that the proposed SB method considerably increases the robustness of heart-rate measurement in challenging fitness applications, and outperforms the state-of-the-art method.
Article
Remote imaging photoplethysmography (RIPPG) can achieve contactless monitoring of human vital signs. However, the robustness to a subject’s motion is a challenging problem for RIPPG, especially in facial video-based RIPPG. The RIPPG signal originates from the radiant intensity variation of human skin with pulses of blood and motions can modulate the radiant intensity of the skin. Based on the optical properties of human skin, we build an optical RIPPG signal model in which the origins of the RIPPG signal and motion artifacts can be clearly described. The region of interest (ROI) of the skin is regarded as a Lambertian radiator and the effect of ROI tracking is analyzed from the perspective of radiometry. By considering a digital color camera as a simple spectrometer, we propose an adaptive color difference operation between the green and red channels to reduce motion artifacts. Based on the spectral characteristics of photoplethysmography signals, we propose an adaptive bandpass filter to remove residual motion artifacts of RIPPG. We also combine ROI selection on the subject’s cheeks with speeded-up robust features points tracking to improve the RIPPG signal quality. Experimental results show that the proposed RIPPG can obtain greatly improved performance in accessing heart rates in moving subjects, compared with the state-of-the-art facial video-based RIPPG methods.
Article
Human skin detection in images is desirable in many practical applications, e.g., human-computer interaction and adult-content filtering. However, existing methods are mainly suffer from confusing backgrounds in real-world images. In this paper, we try to address the issue by exploring and combining several human skin properties, i.e. color property, texture property and region property. First, images are divided into superpixels, and robust skin seeds and background seeds are acquired through color property and texture property of skin. Then we combining color, region and texture properties of skin by proposing a novel skin color and texture based graph cuts (SCTGC) to acquire the final skin detection results. Comprehensive and comparative experiments show that the proposed method achieves promising performance and outperforms many state-of-the-art methods over publicly available challenging datasets with a great part of hard images.
Conference Paper
In this paper, we present a new method for skin detection and segmentation, relying on spatial analysis of skin-tone pixels. Our contribution lies in introducing self-adaptive seeds, from which the skin probability is propagated using the distance transform. The seeds are determined from a local skin color model that is learned on-line from a presented image, without requiring any additional information. This is in contrast to the existing methods that need a skin sample for the adaptation, e.g., acquired using a face detector. In our experimental study, we obtained F-score of over 0.85 for the ECU benchmark, and this is highly competitive compared with several state-of-the-art methods.