PreprintPDF Available

When is it good to use wristband devices to measure HRV?: Introducing a new method for evaluating the quality of data from photophlethysmography-based HRV devices

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Objective: Recent technological advances have led to the proliferation of ambulatory devices for non-invasively assessing cardiac activity. While these devices have exciting implications for conducting research outside the laboratory, it is critical that this increased mobility does not compromise data quality. As a test case, we assess the efficacy of Empatica’s E4, a high-end wristband device designed to assess Heart Rate Variability (HRV) through the use of photoplethysmography. Approach: We compare the E4 to traditional, wired electrocardiogram measures across a variety of conditions, including seated, supine, and standing baselines, as well as typing and grip strength tasks. Most importantly, we introduce and demonstrate the efficacy of a new method for determining the amount of error in HRV estimates derived from the E4 and a technique for adjusting error tolerance. Main Results: Results indicate that the E4 is severely compromised by motion artifact, resulting in a high percentage of missing data across all conditions except seated and supine baselines. Employing error adjustment yielded more robust results, but at the cost of significantly reducing sample size where motion artifact was present. Significance: These results call into question the wristband’s efficacy as an HRV measurement tool in most in-vivo conditions. We recommend that researchers interested in using photoplethysmography-based HRV devices use caution and evaluate the data quality using methods for error detection and tolerance, such as the one presented here.Keywords: heart rate variability, ambulatory photoplethysmography, electrocardiogram.
Content may be subject to copyright.
Running Head: A method for evaluating PPG-based HRV devices
When is it good to use wristband devices to measure HRV?: Introducing a new method for
evaluating the quality of data from photophlethysmography-based HRV devices
William S. Ryan
University of Toronto, St. George
James Conigrave
Geetanjali Basarkod,
Joseph Ciarrochi,
& Baljinder K. Sahdra
Institute for Positive Psychology and Education, Australian Catholic University
Author Notes:
Correspondence concerning this article should be addressed to William S. Ryan, Department of
Psychology, University of Toronto, St. George, 100 St. George St. Sidney Smith Hall
Toronto, ON M5S 3G3. Email: Phone: 647-551-0496.
The authors declare that they have no conflicts of interest.
Running Head: A method for evaluating PPG-based HRV devices
Objective: Recent technological advances have led to the proliferation of ambulatory devices for
non-invasively assessing cardiac activity. While these devices have exciting implications for
conducting research outside the laboratory, it is critical that this increased mobility does not
compromise data quality. As a test case, we assess the efficacy of Empatica’s E4, a high-end
wristband device designed to assess Heart Rate Variability (HRV) through the use of
photoplethysmography. Approach: We compare the E4 to traditional, wired electrocardiogram
measures across a variety of conditions, including seated, supine, and standing baselines, as well
as typing and grip strength tasks. Most importantly, we introduce and demonstrate the efficacy of
a new method for determining the amount of error in HRV estimates derived from the E4 and a
technique for adjusting error tolerance. Main Results: Results indicate that the E4 is severely
compromised by motion artifact, resulting in a high percentage of missing data across all
conditions except seated and supine baselines. Employing error adjustment yielded more robust
results, but at the cost of significantly reducing sample size where motion artifact was present.
Significance: These results call into question the wristband’s efficacy as an HRV measurement
tool in most in-vivo conditions. We recommend that researchers interested in using
photoplethysmography-based HRV devices use caution and evaluate the data quality using
methods for error detection and tolerance, such as the one presented here.
Keywords: heart rate variability, ambulatory photoplethysmography, electrocardiogram.
A method for evaluating PPG-based HRV devices
When is it good to use wristband devices to measure HRV?: Introducing a new method for
evaluating the quality of data from photophlethysmography-based HRV devices
Recent technological advances have led to the proliferation of ambulatory devices for
non-invasively assessing cardiac activity. Wristband devices, in particular, have become popular
for tracking heart rate as well as other fitness-related parameters such as daily activity and
calories burnt. The majority of these devices are developed for the consumer market, although
some, like Empatica’s E4 (, are designed specifically with
physiological research in mind, including measurement of heart rate variability (HRV). These
wearable devices have important implications for physiological research as they enable data
collection outside of the laboratory, allowing for the measurement of physiological activity as it
occurs in reaction to real-life situations (e.g Muaremi, Arnrich, & Tröster, 2013). Wristband
devices are also relatively inexpensive and significantly more comfortable than traditional, lab-
based cardiac measures, facilitating the collection of data from more participants over longer
periods of time. There is currently, however, little research examining the validity and reliability
of cardiac measures, such as heart rate variability (HRV) derived from wristband devices
(McCarthy, Pradhan, Redpath, & Adler, 2016; Ollander, Godin, Campagne, & Charbonnier,
2016). The present research seeks begins to fill this gap by introducing a new method of
evaluating the quality of data obtained from photoplethysmography (PPG)-based measurements
of HRV. As a test case, we compared HRV measures derived from Empatica’s E4
(, a wristband device designed specifically for physiological research,
to those derived from traditional ECG. We then subjected the data obtained from E4 to our new
method of error detection to more closely evaluate the quality of data obtained from E4.
Heart Rate Variability
A method for evaluating PPG-based HRV devices
Heart rate variability (HRV) is a non-invasive means of assessing autonomic function,
with greater variability generally interpreted as indicating greater autonomic control (Berntson et
al., 1997; Porges, 2007; Saul, 1990). The variability in HRV refers to oscillations in the interval
between consecutive heartbeats or instantaneous estimations of heart rate. These varying
distances are referred to as inter-beat intervals (IBIs) and result from the relative activation
and/or inhibition of the sympathetic and parasympathetic branches of the autonomic nervous
system and their respective influences on the heart (Berntson, Cacioppo, & Quigley, 1991).
Under conditions of stress, sympathetic nervous system activation increases the firing rate of the
sino-atrial node, the heart’s pacemaker, increasing heart rate (Appelhans & Luecken, 2006). The
parasympathetic nervous system, acting primarily via vagal innervation of the heart, has an
opposing, inhibitory effect on sino-atrial node firing, acting as a break to slow down heart rate
(Katona & Jih, 1975; Bertsch, Hagemann, Naumann, Schächinger, & Schulz, 2012). Heart rate
variability results from the co-activation, co-inhibition, or the activation of one and inhibition of
the other of the parasympathetic and sympathetic branches (Bernston et al., 1991). These two
branches of the ANS operate in concert to adaptively and dynamically regulate cardiac function
in response to changing circumstances and external stimuli (Acharya, Joseph, Kannathal, Lim, &
Suri, 2006).
Vagal innervation is impacted by the respiration cycle; it is suppressed on inhalation
leading heart rate to fluctuate as the vagal break is turned on and off with exhalation and
inhalation, respectively. Variability within the respiration frequency band is termed respiratory
sinus arrhythmia (RSA) and is used to index vagal efferent activity, or vagal tone, non-invasively
(Grossman & Taylor, 2007). Greater variability in RSA equates with greater vagal strength or
tone (Bertsch, et al., 2012; Grossman & Taylor, 2007) and is associated with a host of positive
A method for evaluating PPG-based HRV devices
socioemotional outcomes such as connectedness, positive emotions, interpersonal style,
attention, and working memory (e.g. Kok & Fredrickson, 2010; Thayer, Hansen, Saus-Rose, &
Johnson, 2009).
Due to the non-invasive nature of RSA, or high frequency HRV, it is assessed in a wide
variety of domains. Initially, HRV was primarily utilized for its clinical applications, as a means
of indexing cardiac health and disease (e.g. Akselrod, et al., 1981; Carney, et al., 2001; Guzzetti,
et al., 1988; Kamath & Fallen, 1995; Pagani, et al., 1986; Masi, Hawkley, Rickett, & Cacioppo
2007; Thayer & Lane, 2007). It has also been employed to monitor individuals with diabetes
(Ewing, Borsey, Bellavere, & Clark, 1981), detect renal failure (Akselrod, Eliash, Oz, & Cohen,
1987), track physical fitness (Davy, DeSouza, Jones, & Seals, 1998), diagnose sleep disorders
(Drinnan, Allen, Langley, & Murray, 2000), track sleep quality (Elsenbruch, Harnish, & Orr,
1999), and assess the impact of tobacco smoke (Pope, et al., 2001) and alcohol (Malpas,
Whiteside, & Maling, 1991) on cardiac function. For a review of physiological applications of
HRV, see Acharya, et al. (2006).
More recently, HRV has also been used to assess a growing number of
psychophysiological processes and outcomes related to cardiovagal control, including attention
(Porges, 1992), emotion regulation (Calkins & Johnson, 1998), self-regulation (Geisler, Kubiak,
Siewert, & Weber, 2013), stress reactivity (Fabes & Eisenberg, 1997), anxiety (Friedman, 2007),
depression (Chambers & Allen, 2002; Rottenberg, 2007), group affiliation (Sahdra, Ciarrochi &
Parker, 2015), and wise reasoning (Grossmann, Sahdra & Ciarrochi, 2016), among others. HRV
has been examined as both a trait-like individual difference variable and a state-like variable that
fluctuates within and between persons in relation to specific circumstances, tasks, or
interventions (e.g. Miu, Helman, & Miclea, 2009). The popularity of HRV measures is due
A method for evaluating PPG-based HRV devices
largely to the fact that they are relatively easy to collect, non-invasive, and reproducible (Ge,
Srinivasan, & Krishnan, 2002; Kleiger, et al., 1991).
Metrics of Heart Rate Variability
Thus far, we have discussed HRV as if it were a unified construct. However, there are
multiple methods for calculating HRV from a given cardiac signal that reflect different
components of autonomic control over the cardiovascular system. Most common in
psychophysiological research are methods that operate in the time domain and those that operate
in the frequency domain (for comprehensive review of all HRV calculation methods see Allen,
Chambers, & Towers, 2007).
Time domain measures. These examine the intervals between each QRS complex in the
ECG wave directly. These intervals are calculated based on the time between successive R-peaks
and are referred to as RR intervals, inter-beat intervals (IBIs), or normal-to-normal (NN)
intervals (see Figure 1). Statistical parameters can then be calculated using the NN intervals as
input data. Assessed in milliseconds, these time-based measures are the simplest to calculate and
at their most basic include mean heart rate and mean NN interval (Stein & Kleiger, 1999).
Additional parameters that reflect the variation in IBIs may also be calculated from the NN time
series data itself. For example, SDNN, one of the most common time-based measures of HRV, is
simply the standard deviation of the NN intervals (Murray et al., 1975). SDNN estimates overall
HRV and reflects both short and long-term variation in the timing of cardiac cycles when
measured over a long period of time. Critically, SDNN values are heavily dependent on the
length of the recording and so must be interpreted with caution when comparing across studies
with different designs (Ewing et al., 1981; Saul, Albrecht, Berger, & Cohen, 1987). It is,
therefore, recommended that recordings be standardized to 5 minutes in length (Task Force of
A method for evaluating PPG-based HRV devices
the European Society of Cardiology and the North American Society of Pacing and
Electrophysiology, 1996). To estimate just the long-term components, the standard deviation of
the average NN intervals calculated over 5-minute periods of time (SDANN) may be used
(Acharya, et al., 2006).
Statistical parameters can also be calculated based on the differences between adjacent
intervals. These include the standard deviation of successive differences between NN intervals
(SDSD) and the root mean square of successive difference intervals (RMSSD; Acharya, et al.,
2006, Allen, et al., 2007). Of these difference-based measures, RMSSD is recommended for its
statistical properties and because it reliably estimates short-term variation in the timing of
cardiac cycles (Task Force, 1996).
Spectral Analysis. Heart rate variability can also be examined within specific frequency
ranges, allowing the researcher to home in on specific influences on the variability, such as
respiration. Here the IBI series is subjected to a Fourier transform or autoregressive modeling to
represent it in the frequency domain. This method is often termed spectral analysis as it yields a
power spectrum, the specific frequency bands of which can then be examined. Activity in the
low frequency range (0.04-0.12 Hz) is associated with activity of the baroreflexes (Goldstein,
Bentho, Park, & Sharabi, 2011) and is posited by some researchers to index sympathetic tone
especially when measured over long periods of time (Pagani et al., 1997). Fluctuations in the
high frequency range (0.12-0.4 Hz) reflect the impact of parasympathetic nervous system activity
on heart rate via the vagus nerve (Berntson, et al., 1997; Akselrod, et al., 1981). Power in this
range is believed to be indicative of a relatively stable autonomic disposition (Bertsch et al.,
2012; Thayer et al., 2009) and has been linked with a host of positive traits and outcomes related
to socio-cognitive functioning (e.g. Appelhans & Luecken, 2006; Thayer et al., 2009).
A method for evaluating PPG-based HRV devices
Examining the full frequency range, or total power, provides an estimate of overall variability
and overall autonomic activity and is highly correlated with SDNN (Task Force, 1996).
Collecting Heart Rate Variability Data
Electrocardiogram. HRV is traditionally assessed using an electrocardiogram (ECG),
which records the electrical activity of the heart using bipolar sensors placed along Einthoven’s
triangle (Einthoven, Fahr, & De Waart, 1913). The resulting ECG waveform reflects changes in
voltage associated with the various phases of the cardiac cycle. Specifically, the QRS complex
indicates ventricular depolarization and its peaked shape makes it a robust and easily identified
index of cardiac chronotropy. The distance between the R peaks in milliseconds form the inter-
beat interval (IBI) series that is used to calculate the majority of HRV metrics (see Figure 1).
Traditionally, ECGs have been rather cumbersome hospital or lab-bound devices. While
these are suitable for many types of clinical and other research, being tethered restricts options
for monitoring people long-term and conducting longitudinal and ecologically valid research. For
this reason, ECGs themselves have become increasingly ambulatory (Sandercock, Shelton,
Bromley, & Brodie, 2004; Koudstaal, van Gijn, Klootwijk, vad der Meche, & Kappelle, 1986;
Jaboudon, Sztajzel, Sievert, Landis, & Sztajzel, 2004), with some applications integrating ECG
sensors into textiles in the form of sensorized t-shirts (Paradiso, Faetti, & Werner, 2011). Despite
their designation as ambulatory devices, many of these wearable ECGs are highly sensitive to
motion artifact (Romero, et al., 2011), leading many to recommend data collection during sleep
(e.g., Muaremi, et al., 2013) and driving research into motion correction algorithms (e.g.
Alqaraawi, Alwosheel, & Alasaad, 2016).
Photoplethysmography. Interest in the ambulatory measurement of cardiovascular
activity and autonomic function has driven research into alternative assessment methods. One of
A method for evaluating PPG-based HRV devices
the most promising of these new methods is the estimation of cardiac chronotropy using pulse
signals or photoplethysmography (PPG) (e.g., Schäfer & Vagedes, 2013; Bolanos, Nazaran, &
Haltiwanger, 2006). PPG methods detect blood volume pulse (BPV) or variation in the volume
of atrial blood in the microvasculature of peripheral tissues resulting from the cardiac cycle
(Challoner, 1979). This is accomplished by shining a small LED light on the wrist, fingertip,
earlobe, or forehead and then measuring how much of that light is either absorbed or reflected
(depending on whether transmission or reflection PPG is being done). When blood volume is
greater, more light is absorbed, meaning that there is less light available to be detected by the
PPG sensor. In this way, changes in blood volume can be tracked to reflect the number and
timing of cardiac cycles (Kamal, Harness, Irving, & Mearns, 1989), allowing for the calculation
of pulse rate and pulse rate variability (PRV). The peaks of the PPG wave can be used just like
the R-peaks of the ECG waveform to estimate the number of heartbeats as well as the intervals
between them (IBIs). The primary difference between the two types of measurement is pulse
transit time (PTT), or the time it takes for the pulse wave to travel from the heart to the periphery
(Jago & Murray, 1988). Thus, PRV and HRV measures should yield highly similar results.
Indeed, research indicates that the two measures are highly correlated, especially when
assessed at rest among healthy participants (Foo & Wilson, 2006; Schäfer & Vagedes, 2013; Lu,
Yang, Taylor, & Stein, 2009; Rauh, Limley, Bauer, Radespiel-Troger, & Mueck-Weymann,
2004). For example, using a finger pulse oximeter to assess PPG alongside a traditional ECG,
Jeyhani, Mahdiani, Peltokangas, and Vehkaoja (2015) found PPG- and ECG- derived HRV
metrics to be highly correlated with one another, with the exception of the pNN50 metric.
Importantly, the majority of these studies have examined correspondence between the
two methods under relatively motion-free conditions. Indeed, Gil and colleagues (Gil, et al.,
A method for evaluating PPG-based HRV devices
2010) were among the first to test the efficacy of PPG-derived HRV under non-stationary
conditions. While these authors did find good correspondence between the two methods, the
non-stationary condition assessed was the tilt-table test in which the participant moves very little
volitionally, but rather, is positioned at different degrees of inclination by a moving table (Julu,
Cooper, Hansen, & Hainsworth, 2003). Lu and Yang (2009) conducted a similar comparison of
traditional ECG and PPG measured at the fingertip. They found that when high quality PPG data
were obtained, HRV values from the two methods were highly correlated. However, data were
not always usable; motion artifact prevented analysis of PPG data from a significant number of
participants. While these results may appear somewhat surprising given that participants were
instructed to sit still for the duration of each 5-minute recording epoch, this finding is consistent
with other researchers’ concerns regarding the high sensitivity of PPG to motion artifact (Allen,
In part because of problems with motion artifact and partly in an effort to maximize
comfort when worn long-term, there has been increasing research into measuring PPG using
wristband devices (Garbarino, Lai, Bender, Picard, & Tognetti, 2014; Tamura, Maeda, Sekine, &
Yoshida, 2014). Arberet and colleagues (2013) compared HRV measured by a wrist PPG sensor
(CSEM propriety wrist monitor) to traditional ECG-derived HRV assessed overnight and found
the two measures to correlate at approximately .90 for both temporal and spectral HRV metrics.
While high, these correlations were based on data obtained during sleep when presumably
participants were moving very little, almost certainly much less than they would move during the
Renevey, Vetter, Krauss, Celka, and Depeursinge (2001) developed and tested a similar
wrist-based PPG device with an accelerometer. Using their motion correction algorithms, they
A method for evaluating PPG-based HRV devices
found acceptable rates of error during both baseline and physical activity (running). However,
data were collected only over 10-second intervals. Preejith, Annamol, Jayaraj, & Mohanasankar
(2016) tested another, similar wristband device and also found a high degree of correlation with
traditional ECG. This study, however, was conducted in a hospital setting and data were
collected for only 20 seconds at a time. Still, motion artifact was noted as an issue.
In this climate of growing interest in wrist-worn PPG devices for HRV measurement,
many devices have been introduced in the market. Consider, for example, the Empatica E4, the
latest model of Empatica wristbands. The E4 developed by Empatica ( is
designed for use by researchers interested in tracking these measures over time and outside of the
laboratory. The device is somewhat similar in appearance to the Fitbit, Apple watch, and other
commercially available health tracking devices worn on the wrist. However, it is notably larger
than these commercial models and does not have a display screen. A primary difference between
the E4 and these more commonly used devices is that, with the E4, it is possible for the
researcher or user to gain access to the full time-series of data rather than only summary
statistics. Although not discussed in this paper, the E4 can also assess electrodermal activity
(EDA) by applying a very low (8 Hz) alternating current to the inside of the wrist and assessing
resistance. In addition, the E4 also contains an accelerometer that detects motion along 3 axes
(X, Y, Z).
The E4 is marketed specifically to researchers as a “wearable wireless device designed
for continuous, real-time data acquisition in daily life” for assessing HRV under various
conditions in and outside of the laboratory ( However, for researchers
interested in using the E4 for its advertised purpose, there is a dearth of peer-reviewed studies
demonstrating its validity as a turnkey tool for continuous measurement of HRV in different
A method for evaluating PPG-based HRV devices
contexts. Empatica’s previous model, the E3, was touted as an improvement on finger-based
measures, especially during tasks employing minimal wrist motion such as typing (Garbarino, et
al., 2014), and the E4, presumably, is an improvement on the E3. Like other commercially
available wristband devices, the E3 and the E4 utilize an accelerometer paired with proprietary
algorithms for detecting motion and removing data contaminated by motion artifact (Garbarino,
et al., 2014).
As far as we know, there are only three peer-reviewed published studies in which
researchers attempted to evaluate E4’s performance using different benchmarks. In one study by
McCarthy and colleagues (2016), participants wore the E4 and a Holter ambulatory ECG
monitor for 24 to 48 hours. A subset of 15-second overlapping segments of data were sampled.
The researchers indicated that over 85% of these segments contained usable data from both
devices. However, of the 15% of cases in which one device yielded poor quality data, this was
twice as likely to be due to failure of the PPG rather than the ECG. Importantly, the majority of
good quality data were collected during the night, presumably when motion was minimal.
In another attempt to test E4’s validity in a controlled lab context, Ollander and
colleagues (2016) compared the E4 with classic ECG and finger conductivity electrodes during a
social stress task. They found that the E4 failed to detect a significant number of inter-beat
intervals, while time-domain features (i.e. heart rate) were accurately estimated. However, as
only seven participants (one male) were included in this study, more work is needed to establish
the reliability and generalizability of these results. Similarly, Corino, Matteucci, and Mainardi
(2007) examined the E4’s performance in a highly controlled context. They analyzed BVP
signals from the E4 to develop algorithms for detecting atrial defibrillation. Recordings were
obtained at rest but only 2 minutes of a 10-minute segment were analyzed. Critically, this study
A method for evaluating PPG-based HRV devices
did not assess the reliability of the E4 during daily activities that include motion. While
promising, these preliminary studies on the E4 do not show a systematic comparison of the E4’s
performance across different conditions, as compared to the performance of the gold-standard
classic ECG-based measurement of HRV in the same conditions.
Present Research
The purpose of this study is to devise a method for evaluating the quality of HR and HRV
data obtained using a purportedly research-grade wristband device that utilizes reflective PPG to
monitor cardiac chronotropy. We employ Empatica’s latest model, E4 as test case, but our
method is general enough to be applied to similar data obtained by other wrist-band devices if
the raw data is accessible to researchers. The E4 has strong promise as a tool for researchers
wishing to assess cardiovascular functioning with minimal, non-invasive equipment and outside
the laboratory. However, although research has been done to confirm the safety and validity of
these devices (McCarthy et al., 2016; Ollander, et al., 2016), little has been done to validate
derived measures across a variety of contexts (including those that require movement) and in
large samples.
The goal of the present research is to conduct such validation tests using traditional
methods of correlating the data obtained from the test device with simultaneous data obtained
from a gold-standard device. Additionally, we also aim to test our new method for a more fine-
grained evaluation of the quality of data obtained from these devices. Specifically, we first
compare heart rate (HR) and heart rate variability (HRV) values derived from the E4 to those
obtained from a traditional ECG (Biopac, Goleta, CA), when collecting data simultaneously with
both systems. Previous research testing the validity of the E4 has utilized data collected
continuously over the course of a day (or more) during which undifferentiated activity has
A method for evaluating PPG-based HRV devices
occurred, or has assessed HRV in relation to just one specific stress test. The current research
improves on these methods by systematically comparing measures derived by the E4 and a
traditional, wired ECG system across a variety of conditions. Specifically, participants’ HRV
was assessed during standing, seated, and supine rest, as well as during computer and grip
strength tasks. We selected these conditions to reflect postures and tasks that people are likely to
engage in both within and outside the laboratory. This allows us to examine the reliability and
validity of the E4 under conditions most likely to be relevant to researchers and to assess the
effects of these tasks and postures on motion artifact.
Equally importantly, the present study contributes to the literature on ambulatory HRV
measures by offering a new method (as detailed below) for determining the amount of error in
HRV estimates and for adjusting error tolerance. This method assesses the percentage of
heartbeats the device failed to detect. The researcher can then select a cutoff value, and exclude
all cases with missingness greater than this value. Employing these methods yields more robust
results than others have obtained using the proprietary algorithms included in the E4 software.
A total of 78 undergraduates from the University of California, Santa Barbara
participated in this study. Informed consent was obtained from all participants before
commencing the study procedures and all participants were compensated with course credit.
Responses to the health screening form indicated that all participants were in good
cardiovascular health. Of the 78 total participants, 56 were women, 21 were men, and one
identified as genderqueer/non-binary. Participants ranged in age from 18 to 30 (M = 19.85, SD =
A method for evaluating PPG-based HRV devices
1.77) and were relatively ethnically diverse: 37.2% White, 20.5% Asian/Asian American, 20.5%
Latino/Latina, 10.3% Multi-racial, 3.8% Black/African American, 3.8% Arabic/Middle Eastern,
and 3.8% selected “other”.
Upon arriving at the lab, a trained research assistant described the study procedures and
obtained informed consent from the participant. The research assistant then applied the
electrodes and wristband device (E4) necessary for the collection of relevant physiological
signals (see Measures & Materials section below). The participant then performed a 6-minute
seated baseline followed by two additional 3-minute baseline periods, one standing and one
supine (the order of the latter two baselines was counter-balanced). Next, participants completed
a computer-based survey assessing demographic information, well-being, and other psychosocial
variables. Given our goal of comparing physiological indices, we do not report on these
psychosocial variables in the present research. However, we do utilize the physiological data
collected while participants were completing these measures to assess the impact of the motion
of using a mouse and typing on resultant data. Participants then completed a hand-grip strength
task where they were instructed to squeeze a digital dynamometer as hard as possible three times,
each trial separated by approximately 10 seconds and at the signal of the experimenter.
Participants did this for both their dominant and non-dominant hands (order counter-balanced).
After this, the research assistant removed all sensors from the participant. Finally, the participant
was debriefed as to the purpose of the study, thanked, and dismissed.
Physiological Measures
Electrocardiograph (ECG). ECG signals were recorded continuously using a Biopac
MP150 system with ECG amplifier (Model ECG 100C) and a modified lead II electrode
A method for evaluating PPG-based HRV devices
configuration (Sherwood, et al., 1990). Data were integrated with a Biopac MP150 and stored
using Acknowledge software (version 4.3, Goleta, CA).
E4 wristband. The E4 wristband from Empatica was used simultaneously to assess PPG-
derived HRV. This device works by simply placing the watch-like wristband around either the
left or right wrist. For all participants it was placed on the dominant hand for the duration of the
study, except for when moved to the non-dominant wrist when testing grip strength of the
dominant hand (to reduce motion artifact). Data collected using the E4 was uploaded to
Empatica Connect, a secure site maintained by Empatica where the researcher can visualize and
download the data stream. Data were stored and identified using only a numeric string reflecting
the date and time when the data were collected.
Once data collection for all participants was complete, all data were downloaded from the
Empatica website as a series of zip folders. Using the statistical program, R (R Core Team,
2017), each .csv containing IBIs were combined into a single data frame. Using an experiment
log that noted the timing of each participant’s data collection, this data frame was then filtered to
create windows of analysis for each participant and each task. These resulting data frames were
then labeled and saved to .csv files for further analysis.
Computing HRV Measures. Each of these data streams was processed using Kubios 2.2
(Tarvainen, Niskanen, Lipponen, Ranta-aho, & Karjalainen, 2013), a software designed
specifically for HRV analysis. Using Kubios, the following time and frequency parameters of
cardiac function were calculated (among others) for the ECG as well as the E4 data:
1. Mean heart rate (meanhr)
3. Total power (TP)
A method for evaluating PPG-based HRV devices
4. Power in the low frequency range (0.04-0.12 Hz) (LFP)
5. Power in the high frequency range (0.12-0.4 Hz) (HFP)
Calculating Missingness. In order to assess the quality of the E4-derived HRV data, we
created a metric called ‘missingness’. As estimates of heart rate are generally robust and highly
correlated across devices and across conditions, these can be used to estimate the number of IBIs
that should have been recorded. Missingness was then calculated by dividing the observed IBIs
by the expected number of heartbeats (meanhr * minutes). The observed IBIs were increased by
1, so that it could be compared to heart rate.
𝑚𝑖𝑠𝑠𝑖𝑛𝑔𝑛𝑒𝑠𝑠 =1!
𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑!𝐼𝐵𝐼𝑠 +1
The data was then filtered based on the degree of missingness. We tested and compared
thresholds for missingness. Data was filtered such that cases missing more than 35% and then
more than 25% of the heartbeats were excluded from analysis. Sample R code used to implement
this method can be found in Appendix A.
We first matched session data to participants. After combining all data into a continuous
table, data were filtered to the timeframe in which participants completed tasks. As the E4 record
IBIs only when data quality is strong, in many cases participants had no recorded data. This was
especially problematic in tasks that had a shorter duration or those involving movement. IBIs
were recorded for 62 participants in the supine condition, 56 participants in the seated baseline
condition, 50 in the standing condition, 25 in the non-dominant grip condition, and 20 in the
dominant grip condition.
A method for evaluating PPG-based HRV devices
Pearson’s correlations were conducted comparing a range of HRV metrics obtained from
the E4 wristbands, to those obtained from the Biopac devices (see Table 1). While mean heart
rate was closely correlated in all conditions, the metrics of heart rate variability were not as
strongly correlated as one would expect from devices assumed to be measuring the same thing.
However, many files had large sequences where IBIs were not recorded due to poor signal.
Using our novel method of error detection discussed above, we next tested to see if correlations
between E4 and Biopac devices could be improved by excluding files with poor quality data.
Note that the same pattern of results emerged when calculating Spearman’s correlations, which
use rank ordering and are, therefore, less susceptible than Pearson’s correlations to artificially
inflated estimates due to potentially influential data-points or extreme values.
Filtering cases resulted in loss of data but a large increase in the correlation between
Biopac and E4 devices (see Tables 2 and 3). When missingness was less than .35, 15 cases were
lost from the baseline condition but the correlation between E4 and Biopac was improved from
.72 to .97. Decreasing the threshold further from .35 to .25 removed an additional 10 cases in the
baseline condition but the correlations were not greatly improved. In some conditions (grip and
standing conditions), relatively few participants had any useable data at a cutoff of .35. This
resulted in poorer correlations in the data between the E4 wristbands and the Biopac device.
Minimizing motion artifact is a major challenge to the development of equipment to
reliably assess HRV and other indices of cardiovascular performance and health. Results of the
present study indicate that despite being designed specifically for ambulatory use, the E4, like
other devices for assessing HRV, still suffers from issues of motion artifact. Specifically, when
A method for evaluating PPG-based HRV devices
data were filtered to exclude cases where 35% or more of the E4 heartbeat data were missing, the
number of viable data points was reduced by almost half (or more) in all but the seated and
supine resting conditions. Setting the cutoff at 25% missingness yielded even fewer viable cases
across all conditions. In other words, the E4 performed best when participants were sitting or
lying still. However, even in these conditions, correlations between frequency-derived HRV
measures (HFP, LFP) assessed via the E4 and wired ECG were moderate to strong (.40 - .82).
Excluding cases that were missing more than 35% and more than 25% of heartbeats increased
these correlations to .86 to .97 and .85 to .98, respectively, but greatly reduced the number of
viable cases. When participants were standing or engaging in the typing task or grip task, data
quality suffered greatly such that excluding cases with missingness greater than 25% depleted
the number of viable cases to 15 or fewer, too few to produce reliable estimates of HRV.
The key contribution of this study is the technique for assessing error in HRV data
collected with the E4 or other PPG-based wristbands, and for determining and implementing cut-
off values for error tolerance. Estimating error directly, can help researchers to better evaluate
the quality of their data and exclude affected data, thus minimizing the impact of error on results.
In our sample, when error levels were adjusted to less than .25, the correlation between E4 and
wired-ECG derived measures of high frequency HRV for the standing condition increased from
.20 to .90. However, in doing so, data from more than half of the participants were excluded.
When applying error tolerance cutoffs to the typing data, virtually all data were rejected. It
remains to be seen whether other wristband devices of HRV measurement would do any better
than the test case of E4 we used in this study.
Wristband measures are appealing for their non-obtrusive nature and comfort. But due to
the frequent movement of hands and arms, other sites might be better than the wrist. Ambulatory
A method for evaluating PPG-based HRV devices
measures that strap around the chest or clip to the earlobe are being developed and may minimize
artifact. Given the problems with motion artifact, researchers are also focusing both on
improving hardware and improving algorithms for cleaning and processing data, to minimize the
influence of motion artifact. For example, efforts are being made to use accelerometers in
conjunction with motion artifact reduction algorithms to improve data quality (e.g., Han, Kim, &
Kim, 2007; Han & Kim, 2012; Kim, Ryoo, & Bae, 2007; Yousefi, Nourani, Ostadabbas, &
Panahi, 2014). Other researchers are examining linear prediction analysis (LPC) and wavelet
transformation techniques to reduce the impact of motion artifact and accurately estimate HRV
(Alqaraawi et al, 2016). As those developments occur, the HRV data quality will likely improve.
However, any new techniques or turnkey tools must be validated against the tried-and-tested
ECG-based method of calculating HRV, as we did in this study. The algorithm for error
detection and tolerance that we introduce in this paper is relatively simple to implement and can
be easily adapted by researchers to evaluate the quality of HRV data obtained from a wide
variety of ambulatory devices. We hope it will facilitate future research on ambulatory devices
for measuring HRV.
A method for evaluating PPG-based HRV devices
Acharya, U. R., Joseph, K. P., Kannathal, N., Lim, C. M., & Suri, J. S. (2006). Heart rate
variability: a review. Medical and Biological Engineering and Computing, 44, 1031-
Akselrod, S., Gordon, D., Ubel, F. A., Shannon, D. C., Barger, A. C., & Cohen, R. J. (1981).
Power spectrum analysis of heart rate fluctuation: a quantitative probe of beat-to-beat
cardiovascular control. Science, 213, 220-222.
Akselrod, S., Eliash, S., Oz, O., & Cohen, S. (1987). Hemodynamic regulation in SHR:
investigation by spectral analysis. American Journal of Physiology-Heart and
Circulatory Physiology, 253(1), H176-H183.
Allen, J. (2007). Photoplethysmography and its application in clinical physiological
measurement. Physiological Measurement, 28(3), R1-R39.
Allen, J. J., Chambers, A. S., & Towers, D. N. (2007). The many metrics of cardiac chronotropy:
A pragmatic primer and a brief comparison of metrics. Biological Psychology, 74(2),
Alqaraawi, A., Alwosheel, A., & Alasaad, A. (2016, May). Towards efficient heart rate
variability estimation in artifact-induced Photoplethysmography signals. In Electrical and
Computer Engineering (CCECE), 2016 IEEE Canadian Conference on (pp. 1-6). IEEE.
Appelhans, B. M., & Luecken, L. J. (2006). Heart rate variability as an index of regulated
emotional responding. Review of General Psychology, 10(3), 229-240.
Arberet, S., Lemay, M., Renevey, P., Sola, J., Grossenbacher, O., Andries, D., ... & Bertschi, M.
(2013, September). Photoplethysmography-based ambulatory heartbeat monitoring
embedded into a dedicated bracelet. In Computing in Cardiology Conference (CinC),
2013 (pp. 935-938). IEEE.
A method for evaluating PPG-based HRV devices
Berntson, G. G., Cacioppo, J. T., & Quigley, K. S. (1991). Autonomic determinism: the modes
of autonomic control, the doctrine of autonomic space, and the laws of autonomic
constraint. Psychological review, 98, 459-487.
Berntson, G. G., Bigger, J. T., Eckberg, D. L., Grossman, P., Kaufmann, P. G., Malik, M., ... &
van der Mollen, M. W. (1997). Heart rate variability: origins, methods, and interpretive
caveats. Psychophysiology, 34, 623-648.
Bertsch, K., Hagemann, D., Naumann, E., Schächinger, H., & Schulz, A. (2012). Stability of
heart rate variability indices reflecting parasympathetic activity. Psychophysiology, 49,
Bolanos, M., Nazeran, H., & Haltiwanger, E. (2006, August). Comparison of heart rate
variability signal features derived from electrocardiography and photoplethysmography
in healthy individuals. In Engineering in Medicine and Biology Society, 2006. EMBS'06.
28th Annual International Conference of the IEEE (pp. 4289-4294). IEEE.
Calkins, S. D., & Johnson, M. C. (1998). Toddler regulation of distress to frustrating events:
Temperamental and maternal correlates. Infant Behavior and Development, 21, 379-395.
Carney, R. M., Blumenthal, J. A., Stein, P. K., Watkins, L., Catellier, D., Berkman, L. F., ... &
Freedland, K. E. (2001). Depression, heart rate variability, and acute myocardial
infarction. Circulation, 104(17), 2024-2028.
Challoner, A. V. J. (1979). Photoelectric plethysmography for estimating cutaneous blood
flow. Non-invasive Physiological Measurements, 1, 125-151.
Chambers, A. S., & Allen, J. J. (2002). Vagal tone as an indicator of treatment response in major
depression. Psychophysiology, 39, 861-864.
A method for evaluating PPG-based HRV devices
Corino, V. D. A., Matteucci, M., & Mainardi, L. T. (2007). Analysis of heart rate variability to
predict patient age in a healthy population. Methods of Information in Medicine, 46(2),
Davy, K. P., Desouza, C. A., Jones, P. P., & Seals, D. R. (1998). Elevated heart rate variability in
physically active young and older adult women. Clinical Science, 94(6), 579-584.
Drinnan, M., Allen, J., Langley, P., & Murray, A. (2000). Detection of sleep apnoea from
frequency analysis of heart rate variability. In Computers in Cardiology 2000 (pp. 259-
262). IEEE.
Einthoven, W., Fahr, G., & De Waart, A. (1913). Über die Richtung und die manifeste Grösse
der Potentialschwankungen im menschlichen Herzen und über den Einfluss der Herzlage
auf die Form des Elektrokardiogramms. Pflügers Archiv European Journal of
Physiology, 150(6), 275-315.
Elsenbruch, S., Harnish, M. J., & Orr, W. C. (1999). Heart rate variability during waking and
sleep in healthy males and females. Sleep, 22(8), 1067-1071.
Ewing, D. J., Borsey, D. Q., Bellavere, F., & Clarke, B. F. (1981). Cardiac autonomic
neuropathy in diabetes: comparison of measures of RR interval variation.
Diabetologia, 21(1), 18-24.
Fabes, R. A., & Eisenberg, N. (1997). Regulatory control and adults' stress-related responses to
daily life events. Journal of Personality and Social Psychology, 73, 1107-1117.
Foo, J. Y. A., & Wilson, S. J. (2006). A computational system to optimise noise rejection in
photoplethysmography signals during motion or poor perfusion states. Medical and
Biological Engineering and Computing, 44, 140-145.
A method for evaluating PPG-based HRV devices
Friedman, B. H. (2007). An autonomic flexibility–neurovisceral integration model of anxiety and
cardiac vagal tone. Biological Psychology, 74, 185-199.
Garbarino, M., Lai, M., Bender, D., Picard, R. W., & Tognetti, S. (2014, November). Empatica
E3—A wearable wireless multi-sensor device for real-time computerized biofeedback
and data acquisition. In Wireless Mobile Communication and Healthcare (Mobihealth),
2014 EAI 4th International Conference on (pp. 39-42). IEEE.
Ge, D., Srinivasan, N., & Krishnan, S. M. (2002). Cardiac arrhythmia classification using
autoregressive modeling. Biomedical Engineering Online, 1(1), 5-17.
Geisler, F. C., Kubiak, T., Siewert, K., & Weber, H. (2013). Cardiac vagal tone is associated
with social engagement and self-regulation. Biological Psychology, 93, 279-286.
Gil, E., Orini, M., Bailón, R., Vergara, J. M., Mainardi, L., & Laguna, P. (2010, August). Time-
varying spectral analysis for comparison of HRV and PPG variability during tilt table
test. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International
Conference of the IEEE (pp. 3579-3582). IEEE.
Goldstein, D. S., Bentho, O., Park, M. Y., & Sharabi, Y. (2011). Lowfrequency power of heart
rate variability is not a measure of cardiac sympathetic tone but may be a measure of
modulation of cardiac autonomic outflows by baroreflexes. Experimental Physiology, 96,
Grossmann, I., Sahdra, B. K., & Ciarrochi, J. (2016). A Heart and a mind: Self-distancing
facilitates the association between heart rate variability and wise reasoning. Frontiers in
Behavioral Neuroscience, 10:68. DOI: 10.3389/fnbeh.2016.00068.
A method for evaluating PPG-based HRV devices
Grossman, P., & Taylor, E. W. (2007). Toward understanding respiratory sinus arrhythmia:
relations to cardiac vagal tone, evolution and biobehavioral functions. Biological
Psychology, 74, 263-285.
Guzzetti, S., Piccaluga, E., Casati, R., Cerutti, S., Lombardi, F., Pagani, M., & Malliani, A.
(1988). Sympathetic predominance an essential hypertension: a study employing spectral
analysis of heart rate variability. Journal of Hypertension, 6, 711-717.
Han, H., & Kim, J. (2012). Artifacts in wearable photoplethysmographs during daily life motions
and their reduction with least mean square based active noise cancellation
method. Computers in Biology and Medicine, 42, 387-393.
Han, H., Kim, M. J., & Kim, J. (2007, August). Development of real-time motion artifact
reduction algorithm for a wearable photoplethysmography. In Engineering in Medicine
and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the
IEEE (pp. 1538-1541). IEEE.
Jabaudon, D., Sztajzel, J., Sievert, K., Landis, T., & Sztajzel, R. (2004). Usefulness of
ambulatory 7-day ECG monitoring for the detection of atrial fibrillation and flutter after
acute stroke and transient ischemic attack. Stroke, 35, 1647-1651.
Jago, J. R., & Murray, A. (1988). Repeatability of peripheral pulse measurements on ears, fingers
and toes using photoelectric plethysmography. Clinical Physics and Physiological
Measurement, 9, 319-330.
Jeyhani, V., Mahdiani, S., Peltokangas, M., & Vehkaoja, A. (2015, August). Comparison of
HRV parameters derived from photoplethysmography and electrocardiography signals.
In Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual
International Conference of the IEEE (pp. 5952-5955). IEEE.
A method for evaluating PPG-based HRV devices
Julu, P. O. O., Cooper, V. L., Hansen, S., & Hainsworth, R. (2003). Cardiovascular regulation in
the period preceding vasovagal syncope in conscious humans. The Journal of
Physiology, 549(1), 299-311.
Kamal, A. A. R., Harness, J. B., Irving, G., & Mearns, A. J. (1989). Skin
photoplethysmography—a review. Computer Methods and Programs in
Biomedicine, 28(4), 257-269.
Kamath, M. V., & Fallen, E. L. (1995). Correction of the heart rate variability signal for ectopics
and missing beats, In: M. Malik & A. J. Camm (Eds.), Heart Rate Variability, (pp. 75-
85). Armonk, NY: Futura Pub. Co. Inc.
Katona, P. G., & Jih, F. (1975). Respiratory sinus arrhythmia: noninvasive measure of
parasympathetic cardiac control. Journal of Applied Physiology, 39, 801-805.
Kim, S. H., Ryoo, D. W., & Bae, C. (2007, August). Adaptive noise cancellation using
accelerometers for the PPG signal from forehead. In Engineering in Medicine and
Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the
IEEE (pp. 2564-2567). IEEE.
Kleiger, R. E., Bigger, J. T., Bosner, M. S., Chung, M. K., Cook, J. R., Rolnitzky, L. M., ... &
Fleiss, J. L. (1991). Stability over time of variables measuring heart rate variability in
normal subjects. The American journal of cardiology, 68, 626-630.
Kok, B. E., & Fredrickson, B. L. (2010). Upward spirals of the heart: Autonomic flexibility, as
indexed by vagal tone, reciprocally and prospectively predicts positive emotions and
social connectedness. Biological psychology, 85, 432-436.
A method for evaluating PPG-based HRV devices
Koudstaal, P. J., van Gijn, J., Klootwijk, A. P. J., Van Der Meche, F. G. A., & Kappelle, L. J.
(1986). Holter monitoring in patients with transient and focal ischemic attacks of the
brain. Stroke, 17(2), 192-195.
Lu, G., & Yang, F. (2009). Limitations of oximetry to measure heart rate variability
measures. Cardiovascular Engineering, 9(3), 119-125.
Lu, G., Yang, F., Taylor, J. A., & Stein, J. F. (2009). A comparison of photoplethysmography
and ECG recording to analyse heart rate variability in healthy subjects. Journal of
Medical Engineering & Technology, 33, 634-641.
Malpas, S. C., Whiteside, E. A., & Maling, T. J. (1991). Heart rate variability and cardiac
autonomic function in men with chronic alcohol dependence. Heart, 65(2), 84-88.
Masi, C. M., Hawkley, L. C., Rickett, E. M., & Cacioppo, J. T. (2007). Respiratory sinus
arrhythmia and diseases of aging: Obesity, diabetes mellitus, and
hypertension. Biological Psychology, 74, 212-223.
McCarthy, C., Pradhan, N., Redpath, C., & Adler, A. (2016). Validation of the Empatica E4
wristband. In Student Conference (ISC), 2016 IEEE EMBS International, pp. 1-4.
Miu, A. C., Heilman, R. M., & Miclea, M. (2009). Reduced heart rate variability and vagal tone
in anxiety: trait versus state, and the effects of autogenic training. Autonomic
Neuroscience, 145(1), 99-103.
Muaremi, A., Arnrich, B., & Tröster, G. (2013). Towards measuring stress with smartphones and
wearable devices during workday and sleep. BioNanoScience, 3(2), 172-183.
Murray, A., Ewing, D. J., Campbell, I. W., Neilson, J. M., & Clarke, B. F. (1975). RR interval
variations in young male diabetics. Heart, 37, 882-885.
A method for evaluating PPG-based HRV devices
Ollander, S., Godin, C., Campagne, A., & Charbonnier, S. (2016, October). A comparison of
wearable and stationary sensors for stress detection. In Systems, Man, and Cybernetics
(SMC), 2016 IEEE International Conference on (pp. 004362-004366). IEEE.
Pagani, M., Montano, N., Porta, A., Malliani, A., Abboud, F. M., Birkett, C., & Somers, V. K.
(1997). Relationship between spectral components of cardiovascular variabilities and
direct measures of muscle sympathetic nerve activity in humans. Circulation, 95, 1441-
Pagani, M., Lombardi, F., Guzzetti, S., Rimoldi, O., Furlan, R., Pizzinelli, P., ... & Malliani, A.
(1986). Power spectral analysis of heart rate and arterial pressure variabilities as a marker
of sympatho-vagal interaction in man and conscious dog. Circulation Research, 59(2),
Paradiso, R., Faetti, T., & Werner, S. (2011, August). Wearable monitoring systems for
psychological and physiological state assessment in a naturalistic environment.
In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International
Conference of the IEEE (pp. 2250-2253). IEEE.
Pope 3rd, C. A., Eatough, D. J., Gold, D. R., Pang, Y., Nielsen, K. R., Nath, P., ... & Kanner, R.
E. (2001). Acute exposure to environmental tobacco smoke and heart rate
variability. Environmental Health Perspectives, 109, 711-716.
Porges, S. W. (1992). Vagal tone: a physiologic marker of stress vulnerability. Pediatrics, 90(3),
Porges, S. W. (2007). The polyvagal perspective. Biological Psychology, 74(2), 116-143.
A method for evaluating PPG-based HRV devices
Preejith, S. P., Alex, A., Joseph, J., & Sivaprakasam, M. (2016, May). Design, development and
clinical validation of a wrist-based optical heart rate monitor. In Medical Measurements
and Applications (MeMeA), 2016 IEEE International Symposium on(pp. 1-6). IEEE.
R Core Team. (2017). R: A language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. URL
Rauh, R., Limley, R., Bauer, R. D., Radespiel-Troger, M., & Mueck-Weymann, M. (2004,
August). Comparison of heart rate variability and pulse rate variability detected with
photoplethysmography. In Saratov Fall Meeting 2003: Optical Technologies in
Biophysics and Medicine V (pp. 115-126). International Society for Optics and Photonics.
Renevey, P., Vetter, R., Krauss, J., Celka, P., & Depeursinge, Y. (2001). Wrist-located pulse
detection using IR signals, activity and nonlinear artifact cancellation. In Engineering in
Medicine and Biology Society, 2001. Proceedings of the 23rd Annual International
Conference of the IEEE (Vol. 3, pp. 3030-3033). IEEE.
Romero, I., Berset, T., Buxi, D., Brown, L., Penders, J., Kim, S., ... & Yazicioglu, F. (2011,
October). Motion artifact reduction in ambulatory ECG monitoring: an integrated system
approach. In Proceedings of the 2nd Conference on Wireless Health (p. 11). ACM.
Rottenberg, J. (2007). Cardiac vagal control in depression: a critical analysis. Biological
Psychology, 74(2), 200-211.
Sahdra, B. K., Ciarrochi, J., & Parker, P. D. (2015). High-frequency heart rate variability linked
to affiliation with a new group. PLoS ONE, 10(6), e0129583.
A method for evaluating PPG-based HRV devices
Sandercock, G. R. H., Shelton, C., Bromley, P., & Brodie, D. A. (2004). Agreement between
three commercially available instruments for measuring short-term heart rate
variability. Physiological Measurement, 25, 1115-1124.
Saul, J. P., Albrecht, P., Berger, R. D., & Cohen, R. J. (1987). Analysis of long term heart rate
variability: methods, 1/f scaling and implications. Computers in cardiology, 14, 419-422.
Saul, J. P. (1990). Beat-to-beat variations of heart rate reflect modulation of cardiac autonomic
outflow. Physiology, 5, 32-37.
Schäfer, A., & Vagedes, J. (2013). How accurate is pulse rate variability as an estimate of heart
rate variability?: A review on studies comparing photoplethysmographic technology with
an electrocardiogram. International Journal of Cardiology, 166, 15-29.
Sherwood, A., Allen, M. T., Fahrenberg, J., Kelsey, R. M., Lovallo, W. R., & van Dooren, L.J.P.
(1990). Methodological guidelines for impedance cardiography. Psychophysiology, 27, 1-
Stein, P. K., & Kleiger, R. E. (1999). Insights from the study of heart rate variability. Annual
Review of Medicine, 50, 249-261.
Tamura, T., Maeda, Y., Sekine, M., & Yoshida, M. (2014). Wearable photoplethysmographic
sensors—past and present. Electronics, 3, 282-302.
Tarvainen, M. P., Niskanen, J. P., Lipponen, J. A., Ranta-Aho, P. O., & Karjalainen, P. A.
(2014). Kubios HRV–heart rate variability analysis software. Computer Methods and
Programs in Biomedicine, 113(1), 210-220.
Task Force of the European Society of Cardiology and the North American Society for Pacing
and Electrophysiology (1996). Heart rate variability: standards of measurement,
physiological interpretation, and clinical use. Circulation, 93, 1043-1065.
A method for evaluating PPG-based HRV devices
Thayer, J., Hansen, A., Saus-Rose, E., & Johnson, B. H. (2009). Heart rate variability, prefrontal
neural function, and cognitive performance: The neurovisceral integration perspective on
self-regulation, adaptation, and health. Annals of Behavioral Medicine Publication of the
Society of Behavioral Medicine, 37(2), 141–153.
Thayer, J. F., & Lane, R. D. (2007). The role of vagal function in the risk for cardiovascular
disease and mortality. Biological Psychology, 74, 224-242.
Yousefi, R., Nourani, M., Ostadabbas, S., & Panahi, I. (2014). A motion-tolerant adaptive
algorithm for wearable photoplethysmographic biosensors. IEEE Journal of Biomedical
and Health Informatics, 18, 670-681.
Running Head: A method for evaluating PPG-based HRV devices
Table 1
Pearson’s correlations between data obtained from E4 and Biopac.
Seated baseline
Dominant grip
Supine baseline
Non-dominant grip
Standing baseline
Note. meanhr = mean heart rate; HFP = high frequency power; LFP = low frequency
power; TP = total power; N = number of participants with data.
* p < .05.
Running Head: A method for evaluating PPG-based HRV devices
Table 2
Pearson’s correlations between data obtained from E4 and Biopac. Missingness <
Seated baseline
Dominant grip
Supine baseline
Non-dominant grip
Standing baseline
Note. meanhr = mean heart rate; HFP = high frequency power; LFP = low frequency
power; TP = total power; N = number of participants with data.
* p < .05.
Running Head: A method for evaluating PPG-based HRV devices
Table 3
Pearson’s correlations between data obtained from E4 and Biopac. Missingness <
Seated baseline
Dominant grip
Supine baseline
Non-dominant grip
Standing baseline
Note. meanhr = mean heart rate; HFP = high frequency power; LFP = low frequency
power; TP = total power; N = number of participants with data.
* p < .05.
A method for evaluating PPG-based HRV devices
Figure 1. Schematic diagram of ECG waveform with IBI or R-R intervals.
... The PRV and HRV metrics in the time (SDNN and RMSSD) and frequency domain (LF and HF) were comparable during seated and paced breathing conditions, but these E4 wristband-based PRV metrics were less accurate approximations of the ECG-based HRV metrics during conditions that involved movement (i.e., slow walking) or cognitive performance (i.e., Stroop test). The authors demonstrated that this decrease in PRV accuracy in those conditions was likely due to hand/wrist movement (see Ryan et al., 2019, van Lier et al., 2020. Furthermore, in contrast to the studies mentioned above, Schuurmans et al. (2020) observed that E4 wristband PRV-based RMSSD was not comparable to the ECG-based RMSSD during a seated-rest condition (see Ollander et al., 2016 for similar results). ...
Lab research might benefit from the advantages of wearable devices, such as their ease of use, to estimate pulse rate (PR) and pulse rate variability (PRV) as an equivalent for heart rate (HR) and heart rate variability. However, before implementing them in a lab context, the validity of the PR and PRV, also on ultra-short time scales (e.g., 30s), needs to be confirmed. We recorded heart activity simultaneously with an E4 wristband and an ECG device in a seated resting condition for 5 min. Our results showed that HR, RMSSD, SDNN and LF, but not HF, were validly estimated by the E4 wristband. Furthermore, the E4 wristband could validly estimate PR with recording lengths as short as 10s. RMSSD and SDNN were validly estimated using 30s or 120 s or an average of multiple short intervals (10s), while HF likely requires longer recording intervals. Based on this study, we formulated several recommendations for using the E4 wristband in a lab context.
... In [53], the authors found the Empatica E4 to be suitable for psychotherapy research focused on inter-beat interval (IBI) and specific HRV measures, but failed to produce reliable EDA data and produced missing IBI data, especially when a subject is being more dynamic. This is confirmed by Ryan et al. [54] and Sevil et al. [21] that found the Empatica E4 can be severely compromised by motion artifact, resulting in a high percentage of missing data across all conditions except seated and supine baselines, and calls into question the E4's efficacy as an HRV measurement tool in most in-vivo conditions. In contrast, Greco et al. [20] found EDA to be a good marker of stress when features are engineered based on its phasic and tonic components. ...
Full-text available
Introduction. The stress response has both subjective, psychological and objectively measurable, biological components. Both of them can be expressed differently from person to person, complicating the development of a generic stress measurement model. This is further compounded by the lack of large, labeled datasets that can be utilized to build machine learning models for accurately detecting periods and levels of stress. The aim of this review is to provide an overview of the current state of stress detection and monitoring using wearable devices, and where applicable, machine learning techniques utilized. Methods. This study reviewed published works contributing and/or using datasets designed for detecting stress and their associated machine learning methods, with a systematic review and meta-analysis of those that utilized wearable sensor data as stress biomarkers. The electronic databases of Google Scholar, Crossref, DOAJ and PubMed were searched for relevant articles and a total of 24 articles were identified and included in the final analysis. The reviewed works were synthesized into three categories of publicly available stress datasets, machine learning, and future research directions. Results. A wide variety of study-specific test and measurement protocols were noted in the literature. A number of public datasets were identified that are labeled for stress detection. In addition, we discuss that previous works show shortcomings in areas such as their labeling protocols, lack of statistical power, validity of stress biomarkers, and generalization ability. Conclusion. Generalization of existing machine learning models still require further study, and research in this area will continue to provide improvements as newer and more substantial datasets become available for study.
... With the growth of "wearables" (e.g., smart watches) that collect physiological data, researchers are increasingly able to move outside of the confines of the laboratory when assessing physiological indices. While wearables are increasingly effective at assessing heart rate and heart rate variability, in particular, even these are not well-validated, often proprietary, and highly susceptible to motion artifact (e.g., Ryan et al., 2019). Cardiovascular indices of challenge and threat require continuous (or near-continuous) measures of blood pressure and impedance cardiography (in addition to heart rate), which have not yet been integrated into commercially available or similar wearables. ...
The biopsychosocial model of challenge and threat (BPS-CT) is a powerful framework linking psychological processes to reliable patterns of cardiovascular responses during motivated performance situations. Specifically, the BPS-CT poses challenge and threat as two motivational states that can emerge in response to a demanding, self-relevant task, where greater challenge arises when perceived resources are higher than demands, and greater threat arises when perceived resources are lower than demands. By identifying unique patterns of physiological responses associated with challenge and threat, respectively, the BPS-CT affords insight into subjective appraisals of resources and demands, and their determinants, during motivated performance situations. Despite its broad utility, lack of familiarity with physiological concepts and difficulty with identifying clear guidelines in the literature are barriers to wider uptake of this approach by behavioral researchers. Our goal is to remove these barriers by providing a comprehensive, step-by-step tutorial on conducting an experiment using the challenge and threat model, offering concrete recommendations for those who are new to the method, and serving as a centralized collection of resources for those looking to deepen their understanding. The tutorial spans five parts, covering theoretical introduction, lab setup, data collection, data analysis, and appendices offering additional details about data analysis and equipment. With this, we aim to make challenge and threat research, and the insights it offers, more accessible to researchers throughout the behavioral sciences.
Introduction: Wearable sensors have shown promise as a non-intrusive method for collecting biomarkers that may correlate with levels of elevated stress. Stressors cause a variety of biological responses, and these physiological reactions can be measured using biomarkers including Heart Rate Variability (HRV), Electrodermal Activity (EDA) and Heart Rate (HR) that represent the stress response from the Hypothalamic-Pituitary-Adrenal (HPA) axis, the Autonomic Nervous System (ANS), and the immune system. While Cortisol response magnitude remains the gold standard indicator for stress assessment [1], recent advances in wearable technologies have resulted in the availability of a number of consumer devices capable of recording HRV, EDA and HR sensor biomarkers, amongst other signals. At the same time, researchers have been applying machine learning techniques to the recorded biomarkers in order to build models that may be able to predict elevated levels of stress. Objective: The aim of this review is to provide an overview of machine learning techniques utilized in prior research with a specific focus on model generalization when using these public datasets as training data. We also shed light on the challenges and opportunities that machine learning-enabled stress monitoring and detection face. Methods: This study reviewed published works contributing and/or using public datasets designed for detecting stress and their associated machine learning methods. The electronic databases of Google Scholar, Crossref, DOAJ and PubMed were searched for relevant articles and a total of 33 articles were identified and included in the final analysis. The reviewed works were synthesized into three categories of publicly available stress datasets, machine learning techniques applied using those, and future research directions. For the machine learning studies reviewed, we provide an analysis of their approach to results validation and model generalization. The quality assessment of the included studies was conducted in accordance with the IJMEDI checklist [2]. Results: A number of public datasets were identified that are labeled for stress detection. These datasets were most commonly produced from sensor biomarker data recorded using the Empatica E4 device, a well-studied, medical-grade wrist-worn wearable that provides sensor biomarkers most notable to correlate with elevated levels of stress. Most of the reviewed datasets contain less than twenty-four hours of data, and the varied experimental conditions and labeling methodologies potentially limit their ability to generalize for unseen data. In addition, we discuss that previous works show shortcomings in areas such as their labeling protocols, lack of statistical power, validity of stress biomarkers, and model generalization ability. Conclusion: Health tracking and monitoring using wearable devices is growing in popularity, while the generalization of existing machine learning models still requires further study, and research in this area will continue to provide improvements as newer and more substantial datasets become available.
Background The incidence of anxiety in adults with spinal cord injury/disorder (SCI/D) exceeds that of the general population. Heart rate variability (HRV) biofeedback training is a potential treatment associated with a reduction in stress and anxiety, however HRV training has not been explored in the SCI/D population. Objectives To describe a modified protocol piloting HRV training to reduce anxiety associated with SCI/D and detail the COVID-19–related modifications. Methods To test the feasibility of the biofeedback treatment, 30 adults with SCI/D will complete this pilot randomized controlled trial. Enrollment started in January 2020, halted in March 2020 due to the COVID-19 pandemic, and resumed in March 2021 with a modified protocol. Protocol modifications are documented using the Framework for Reporting Adaptations and Modifications (FRAME). Participants are allocated to the treatment or control arm and undergo eight sessions of physiological monitoring at home using a commercially available HRV sensor and mobile application, which also delivers biofeedback training for those in the treatment arm. Surveys are administered following each session to capture self-reported stress, anxiety, and mood. The study is approved by the HCA-HealthONE institutional review board and is registered with (NCT# 03975075). Conclusion COVID-19 has changed the research landscape, forcing scientists to rethink their study designs to address patient and staff safety in this new context. Our modified protocol accomplished this by moving the treatment setting and delivery out of the clinic and into the home. In doing so, we address patient and staff safety, increase external validity, and reduce participant burden.
Eating disorders (EDs) and social anxiety disorder (SAD) are characterized by high levels of fear and effectively treated with exposure therapy. Physiological markers of fear can elucidate how exposure influences psychophysiological processes underlying psychopathology. In the current study ( N = 109), we measured heart rate variability, heart rate, and electrodermal activity (EDA) with wearable sensors during ED fear, SAD fear, and neutral scripts. Bayesian ridge-regression models tested physiological features during these scripts as predictors of momentary and trait ED and SAD symptoms and determined which physiological features most strongly predicted symptoms. Across models, prediction error was low, which indicates high predictive accuracy. The most salient predictors were EDA features during the neutral script. These findings suggest physiological markers can accurately predict ED and SAD symptoms. This research highlights the utility of wearable sensor technology as a complement to exposure therapy and informs research, assessment, and treatment for anxiety-based disorders.
Post-operative complications and hospital readmission are of great concern to surgical patients and health care providers. Wearable devices such as Fitbit wristbands enable long-term and non-intrusive monitoring of patients outside clinical environments. To build accurate predictive models based on wearable data, however, requires effective feature engineering to extract high-level features from time series data collected by the wearable sensors. This paper presents a pipeline for developing clinical predictive models based on wearable sensors. The core of the pipeline is a multi-level feature engineering framework for extracting high-level features from fine-grained time series data. The framework integrates a set of techniques tailored for noisy and incomplete wearable data collected in real-world clinical studies: (1) singular spectrum analysis for extracting high-level features from daily features over the course of the study; (2) a set of daily features that are resilient to missing data in wearable time series data; (3) a K-Nearest Neighbors (KNN) method for imputing short missing heart rate segments; (4) the integration of patients' clinical characteristics and wearable features. We evaluated the feature engineering approach and machine learning models in a clinical study involving 61 patients undergoing pancreatic surgery. Linear support vector machine (SVM) with integrated feature engineering achieved an AUROC of 0.8802 for predicting post-operative readmission or severe complications, which significantly outperformed the existing rule-based model used in clinical practice and other state-of-the-art feature engineering approaches.
Human physiology is a window to our physical, mental, and emotional states; our well-being. Today, a new wave of objective data, derived from consumer-grade body sensors---like those equipped by smartwatches---paves the way towards a new approach in how well-being is being measured, continuously and unobtrusively. Here, we developed a framework for collecting and analyzing physiological data using smartwatches in-the-wild, and demonstrated its robustness in data obtained away from controlled laboratory settings. We found that changes in people's heart rate and heart rate variability are predictive of their well-being, but, to a greater extent, at daily level; a finding consistent with theoretical expectations.
Full-text available
Cardiac vagal tone (indexed via resting heart rate variability - HRV) has been previously associated with superior executive functioning. Is HRV related to wiser reasoning and less biased judgments? Here, we hypothesize that this will be the case when adopting a self-distanced (as opposed to a self-immersed) perspective, with self-distancing enabling individuals with higher HRV to overcome bias-promoting egocentric impulses and to reason wisely. However, higher HRV may not be associated with greater wisdom when adopting a self-immersed perspective. Participants were randomly assigned to reflect on societal issues from a self-distanced- or self-immersed perspective, with responses coded for reasoning quality. In a separate task, participants read about and evaluated a person performing morally ambiguous actions, with responses coded for dispositional vs. situational attributions. We simultaneously assessed resting cardiac recordings, obtaining 6 HRV indicators. As hypothesized, in the self-distanced condition, each HRV indicator was positively related to prevalence of wisdom-related reasoning (e.g., prevalence of recognition of limits of one’s knowledge, recognition that the world is in flux/change, consideration of others’ opinions and search for an integration of these opinions) and to balanced vs. biased attributions (recognition of situational and dispositional factors vs. focus on dispositional factors alone). In contrast, there was no relationship between these variables in the self-immersed condition. We discuss implications for research on psychophysiology, cognition, and wisdom.
Full-text available
In 57 normal subjects (age 20-60 years), we analyzed the spontaneous beat-to-beat oscillation in R-R interval during control recumbent position, 90° upright tilt, controlled respiration (n = 16) and acute (n = 10) and chronic (n = 12) β-adrenergic receptor blockade. Automatic computer analysis provided the autoregressive power spectral density, as well as the number and relative power of the individual components. The power spectral density of R-R interval variability contained two major components in power, a high frequency at ~ 0.25 Hz and a low frequency at ~ 0.1 Hz, with a normalized low frequency: high frequency ratio of 3.6 ± 0.7. With tilt, the low-frequency component became largely predominant (90 ± 1%) with a low frequency: high frequency ratio of 21 ± 4. Acute β-adrenergic receptor blockade (0.2 mg/kg IV propranolol) increased variance at rest and markedly blunted the increase in low frequency and low frequency: high frequency ratio induced by tilt. Chronic β-adrenergic receptor blockade (0.6 mg/kg p.o. propranolol, t.i.d.), in addition, reduced low frequency and increased high frequency at rest, while limiting the low frequency: high frequency ratio increase produced by tilt. Controlled respiration produced at rest a marked increase in the high-frequency component, with a reduction of the low-frequency component and of the low frequency: high frequency ratio (0.7 ± 0.1); during tilt, the increase in the low frequency: high frequency ratio (8.3 ± 1.6) was significantly smaller. In seven additional subjects in whom direct high-fidelity arterial pressure was recorded, simultaneous R-R interval and arterial pressure variabilities were examined at rest and during tilt. Also, the power spectral density of arterial pressure variability contained two major components, with a relative low frequency: high frequency ratio at rest of 2.8 ± 0.7, which became 17 ± 5 with tilt. These power spectral density components were numerically similar to those observed in R-R variability. Thus, invasive and noninvasive studies provided similar results. More direct information on the role of cardiac sympathetic nerves on R-R and arterial pressure variabilities was derived from a group of experiments in conscious dogs before and after bilateral stellectomy. Under control conditions, high frequency was predominant and low frequency was very small or absent, owing to a predominant vagal tone. During a 9% decrease in arterial pressure obtained with IV nitroglycerin, there was a marked increase in low frequency, as a result of reflex sympathetic activation. Bilateral stellectomy prevented this low-frequency increase in R-R but not in arterial pressure autospectra, indicating that sympathetic nerves to the heart are instrumental in the genesis of low-frequency oscillations in R-R interval.
Full-text available
This study tests the hypothesis that high levels of high-frequency heart rate variability (HF-HRV) predisposes individuals to affiliate with new groups. Resting cardiac physiological recordings were taken before and after experimental sessions to measure trait high-frequency heart rate variability as an index of dispositional autonomic influence on heart rate. Following an experimental manipulation of priming of caring-related words, participants engaged in a minimal group paradigm, in which they imagined being a member of one of two arbitrary groups, allocated money to members of the two groups, and rated their affiliation with the groups. High levels of HF-HRV were associated with ingroup favouritism while allocating money, an effect largely attributable to a positive relationship between HF-HRV and allocation of money to the ingroup, and less due to a negative relationship between HF-HRV and money allocation to the outgroup. HF-HRV was also associated with increased self-reported affiliation feelings for the ingroup but was unrelated to feelings towards the outgroup. These effects remained substantial even after controlling for age, gender, BMI, mood, caffeine consumption, time of day of data collection, smoking and alcohol behaviour, and respiration rate. Further, the effects were observed regardless of whether participants were primed with caring-related words or not. This study is the first to bridge a long history of research on ingroup favouritism to the relatively recent body of research on cardiac vagal tone by uncovering a positive association between HF-HRV and affiliation with a novel group.
Full-text available
The performance of portable and wearable biosensors is highly influenced by motion artifact. In this paper, a novel real-time adaptive algorithm is proposed for accurate motion-tolerant extraction of heart rate (HR) and pulse oximeter oxygen saturation ($hbox{SpO}_2$) from wearable photoplethysmographic (PPG) biosensors. The proposed algorithm removes motion artifact due to various sources including tissue effect and venous blood changes during body movements and provides noise-free PPG waveforms for further feature extraction. A two-stage normalized least mean square adaptive noise canceler is designed and validated using a novel synthetic reference signal at each stage. Evaluation of the proposed algorithm is done by Bland–Altman agreement and correlation analyses against reference HR from commercial ECG and $hbox{SpO}_2$ sensors during standing, walking, and running at different conditions for a single- and multisubject scenarios. Experimental results indicate high agreement and high correlation (more than 0.98 for HR and 0.7 for $hbox{SpO}_2$ extraction) between measurements by reference sensors and our algorithm.
What is most intriguing about heart rate (HR) variability is that there is so much of it. HR is constantly responding both rapidly and slowly to various physiological perturbations. We now understand that the frequency and amplitude of these HR fluctuations are indicative of the autonomic control systems underlying the response.
The Empatica E3 is a wearable wireless multisensor device for real-time computerized biofeedback and data acquisition. The E3 has four embedded sensors: photoplethysmograph (PPG), electrodermal activity (EDA), 3-axis accelerometer, and temperature. It is small, light and comfortable and it is suitable for almost all real-life applications. The E3 operates both in streaming mode for real-time data processing using a Bluetooth low energy interface and in recording mode using its internal flash memory. With E3, it is possible to conduct research outside of the lab by acquiring continuous data for ambulatory situations in a comfortable and non-distracting way.