PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Full Citation: Oniani, S., Woolley, S.I., Pires, I.M., Garcia, N.M., Collins, T., Ledger, S. and Pandyan, A. "Reliability Assessment of New and Updated Consumer-Grade Activity and Heart Rate Monitors." IARIA Conference on Sensor Device Technologies and Applications, Venice, SENSORDEVICES 2018
Content may be subject to copyright.
Reliability Assessment of New and Updated Consumer-Grade Activity and
Heart Rate Monitors
Salome Oniani
Faculty of Informatics and Control Systems
Georgian Technical University
Tbilisi, Georgia
Sandra I. Woolley
School of Computing and Mathematics
Keele University
Staffordshire, UK
Ivan Miguel Pires and Nuno M. Garcia
Instituto de Telecomunicações,
Universidade da Beira Interior
Covilhã, Portugal
Tim Collins
School of Engineering
Manchester Metropolitan University
Manchester, UK
Ivan Miguel Pires
Lisbon, Portugal
Sean Ledger and Anand Pandyan
School of Health and Rehabilitation
Keele University
Staffordshire, UK
AbstractThe aim of this paper is to address the need for
reliability assessments of new and updated consumer-grade
activity and heart rate monitoring devices. This issue is central
to the use of these sensor devices and it is particularly
important in their medical and assisted living application.
Using an example lightweight empirical approach,
experimental results for heart rate acquisitions from Garmin
VivoSmart 3 (v4.10) smartwatch monitors are presented and
analyzed. The reliability issues of optically-acquired heart
rates, especially during periods of activity, are demonstrated
and discussed. In conclusion, the paper recommends the
empirical assessment of new and updated activity monitors, the
sharing of this data and the use of version information across
the literature.
Keywords- wearable sensing; activity monitoring; ambulatory
heart rate, inter-instrument reliability.
Consumer-grade wearable monitoring devices are used
across a spectrum of health, well-being and behavioral
studies as well as clinical trials. For example, the U.S.
Library of Medicine database reports
nearly 200 “Completed” to “Not yet recruiting” trials
involving Fitbit devices (search accessed 01/05/2018).
However, the manufacturers of these devices are generally
very clear regarding the intended applications and suitability
of their devices, and do not make misleading clinical claims.
For example, Garmin Vivosmart “Important Safety and
Product Information” [1] advises that the device is for
recreational purposes and not for medical purposes” and
that “inherent limitations” may “cause some heart rate
readings to be inaccurate”, similarly, Fitbit device
Important Safety and Product Informationdeclares that the
device is “not a medical device” and “accuracy of Fitbit
devices is not intended to match medical devices or scientific
measurement devices” [2]. Given that these devices are
being used in clinical applications, and with future clinical
applications anticipated [3], it is important that device
reliability is assessed.
In terms of meeting user expectations, it is noteworthy
that, at the time of writing, Fitbit’s motion to dismiss a class
action has been denied. The complaint alleged “gross
inaccuracies and recording failures” [4] because “products
frequently fail to record any heart rate at all or provide
highly inaccurate readings, with discrepancies of up to 75
bpm” [5]. Indeed, ambulatory heart rate acquisition from
optical sensors is known to be very challenging [6]. One of
the main challenges is the range of severe interference
effects caused by movement [7, 8]. Optical heart rate signals
can also be affected by skin color [9] and aging [10]. Yet,
optical heart rate acquisition remains a desirable alternative
to chest strap electrocardiogram (ECG) monitoring in
consumer-level activity monitors, where comfortability,
ease-of-use and low cost are prioritized.
After selection of an activity monitor model based on
recorded parameters, study requirements and deployment
needs [11], the calibration and validation of wearable
monitors [12, 13] can be onerous. Best practice requires a
substantial time and resource investment for researchers to
calibrate and validate sufficiently large numbers of their
devices with a large and diverse cohort of representative
users performing a range of anticipated activities. At the
same time, commercial monitors can frequently and
automatically update both software and firmware that can
alter device function, data collection and data reporting,
potentially compromising previous validation. But, of
course, manufacturers are under no obligation to report the
detail of their proprietary algorithms or the specifics of
version changes.
Devices that have the same model name, but operate with
different software and firmware versions, are distinct
devices; they should not be treated as identical devices.
Ideally, devices would be clearly differentiated in the
literature with data for manufacturer, model and version
data. While there may be limited (if any) opportunity for
researchers to reversion commercial device software to
repeat published experiments, the provision of version
information would, at least, limit the potential for incorrect
aggregations of data for devices that operate with different
software and firmware versions.
A number of studies have reported on the validity of
different monitoring device models. For example, Fokkema
et al. [14] reported on the step count validity and reliability
of ten different activity trackers. Thirty-one healthy
participants performed 30-minute treadmill walking
activities while wearing ten activity trackers. The research
concluded that, in general, consumer activity trackers
perform better at an average (4.8 km/h) and vigorous
(6.4 km/h) walking speed than at slower walking speeds.
In another study, Wahl et al. [15] evaluated the validity
of eleven wearable monitoring devices for step count,
distance and energy expenditure (EE) with participants
walking and running at different speeds. The study reported
results with the commonly used metrics: Mean Absolute
Percentage Error (MAPE) and IntraClass Correlation (ICC)
showing that most devices, except Bodymedia Sensewear,
Polar Loop, and Beurer AS80 models, had good validity
(low MAPE, high ICC) for step count. However, for
distance, all devices had low ICC (<0.1) and high MAPE (up
to 50%), indicating poor validity. The measurement of EE
was acceptable for Garmin, Fitbit and Withings devices
(comprising Garmin Vivofit; Garmin Vivosmart; Garmin
Vivoactive; Garmin Forerunner 920XT; Fitbit Charge; Fitbit
Charge HR; Withings Pulse Ox Hip; Withings Pulse Ox
Wrist) which had low-to-moderate MAPEs. The Bodymedia
Sensewear, Polar Loop, and Beurer AS80 devices had high
MAPEs (up to 56%) for all test conditions.
There is a growing number of similar studies that
compare different recordings from different models of
consumer activity monitors. However, across this literature,
and in reviews of this literature [16], it is common practice to
provide version data for the software used for statistical
analyses of device performance, but it is not common
practice to report version information for the devices
themselves. As an example of device ambiguity, a reference
to Garmin Vivosmart could refer to either Garmin
Vivosmart 3 or Garmin Vivosmart HR. The date of a given
publication might help disambiguate the model variant but
will not help identify the version. The Vivosmart HR had 14
versions from 2.10 to 4.30 over approximately 30 months
(each update comprising between 1 and 11 items, such as,
“improved calculation of Intensity Minutes” and Various
other improvements) [17]. At the time of writing, the
Garmin Vivosmart 3 (v4.10) is the latest of 9 versions.
Four Garmin Vivosmart 3 smartwatches (all versioned
SW v4.10 throughout the data acquisitions during May 2018)
were worn, as shown in Figure 1, by four healthy researcher
participants, P1-P4 outlined in Table I, during the treadmill
walking activities summarized in Table II. The walking
speeds: slow, moderate, fast and vigorous, were selected
based on reports in the literature [18, 19] and were
performed on an h/p/cosmos Pulsar treadmill. To support
reproducibility [20], we report further details about materials
in the appendix.
Participant Age (yrs) Gender
Weight (kg) BMI
(minutes) 20 20 20 20
(2.4 km/h)
(4.8 km/h)
(6.4 km/h)
All participants reported regularly partaking brisk-
intensive exercise outside largely sedentary
academic/working roles. Participant 1 was ambidextrous. All
other participants were right-handed. (Ethical approval for
“Health Technology Assessment and Data Analytics”,
ERP2329” was obtained from Keele University.)
Figure 1. Activity monitor positions (color-coded for reference).
The slow walking activity was prefaced by two minutes
of standing with arms down. Pulse readings were taken from
a Polar H10 chest strap ECG monitor at 1-minute intervals
throughout the activity.
Data (from the logged Garmin .FIT files) was
downloaded from the watches after each activity and
converted into .CSV formats and imported into Excel. Dates
and times were converted from the Garmin 16- and 32-bit
timestamps used in the .FIT file [21] into standard Excel
date-time serial numbers.
Mean Absolute Percentage Error (MAPE) and the
IntraClass Correlation (ICC) [22] were used to compare the
heart rate recordings from the watches with the baseline
ECG device. Step counts were also acquired and analyzed
but, due to limitations of space, are not reported here.
Figure 2 shows the heart rate recordings for P1-P4 from
the treadmill walking activities. Variability in recorded
values can be seen at both slower and faster walking speeds
and, notably, differs between participants. For analysis of the
acquired data we calculated the MAPE (compared with the
ECG chest strap reference) and ICC values listed in Table
III. As shown, treadmill acquisitions for participants P2 and
P3 produced higher MAPEs (including MAPEs over 10%:
the level often taken as the upper bound of “acceptable”
errors) and lower ICCs. This could, in part, be attributed to
the increased age of participants P2 and P3 compared to P1
and P4. As shown in Figure 2, for P2 there were some
abnormally low but sparse heart rate recordings from the
“blue” device and, to a lesser extent, the “red” device. For
P3, the bluedevice recorded decreasing heart rates when
the actual heart rate increased during the vigorous walking
activity. This produced a near zero ICC.
The devices were also worn by participants for 12-hour
periods during uncontrolled everyday activities. The
recorded heart rates are shown in Figure 3. Intraclass
correlations and confidence intervals for treadmill walking
and 12-hr use are plotted, respectively, in Figures 4 and 5. As
anticipated these indicated poor performance during the
treadmill activity. However, as shown in Figure 5, the
devices performed more consistently during the prolonged
acquisitions of activities of everyday living, when activity
levels were generally lower on average.
Figure 2. Heart rate recordings acquired during treadmill walking activities.
Figure 3. Heart rate recordings acquired during 12-hr everyday living.
Figure 4. ICC for each device compared with ECG chest strap baseline recordings with 90% confidence intervals for treadmill activities.
Figure 5. Inter-instrument ICC values for 12-hrs everyday living.
The lightweight assessment approach exemplified here is
not, and could not be, prescriptive. A useful approach must
incorporate participants and activities that have relevance to
the intended study, otherwise it would have little value. It is
also important to ensure that the duration of activities is
sufficient for devices to record enough data. We established
20-minute durations empirically for each treadmill walking
speed by monitoring the frequency of logged readings and
expanding the window to ensure several readings would be
logged for each speed. For other devices where, for example,
per-minute records are available, the activity duration could
be reduced.
Of course, a comprehensive reliability assessment would
be preferable to the approach outlined here. Similarly, this
lightweight empirical approach is preferable to no
assessment at all or reliance on outdated, irrelevant or
unreproducible reports in the literature. Of the several
limitations of the presented approach, there was,
intentionally, a small number of participants, a limited
sample of unrepeated activities and there were no reference
recordings for the 12-hr everyday activity. (Reference
readings from finger-worn pulse oximeters were attempted,
but the devices repeatedly failed to maintain accurate
readings). However, with just four participants and two
activity acquisitions, we were able to quickly and simply
obtain an insight into the reliability of the devices at their
current version, have an appreciation of their limitations and,
also, a degree of confidence regarding their potential for
study acquisitions.
There is much scope for further work to improve
reproducibility across the activity monitoring domain and to
assist researchers evaluate and re-evaluate new and updated
devices. We have demonstrated an empirical approach to
device assessment that provides an example lightweight
assessment that is not onerous and could easily be repeated
as and when devices are updated.
Despite issues associated with reliable optical heart rate
acquired from the wrist during activity, we might hope that
future and updated consumer devices would i) be better at
identifying erroneous values and avoid reporting them and ii)
be better at correctly estimating values. However, it would
be unwise to assume every device upgrade will necessarily
result in improved device performance in all aspects.
The U.S. Food and Drug Administration has established
a new Digital Health Software Precertification (Pre-Cert)
Program” [23] that aspires toward a more agile approach to
digital health technology regulation. It recognizes the
iterative characteristics of new consumer devices [24]. In
addition, the Consumer Technology Association recently
defined CTA-2065; a new protocol to test and validate the
accuracy of heart rate monitoring devices under the
conditions of everyday living from dynamic indoor cycling
to sedentary lifestyles. We recommend that there is also
some means to enable and encourage the sharing of version-
by-version device reliability assessment data between
manufacturer/s, users and researchers.
In a systematic review of consumer-wearable activity
trackers, Everson et al. [16], recommend that “future studies
on the measurement properties of the trackers should be sure
to initialize the tracker properly and indicate in the
publication how this was done so others can replicate the
process. Providing the specific tracker type, date purchased,
and date tested would also be important.” We additionally
recommend that full device details, including software and
firmware versions, are reported in the literature.
The authors wish to thank Professor Fiona Polack,
Software and Systems Engineering Research, Keele
University for her valuable input and support in resourcing
this work. The authors also thank Professor Barbara
Kitchenham for her advice on protocol design and statistics.
The authors also wish to acknowledge contributions from
FCT project UID/EEA/50008/2013 and COST Actions
IC1303 (AAPELE Architectures, Algorithms and Protocols
for Enhanced Living Environments) and CA16226 (Indoor
living space improvement: Smart Habitat for the Elderly).
The further material details were as follows:
Garmin Vivosmart 3 software/firmware versions:
SW: v4.10; TSC: v1.10; SNS: v5.90. Devices were
initialized according to the arm worn and all data was taken
directly from logged .FIT files. Devices were purchased on
9th March 2018 and acquisitions made during May 2018.
Their serial numbers were as follows: Black 560185378,
Red 560185383, Blue 560640435, Green 560639717.
The treadmill was an h/p/cosmos Pulsar treadmill,
h/p/cosmos Sports & Medical Gmbh, Nussdorf-Traunstein,
Germany. (cos100420b; ID: X239W80479043; OP19: 0319
Polar H10 chest heart rate monitor (FCC ID: INW1W;
Model: 1W; IC: 6248A-1W; SN: C7301W0726005;
ID: 14C00425; Firmware: 2.1.9 and data acquired via Polar
Beat 2.5.3.
[1] Garmin Vivosmart 2016 “Important safety and product information”
Instruction Leaflet supplied with Vivosmart 3 (v4.10), l90-02068-
[2] Fitbit Important safety and product information, Last Updated
March 20, 2017, [Online]. Available from: 2018.06.02
[3] M. M. Baig, H. GholamHosseini, A. A. Moqeem, F. Mirza and M.
Lindén, “A systematic review of wearable patient monitoring
systemscurrent challenges and opportunities for clinical adoption,”
Journal of Medical Systems, vol. 41(7): 115, pp. 1-9, 2017.
[4] Business Wire, “Federal court denies Fitbit's motion to dismiss class
action lawsuit alleging gross inaccuracies and recording failures in
PurePulse™ heart rate monitors, June 05, 2018: [Online]. Available
ral-Court-Denies-Fitbits-Motion-Dismiss-Class 2018.06.02
[5] Lieff Cabraser Civil Justice Blog, June 5, 2018, [Online]. Available
heart-rate-monitors/ 2018.06.07
[6] M. Lang,. Beyond Fitbit: A critical appraisal of optical heart rate
monitoring wearables and apps, their current limitations and legal
implications,” Albany Law Journal of Science & Technology 28(1),
pp. 39-72, 2017.
[7] Z. Zhang, Heart rate monitoring from wrist-type
photoplethysmographic (PPG) signals during intensive physical
exercise,” In Signal and Information Processing (GlobalSIP), IEEE
Global Conference on, pp. 698-702, December 2014.
[8] Z. Zhang, Z., Pi, and B. Liu, TROIKA: A general framework for
heart rate monitoring using wrist-type photoplethysmographic signals
during intensive physical exercise,” IEEE Transactions on
Biomedical Engineering, vol. 62(2), pp. 522-531, 2015.
[9] W. T. Cecil, K. J. Thorpe, E. E. Fibuch, and G. F.Tuohy,. A clinical
evaluation of the accuracy of the Nellcor N-100 and Ohmeda 3700
pulse oximeters,” Journal of Clinical Monitoring, vol. 4(1), pp. 31-36,
[10] K. S. Hong, K. T. Park,. and J. M. Ahn. Aging index using
photoplethysmography for a healthcare device: comparison with
brachial-ankle pulse wave velocity,” Healthcare Informatics
Research, vol. 21(1), pp. 30-34, 2015.
[11] T. Collins, S. Aldred, S. I. Woolley, and S. Rai,. Addressing the
deployment challenges of health monitoring devices for a dementia
study,” In Proceedings of the 5th EAI International Conference on
Wireless Mobile Communication and Healthcare, pp. 202-205, 2015.
[12] Jr, D. R. Bassett, A. V. Rowlands, and S. G. Trost,. Calibration and
validation of wearable monitors,” Medicine and Science in Sports and
Exercise, 44(1 Suppl 1), p.S32, 2012.
[13] P. Freedson, H. R. Bowles, R. Troiano and W. Haskell,. Assessment
of physical activity using wearable monitors: recommendations for
monitor calibration and use in the field,” Medicine and Science in
Sports and Exercise, vol. 44(1 Suppl 1):S1-S4, pp. 1-6, 2012.
[14] T. Fokkema, T. J. Kooiman, W. P. Krijnen, C. P. Van Der Schans.
and M. De Groot, Reliability and validity of ten consumer activity
trackers depend on walking speed,” Medicine and Science in Sports
and Exercise, 49(4), pp. 793-800, 2017.
[15] Y. Wahl, P. Düking, A. Droszez, P. Wahl and J. Mester, Criterion-
validity of commercially available physical activity tracker to
estimate step count, covered distance and energy expenditure during
sports conditions,” Frontiers in Physiology, vol. 8:725, pp. 1-12,
[16] K. R. Evenson, M. M. Goto and R. D. Furberg, "Systematic review of
the validity and reliability of consumer-wearable activity
trackers,” International Journal of Behavioral Nutrition and Physical
Activity, vol. 12(1):159, pp. 1-22, 2015.
[17] Garmin 2018, “Updates & Downloads: vivosmart HR software -
version 4.30 as of March 7, 2018, [Online]. Available from:
[18] P. M. Grant, P.M. Dall., S. I. Mitchell, and M. H.Granat, “Activity-
monitor accuracy in measuring step number and cadence in
community-dwelling older adults,” Journal of Aging and Physical
Activity, 16(2), pp. 201-214, 2008.
[19] J. Takacs, C. L. Pollock, J. R. Guenther, M. Bahar, C. Napier, C. and
M. A.Hunt,Validation of the Fitbit One activity monitor device
during treadmill walking,” Journal of Science and Medicine in
Sport, vol. 17(5), pp. 496-500, 2014.
[20] S. Krishnamurthi and J. Vitek, “The real software crisis: Repeatability
as a core value,” Communications of the ACM, vol. 58(3), pp. 34-36,
[21] Garmin, 2018. FIT Software Development Kit (version 20.56.00),
[Online]. Available from:
[22] K. O. McGraw and S. P. Wong, Forming inferences about some
intraclass correlation coefficients,” Psychological methods, vol. 1(1),
p. 30-46, 1996.
[23] U.S. Food and Drug Administration Digital health software
precertification (pre-cert) program [Online]. Available from:
CertProgram/ucm567265.htm 2018.06.13
[24] CTA announces standard to improve heart rate monitoring in
wearables May 2, 2018, [Online]. Available from:
Announces-Standard-to-Improve-Heart-Rate-Monit.aspx 2018.06.1
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Fitness and health-care-oriented wearables and apps have been around for a couple of years and are still gaining momentum. Over time, they have begun to harness considerable computational power and to incorporate increasingly sophisticated sensors, eventually resulting in a blurring of the lines between consumer electronics and medical devices. While their benefits and potentials are undisputed, the overly optimistic appraisal commonly encountered in both mass media and academic literature does not adequately reflect unsolved problems and inherent limitations of these devices. This Article will argue that while these issues have long been known to the engineering community, their relevance and legal implications appear to have been grossly underestimated. January 2016 marked a turning point, as news of two class-action lawsuits filed against major manufacturer Fitbit brought widespread attention to accuracy, reliability, and safety concerns regarding these devices. This Article will provide a concise overview of optical heart rate monitoring technology, the current state of the art, and research trends. It will be argued that under real-world scenarios these apps and devices are currently inherently inaccurate and unreliable, with even greater problems on the horizon as the industry shifts towards areas such as heart rate variability monitoring or the detection of cardiac arrhythmias. Available at
Full-text available
Background: In the past years, there was an increasing development of physical activity tracker (Wearables). For recreational people, testing of these devices under walking or light jogging conditions might be sufficient. For (elite) athletes, however, scientific trustworthiness needs to be given for a broad spectrum of velocities or even fast changes in velocities reflecting the demands of the sport. Therefore, the aim was to evaluate the validity of eleven Wearables for monitoring step count, covered distance and energy expenditure (EE) under laboratory conditions with different constant and varying velocities. Methods: Twenty healthy sport students (10 men, 10 women) performed a running protocol consisting of four 5 min stages of different constant velocities (4.3; 7.2; 10.1; 13.0 km·h⁻¹), a 5 min period of intermittent velocity, and a 2.4 km outdoor run (10.1 km·h⁻¹) while wearing eleven different Wearables (Bodymedia Sensewear, Beurer AS 80, Polar Loop, Garmin Vivofit, Garmin Vivosmart, Garmin Vivoactive, Garmin Forerunner 920XT, Fitbit Charge, Fitbit Charge HR, Xaomi MiBand, Withings Pulse Ox). Step count, covered distance, and EE were evaluated by comparing each Wearable with a criterion method (Optogait system and manual counting for step count, treadmill for covered distance and indirect calorimetry for EE). Results: All Wearables, except Bodymedia Sensewear, Polar Loop, and Beurer AS80, revealed good validity (small MAPE, good ICC) for all constant and varying velocities for monitoring step count. For covered distance, all Wearables showed a very low ICC (<0.1) and high MAPE (up to 50%), revealing no good validity. The measurement of EE was acceptable for the Garmin, Fitbit and Withings Wearables (small to moderate MAPE), while Bodymedia Sensewear, Polar Loop, and Beurer AS80 showed a high MAPE up to 56% for all test conditions. Conclusion: In our study, most Wearables provide an acceptable level of validity for step counts at different constant and intermittent running velocities reflecting sports conditions. However, the covered distance, as well as the EE could not be assessed validly with the investigated Wearables. Consequently, covered distance and EE should not be monitored with the presented Wearables, in sport specific conditions.
Full-text available
The aim of this review is to investigate barriers and challenges of wearable patient monitoring (WPM) solutions adopted by clinicians in acute, as well as in community, care settings. Currently, healthcare providers are coping with ever-growing healthcare challenges including an ageing population, chronic diseases, the cost of hospitalization, and the risk of medical errors. WPM systems are a potential solution for addressing some of these challenges by enabling advanced sensors, wearable technology, and secure and effective communication platforms between the clinicians and patients. A total of 791 articles were screened and 20 were selected for this review. The most common publication venue was conference proceedings (13, 54%). This review only considered recent studies published between 2015 and 2017. The identified studies involved chronic conditions (6, 30%), rehabilitation (7, 35%), cardiovascular diseases (4, 20%), falls (2, 10%) and mental health (1, 5%). Most studies focussed on the system aspects of WPM solutions including advanced sensors, wireless data collection, communication platform and clinical usability based on a specific area or disease. The current studies are progressing with localized sensor-software integration to solve a specific use-case/health area using non-scalable and ‘silo’ solutions. There is further work required regarding interoperability and clinical acceptance challenges. The advancement of wearable technology and possibilities of using machine learning and artificial intelligence in healthcare is a concept that has been investigated by many studies. We believe future patient monitoring and medical treatments will build upon efficient and affordable solutions of wearable technology.
Conference Paper
Full-text available
This paper presents the findings of a technological adoption assessment of health monitoring devices for a dementia study. The work was motivated by the need to monitor physical activity interventions in a study cohort of dementia patients living with caregiver support in the community. The system requirements were for a discrete and unobtrusive solution with activity level (energy expenditure) and heart rate monitoring. In addition to fulfilling system requirements, successful technology adoption requires careful consideration of practical challenges in deployment. The paper addresses these challenges - in particular, aspects relating to the servicing and maintenance of units over the study period and the access and synchronisation of data. Test data visualisations and data mining results for sustained, long-term data capture are provided.
Full-text available
Background: Consumer-wearable activity trackers are electronic devices used for monitoring fitness- and other health-related metrics. The purpose of this systematic review was to summarize the evidence for validity and reliability of popular consumer-wearable activity trackers (Fitbit and Jawbone) and their ability to estimate steps, distance, physical activity, energy expenditure, and sleep. Methods: Searches included only full-length English language studies published in PubMed, Embase, SPORTDiscus, and Google Scholar through July 31, 2015. Two people reviewed and abstracted each included study. Results: In total, 22 studies were included in the review (20 on adults, 2 on youth). For laboratory-based studies using step counting or accelerometer steps, the correlation with tracker-assessed steps was high for both Fitbit and Jawbone (Pearson or intraclass correlation coefficients (CC) > =0.80). Only one study assessed distance for the Fitbit, finding an over-estimate at slower speeds and under-estimate at faster speeds. Two field-based studies compared accelerometry-assessed physical activity to the trackers, with one study finding higher correlation (Spearman CC 0.86, Fitbit) while another study found a wide range in correlation (intraclass CC 0.36-0.70, Fitbit and Jawbone). Using several different comparison measures (indirect and direct calorimetry, accelerometry, self-report), energy expenditure was more often under-estimated by either tracker. Total sleep time and sleep efficiency were over-estimated and wake after sleep onset was under-estimated comparing metrics from polysomnography to either tracker using a normal mode setting. No studies of intradevice reliability were found. Interdevice reliability was reported on seven studies using the Fitbit, but none for the Jawbone. Walking- and running-based Fitbit trials indicated consistently high interdevice reliability for steps (Pearson and intraclass CC 0.76-1.00), distance (intraclass CC 0.90-0.99), and energy expenditure (Pearson and intraclass CC 0.71-0.97). When wearing two Fitbits while sleeping, consistency between the devices was high. Conclusion: This systematic review indicated higher validity of steps, few studies on distance and physical activity, and lower validity for energy expenditure and sleep. The evidence reviewed indicated high interdevice reliability for steps, distance, energy expenditure, and sleep for certain Fitbit models. As new activity trackers and features are introduced to the market, documentation of the measurement properties can guide their use in research settings.
Full-text available
Recent studies have emphasized the potential information embedded in peripheral fingertip photoplethysmogram (PPG) signals for the assessment of arterial wall stiffening during aging. For the discrimination of arterial stiffness with age, the brachial-ankle pulse wave velocity (baPWV) has been widely used in clinical applications. The second derivative of the PPG (acceleration photoplethysmogram [APG]) has been reported to correlate with the presence of atherosclerotic disorders. In this study, we investigated the association among age, the baPWV, and the APG and found a new aging index reflecting arterial stiffness for a healthcare device. The APG and the baPWV were simultaneously applied to assess the accuracy of the APG in measuring arterial stiffness in association with age. A preamplifier and motion artifact removal algorithm were newly developed to obtain a high quality PPG signal. In total, 168 subjects with a mean ± SD age of 58.1 ± 12.6 years were followed for two months to obtain a set of complete data using baPWV and APG analysis. The baPWV and the B ratio of the APG indices were correlated significantly with age (r = 0.6685, p < 0.0001 and r = -0.4025, p < 0.0001, respectively). A regression analysis revealed that the c and d peaks were independent of age (r = -0.3553, p < 0.0001 and r = -0.3191, p < 0.0001, respectively). We determined the B ratio, which represents an improved aging index and suggest that the APG may provide qualitatively similar information for arterial stiffness.
Conference Paper
Full-text available
Heart rate monitoring from wrist-type photoplethys-mographic (PPG) signals during subjects' intensive exercise is a difficult problem, since the PPG signals are contaminated by extremely strong motion artifacts caused by subjects' hand movements. In this work, we formulate the heart rate estimation problem as a sparse signal recovery problem, and use a sparse signal recovery algorithm to calculate high-resolution power spectra of PPG signals, from which heart rates are estimated by selecting corresponding spectrum peaks. To facilitate the use of sparse signal recovery, we propose using bandpass filtering, singular spectrum analysis, and temporal difference operation to partially remove motion artifacts and sparsify PPG spectra. The proposed method was tested on PPG recordings from 10 subjects who were fast running at the peak speed of 15km/hour. The results showed that the averaged absolute estimation error was only 2.56 Beats/Minute, or 1.94% error compared to ground-truth heart rates from simultaneously recorded ECG.
Full-text available
Heart rate monitoring using wrist-type photoplethysmographic (PPG) signals during subjects' intensive exercise is a difficult problem, since the signals are contaminated by extremely strong motion artifacts caused by subjects' hand movements. So far few works have studied this problem. In this work, a general framework, termed TROIKA, is proposed, which consists of signal decomposiTion for denoising, sparse signal RecOnstructIon for high-resolution spectrum estimation, and spectral peaK trAcking with verification. The TROIKA framework has high estimation accuracy and is robust to strong motion artifacts. Many variants can be straightforwardly derived from this framework. Experimental results on datasets recorded from 12 subjects during fast running at the peak speed of 15 km/hour showed that the average absolute error of heart rate estimation was 2.34 beat per minute (BPM), and the Pearson correlation between the estimates and the ground-truth of heart rate was 0.992. This framework is of great values to wearable devices such as smart-watches which use PPG signals to monitor heart rate for fitness.
Purpose: To examine the test-retest reliability and validity of ten activity trackers for step counting at three different walking speeds. Methods: Thirty-one healthy participants walked twice on a treadmill for 30 min while wearing 10 activity trackers (Polar Loop, Garmin Vivosmart, Fitbit Charge HR, Apple Watch Sport, Pebble Smartwatch, Samsung Gear S, Misfit Flash, Jawbone Up Move, Flyfit, and Moves). Participants walked three walking speeds for 10 min each; slow (3.2 km·h), average (4.8 km·h), and vigorous (6.4 km·h). To measure test-retest reliability, intraclass correlations (ICC) were determined between the first and second treadmill test. Validity was determined by comparing the trackers with the gold standard (hand counting), using mean differences, mean absolute percentage errors, and ICC. Statistical differences were calculated by paired-sample t tests, Wilcoxon signed-rank tests, and by constructing Bland-Altman plots. Results: Test-retest reliability varied with ICC ranging from -0.02 to 0.97. Validity varied between trackers and different walking speeds with mean differences between the gold standard and activity trackers ranging from 0.0 to 26.4%. Most trackers showed relatively low ICC and broad limits of agreement of the Bland-Altman plots at the different speeds. For the slow walking speed, the Garmin Vivosmart and Fitbit Charge HR showed the most accurate results. The Garmin Vivosmart and Apple Watch Sport demonstrated the best accuracy at an average walking speed. For vigorous walking, the Apple Watch Sport, Pebble Smartwatch, and Samsung Gear S exhibited the most accurate results. Conclusion: Test-retest reliability and validity of activity trackers depends on walking speed. In general, consumer activity trackers perform better at an average and vigorous walking speed than at a slower walking speed.
Sharing experiences running artifact evaluation committees for five major conferences.