Content uploaded by Zan Gao
Author content
All content in this area was uploaded by Zan Gao on Apr 28, 2020
Content may be subject to copyright.
Accuracy of Commercially Available Smartwatches in Assessing
Energy Expenditure During Rest and Exercise
Zachary C. Pope
University of Minnesota
Nan Zeng
Colorado State University
Xianxiong Li and Wenfeng Liu
Hunan Normal University
Zan Gao
University of Minnesota
Background: This study examined the accuracy of Microsoft Band (MB), Fitbit Surge HR (FS), TomTom Cardio Watch (TT),
and Apple Watch (AW) for energy expenditure (EE) estimation at rest and at different physical activity (PA) intensities. Method:
During summer 2016, 25 college students (13 females; M
age
=23.52 ± 1.04 years) completed four separate 10-minute exercise
sessions: rest (i.e., seated quietly), light PA (LPA; 3.0-mph walking), moderate PA (MPA; 5.0-mph jogging), and vigorous PA
(VPA; 7.0-mph running) on a treadmill. Indirect calorimetry served as the criterion EE measure. The AW and TT were placed on
the right wrist and the FS and MB on the left—serving as comparison devices. Data were analyzed in late 2017. Results: Pearson
correlation coefficients revealed only three significant relationships (r=0.43–0.57) between smartwatches’EE estimates and
indirect calorimetry: rest-TT; LPA-MB; and MPA-AW. Mean absolute percentage error (MAPE) values indicated the MB
(35.4%) and AW (42.3%) possessed the lowest error across all sessions, with MAPE across all smartwatches lowest during the
LPA (33.7%) and VPA (24.6%) sessions. During equivalence testing, no smartwatch’s 90% CI fell within the equivalence region
designated by indirect calorimetry. However, the greatest overlap between smartwatches’90% CIs and indirect calorimetry’s
equivalency region was observed during the LPA and VPA sessions. Finally, EE estimate variation attributable to the use of
different manufacturer’s devices was greatest at rest (53.7 ± 12.6%), but incrementally decreased as PA intensity increased.
Conclusions: MB and AW appear most accurate for EE estimation. However, smartwatch manufacturers may consider
concentrating most on improving EE estimate accuracy during MPA.
Keywords: measurement bias, indirect calorimetry, validity
Wearable technology devices offer tremendous promise in
promoting physical activity (PA) and health among diverse popu-
lations (Case, Burwick, Volpp, & Patel, 2015;Kenney, Gortmaker,
Evenson, Goto, & Furberg, 2015), with great potential to aid in the
development of personalized health behavior change interventions
(Bai et al., 2016;Ferguson, Rowlands, Olds, & Maher, 2015;
Flores, Glusman, Brogaard, Price, & Hood, 2013;Hood, Balling,
& Auffray, 2012;Sasaki et al., 2015). For example, advancing
technology has facilitated development of sophisticated smart-
watches (Kenney et al., 2015)—many providing health metric
data output for heart rate, energy expenditure (EE), PA, and sleep,
among other metrics. Notably, smartwatches’capability to provide
EE estimates has played a crucial role in these devices’popularity
as consumers track this metric and modify kcaloric (kcal) con-
sumption and PA in a manner necessary to promote appropriate and
sustainable weight loss (Kenney, Wilmore, & Costill, 2015b). Yet,
if smartwatches are not providing accurate EE estimates, these
inaccuracies may prevent the effective use of these devices as part
of a weight loss strategy or, more generally, for health promotion
purposes.
Currently, several smartwatches are popular among consu-
mers. Using each manufacturer’s proprietary algorithms, these
smartwatches combine demographic (age, sex), anthropometric
(height, weight), and bodily movement data collected via triaxial
accelerometer technology to provide daily EE estimates at rest,
during activities of daily living, and during PA or exercise (Fitbit,
2016;TomTom, 2017). Only a paucity of the available literature,
however, has conducted smartwatch EE estimate validation.
Indeed, literature has mostly examined the validity of smartwatches
in the measurement of laboratory-based and free-living PA dura-
tion and steps (Bai et al., 2016;Bunn, Navalta, Fountaine, & Reece,
2018;Evenson, Goto, & Furberg, 2015;Lee & Gorelick, 2011).
Among the few smartwatch EE estimate validation studies to date
(Bai et al., 2016;Diaz et al., 2015;Ferguson et al., 2015), mean
validity coefficients for EE were moderate to strong (range
r=0.74–0.85), with mixed findings regarding smartwatches’
over- or under-estimation of EE versus various criterion measures.
Notably, however, these studies were almost exclusively con-
ducted using specific models of the Fitbit and Jawbone despite
the rising popularity of other smartwatches (e.g., Apple Watch).
Moreover, few studies have employed indirect calorimetry as the
criterion EE measure—an assessment method commonly consid-
ered the ‘gold standard’for EE measurement (Kenney, Wilmore, &
Costill, 2015c). Finally, a newer statistical methodology, termed
“equivalence testing”(Dixon, Saint-Maurice, Kim, Hibbing, &
Pope is with the Division of Epidemiology and Community Health, School of Public
Health, University of Minnesota, Minneapolis, MN. Zeng is with the Department of
Food Science and Human Nutrition, Colorado State University, Fort Collins, CO. Li
and Liu are with the School of Physical Education, Hunan Normal University,
Changsha, China. Gao is with the School of Kinesiology, University of Minnesota,
Minneapolis, MN. Pope (popex157@umn.edu) and Gao (gaoz@umn.edu) are
corresponding authors
Q1.
1
Journal for the Measurement of Physical Behaviour, (Ahead of Print)
https://doi.org/10.1123/jmpb.2018-0037
© 2019 Human Kinetics, Inc. ORIGINAL RESEARCH
Welk, 2018), has been developed and may provide better insight
into smartwatch health metric data accuracy than the validity
statistics employed in past studies.
These limitations are not only notable given how consumers
often use smartwatch EE estimates (e.g., to monitor daily EE and
subsequently modify PA and/or dietary behaviors), but these
limitations may also impair health professionals’abilities to
employ smartwatches as a health promotion tool. Specifically,
smartwatches are more often being cited as important components
of a healthcare approach referred to as “systems medicine”(Flores
et al., 2013;Hood et al., 2012), a multi-faceted wellness perspec-
tive leveraging novel technology (e.g., smartwatches, smartphones,
social media) to collect and analyze (via big data analysis) an
individual’s health behaviors and develop personalized health
behavior change interventions thereafter based on these data
(Flores et al., 2013;Pope & Gao, 2017). Given smartwatch
technology’s emerging uses, a need exists to assess several popular
smartwatches’EE estimates versus a gold standard EE criterion
measure like indirect calorimetry during different PA intensities—
employing the statistical methodology of equivalence testing to
conduct these analyses. Therefore, this study’s purpose was to
investigate the accuracy of the Microsoft Band (MB), Fitbit Surge
HR (FS), TomTom Cardio Watch (TT), and Apple Watch (AW) in
estimating EE at rest and at different PA intensities versus indirect
calorimetry EE measurements. The current study’s observations
may inform consumers and health professionals alike of the
capability of various popular smartwatches to provide accurate
EE estimates capable of assisting in effective health behavior
change intervention development.
Method
Participants and Research Setting
This cross-sectional study recruited a convenience sample of
healthy young adults at a south-central Chinese university in
summer 2016. Participant inclusion criteria were (a) 18–25 years
old; (b) body mass index ≥18.5 kg/m
2
; (c) no diagnosed physical
or mental disability; and (d) signed informed consent. Exclusion
criteria included (a) anyone currently using medication which
might affect cardiovascular function (e.g., beta-blockers); (b) a
history of documented cardiovascular or metabolic diseases/
conditions; or (c) unaccustomed to high-intensity exercise eliciting
EE >300 kcals/session. Participants completed a comprehensive
medical and health history questionnaire prior to study participa-
tion, with the experiment conducted in a highly controlled labora-
tory setting. All procedures performed were in accordance with
the ethical standards of the institution
Q2and/or national research
committee and with the 1964 Helsinki Declaration and its later
amendments or comparable ethical standards (World Medical
Association, 2018). Additionally, this research was completed in
agreement with the most recent ethical standards for sport and
exercise research (Harriss, Macsween, & Atkinson, 2017). Finally,
University Research Ethics Committee
Q3approval and informed
consent were obtained prior to testing.
Instrumentation
Criterion Device. Criterion EE data were collected via indirect
calorimetry with a Cortex Metalyzer II metabolic cart (Cortex;
Germany). Briefly, the exercise tests were performed on a Pulsar
treadmill (H/P/Cosmos; Willich, Germany), with participants
wearing a mask attached to the metabolic cart. The metabolic
cart conducted indirect calorimetry measurements via gas analyses
at rest and during exercise from which body temperature, pressure,
and saturated-adjusted EE values for each exercise session were
measured. In simplest terms, indirect calorimetry measures parti-
cipants’respiratory gas exchange rates of oxygen and carbon
dioxide which is then used to provide EE measurements (Kenney
et al., 2015c), with more detailed descriptions available regarding
how indirect calorimetry measures EE and why this measure
has been widely considered the ‘gold standard’EE measurement
method (Branson & Johannigman, 2004;Holdy, 2004). Impor-
tantly, the Pulsar treadmill and Cortex Metalyzer II have been used
in previous studies among various populations when assessing
EE (Bailey et al., 2012;Cockcroft et al., 2015;Peters, Heelan, &
Abbey, 2013). Notably, the Cortex Metalyzer II was calibrated
using a 3-liter syringe prior to each participant’s session, with the
calibration process completed per manufacturer specifications.
Comparison Devices. Four wrist-worn smartwatches provided
EE estimates and served as the comparison devices. The smart-
watches included were the MB (Microsoft; Redmond, WA, USA),
FS (Fitbit, Inc.; San Francisco, CA, USA), TT (TomTom;
Amsterdam, The Netherlands), and AW (Apple; Cupertino, CA,
USA). Each smartwatch can assess several metrics including heart
rate, activity (i.e., minutes of activity, steps/day), sleep, stairs
climbed, and calories burned (i.e., EE). Notably, only one smart-
watch from each of the preceding manufacturers was included.
Regarding smartwatch placement, the MB and FS were worn on the
left wrist while the TT and AW were placed on the right, with the
smartwatches spaced 1 cm apart. Smartwatches were monitored
throughout the sessions to ensure no contact was made between
devices that might have impacted the results. This study’s smart-
watch placement mirrored that of other studies placing multiple
smartwatches to participants’wrists (Ferguson et al., 2015;
Fokkema, Kooiman, Krijnen, Van Der Schans, & De Groot, 2017).
To ensure the most accurate EE estimates were provided by each
smartwatch, each participant’s age, sex, weight, and height were
entered into each smartwatch prior to initiating the testing session
(see procedures below), with the side upon which each smartwatch
was worn (i.e., left or right) programmed as well. Finally, while the
wrist upon which the smartwatches were placed did not differ
between participants, potential bias of smartwatch placement was
reduced by randomizing which smartwatch was distal and which
was more proximal from participant to participant.
Anthropometrics. Height and weight were measured using a
stadiometer and digital weight scale, respectively. Specifically,
height was measured using a Seca stadiometer (Seca; Hamburg,
Germany) and recorded to the nearest half-centimeter. As for
weight, this measurement was performed with a Detecto digital
weight scale (Detecto; Webb City, MO, USA), with weight
documented to the nearest tenth of a kilogram.
Procedures
All participants were instructed to abstain from eating or drinking
anything except water eight hours prior to visiting the lab in addition
to refraining from any vigorous PA (VPA) during the 24 hours prior
to study participation. Participants were asked to come into the lab
in a fasted state for two reasons. First, we wanted to ensure that the
indirect calorimetry measurements during the resting trial were as
accurate as possible. Indeed, basal metabolic rate assessed via
indirect calorimetry may be affected by prior food consumption
(Ahead of Print)
2Pope et al.
(Kenney et al., 2015c). Therefore, having the participants abstain
from food consumption until after study completion was important
to ensure the most valid comparison of indirect calorimetry EE
measurements to smartwatch EE estimates during the resting
(i.e., sitting) session. Second and more practically, participants
were requested to be fasted prior to study participation to ensure
no adverse gastrointestinal discomfort was experienced during the
study—particularly during the higher-intensity sessions. Partici-
pants were informed of all experimental procedures and encouraged
to ask any questions before providing consent. Next, participants
were asked to complete a comprehensive medical/health history
questionnaire and a demographic information sheet after which
anthropometric data (i.e., height and weight) were gathered. Demo-
graphic and anthropometric data were subsequently loaded into
each smartwatch and into the metabolic cart’s software to ensure
accurate EE estimation and measurement, respectively. Finally, a
mask connected to the metabolic cart was placed on each participant
to measure oxygen consumption for determination of criterion EE.
Participants completed an 80-minute experimental protocol
which included four 10-minute PA sessions, each at a different
PA intensity: resting (sitting quietly), light PA (LPA; walking at
3.0 mph on treadmill), moderate PA (MPA; jogging at 5.0 mph), and
VPA (running at 7.0 mph). Sessions were completed from lowest
(i.e., resting) to highest (i.e., VPA) intensity—ensuring the results of
the lower-intensity trials were not biased by prior high-intensity
physiological workload. The PA intensity classification criterion
were consistent with a previous study among Chinese young adults
(Ren, Li, & Liang, 2017). Between each session, participants were
required to sit quietly until heart rate returned to ±10 beats/minute of
that observed during the initial resting session (Goto et al., 2007).
Following each of the four exercise sessions each participant
completed, EE data were obtained directly from the smartwatches
themselves, with these data recorded immediately to prevent any
data loss or misinterpretation. All four smartwatches in this study
provided “average calories burned [i.e., EE]”estimates over the
specified time interval pre-programmed by the researchers. There-
fore, prior to each of the participant’s resting and exercise trials, we
pre-programmed the smartwatches for a 10-minute exercise session
—starting each smartwatch’s 10-minute program immediately upon
each participant’s initiation of their session. This pre-programming
ensured that no EE data were included outside of the 10-minute
exercise session and, further, requested each smartwatch to save the
exercise session to its internal memory in case we needed to verify
these data at a later time. It is also noteworthy that two researchers
collected EE data from the smartwatches immediately after each
participant finished their respective exercise session—allowing
each participant’s smartwatch data from each exercise session to
be double-checked (i.e., data quality control protocol). Finally, data
regarding each participant’s EE were placed immediately into an
Excel file for later analysis by one researcher and double-checked
by a second researcher after each trial for each participant. Impor-
tantly, the times the smartwatches were started and stopped during
each testing session were recorded per the software reporting
indirect calorimetry EE measurements. Using this software’s time-
stamp allowed us to ensure that the start and stop times used to
segment indirect calorimetry EE measurements were identical to the
time segments during which the smartwatches were estimating EE.
Statistical Analysis
Data were analyzed in late 2017 and were first screened for
physiological implausible values. Next, Pearson correlation
coefficients were calculated to observe the association between
smartwatch EE estimates and indirect calorimetry EE measure-
ments at rest and each PA intensity (resting, LPA, MPA, and VPA).
Weak, moderate, and strong correlations were categorized as
rvalues of 0.20–0.39, 0.40–0.59, and 0.60–0.79, respectively,
with rvalues ≤0.19 classified as very weak and rvalues ≥0.80
classified as very strong (Thomas, Nelson, & Silverman, 2011).
Mean absolute percent errors (MAPE) were then calculated for
sitting and each PA intensity. Briefly, MAPE was reported as
the average of the absolute difference between smartwatch EE
estimates and indirect calorimetry EE measurements divided
by indirect calorimetric measurements multiplied by 100. These
MAPE calculations were completed for each smartwatch at each
PA intensity, with MAPE calculations providing an examination of
individual-level measurement error—an approach used in other
smartwatch and accelerometer device validation studies (Fokkema
et al., 2017;Kim & Welk, 2015).
Equivalence testing was then used to assess the agreement of
smartwatch EE estimates with indirect calorimetry EE measure-
ments using this testing approach’s confidence interval (CI) method.
Equivalency testing is given fuller explanation in Dixon et al.
(2018), but the following two aspects of equivalence testing are
important: (1) equivalence testing’s null hypothesis states that the
two measurement methods being compared are not equivalent; and
(2) an alpha of 0.05 (i.e., 5%) is consistent with examining whether
the entire 90% CI for a given smartwatch at a given PA intensity
falls within a proposed equivalency region situated around the mean
indirect calorimetry EE measurement made at the same PA inten-
sity. Congruent with Kim and Welk (2015) and Bai et al. (2016), we
stated the equivalency region to be ±10% of the mean indirect
calorimetry EE measurements made at a given PA intensity. Finally,
coefficients of variation (CV) examined percentage variation in
smartwatch EE estimates attributable to the use of different man-
ufacturer’s devices—as done in prior literature (Driller, McQuillan,
&O’Donnell, 2016). SPSS 25.0 (IBM Inc.; Armonk, NY) was
employed for all analyses, with alpha set at 0.05.
Results
Participants were 25 college students (13 females; M
age
=23.52 ±
1.04; M
height
=168.6 ± 7.4 cm; M
weight
=61.5 ± 10.1 kg). Table 1
presents descriptive statistics for smartwatch EE estimates and
indirect calorimetry EE measurements. As expected, EE values
increased linearly as PA intensity increased.
Pearson correlation coefficients between smartwatch EE esti-
mates and indirect calorimetry EE measurements at each PA
intensity revealed only three significant correlations (rrange =
−0.19–0.57; Table 2). Specifically, moderate correlations were
seen for the following smartwatches at the denoted PA intensities
versus indirect calorimetry: Rest–TT (r=0.57, p<.01); LPA–MB
(r=0.43, p<.05); and MPA–AW (r=0.43, p<.05). Notably, a
marginally significant, but weak, correlation was observed between
the AW and indirect calorimetry during LPA (r=0.37, p=.07).
No significant correlations were found between smartwatch EE
estimates and indirect calorimetry EE measurements during VPA.
Moreover, Table 3contains MAPE values for each smartwatch’s
EE estimates at each PA intensity compared to indirect calorimetry.
Overall, MAPE values were lowest for the MB (35.4%) and AW
(42.3%), with the FS and TT demonstrating higher values (47.7%
and 51.0%, respectively). Finally, MAPE values were higher
during the resting (52.9%) and MPA (65.3%) sessions versus the
LPA (33.7%) and VPA (24.6%) sessions.
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 3
The equivalence testing results for each smartwatch’sEE
estimates and indirect calorimetry’s EE measurements are pre-
sented in Table 4. Further, Figures 1–4graphically present these
results during the resting, LPA, MPA, and VPA sessions, respec-
tively. As indicated by Table 4and each Figure, no smartwatch’s
90% CI fell completely within the ±10% equivalency region
established by indirect calorimetry at rest (20.3–22.5 kcals) or
during LPA (33.1–36.9 kcals), MPA (50.5–57.3 kcals), and VPA
(93.8–99.3 kcals). Similar to the MAPE results, however, the
greatest overlap between smartwatches’90% CIs and indirect
calorimetry’s equivalency region was observed during the LPA
and VPA sessions (see Table 4, Figures 2and 4). Specifically, the
MB, TT, and AW possessed 90% CIs which overlapped with
indirect calorimetry’s equivalency region during the LPA session
while all smartwatches achieved some overlap during the VPA
session. Notably, only the FS demonstrated any overlap with
indirect calorimetry’s equivalency region during the resting session
whereas no smartwatch demonstrated any overlap during the MPA
session. Lastly, Table 5presents CVs. This metric indicated that EE
estimate variation attributable to the use of different manufacturer’s
devices was highest at rest (53.7 ± 12.6%), but incrementally
decreased as PA intensity increased (LPA: 31.1 ± 10.5%; MPA:
18.3 ± 8.9%; and VPA: 16.9 ± 8.0%).
Discussion
The present study examined the accuracy of four popular smart-
watches’EE estimates against indirect calorimetry at rest and at
different PA intensities. This comparison was significant given the
fact that few previous investigations have examined the accuracy
of multiple smartwatches’EE estimates to that of ‘gold standard’
indirect calorimetry measurements, with most previous studies
having only validated PA duration and step estimates made by
different models of the Fitbit and Jawbone in comparison to
research-grade accelerometers like the ActiGraph.
Our data suggested the MB and AW possess the greatest EE
estimate accuracy—particularly during LPA and VPA. Notably,
despite the fact that all smartwatches demonstrated some EE
Table 1 Descriptive Statistics for Smartwatch Energy Expenditure and Indirect Calorimetry*
Microsoft Band Fitbit TomTom Apple Watch Indirect Calorimetry
M(SD) M(SD) M(SD) M(SD) M(SD)
Resting 16.7 (3.6) 18.4 (8.2) 33.4 (23.6) 36.3 (7.7) 21.4 (3.2)
Light Physical Activity 38.8 (13.0) 55.9 (16.0) 34.0 (11.8) 36.1 (9.8) 35.0 (5.4)
Moderate Physical Activity 86.7 (14.7) 90.6 (19.8) 82.5 (28.1) 79.9 (16.3) 53.9 (10.0)
Vigorous Physical Activity 102.2 (27.9) 94.4 (25.3) 95.7 (35.1) 88.0 (27.0) 96.5 (7.9)
*M ±SD total kilocalories burned during each 10-minute exercise session.
Table 2 Pearson Correlations Between Smartwatch Energy Expenditure and Indirect Calorimetry at Different PA
Intensities
#
Indirect Calorimetry vs.
Microsoft Band Fitbit TomTom Apple Watch
Resting 0.02 0.21 0.57** 0.06
Light Physical Activity 0.43* 0.14 −0.19 0.37
Moderate Physical Activity 0.13 0.26 0.12 0.43*
Vigorous Physical Activity −0.03 0.25 −0.03 −0.09
#
Energy expenditure unit is kilocalories, with the correlations reflective of this metric; *Indicates significant correlation at p<.05 level; **Indicates significant correlation at
p<0.01 level.
Table 3 Mean Absolute Percent Error for Each Smartwatch’s Energy Expenditure Measurement at Each Physical
Activity Intensity Versus Indirect Calorimetry*
Indirect Calorimetry vs.
Microsoft Band Fitbit TomTom Apple Watch
Overall MAPE by
PA Intensity
M(SD) M(SD) M(SD) M(SD) M(SD)
Resting 23.6 (15.6) 31.8 (30.2) 83.3 (66.4) 73.0 (44.8) 52.9 (22.0)
Light Physical Activity 23.3 (23.9) 64.9 (44.1) 27.8 (26.3) 18.9 (18.8) 33.7 (16.4)
Moderate Physical Activity 69.2 (44.3) 73.3 (43.7) 64.2 (54.0) 54.5 (35.3) 65.3 (38.1)
Vigorous Physical Activity 25.6 (18.5) 21.0 (14.7) 28.8 (23.3) 22.8 (21.1) 24.6 (15.3)
Overall MAPE by Smartwatch 35.4 (12.1) 47.7 (21.5) 51.0 (22.4) 42.3 (14.1)
*Mean absolute percent error ± standard deviation for total kilocalories burned during each 10-minute exercise session.
(Ahead of Print)
4Pope et al.
estimate inaccuracies, these inaccuracies are congruent with past
studies assessing various smartwatches’capability to provide accu-
rate EE estimates (Alharbi, Bauman, Neubeck, & Gallagher, 2016;
Bai et al., 2016;Ferguson et al., 2015;Kenney et al., 2015;Sasaki
et al., 2015). For example, Bai et al. (2016) suggested the MAPEs for
four smartwatch’s (Fitbit Flex, Jawbone Up24, Nike Fuel Band SE,
Misfit Shine) EE estimate accuracy during aerobic activity to vary
between approximately 18–60%—congruentwiththecurrentin-
vestigation’s mean MAPE values for all smartwatches during the
Table 4 90% Confidence Intervals for Energy Expen-
diture Measurements Made by Each Smartwatch and
Indirect Calorimetry at Each Physical Activity Intensity
Kilocalories 90% CI
M(LL, UL)
Resting Session
Indirect Calorimetry 21.4
Q4(20.3, 22.5)
Microsoft Band 16.7 (15.5, 18.0)
Fitbit 18.4 (15.6, 21.2)
TomTom 33.4 (25.4, 41.5)
Apple Watch 36.3 (33.7, 38.9)
LPA Session
Indirect Calorimetry 35.0 (33.1, 36.9)
Microsoft Band 38.8 (34.4, 43.3)
Fitbit 55.9 (50.5, 61.4)
TomTom 34.0 (30.0, 38.1)
Apple Watch 36.1 (32.7, 39.4)
MPA Session
Indirect Calorimetry 53.9 (50.5, 57.3)
Microsoft Band 86.7 (81.7, 91.7)
Fitbit 90.6 (83.8, 97.4)
TomTom 82.5 (72.9, 92.1)
Apple Watch 79.9 (74.3, 85.5)
VPA Session
Indirect Calorimetry 96.5 (93.8, 99.3)
Microsoft Band 102.2 (92.6, 111.7)
Fitbit 94.4 (85.6, 103.2)
TomTom 95.7 (83.7, 107.7)
Apple Watch 88.0 (78.7, 97.2)
Abbreviations: CI =Confidence interval; LL =Lower limit for 90% confidence
interval; UL =Upper limit for 90% confidence interval.
Figure 1 —Comparisons of Smartwatches vs. Indirect Calorimetry at
Rest.
Figure 2 —Comparisons of Smartwatches vs. Indirect Calorimetry
during light physical activity.
Figure 3 —Comparisons of Smartwatches vs. Indirect Calorimetry
during moderate physical activity.
Figure 4 —Comparisons of Smartwatches vs. Indirect Calorimetry
during vigorous physical activity.
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 5
resting, LPA, and VPA conditions. Moreover, Lee, Kim, and Welk
(2014)confirmed the Fitbit Zip and Fitbit One accurately estimate
EE in free-living conditions (mean overall MAPEs =10.1% and
10.4%, respectively), with the other smartwatches tested (Jawbone
Up, Directlife, Nike Fuel Band, Basis Band) possessing a MAPE
range of 12.2–23.5%. Therefore, although the present study did not
observe the MAPE values for the MB and AW to be as low as that
observed by Lee et al. (2014) for that study’s two Fitbit devices, the
fact that the MB and AW demonstrated relative accuracy versus
similar literature suggests two additional smartwatch options may be
considered by individuals desiring a wearable device to estimate EE.
Further, the MB and AW’s relative EE estimation accuracy during
LPA is particularly promisinggiven the fact that most individuals are
capable of being active at this PA intensity, with a growing body of
literaturehighlighting the health-promoting benefits of LPA (Powell,
Paluch, & Blair, 2011;U.S. Department of Health and Human
Services, 2018). Therefore, consumers and health professionals
might be able to use the MB and AW to develop PA programs
which focus on higher LPA incorporation among previously seden-
tary cohorts. Nonetheless, even the MB and AW demonstrated some
EE estimate inaccuracies, which suggests that these two devices’use
within health programs should still take this error into account—a
topic discussed further below.
It is also noteworthy that the accuracy of smartwatch EE
estimates decreased (i.e., mean differences between smartwatch
EE and indirect calorimetry EE values increased) as PA intensity
increased up to the level of MPA, with the greatest smartwatch EE
overestimation observed during the MPA session. This observation
demonstrates majority alignment with literature examining smart-
watch EE estimate accuracy at different PA intensity levels (Bai
et al., 2016;Diaz et al., 2015). For example, Diaz and colleagues
(2015) examined smartwatch accuracy at different PA intensities
and indicated smartwatches overestimated EE by 52.4% during
moderate walking and 33.3% during brisk walking. These re-
searchers’observation of greater smartwatch EE estimate accuracy
during the highest walking intensity session, but less accuracy at
lower walking intensities, is congruent with the current study’s
observation of increased accuracy during VPA, but decreased
smartwatch EE estimate accuracy as PA intensity increased to the
level of MPA. Bai and associates (2016) also made observations
similar to the current study. Indeed, these researchers observed
smartwatches to generally overestimate EE during MPA. Given the
observations of prior literature and the present study, speculation is
warranted as to the possible explanations for why smartwatch EE
estimates were quite accurate during VPA despite accuracy becom-
ing worse as participants increased PA intensity from LPA to MPA
—with the largest inaccuracies during MPA.
The most plausible explanation lies in the difference in how
EE data is calculated by a smartwatch versus being measured by
indirect calorimetry. Specifically, a smartwatch uses proprietary
algorithms to combine the user’s demographic and anthropometric
data with bodily movement data determined via an accelerometer
to estimate EE (Fitbit, 2016;TomTom, 2017). Indirect calorimetry,
on the other hand, measures the respiratory gas exchange rates of
oxygen and carbon dioxide as the participant breathes into the mask
during exercise (Branson & Johannigman, 2004;Holdy, 2004;
Kenney et al., 2015c). Thus, when the participants were progres-
sing from LPA to MPA, the body may have experienced slight
increases in physiological demand but marked increases in bodily
movement. As smartwatches estimate EE based largely upon
bodily movement, it may be that the large changes in bodily
movement observed as PA intensity increased led to systematic
overestimation of EE by smartwatches versus the highly accurate
indirect calorimetry which measures actual physiological demand
via gas analysis. This explanation appears more plausible, too,
when considering that during VPA smartwatch EE estimates from
all devices were found to be most accurate compared to indirect
calorimetry (see MAPE and equivalence testing results)—aPA
intensity requiring an even greater amount of bodily movement
and physiological demand than observed during MPA. Indeed,
great amounts of bodily movement would have resulted in an
increased physiological demand (e.g., increased need for oxygen
and nutrients to be delivered to muscles/removal of carbon dioxide
and other metabolic waste products—all processes which are
facilitated via increased ventilation) and subsequently higher indi-
rect calorimetry EE measurements. As a final point, more research
is also warranted regarding smartwatch EE estimate inaccuracy at
rest given the continued calls for the ability to accurately track and
modify sedentary behavior (Lewis, Napolitano, Buman, Williams,
& Nigg, 2017;U.S. Department of Health and Human Services,
2018). Undoubtedly, the high MAPE values and large variation
in EE estimates between different manufacturer’s smartwatch EE
estimates during the resting condition suggests improvements are
necessary if health professionals are to develop sedentary behavior
reduction interventions.
Smartwatches’capability to provide EE estimates have
increased interest among health professionals regarding utilizing
these devices to assist with the development and implementation of
personalized health behavior change programs among clients or
patients. Yet, the present study’s observations suggested that while
smartwatches may demonstrate relative accuracy at certain PA
intensities, no smartwatch provided EE estimates within the EE
equivalency regions designated by indirect calorimetry—even
under standardized, highly controlled laboratory-based conditions.
Aside from how these inaccuracies affect consumers’use of
smartwatch EE estimates, these inaccuracies render problematic
the use of patient/client smartwatch EE estimates collected under
free-living (i.e., less standardized conditions than the present
study) by health professionals when developing the health behavior
change programs. Healthcare is experiencing a paradigm shift
from reactive treatment (i.e., treating diseases/conditions following
onset) to preventive/proactive treatment (i.e., treating diseases/
conditions prior to onset or in the early stages of development)
(Flores et al., 2013;Hood et al., 2012). Coinciding with this
paradigm shift has been the previously mentioned idea of “systems
medicine”and the development of a healthcare model which is
(a) predictive: using novel technology like smartwatches to track
health behaviors/indices (e.g., PA, sedentary behavior, EE, etc.)
may facilitate subsequent correlation of these health behaviors/
indices with biomarkers (e.g., blood lipid levels, blood sugar), with
disease risk able to be discerned thereafter; (b) preventive: health
Table 5 CVs for Smartwatch Energy Expenditure at
Different Physical Activity Intensities
CV*
M(SD)
Resting 53.7 (12.6%)
Light Physical Activity 31.1 (10.5%)
Moderate Physical Activity 18.3 (8.9%)
Vigorous Physical Activity 16.9 (8.0%)
Note.CV=coefficient of variation. *CVs are percentages.
(Ahead of Print)
6Pope et al.
behavior change programs can be developed based upon a patient’s
health behaviors to improve the patient’s participation in health
behaviors conducive to better health and the prevention/attenuation
of disease; (c) personalized: these health behavior change programs
can be personalized to the patient’s unique physical activity and/or
dietary preferences which may improve program adherence and
effectiveness; and (d) participatory: providing health education to
patients via web-based platforms may further improve patients’
ability to engage in proper health behaviors in the long-term
(i.e., after cessation of the formal health behavior change program)
through promotion of increased health literacy.
Smartwatch EE overestimation is particularly detrimental to
smartwatch use within a systems medicine framework as overesti-
mation may diminish the effectiveness of weight loss programs
developed based upon smartwatch EE values. For instance, in-
dividuals may be led to believe they need to consume more kcals
than needed based upon the current inaccuracies observed for
the current study’s smartwatches—particularly during MPA. For
instance, an individual briskly walking for 30 minutes (i.e., MPA)
may have an actual EE of 200 kcals. Yet, even the most accurate
watch observed during MPA in the current study (i.e., the AW)
could register an EE estimate of 309 kcals during this 30-minute
walking session, based upon the AW’s MPA MAPE results of
±54.5% and the fact that all smartwatches overestimated EE during
MPA. This, again, is not ideal within a systems medicine framework
and so caution is urged among health professionals using smart-
watches to develop health behavior change programs for patients/
clients. More broadly, these observations suggest more cross-
collaboration should be implemented between researchers and
smartwatch manufacturers to improve the algorithms used in smart-
watch EE estimation.
The present study has merits in that it was (1) conducted in a
highly controlled laboratory setting, thus limiting many confound-
ing variables (e.g., different wear times/locations/PA modality
choices) which might have affected the analyses; (2) assessed
EE at four different PA intensities; (3) examined smartwatch
accuracy using equivalency testing; and (4) used indirect calorim-
etry as the criterion measure—an assessment method considered
the ‘gold standard’when assessing EE during aerobic exercise
(Kenney et al., 2015c). However, several limitations in the present
study should be noted. First, all study participants were healthy
young adults (i.e., a homogenous sample). Whether smartwatches’
EE estimates are accurate in other populations, particularly clinical
populations, remains unanswered. Second, the sample size was
relatively small. Notably, while the use of indirect calorimetry is a
strength of the current study, connecting each participant to the
metabolic cart for an 80-minute study session was intricate and
time-consuming—limiting the number of participants tested and
precluding comparisons of how sex and BMI differences may
influence smartwatch EE estimates. Yet, the researchers felt the
current study’s sample size to be adequate as the sample size was
congruent with the most recent smartwatch validation studies
conducted (Diaz et al., 2015;Ferguson et al., 2015;Fokkema
et al., 2017)—most of which did not employ indirect calorimetry as
the criterion measure. Third, this study only assessed participants’
EE while neglecting other relevant health metric data output. For
example, heart rate data accuracy might also be examined given
that heart rate is often used by health professionals to facilitate
individuals’participation in PA intensities necessary to promote
improved health outcomes like increased cardiovascular fitness and
aerobic capacity (Kenney, Wilmore, & Costill, 2015a). Fourth, the
exercise tests were conducted solely on a treadmill. The last
limitation is noteworthy as the exclusive use of this PA modality
limits the current study’s generalizability to other PA modalities
that may use different proportions of muscle mass (e.g., biking),
thus influencing EE values. Moreover, other PA modalities may
have differing degrees of upper body motion, thus contributing to
greater or lesser degrees of motion artifact which some researchers
have speculated might affect smartwatch EE calculations (Lee &
Gorelick, 2011). Finally, although unlikely, placing two smart-
watches on each wrist may have biased smartwatch EE measure-
ments. It must be remembered, however, that while the FS and
MB were always placed on the left wrist and AW and TT on the
right, which device was distal and which device was proximal
was randomized. Moreover, smartwatch placement, no matter
distal or proximal, was as close to manufacturer specifications as
possible. Therefore, future studies would benefit from larger and
more diverse samples and the assessment of smartwatch EE and/or
heart rate data accuracy during different PA modalities. These
studies may also assess EE estimate inter-device reliability when
employing multiple smartwatches from the same manufacturer
to evaluate the device-dependency of EE estimations at different
PA intensities.
Conclusion
Wearable technology devices like smartwatches are becoming
widely used by consumers, in addition to health professionals, for
health promotion. Therefore, establishing smartwatch data accuracy
is paramount. Indeed, greater smartwatch data accuracy will allow
consumers and, importantly, health professionals to leverage these
devices to track health metrics such as EE and PA—subsequently
developing highly personalized health behavior change programs to
improve health and prevent non-communicable diseases (Flores
et al., 2013;Hood et al., 2012). This study indicated the MB and
AW to provide the most accurate EE estimates overall—particularly
during LPA and VPA. Notably, however, the accuracy of all
smartwatches decreased as PA intensity increased, with the most
pronounced inaccuracies during MPA. These observations suggest
a prudent approach should be taken by consumers and health
professionals when interpreting smartwatch EE estimates—
particularly when one is engaging in MPA. Similarly, smartwatch
use in the development and implementation of PA and dietary
behavior change programs by health professionals may be cau-
tioned until health professionals can confirm the health metric data
accuracy these devices provide. In the future, researchers may work
alongside smartwatch manufacturers to ensure increased smart-
watch accuracy through the testing and manipulation of smartwatch
health metric data algorithms.
Acknowledgments
This research did not receive any specific grant from funding agencies in
the public, commercial, or not-for-profit sectors. While conducting this
study, the first author played a large role in data analysis and writing the
manuscript. The second author played a role in data sorting and editing
the manuscript. The third author played a role in data collection and editing
the manuscript. The fourth author played a role in data collection and
editing the manuscript. The fifth played a role in developing the idea,
overseeing data collection/analysis, and writing the manuscript. No finan-
cial disclosures were reported by the authors of this paper. The authors
have no conflicts of interest to disclose in relation to the current research.
The results of this study are presented clearly, honestly, and without
fabrication, falsification, or inappropriate data manipulation.
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 7
References
Q5Alharbi, M., Bauman, A., Neubeck, L., & Gallagher, R. (2016). Validation
of the fitbit-flex as a measure of free-living physical activity in
a community-based phase III cardiac rehabilitation population.
European Journal of Preventive Cardiology, 23(14), 1476–1485.
PubMed ID: 26907794 doi:10.1177/2047487316634883
Bai, Y., Welk, G., Nam, Y., Lee, J., Lee, J.-M., Kim, Y., ::: Dixon, P.
(2016). Comparison of consumer and research monitors under
semistructured settings. Medicine & Science in Sports & Exercise,
48(1), 151–158. PubMed ID: 26154336 doi:10.1249/MSS.
0000000000000727
Bailey, T., Jones, H., Gregson, W., Atkinson, G., Cable, N., & Thijssen, D.
(2012). Effect of ischemic preconditioning on lactate accumulation
and running performance. Medicine & Science in Sports & Exercise,
44(11), 2084–2089. PubMed ID: 22843115 doi:10.1249/MSS.
0b013e318262cb17
Branson, R., & Johannigman, J. (2004). The measurement of energy
expenditure. Nutrition in Clinical Practice, 19, 622–636. PubMed
ID: 16215161 doi:10.1177/0115426504019006622
Bunn, J., Navalta, J., Fountaine, C., & Reece, J. (2018). Current state of
commercial wearable technology in physical activity monitoring
2015–2017. International Journal of Exercise Science, 11(7), 503–
515. PubMed ID: 29541338
Case, M., Burwick, H., Volpp, K., & Patel, M. (2015). Accuracy of
smartphone applications and wearable devices for tracking physical
activity data. Journal of the American Medical Association, 313(6),
625–626. PubMed ID: 25668268 doi:10.1001/jama.2014.17841
Cockcroft, E., Williams, C., Tomlinson, O., Vlachopoulos, D., Jackman,
S., Armstrong, N., & Barker, A. (2015). High intensity interval
exercise is an effective alternative to moderate intensity exercise
for improving glucose tolerance and insulin sensitivity in adolescent
boys. Journal of Science and Medicine in Sport, 18(6), 720–724.
PubMed ID: 25459232 doi:10.1016/j.jsams.2014.10.001
Diaz, K., Krupka, D., Chang, M., Peacock, J., Ma, Y., Goldsmith, J., :::
Davidson, K. (2015). Fitbit: an accurate and reliable device for
wireless physical activity tracking. International Journal of Cardiol-
ogy, 185, 138–140. PubMed ID: 25795203 doi:10.1016/j.ijcard.
2015.03.038
Dixon, P., Saint-Maurice, P., Kim, Y., Hibbing, P., & Welk, G. (2018). A
primer on the use of equivalence testing for evaluating measurement
agreement. Medicine & Science in Sports & Exercise, 50(4), 837–
845. PubMed ID: 29135817 doi:10.1249/MSS.0000000000001481
Driller, M., McQuillan, J., & O’Donnell, S. (2016). Inter-device reliability
of an automatic-scoring actigraph for measuring sleep in healthy
adults. Sleep Science, 9, 198–201. PubMed ID: 28123660 doi:10.
1016/j.slsci.2016.08.003
Evenson, K., Goto, M., & Furberg, R. (2015). Systematic review of the
validity and reliability of consumer-wearable activity trackers. Inter-
national Journal of Behavioral Nutrition, 12, 159. doi:10.1186/
s12966-015-0314-1
Ferguson, T., Rowlands, A., Olds, T., & Maher, C. (2015). The validity of
consumer-level, activity monitors in healthy adults worn in free-
living conditions: A cross-sectional study. International Journal of
Behavioral Nutrition and Physical Activity, 12, 42. PubMed ID:
25890168 doi:10.1186/s12966-015-0201-9
Fitbit. (2016). How does fitbit estimate how many calories I’ve burned.
Retrieved from https://help.fitbit.com/articles/en_US/Help_article/1381
Flores, M., Glusman, G., Brogaard, K., Price, N., & Hood, L. (2013). P4
medicine: how systems medicine will transform the healthcare sector
and society. Personalized Medicine, 10(6), 565–576. PubMed ID:
25342952 doi:10.2217/pme.13.57
Fokkema, T., Kooiman, T., Krijnen, W., Van Der Schans, C., & De Groot,
M. (2017). Reliability and validity of ten consumer activity trackers
depend on walking speed. Medicine & Science in Sports & Exercise,
49(4), 793–800. PubMed ID: 28319983 doi:10.1249/MSS.
0000000000001146
Goto, C., Nishioka, K., Umemura, T., Jitsuiki, D., Sakagutchi, A.,
Kawamura, M., ::: Higashi, Y. (2007). Acute moderate-intensity
exercise induces vasodilation through an increase in nitric oxide
bioavailability in humans. American Journal of Hypertension, 20,
825–830. PubMed ID: 17679027 doi:10.1016/j.amjhyper.2007.
02.014
Harriss, D., Macsween, A., & Atkinson, G. (2017). Standards for ethics
in sport and exercise science research: 2018 update. International
Journal of Sports Medicine, 38, 1126–1131. PubMed ID: 29258155
doi:10.1055/s-0043-124001
Holdy, K. (2004). Monitoring energy metabolism with indirect calorime-
try: Instruments, interpretation, and clinical application. Nutrition in
Clinical Practice, 19, 447–454. PubMed ID: 16215138 doi:10.1177/
0115426504019005447
Hood, L., Balling, R., & Auffray, C. (2012). Revolutioning medicine in the
21st century through systems approaches. Biotechnology Journal, 7,
992–1001. PubMed ID: 22815171 doi:10.1002/biot.201100306
Hopkins, W. (2000). Measures of reliability in sports medicine and
science. Sports Medicine, 30(1), 1–15. PubMed ID: 10907753 doi:10.
2165/00007256-200030010-00001
Kellar, S., & Kelvin, E. (2012). In S. Kellar& E. Kelvin (Eds.), Munro’s
statistical methods for health care research (6th ed.). Philadelphia,
PA: Lippincott Williams & Wilkins. Q6
Kenney, E., Gortmaker, S., Evenson, K., Goto, M., & Furberg, R. (2015).
Systematic review of the validity and reliability of consumer-
wearable activity trackers. International Journal of Behavioral
Medicine and Physical Activity, 12(1), 5–10.
Kenney, W., Wilmore, J., & Costill, D. (2015a). Adaptations to aerobic
and anaerobic training. In W. Kenney, J. Wilmore, & D. Costill
(Eds.), Physiology of sport and exercise (6th ed., pp. 261–291).
Champaign, IL: Human Kinetics.
Kenney, W., Wilmore, J., & Costill, D. (2015b). Body composition and
nutrition for sport. In W. Kenney, J. Wilmore, & D. Costill (Eds.),
Physiology of sport and exercise (6th ed., pp. 371–405). Champaign,
IL: Human Kinetics.
Kenney, W., Wilmore, J., & Costill, D. (2015c). Energy expenditure and
fatigue. In W. Kenney, J. Wilmore, & D. Costill (Eds.), Physiology
of sport and exercise (6th ed., pp. 119–150). Champaign, IL: Human
Kinetics.
Kim, Y., & Welk, G. (2015). Criterion validity of competing
accelerometry-based activity monitoring devices. Medicine & Science
in Sports & Exercise, 47(11), 2456–2463. PubMed ID: 25910051
doi:10.1249/MSS.0000000000000691
Lee, C., & Gorelick, M. (2011). Validity of the smarthealth watch to
measure heart rate and energy expenditure during rest and exercise.
Measurement in Physical Education and Exercise Science, 15(1),
18–25. doi:10.1080/1091367X.2011.539089
Lee, C., Gorelick, M., & Mendoza, A. (2011). Accuracy of an infrared
LED device to measure heart rate and energy expenditure during rest
and exercise. Journal of Sports Science, 29(15), 1645–1653. doi:10.
1080/02640414.2011.609899
Lee, J., Kim, Y., & Welk, G. (2014). Validity of consumer-based physical
activity monitors. Medicine & Science in Sports & Exercise, 46(9), 1840–
1848. PubMed ID: 24777201 doi:10.1249/MSS.0000000000000287
Lewis, B., Napolitano, M., Buman, M., Williams, D., & Nigg, C. (2017).
Future directions in physical activity intervention research: Expand-
ing our focus to sedentary behaviors, technology, and dissemination.
(Ahead of Print)
8Pope et al.
Journal of Behavioral Medicine, 40(1), 112–126. PubMed ID:
27722907 doi:10.1007/s10865-016-9797-8
Peters, B., Heelan, K., & Abbey, B. (2013). Validation of omron
pedometers using MTI accelerometers for use with children.
International Journal of Exercise Science, 6(2), 106–113.
Pope, Z., & Gao, Z. (2017). Mobile device apps in enhancing physical
activity. In Z. Gao (Ed.), Technology in physical activity and promo-
tion (pp. 106–128). London, UK: Routledge Publisher.
Powell, K., Paluch, A., & Blair, S. (2011). Physical activity for health:
What kind? how much? how intense? on top of what? In J. Fielding,
R. Brownson, & L. Green (Eds.), Annual review of public health
(Vol. 32, pp. 349–365). Palo Alto, CA: Annual Reviews.
Ren, Q., Li, Z., & Liang, G. (2017). Comparison of active and passive
movement on treadmill in healthy individuals. Space Medicine &
Medical Engineering, 30(3), 185–190.
Sasaki, J., Hickey, A., Mavilia, M., Tedesco, J., John, D., Keadle, S., &
Freedson, P. (2015). Validation of the fitbit wireless activity tracker
for prediction of energy expenditure. Journal of Physical Activity
and Health, 12(2), 149–154. PubMed ID: 24770438 doi:10.1123/
jpah.2012-0495
Thomas, J., Nelson, J., & Silverman, S. (2011). Relationships among
variables. In J. Thomas, J. Nelson, & S. Silverman (Eds.), Research
methods in physical activity (pp. 125–144). Champaign, IL: Human
Kinetics.
TomTom. (2017). How calories are estimated on your watch. Retrieved
from http://uk.support.tomtom.com/app/answers/detail/a_id/19148/
~/how-calories-are-estimated-on-your-watch
U.S. Department of Health and Human Services. (2018). Physical activity
guidelines for Americans (2nd ed.). Washington, DC: Author.
World Medical Association. (2018). World medical association declara-
tion of Helsinki: Ethical principles for medical research involving
human subjects. Retrieved from https://www.wma.net/policies-post/
wma-declaration-of-helsinki-ethical-principles-for-medical-research-
involving-human-subjects/
(Ahead of Print)
Smartwatches in Assessing Energy Expenditure 9
Queries
Q1. Please ensure author information is listed correctly here and within the byline.
Q2. Please indicate the name of the institution
Q3. Please indicate the name of the university
Q4. Please provide a table footnote indicating what italics represents, or remove the italics
Q5. Please provide in-text for the following references: “Hopkins (2000)”,“Kellar and Kelvin. (2012)”, and “Lee et al. (2011).”
Q6. Please provide chapter title and page range for the reference “Keller and Kelvin (2012).”