ArticlePDF Available

Abstract

Purpose: The purpose of this study was to assess the reliability of power output measurements of a Wahoo KICKR Power Trainer (KICKR) on two separate occasions, separated by fourteen months of regular use (~1 h per week). Methods: Using the KICKR to set power outputs, powers of 100-600W in increments of 50W were assessed at cadences of 80, 90 and 100rev.min(-1) which were controlled and validated by a dynamic calibration rig (CALRIG). Results: A small ratio bias of 1.002 (95%rLoA: 0.992-1.011) was observed over 100-600W at 80-100rev.min(-1) between Trial 1 and Trial 2. Similar ratio biases with acceptable limits of agreement were observed at 80rev.min(-1) (1.003 (95% 0.987-1.018)), 90rev.min(-1) (1.000 (95%rLoA: 0.996-1.005)) and 100rev.min(-1) (1.002 (95%rLoA: 0.997-1.007)). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) for mean power (W) between trials was 1.00 (95%CI: 1.00-1.00) with a typical error (TE) of 3.1W and 1.6% observed between Trial 1 and Trial 2. Conclusion: When assessed at two separate time points fourteen months apart, the KICKR has acceptable reliability for combined power outputs of 100-600W at 80-100rev.min(-1), reporting overall small ratio biases with acceptable limits of agreement and low TE. Coaches and sports scientists should feel confident in the measured power output by the KICKR over an extended period of time when performing laboratory training and performance assessments.
Abstract
Purpose: The purpose of this study was to assess the reliability of power output measurements
of a Wahoo KICKR Power Trainer (KICKR) on two separate occasions, separated by fourteen
months of regular use (~1 h per week). Methods: Using the KICKR to set power outputs,
powers of 100-600W in increments of 50W were assessed at cadences of 80, 90 and
100rev.min-1 which were controlled and validated by a dynamic calibration rig (CALRIG).
Results: A small ratio bias of 1.002 (95%rLoA: 0.992-1.011) was observed over 100-600W at
80-100rev.min-1 between Trial 1 and Trial 2. Similar ratio biases with acceptable limits of
agreement were observed at 80rev.min-1 (1.003 (95% 0.987-1.018)), 90rev.min-1 (1.000
(95%rLoA: 0.996-1.005)) and 100rev.min-1 (1.002 (95%rLoA: 0.997-1.007)). Intraclass
correlation coefficients (ICC) with 95% confidence intervals (CI) for mean power (W) between
trials was 1.00 (95%CI: 1.00-1.00) with a typical error (TE) of 3.1W and 1.6% observed
between Trial 1 and Trial 2. Conclusion: When assessed at two separate time points fourteen
months apart, the KICKR has acceptable reliability for combined power outputs of 100-600W
at 80-100rev.min-1, reporting overall small ratio biases with acceptable limits of agreement and
low TE. Coaches and sports scientists should feel confident in the measured power output by
the KICKR over an extended period of time when performing laboratory training and
performance assessments.
Keywords: reproducibility, power, ergometry, training
... Since development of the laboratory-based 3MT in 2006, the availability of accurate and reliable power-measuring devices for use during indoor cycle training has increased dramatically (Hoon et al. 2016;Zadow et al. 2016Zadow et al. , 2018. As the 3MT only requires accurate measurement of power output and time, it is theoretically possible that the 3MT could be used by athletes in remote settings using personal equipment without supervision for quantification of MMSS and W′. ...
... The sub-group of participants completing the validity phase of the study used either a Wahoo Kickr (N = 9, Wahoo Fitness, Atlanta, USA) or Tacx Neo 2 T (N = 1, Garmin ® , KS, USA) to measure power output. The reliability and validity of the Wahoo Kickr has been established (Hoon et al. 2016;Zadow et al. 2016Zadow et al. , 2018, whereas the Tacx Neo 2 T has to our knowledge not been validated in research; however, the data from this individual participant support credible reliability within this study (within-subject coefficient of variation for V O 2 during 5-8 min of the standardised warm-up in the validity trials, 2.5%). ...
Article
Full-text available
Purpose: The three-minute all-out test (3MT), when performed on a laboratory ergometer in a linear mode, can be used to estimate the heavy-severe-intensity transition, or maximum metabolic steady state (MMSS), using the end-test power output. As the 3MT only requires accurate measurement of power output and time, it is possible the 3MT could be used in remote settings using personal equipment without supervision for quantification of MMSS. Methods: The aim of the present investigation was to determine the reliability and validity of remotely performed 3MTs (3MTR) for estimation of MMSS. Accordingly, 53 trained cyclists and triathletes were recruited to perform one familiarisation and two experimental 3MTR trials to determine its reliability. A sub-group (N = 10) was recruited to perform three-to-five 30 min laboratory-based constant-work rate trials following completion of one familiarisation and two experimental 3MTR trials. Expired gases were collected throughout constant-work rate trials and blood lactate concentration was measured at 10 and 30 min to determine the highest power output at which steady-state [Formula: see text] (MMSS-[Formula: see text]) and blood lactate (MMSS-[La-]) were achieved. Results: The 3MTR end-test power (EPremote) was reliable (coefficient of variation, 4.5% [95% confidence limits, 3.7, 5.5%]), but overestimated MMSS (EPremote, 283 ± 51 W; MMSS-[Formula: see text], 241 ± 46 W, P = 0.0003; MMSS-[La-], 237 ± 47 W, P = 0.0003). This may have been due to failure to deplete the finite work capacity above MMSS during the 3MTR. Conclusion: These results suggest that the 3MTR should not be used to estimate MMSS in endurance-trained cyclists.
... Portable power meter devices overcome important drawbacks of laboratory testing, allowing the use of cyclists' own bicycles, so that decisive metrics such as the crank width (Q-factor), crank length, and geometry-related variables are replicated in the test [3]. Commercial indoor stationary cycle training, cycling treadmills, or rollers are a valid and reliable alternative to recreate outdoor cycling conditions, both for testing [4][5][6] and training [7]. While these tools simulate outdoor cycling, they do not allow recording during real outdoor environments (e.g., missing air drag and downhill sections or increasing dehydration), which may alter the metrics [8,9] and limit to apply the results to real-life situations. ...
Article
Full-text available
This study aimed to examine the validity and reliability of the recently developed Assioma Favero pedals under laboratory cycling conditions. In total, 12 well-trained male cyclists and triath- letes (VO2max = 65.7 ± 8.7 mL·kg−1·min−1) completed five cycling tests including graded exercises tests (GXT) at different cadences (70–100 revolutions per minute, rpm), workloads (100–650 Watts, W), pedaling positions (seated and standing), vibration stress (20–40 Hz), and an 8-s maximal sprint. Tests were completed using a calibrated direct drive indoor trainer for the standing, seated, and vibration GXTs, and a friction belt cycle ergometer for the high-workload step protocol. Power output (PO) and cadence were collected from three different brand, new pedal units against the gold-standard SRM crankset. The three units of the Assioma Favero exhibited very high within-test reliability and an extremely high agreement between 100 and 250 W, compared to the gold standard (Standard Error of Measurement, SEM from 2.3–6.4 W). Greater PO produced a significant underestimating trend (p < 0.05, Effect size, ES ≥ 0.22), with pedals showing systematically lower PO than SRM (1–3%) but producing low bias for all GXT tests and conditions (1.5–7.4 W). Furthermore, vibrations ≥ 30 Hz significantly increased the differences up to 4% (p < 0.05, ES ≥ 0.24), whereas peak and mean PO differed importantly between devices during the sprints (p < 0.03, ES ≥ 0.39). These results demon- strate that the Assioma Favero power meter pedals provide trustworthy PO readings from 100 to 650 W, in either seated or standing positions, with vibrations between 20 and 40 Hz at cadences of 70, 85, and 100 rpm, or even at a free chosen cadence.
... All cyclists used the same bike (2017 Roubaix One. 3 size 56; Fuji, Taichung, Taiwan) mounted on a cycle ergometer (KICKR; Wahoo Fitness, Atlanta, GA) considered to be valid and reliable. 22,23 Saddle position was individually adjusted, and measures were noted for replication. The bike was equipped with a crank-based power meter (SRAM S975; SRM, Jülich, Germany), from which power output and cadence were recorded. ...
Article
Full-text available
Purpose: Maximal oxygen uptake (V˙O2max) is a key determinant of endurance performance. Therefore, devising high-intensity interval training (HIIT) that maximizes stress of the oxygen-transport and -utilization systems may be important to stimulate further adaptation in athletes. The authors compared physiological and perceptual responses elicited by work intervals matched for duration and mean power output but differing in power-output distribution. Methods: Fourteen cyclists (V˙O2max 69.2 [6.6] mL·kg-1·min-1) completed 3 laboratory visits for a performance assessment and 2 HIIT sessions using either varied-intensity or constant-intensity work intervals. Results: Cyclists spent more time at >90%V˙O2max during HIIT with varied-intensity work intervals (410 [207] vs 286 [162] s, P = .02), but there were no differences between sessions in heart-rate- or perceptual-based training-load metrics (all P ≥ .1). When considering individual work intervals, minute ventilation (V˙E) was higher in the varied-intensity mode (F = 8.42, P = .01), but not respiratory frequency, tidal volume, blood lactate concentration [La], ratings of perceived exertion, or cadence (all F ≤ 3.50, ≥ .08). Absolute changes (Δ) between HIIT sessions were calculated per work interval, and Δ total oxygen uptake was moderately associated with ΔV˙E (r = .36, P = .002). Conclusions: In comparison with an HIIT session with constant-intensity work intervals, well-trained cyclists sustain higher fractions of V˙O2max when work intervals involved power-output variations. This effect is partially mediated by an increased oxygen cost of hyperpnea and not associated with a higher [La], perceived exertion, or training-load metrics.
... After conclusion of the resting measure, subjects performed a five-minute, self-selected warm-up on their stationary bike. Athletes performed bicycle VO2max testing with their racing bicycles attached to a stationary ergometer (Zadow et al. 2018). Wattage was controlled via Bluetooth. ...
Thesis
Full-text available
PURPOSE To assess heart rate variability (HRV) as a sensitive marker to exercise training, health and fitness, acute exercise, and prolonged altitude exposure. METHODS 10-minute resting HRV was assessed in cohorts of: Miami University intercollegiate athletes and healthy, active controls; endurance-trained cyclists and healthy, active controls; & novice college-aged trekkers and experienced middle-aged (46±12yrs) Nepali trekking guides. HRV was compared with health biomarkers and between athlete types and controls. Cyclists and controls underwent a maximal exercise bout, and recovery HRV was calculated. HRV was assessed on novice trekkers at 3 altitudes and compared to Nepali guides. RESULTS Athlete type had no effect on HRV, while race did, and HRV correlated with health biomarkers. Cyclists experienced an improved HRV recovery post-exercise that was predicted by higher VO2max compared to healthy controls. Resting HRV decreased from baseline at 1900m altitude, but increased back to baseline at 4500m, where students had twofold higher HRV than experienced Nepali guides. This response was predicted by resting HR collected before ascent at 300m. CONCLUSION HRV was a sensitive marker to exercise training, health and fitness, acute exercise, and prolonged altitude exposure, but further confirmatory and more powerful research is needed.
... Thus, for a cycle trainer or power meter to be useful in a research setting it must have similar qualities of measurement. Different researchers have tested the validity of other mobile ergometers such as Tacx Fortius (Peiffer and Losco, 2011), KICKR Power Trainer (Zadow et al., 2017;Zadow et al., 2016), LeMond Revolution (Novak et al., 2015), and Elite Axiom Powertrain (Bertucci et al., 2005a), as well as other mobile power meters, including PowerTap Hub (Bertucci et al., 2005b;Bouillod et al., 2016;Gardner et al., 2004) and Garmin Vector (Bouillod et al., 2016;Nimmerichter et al., 2017;Novak and Dascombe, 2016). It should be noted that the SRM, as the reference power meter, is also affected by some measurement error. ...
Article
Full-text available
To validate the new PowerTap P1® pedals power meter (PP1), thirty-three cyclists performed 12 randomized and counterbalanced graded exercise tests (100–500 W), at 70, 85 and 100 rev·min-1 cadences, in seated and standing positions. A scientific SRM system and a pair of PP1 pedals continuously recorded cadence and power output data. Significantly lower power output values were detected for the PP1 compared to the SRM for all workloads, cadences, and pedalling conditions (2–10 W, p < 0.05), except for the workloads ranged between 150 W to 350 W at 70 rev·min-1 in seated position (p > 0.05). Strong Spearman’s correlation coefficients were found between the power output values recorded by both power meters in a seated position, independently from the cadence condition (rho ≥ 0.987), although slightly lower concordance was found for the standing position (rho = 0.927). The mean error for power output values were 1.2%, 2.7%, 3.5% for 70, 85 and 100 rev·min-1, respectively. Bland-Altman analysis revealed that PP1 pedals underestimate the power output data obtained by the SRM device in a directly proportional manner to the cyclist’s cadence (from -2.4 W to -7.3 W, rho = 0.999). High absolute reliability values were detected in the PP1 pedals (150–500 W; CV = 2.3%; SEM < 1.0 W). This new portable power meter is a valid and reliable device to measure power output in cyclists and triathletes for the assessment, training and competition using their own bicycle, although caution should be exercised in the interpretation of the results due to the slight power output underestimation of the PP1 pedals when compared to the SRM system and its dependence on both pedalling cadence and cyclist’s position (standing vs. seated).
Article
Full-text available
This study aimed to analyse the reproducibility of mean power output during 20-min cycling time-trials, in a remote home-based setting, using the virtual-reality cycling software, Zwift. Forty-four cyclists (11 women, 33 men; 37 ± 8 years old, 180 ± 8 cm, 80.1 ± 13.2 kg) performed 3 x 20-min time-trials on Zwift, using their own setup. Intra-class correlation coefficient (ICC), coefficient of variation (CV) and typical error (TE) were calculated for the overall sample, split into 4 performance groups based on mean relative power output (25% quartiles) and sex. Mean ICC, TE and CV of mean power output between time-trials were 0.97 [0.95-0.98], 9.36 W [8.02-11.28 W], and 3.7% [3.2-4.5], respectively. Women and men had similar outcomes (ICC: 0.96 [0.89-0.99] vs 0.96 [0.92-0.98]; TE: 8.30 W [6.25-13.10] vs. 9.72 W [8.20-12.23]; CV: 3.8% [2.9-6.1] vs. 3.7% [3.1-4.7], respectively), although cyclists from the first quartile showed a lower CV in comparison to the overall sample (Q1: 2.6% [1.9-4.1] vs. overall: 3.7% [3.2-4.5]). Our results indicate that power output during 20-minute cycling time-trials on Zwift are reproducible and provide sports scientists, coaches and athletes, benchmark values for future interventions in a virtual-reality environment.
Article
Full-text available
A large number of power meters have become commercially available during the last decades to provide power output (PO) measurement. Some of these power meters were evaluated for validity in the literature. This study aimed to perform a review of the available literature on the validity of cycling power meters. PubMed, SPORTDiscus, and Google Scholar have been explored with PRISMA methodology. A total of 74 studies have been extracted for the reviewing process. Validity is a general quality of the measurement determined by the assessment of different metrological properties: Accuracy, sensitivity, repeatability, reproducibility, and robustness. Accuracy was most often studied from the metrological property (74 studies). Reproducibility was the second most studied (40 studies) property. Finally, repeatability, sensitivity, and robustness were considerably less studied with only 7, 5, and 5 studies, respectively. The SRM power meter is the most used as a gold standard in the studies. Moreover, the number of participants was very different among them, from 0 (when using a calibration rig) to 56 participants. The PO tested was up to 1700 W, whereas the pedalling cadence ranged between 40 and 180 rpm, including submaximal and maximal exercises. Other exercise conditions were tested, such as torque, position, temperature, and vibrations. This review provides some caveats and recommendations when testing the validity of a cycling power meter, including all of the metrological properties (accuracy, sensitivity, repeatability, reproducibility, and robustness) and some exercise conditions (PO range, sprint, pedalling cadence, torque, position, participant, temperature, vibration, and field test).
Thesis
Full-text available
The doctoral thesis presented in this document is structured in three different parts. The first part of the work is composed of studies I and II, where the validation work of two different workload cycling tools, “drive indoor trainer Cycleops Hammer” and “PowerTap P1 Pedals Power Meter “, is detailed. In both articles, randomized and counterbalanced incremental workload tests (100-500 W) were performed, at 70, 85 and 100 rev·min-1 cadence, with sitting and standing pedalling in 3 different Hammer unit cadences. Then, the results are compared against the values measured by a professional SRM crankset. In general terms, no significant differences were detected between the Hammer devices and the SRM, while strong intraclass correlation coefficients were observed (≥0.996; p=0.001), with low bias (-5,5 a 3,8), and high values of absolute reproducibility (CV<1,2%, SEM<2,1). The PowerTap P1 pedals showed strong correlation coefficients in a seated position (rho ≥ 0.987). They underestimated the power output obtained in a directly proportional way to the cadence, with an average error of 1.2%, 2.7%, 3.5% for 70, 85 and 100 rev∙min-1. However, they showed high absolute reproducibility values (150-500 W, CV = 2.3%, SEM <1.0W). These results prove that both are valid and reproducible devices to measure the power output in cycling, although caution should be exercised in the interpretation of the results due to the slight underestimation. The second part of the thesis is devoted to the study III, where the time to exhaustion (TTE) at the workloads related to the main events of the aerobic and anaerobic pathway in cycling were analysed in duplicate in a randomized and counterbalanced manner (Lactic anaerobic capacity (WAnTmean), the workload that elicit VO2max -MAP-, Second Ventilatory Threshold (VT2) and at Maximal Lactate Steady State (MLSS). TTE values were 00:28±00:07, 03:27±00:40, 11:03±04:45 and 76:35±12:27 mm:ss, respectively. Moderate between-subject reproducibility values were found (CV=22.2%,19.3%;43.1% and 16.3%), although low within-subject variability was found (CV=7.6%,6.9%;7.0% y 5.4%). According to these results, the %MAP where the physiological events were found seems to be a useful covariable to predict each TTE for training or competing purposes. Finally, in the third part of the work, the results of studies IV y V have been included. The validity of two different methods to estimate the cyclists’ workload at MLSS was evaluated. The first method was a 20 min time trial test (20TT), while the second method was a one-day incremental protocol including 4 steps of 10 minutes (1day_MLSS). The 20TT test absolute reproducibility, performed in duplicate, was very high (CV = -0.3±2.2%, ICC = 0.966, bias = 0.7±6.3 W). 95% of the mean 20TT workload overestimated the MLSS (bias 12.3±6.1W). In contrast, 91% of 20TT showed an accurate prediction of MLSS (bias 1.2±6.1 W), although the regression equation "MLSS (W) = 0.7489 * 20TT (W) + 43.203" showed even a better MLSS estimates (bias 0.1±5.0 W). Related to the 1day_MLSS test, the physiological steady state was determined as the highest workload that could be maintained with a [Lact] rise lower than 1mmol·L-1. No significant differences were detected between the MLSS (247±22 W) and the main construct of the test (DIF_10to10) (245±23 W), where the difference of [Lact] between minute 10 of two consecutive steps were considered, with high correlations (ICC = 0.960), low bias (2.2W), as well as high within-subject reliability (ICC = 0.846, CV = 0.4%, Bias = 2.2±6.4W). Both methods were revealed as valid predictors of the MLSS, significantly reducing the requirements needed to individually determine this specific intensity.
Article
Purpose: To (1) evaluate agreement between the PowerTap P1 (P1) pedals and the Lode Excalibur Sport cycle ergometer, (2) investigate the reliability of the P1 pedals between repeated testing sessions, and (3) compare the reliability and validity of the P1 pedals before (P10) and after (P1100) ∼100 h of use. Methods: Ten participants completed four 5-min submaximal cycling bouts (100, 150, 200, and 250 W), a 2-min time trial, and two 10-s all-out sprints on 2 occasions. This protocol was repeated after 15 mo and ∼100 h of use. Results: Significant differences were seen between the P10 pedals and the Lode Excalibur Sport at 100 W (P = .006), 150 W (P = .006), 200 W (P = .001), and 250 W (P = .006) and during the all-out sprints (P = .020). After ∼100 h of use, the P1100 pedals did not significantly differ from the Lode Excalibur Sport at 100 W (P = .799), 150 W (P = .183), 200 W (P = .289), and 250 W (P = .183), during the 2-min time trial (P = .583), or during the all-out sprints (P = .412). The coefficients of variation for the P10 and P1100 ranged from 0.6% to 1.3% and 0.5% to 2.0%, respectively, during the submaximal cycling bouts. Conclusion: The P1 pedals provide valid data after ∼100 h of laboratory use. Furthermore, the pedals provide reliable data during submaximal cycling, even after prolonged use.
Article
Full-text available
The purpose of the present study was to evaluate the reliability of a laboratory-based 4 km cycling time trial using a Wahoo KICKR Power Trainer. Twelve trained male cyclists (age: 34.0 ± 6.5 years; height: 1.78 ± 0.62 m; training per week: 11.9 ± 2.6 hours) completed three 4 km time trials on the Wahoo KICKR Power Trainer, with each time trial separated by a minimum of two days. During all time trials, mean power (W), cadence (rpm), speed (km.h-1), heart rate (bpm) and total time (s) were recorded with rating of perceived exertion (6-20) collected immediately post time trial. Average Intraclass Correlation Coefficients (ICC) between time trials (2v1, 3v2, 3v1) for power was 0.94 (95%CI: 0.85-0.98), cadence 0.73 (95%CI: 0.46-0.90), speed 0.54 (95%CI: 0.22-0.82), heart rate 0.93 (95%CI: 0.84-0.98) and total time 0.64 (95%CI: 0.34-0.86). Mean reliability expressed as the coefficient of variation (CV) and typical error of measurement over the three time trials was 3.4%, 5.2%, 4.2%, 1.6% and 4.3% for power, cadence, speed, heart rate and total time, respectively. Average power measured during a laboratory-based 4 km cycling time trial is highly reliable in trained cyclists making it a reliable method for monitoring cycling performance., however, caution should be applied when assessing cadence, speed and total time due to the larger typical errors when completed on the Wahoo KICKR Power Trainer.
Article
Full-text available
The purpose of this study was to assess the validity and reliability of the Wattbike cycle ergometer against the SRM Powermeter using a dynamic calibration rig (CALRIG) and trained and untrained human participants. Using the CALRIG power outputs of 50-1 250  W were assessed at cadences of 70 and 90  rev x min(-1). Validity and reliability data were also obtained from 3 repeated trials in both trained and untrained populations. 4 work rates were used during each trial ranging from 50-300  W. CALRIG data demonstrated significant differences (P<0.05) between SRM and Wattbike across the work rates at both cadences. Significant differences existed in recorded power outputs from the SRM and Wattbike during steady state trials (power outputs 50-300  W) in both human populations (156±72  W vs. 153±64  W for SRM and Wattbike respectively; P<0.05). The reliability (CV) of the Wattbike in the untrained population was 6.7% (95%CI 4.8-13.2%) compared to 2.2% with the SRM (95%CI 1.5-4.1%). In the trained population the Wattbike CV was 2.6% (95%CI 1.8-5.1%) compared to 1.1% with the SRM (95%CI 0.7-2.0%). These results suggest that when compared to the SRM, the Wattbike has acceptable accuracy. Reliability data suggest coaches and cyclists may need to use some caution when using the Wattbike at low power outputs in a test-retest setting.
Article
Full-text available
Minimal measurement error (reliability) during the collection of interval- and ratio-type data is critically important to sports medicine research. The main components of measurement error are systematic bias (e.g. general learning or fatigue effects on the tests) and random error due to biological or mechanical variation. Both error components should be meaningfully quantified for the sports physician to relate the described error to judgements regarding 'analytical goals' (the requirements of the measurement tool for effective practical use) rather than the statistical significance of any reliability indicators. Methods based on correlation coefficients and regression provide an indication of 'relative reliability'. Since these methods are highly influenced by the range of measured values, researchers should be cautious in: (i) concluding acceptable relative reliability even if a correlation is above 0.9; (ii) extrapolating the results of a test-retest correlation to a new sample of individuals involved in an experiment; and (iii) comparing test-retest correlations between different reliability studies. Methods used to describe 'absolute reliability' include the standard error of measurements (SEM), coefficient of variation (CV) and limits of agreement (LOA). These statistics are more appropriate for comparing reliability between different measurement tools in different studies. They can be used in multiple retest studies from ANOVA procedures, help predict the magnitude of a 'real' change in individual athletes and be employed to estimate statistical power for a repeated-measures experiment. These methods vary considerably in the way they are calculated and their use also assumes the presence (CV) or absence (SEM) of heteroscedasticity. Most methods of calculating SEM and CV represent approximately 68% of the error that is actually present in the repeated measurements for the 'average' individual in the sample. LOA represent the test-retest differences for 95% of a population. The associated Bland-Altman plot shows the measurement error schematically and helps to identify the presence of heteroscedasticity. If there is evidence of heteroscedasticity or non-normality, one should logarithmically transform the data and quote the bias and random error as ratios. This allows simple comparisons of reliability across different measurement tools. It is recommended that sports clinicians and researchers should cite and interpret a number of statistical methods for assessing reliability. We encourage the inclusion of the LOA method, especially the exploration of heteroscedasticity that is inherent in this analysis. We also stress the importance of relating the results of any reliability statistic to 'analytical goals' in sports medicine.
Article
Full-text available
The reliability of power in tests of physical performance affects the precision of assessment of athletes, patients, clients and study participants. In this meta-analytic review we identify the most reliable measures of power and the factors affecting reliability. Our measures of reliability were the typical (standard) error of measurement expressed as a coefficient of variation (CV) and the percent change in the mean between trials. We meta-analysed these measures for power or work from 101 studies of healthy adults. Measures and tests with the smallest CV in exercise of a given duration include field tests of sprint running (approximately 0.9%), peak power in an incremental test on a treadmill or cycle ergometer (approximately 0.9%), equivalent mean power in a constant-power test lasting 1 minute to 3 hours on a treadmill or cycle ergometer (0.9 to 2.0%), lactate-threshold power (approximately 1.5%), and jump height or distance (approximately 2.0%). The CV for mean power on isokinetic ergometers was relatively large (> 4%). CV were larger for nonathletes versus athletes (1.3 x), female versus male nonathletes (1.4 x), shorter (approximately 1-second) and longer (approximately 1-hour) versus 1-minute tests (< or = 1.6 x), and respiratory- versus ergometer-based measures of power (1.4 to 1.6 x). There was no clear-cut effect of time between trials. The importance of a practice trial was evident in studies with > 2 trials: the CV between the first 2 trials was 1.3 times the CV between subsequent trials; performance also improved by 1.2% between the first 2 trials but by only 0.2% between subsequent trials. These findings should help exercise practitioners and researchers select or design good measures and protocols for tests of physical performance.
Article
Mobile power meters provide a valid means of measuring cyclists’ power output in the field. These field measurements can be performed with very good accuracy and reliability making the power meter a useful tool for monitoring and evaluating training and race demands. This review presents power meter data from a Grand Tour cyclist’s training and racing and explores the inherent complications created by its stochastic nature. Simple summary methods cannot reflect a session’s variable distribution of power output or indicate its likely metabolic stress. Binning power output data, into training zones for example, provides information on the detail but not the length of efforts within a session. An alternative approach is to track changes in cyclists’ modelled training and racing performances. Both critical power and record power profiles have been used for monitoring training-induced changes in this manner. Due to the inadequacy of current methods, the review highlights the need for new methods to be established which quantify the effects of training loads and models their implications for performance.
Purpose: The purpose of this study was to assess the validity of power output settings of the Wahoo KICKR Power Trainer (KICKR) using a dynamic calibration rig (CALRIG) over a range of power outputs and cadences. Methods: Using the KICKR to set power outputs, powers of 100-999W were assessed at cadences (controlled by the CALRIG) of 80, 90, 100, 110 and 120rpm. Results: The KICKR displayed accurate measurements of power between 250-700W at cadences of 80-120rpm with a bias of -1.1% (95%LoA: -3.6-1.4%). A larger mean bias in power were observed across the full range of power tested, 100-999W 4.2% (95%LoA: -20.1-28.6%), due to larger biases between 100-200W and 750-999W (4.5%, 95%LoA:-2.3-11.3% and 13.0%, 95%LoA: -24.4-50.3%), respectively. Conclusion: When compared to a CALRIG, the Wahoo KICKR Power Trainer has acceptable accuracy reporting a small mean bias and narrow limits of agreement in the measurement of power output between 250-700W at cadences of 80-120rpm. Caution should be applied by coaches and sports scientists when using the KICKR at power outputs of <200W and >750W due to the greater variability in recorded power.
Article
In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.
Article
The purpose of this study was to determine the accuracy of the Velotron cycle ergometer and the SRM power meter using a dynamic calibration rig over a range of exercise protocols commonly applied in laboratory settings. These trials included two sustained constant power trials (250 W and 414 W), two incremental power trials and three high-intensity interval power trials. To further compare the two systems, 15 subjects performed three dynamic 30 km performance time trials. The Velotron and SRM displayed accurate measurements of power during both constant power trials (<1% error). However, during high-intensity interval trials the Velotron and SRM were found to be less accurate (3.0%, CI=1.6-4.5% and -2.6%, CI=-3.2--2.0% error, respectively). During the dynamic 30 km time trials, power measured by the Velotron was 3.7+/-1.9% (CI=2.9-4.8%) greater than that measured by the SRM. In conclusion, the accuracy of the Velotron cycle ergometer and the SRM power meter appears to be dependent on the type of test being performed. Furthermore, as each power monitoring system measures power at various positions (i.e. bottom bracket vs. rear wheel), caution should be taken when comparing power across the two systems, particularly when power is variable.
In this study we measured the accuracy of the following types of cycle ergometer against the criterion of a dynamic calibration rig (DCR): 35 friction-braked (Monark), 5 research-grade air-braked (Repco) and 5 electromagnetically braked (2 Siemens, 1 Elema-Schonander, 1 Ergoline, 1 Warren E. Collins). Monark ergometer power outputs over the range 58.9-353.2 W significantly (P < 0.001) underestimated those registered by the DCR with mean accuracies of 91.7-97.8%. The least accurate individual reading for each of the six up-scale (0-353.2 W) power outputs ranged from 81.6 to 91.6%; corresponding down-scale (353.2-0 W) accuracies were 85.1-92.5%. A hysteresis effect was furthermore evident for this ergometer in that up-scale measurements were significantly (P < 0.05) greater than down-scale ones. In addition, when the oldest [mean (SD): 11.3 (2.3) years old] and newest [1.4 (0.8) years old] eight ergometers were compared, the latter were significantly (P < 0.05) more accurate over the range 117.7-294.3 W. Apart from the two lowest power outputs of 47 W (62.2-96.0% accuracy) and 127 W (88.0-97.7% accuracy), the individual up-scale and down-scale accuracies of the Repco ergometers ranged from 98.0 to 104.2% for power outputs of 272.7-1137.8 W and the means were not significantly different from those of the DCR. There was also no evidence of hysteresis. Except for the initial power output of 50 W (40 rev/min: 83.8-99.2% accuracy; 60 rev/min: 93.2-122.6% accuracy), the individual accuracies of the electromagnetically braked ergometers ranged from 89.3 to 101.4% over the up-scale range of 100-400 W, and none of the means were significantly different from those of the DCR. The variability of individual errors for the preceding data emphasises that all cycle ergometers should be validated against the criterion of a DCR if accurate power outputs are required.