Enhancing specificity in proxy-design for the assessment of
bioenergetics (1st heading in importance)
A.D. Flouris1, Y. Koutedakis2/3, A. Nevill2, G.S. Metsios2, G. Tsiotra2, Y. Parasiris3
1Faculty of Applied Health Sciences, Brock University, St. Catharines, Canada;
2School of Sport, Performing Arts and Leisure, Wolverhampton University, Walshall, UK;
3Department of Physical Education and Sports Sciences, University of Thessaly, Trikala, Greece;
Submission date: 18/07/2003
, field test, validity, musculature.
Narrative text word count: 2034 words
Address of correspondence:
Andreas D. Flouris
Faculty of Applied Health Sciences
Academic South Building, Suite 335
500 Glenridge Avenue, St. Catharines
Ontario, L2S 3A1, Canada
Tel: +1 905 688 5550 ext. 3882
Fax: +1 905 688 8954
Enhancing specificity in proxy-design for the assessment of
bioenergetics (1st heading in importance)
The purpose of this study was to examine the hypothesis that improved prediction of
bioenergetics may be achieved when proxies are designed to closely simulate gold
standard laboratory protocols. To accomplish this, a modified ‘square’ variation (SST) of
the classical 20m Multistage Shuttle Run Test (MST) was designed aiming to reduce the
stopping, turning, and side-stepping manoeuvres. Within two weeks, 50 male volunteers
(age 21.5±1.6, BMI 24.4±2.2) randomly underwent three maximal oxygen uptake
( ) assessments using a treadmill test (TT), the SST and MST. To assess SST
reproducibility, 10 randomly-selected subjects performed the test twice. Validity results
showed that mean predicted from SST was not significantly different compared
to TT (p>0.05). In contrast, the equivalent value from MST was significantly
higher (p<0.001) than TT. Furthermore, TT correlated with SST and MST at
r=0.88 (p<0.001) and r=0.61 (p<0.05), respectively. The '95% limits of agreement’
AG) for SST and MST indicated a range of error equal to -0.5±5.4 and
8.1±8.0 (ml·kg-1·min-1) with a coefficient of variation of ±6 and ±8.2%, respectively.
Test-retest results for SST revealed no mean difference in (p>0.05) and a
correlation coefficient of r=0.98 (p<0.001), while LIM
AG demonstrated a range of error
equal to -0.2±2.6 (ml·kg-1·min-1) with a coefficient of variation of ±5.6%. It is concluded
that, compared to MST, the SST had a higher agreement with TT. The latter may well be
explained by the closer simulation in bioenergetics between the two protocols (i.e. the
continuous nature of SST provides a closer proxy of TT).
Abstract word count: 249
Introduction (2nd heading in importance)
Field assessment of bioenergetics [namely maximal oxygen uptake ()] with
minimal equipment and cost presents a continuous interest for many researchers seeking
information on cardiorespiratory elements associated with health-related fitness and
performance enhancement (1-3). Although voluminous literature has appeared about the
attributes of this approach (4, 5), it remains curtailed mainly because the majority of
proxies represent field measures designed to predict laboratory bioenergetics which, in
turn, are used to provide information on ‘field performance’. It seems, therefore, that
minor methodological flaws in proxy-design may have significant impact on assessing
cardiorespiratory fitness and/or performance.
The majority of proxies assessing bioenergetics utilize various exercise protocols
and powerful statistical tools in order to link specific field-performance indices (e.g.
velocity, time, heart rate) with measured – usually – during laboratory treadmill
running (2, 6). However, it seems reasonable to suggest that prediction power may be
limited when physiological and/or biomechanical disparity between the proxy and the
gold standard laboratory test are considered. Lack of specificity in factors such as
intensity, duration, exercise mode, technique and, particularly, musculature employed
may account for significant performance differences between the proxy and the gold
standard. This may explain the reduced precision frequently reported in relation to field-
testing (3, 7, 8).
The 20m multistage shuttle run test (MST) (6), a widely-used proxy-assessment
of treadmill , incorporates stopping, turning and side-stepping at the end of each
20-meter shuttle. However, such manoeuvres may considerably increase net muscle
activation compared to steady-state forward running (9). Since energy utilization depends
largely on the muscle mass being employed (10), variations between musculature
activated during the MST and the treadmill test will probably result in performance
discrepancies. Conversely, it seems reasonable to suggest that improved prediction of
bioenergetics may be achieved when proxies closely simulate the laboratory protocols.
Therefore, the main purpose of this study was to examine the effects of minimized
stopping, turning and side-stepping manoeuvres on MST precision. To achieve this, a
modified ‘square’ version of the MST was devised to incorporate a reduced turning angle
– thus resembling more the actions of forward treadmill running.
Methods and procedures (2nd heading in importance)
Subjects (3rd heading in importance)
Fifty adult males volunteered to participate in the study. The subjects were recreational
athletes, not specialized in a particular sport. For the purpose of data analysis subjects
were randomly assigned to either the model (n=40) or the validation (n=10) group.
Anthropometrical data appear in Table 1. Exclusion criteria included smoking and any
benign medical history. Written informed consent was obtained from all subjects after
full explanation of the procedures involved. This study received approval from the
Research Ethics Board of the University of Thessaly.
Each participant visited the data collection sites on three different occasions
within a 14-day period. One visit was reserved for the laboratory assessment of ,
while field-testing [i.e., the ‘square’ variation (SST) and the classic MST] was conducted
in the same rubber-floored gymnasium during the two remaining occasions. To assess
whether the SST was reproducible, the validation group performed this test twice, seven
days apart. Prior to data collection visits, subjects were familiarised with all assessment
protocols. They were also advised to avoid stressful activities 36-48 hours prior to data
collection visits. Tests were conducted in a random order by the same investigators and at
approximately the same time of the day (late mornings or early afternoons).
*** Table 1 near here ***
Incremental treadmill test (TT) (4th heading in importance)
A modified Bruce treadmill test (TT) to exhaustion was used to elicit (11). The
test commenced at 9 km·h
-1 with 2 min speed increments of 1 km·h-1 until exhaustion.
Treadmill inclination throughout testing remained at 0° while was confirmed
when at least two of the following criteria were met: 1) maximal heart rate greater than
185 bpm, 2) respiratory quotient greater than 1.1, and 3) detection of plateau in
curve. Oxygen uptake was measured via open circuit spirometry using an automated gas
analyser (Vmax 29, SensorMedics, USA). Respiratory parameters were recorded every
20 seconds during testing while subjects inspired room air through a low-resistance two-
way Rudolph valve. The gas analysers were calibrated with standard gases previously
checked by microtechniques. Spot checks were made on the calibration of the
pneumotachograph for volume flows up to 200 l·min-1.
Unlike the inclined treadmill running adopted by Léger and Gadoury (6), the
horizontal treadmill protocol used herein has a closer agreement with field running (12).
Nevertheless, since the MST has been designed to predict of a specific treadmill
test, this protocol-diversity was addressed by introducing a new prediction model based
on the current data (see Statistical Analysis section).
20m square shuttle test (SST) (4th heading in importance)
This test involves running on the four 20m-long sides of a square marked on the floor of
a gymnasium (fig.1) with the choice of performing the test running either clockwise or
counter-clockwise. Four pairs of cones are placed at the corners of the square to ensure
adherence. One to four subjects can perform the test simultaneously. Each subject should
start the test at one of the cone stations and follow the prescribed pace for as long as
he/she is able to be at the cone stations in synchrony (i.e. ±1sec) with the sound signals
emitted from the classical MST pre-recorded audiotape. Individuals should be advised to
perform wide turns, thus avoiding disturbances in their running technique. The test is
terminated when subjects are unable to maintain the prescribed pace for three consecutive
signals. In the present study, subjects performed the test individually to eliminate
*** Figure 1 near here ***
20m multistage shuttle run test (MST) (4th heading in importance)
This test was conducted according to published procedures (6). Subjects performed the
test individually and were instructed to run between two lines 20m apart in synchrony
with a sound signal emitted from an audiocassette. The test was terminated when subjects
were unable to maintain the prescribed pace for three consecutive signals.
Statistical analysis (3rd heading in importance)
Stepwise linear regression analysis was used to develop a prediction equation for
the SST (EQ
SST) using data from the model group. A prediction equation for the
MST) was also developed, using the same model group data, to cater for the fact
that a different treadmill protocol was originally utilized (6). Correlation coefficients and
analysis of variance (ANOVA) were used to detect possible bias between the actual and
the predicted values from the two models. Thereafter, data from the validation group
were used to cross-validate EQSST, EQMST, as well as the original equation reported by
Léger and Gadoury (6) (EQLÉG). Correlation coefficients, ANOVA, 95% limits of
agreement analyses (LIMAG) and percent coefficients of variation (CV%) were adopted
for both validity and reproducibility assessments according to known procedures (13).
The level of significance for all statistical analyses was set at p<0.05.
Results (2nd heading in importance)
Prediction of (3
rd heading in importance)
Stepwise linear regression analyses revealed that the maximal attained speed (MAS)
(km·h-1) was the best predictor of (ml·kg
-1·min-1) for both SST and MST.
Examination of residuals scatterplots detected no violation of normality, linearity, and
homoscedasticity between predicted scores and errors of prediction, while
Mahalanobis distance of each case to the centroid of all cases detected no multivariate
outliers for χ
2<0.001. Relevant statistics from the calculated prediction models
for SST  and MST  appear in Table 2.
 SST = MAS x 3.679 – 7.185
 MST = MAS x 3.56 + 2.584
*** Table 2 near here ***
Validity assessments (3rd heading in importance)
Means (±SD) and correlation coefficients of various performance indices from all three
protocols appear in Table 3. Preliminary analyses for LIM
AG revealed no positive
relationship between the differences/errors [either (EQSST - TT) or (EQMST - TT) or
(EQLÉG - TT)] and the size of measurements [given by either (the mean of EQSST and TT)
or (mean of EQMST and TT) or (mean of EQLÉG and TT)], respectively. Thus, the LIMAG
can be reported as absolute measurements (14). Finally, unlike EQSST and TT (t= -0.1,
p>0.05), the mean difference (error) between estimates from EQMST and TT (t= -2.4,
p<0.05) as well as EQLÉG and TT was biased (t= -8.1, p<0.001). Indices for LIMAG and
CV% appear in Table 3.
*** Table 3 near here ***
Reproducibility assessment (3rd heading in importance)
Table 4 demonstrates no significant differences (p>0.05) between the mean values from
the first (SST1) and the second (SST2) trial in the studied performance parameters. The
correlation coefficient between trials for all parameters was r= 0.98 (p<0.001).
Preliminary investigation for the LIMAG analysis revealed no positive relationship
between the differences/errors [SST
1 – SST2] and the size of measurements
[given by the mean of SST1 and SST2]. The mean difference between estimates
on the first and second trial was not biased (t= -1.7, p>0.05). Results for LIM
AG and CV%
appear in Table 4.
*** Table 4 near here ***
Discussion (2nd heading in importance)
The main purpose of this study was to examine the hypothesis that improved prediction
of bioenergetics may be achieved when proxies are designed to closely simulate gold
standard laboratory protocols. To fulfil this, we investigated the validity of the widely-
used MST against the SST. The latter test was designed to minimize stopping, turning
and side-stepping manoeuvres – thus closely resembling the gold standard forward
treadmill running. The main finding was that, compared to the classic MST, the SST had
a higher agreement with the gold standard laboratory test in predicting and
assessing relevant performance parameters. Furthermore, the SST preserved the high
reproducibility previously reported for the classical version of the test (i.e. MST) (1).
The main reason for assessing is to provide relevant data that will allow a
more precise planning of training and, ultimately, enhance field performance. Previous
studies examining different laboratory tests have stressed the importance of specificity
when assessing bioenergetics (15). For instance, the quantitative effects of training
cardiovascular and respiratory functions are optimally evaluated only by adopting tests
that primarily activate muscles used for this training (16). Despite the suggestion that
predicted is significantly influenced by the test utilized (17), according to our
knowledge, specificity of proxies has not been scrutinized hitherto. Application of the
specificity principle in proxies predicting would suggest similar exercise mode,
intensity, duration, technique and muscular action between the laboratory protocol used
as gold standard and the proxy. Results from the present study support the latter notion
demonstrating that proxies should be designed to assess bioenergetics should mimic the
intensity, duration, exercise mode, technique, and muscular action of the gold standard
laboratory test in order to achieve the highest accuracy and precision.
The MST utilizes information from shuttle running to predict which has
been derived from forward treadmill running. However, published reports suggest that
manoeuvres incorporated in shuttle running may increase net muscle activation compared
to forward running (9). In contrast, the SST prediction has been based on fairly
similar running modes (i.e. continuous ‘elliptic’ field-running and forward treadmill-
running). Since energy utilization depends largely on the muscle mass being employed
(10), variations in the mechanics – and, therefore, musculature activated – between the
two field tests and the gold standard contribute significantly to the observed variations in
, maximal velocity, and test duration. Furthermore, these differences allude to the
notion that intensity in the MST is markedly increased compared to the gold standard
test. These results are also in line with previous reports questioning metabolic (8, 18, 19)
and performance-based (20) aspects of the classical MST.
In addition, it seems tenable that the aforementioned manoeuvres incorporated in
MST represent biomechanical complexities which are dealt by each subject according to
individual skills. Although agility, strength, and sport-specific skills are very important in
sport performance, these factors should be evaluated individually by element- and sport-
specific tests. The presence of these factors in a cardioresporatory fitness field test
constitutes a significant source of inter-individual variation that is not present during the
gold standard test. As illustrated by the present CV% indices, the prediction of
MST and EQLÉG can be up to 1.4 times as ‘unreliable’ as the prediction of EQSST.
Although the limits of agreement in EQSST are still relatively wide, this range is more
likely to be acceptable by exercise scientists and coaches compared to EQMST and EQLÉG.
Due to the increased agreement and precision with the gold standard , results
from the SST can be used as parsimonious means for cross-sectional as well as
longitudinal evaluation of training prescription and analysis of training adaptation.
Further, due to the elimination of factors unrelated to cardiorespiratory fitness, it seems
reasonable that results from SST can be employed in diverse sporting disciplines. Ergo,
the SST may represent a valid and cost-effective tool in circumstances were, although
laboratory testing is not feasible, an accurate and precise evaluation of bioenergetics is
The present study is limited by the relatively small sample spectrum and by the
lack of examining the effect of diverse sporting backgrounds on SST and MST
performance. Within these limits, it is concluded that improved prediction of
bioenergetics may be achieved when proxies closely simulate laboratory protocols.
Although the rapid screening of large groups of individuals by practical proxies such as
the MST is acknowledged, scientists should appreciate the validity and precision required
to accurately assess cardiorespiratory fitness levels and advise the individual.
References (2nd heading in importance)
1 Léger LA, Mercier D, Gadoury C, et al. The multistage 20 metre shuttle run test
for aerobic fitness. J Sports Sci 1988;6:93-101.
2 Shephard RJ, Bailey DA, & Mirwald RL. Development of the Canadian Home
Fitness Test. Can Med Assoc J 1976;114:675-9.
3 Shephard RJ. Tests of maximum oxygen intake. A critical review. Sports Med
4 McNaughton L, Cooley PDC, Kearney V, et al. A comparison of two different
Shuttle Run tests for the estimation of VO2max. J Sports Med Phys Fitness
5 McNaughton L, Hall P, & Cooley D. Validation of several methods of estimating
maximal oxygen uptake in young men. Percept Mot Skills 1998;87:575-84.
6 Léger L, & Gadoury C. Validity of the 20 m shuttle run test with 1 min stages to
predict VO2max in adults. Can J Sports Sci 1989;14:21-26.
7 Berthoin S, Gerbeaux M, Turpin E, et al. Comparison of two field tests to
estimate maximum aerobic speed. J Sports Sci 1994;12:355-62.
8 Grant S, Corbett K, Amjad AM, et al. A comparison of methods of predicting
maximum oxygen uptake. Br J Sports Med 1995;29:147-152.
9 Besier TF, Lloyd DG, & Ackland TR. Muscle activation strategies at the knee
during running and cutting maneuvers. Med Sci Sports Exerc 2003;35:119-27.
10 Strømme SB, Ingjer F, & Meen HD. Assessment of maximal aerobic power in
specifically trained athletes. J Appl Physiol 1977; 42:833-877.
11 Koutedakis Y, Boreham C, Kabitsis C, et al. Seasonal deterioration of selected
physiological variables in elite male skiers. Int J Sports Med 1992;13:548-551.
12 Roecker K, Schotte O, Niess MA, et al. Predicting competition performance in
long-distance running by means of a treadmill test. Med Sci Sports Exerc
13 Bland JM, & Altman DG. Statistical methods for assessing agreement between
two methods of clinical measurement. Lancet 1986;8:307-310.
14 Nevill AM, & G. A. Assessing agreement between measurements recorded on a
ratio scale in sports medicine and sports science. Br J Sports Med 1997;31:314-
15 Basset FA, & Boulay MR. Specificity of treadmill and cycle ergometer tests in
triathletes, runners, and cyclists. Eur J Appl Physiol 2000;81:214-221.
16 Verstappen FT, Huppertz RM, & Snoeckx LH. Effect of training specificity on
maximal treadmill and bicycle ergometer exercise. Int J Sports Med 1982;3:43-
17 Anderson GS. A comparison of predictive tests of aerobic capacity. Can J Sports
18 Ahmaidi S, Collomp K, & Prefaut C. The effect of shuttle test protocol and the
resulting lactacidaemia on maximal velocity and maximal oxygen uptake during
the shuttle exercise test. Eur J Appl Physiol 1992;65:475-479.
19 Sproule J, Kunalan C, McNeill M, et al. Validity of 20-MST for predicting
VO2max of adult Singaporean athletes. Br J Sports Med 1993;27:202-204.
20 Ahmaidi S, Collomp K, Caillaud C, et al. Maximal and functional aerobic
capacity as assessed by two graduated field methods in comparison to laboratory
exercise testing in moderately trained subjects. Int J Sports Med 1992;13:243-
Figure 1. The 20 meter square shuttle test.
Table 1. Anthropometrical data and dynamometry results for all subjects and
Parameters Model Group
Note: ANOVA detected no significant differences between the two sub-groups in any of the
Key: BMI = body mass index.
Table 2. Stepwise multiple regression for predicting
using maximal attained speed in the model
Intercept B β SEE
Note: * p<0.05; ** p<0.001.
Key: MAS = maximal attained speed [mean(SD)]; R2 = coefficient of determination; adjR2 = adjusted coefficient of
determination; Intercept & B = unstandarized coefficients; β = standardized coefficient; SEE = standard error of the
estimate; = predicted values using the calculated models [mean(SD)]; r = correlation coefficient between
actual and predicted values; EQ
SST, MST = prediction models for each test developed from the model group.
Table 3. Comparison between all three tests [means(SD)] in the validation group (n=10). Download full-text
r MAS Time
TT 47.2(6.0) --- --- --- 15.4±1.2 14:30±2:25
Note: ANOVA against TT: † different at p<0.05; ‡ different at p<0.001.
Correlation coefficient against TT: * significant at p<0.05; ** significant at p<0.001.
= maximal oxygen uptake (ml·kg
percent coefficient of variation for ; r = correlation coefficient against TT for
attained speed (km·h
developed from the model group; LÉG = prediction model for MST reported by Leger and Gadoury (1989).
-1·min-1); LIMAG = calculated limits of agreement for ; CV
; MAS = maximal
-1); Time = exercise time to exhaustion (min); EQSST, MST = prediction models for each test