Content uploaded by Stephen H. Courtright
Author content
All content in this area was uploaded by Stephen H. Courtright on May 14, 2015
Content may be subject to copyright.
A Meta-Analysis of Sex Differences in Physical Ability: Revised Estimates
and Strategies for Reducing Differences in Selection Contexts
Stephen H. Courtright
Texas A&M University
Brian W. McCormick
University of Iowa
Bennett E. Postlethwaite
Pepperdine University
Cody J. Reeves and Michael K. Mount
University of Iowa
Despite the wide use of physical ability tests for selection and placement decisions in physically
demanding occupations, research has suggested that there are substantial male–female differences on the
scores of such tests, contributing to adverse impact. In this study, we present updated, revised meta-
analytic estimates of sex differences in physical abilities and test 3 moderators of these differences—
selection system design, specificity of measurement, and training—in order to provide insight into
possible methods of reducing sex differences on physical ability test scores. Findings revealed that males
score substantially better on muscular strength and cardiovascular endurance tests but that there are no
meaningful sex differences on movement quality tests. These estimates differ in several ways from past
estimates. Results showed that sex differences are similar across selection systems that emphasize basic
ability tests versus job simulations. Results also showed that sex differences are smaller for narrow
dimensions of muscular strength and that there is substantial variance in the sex differences in muscular
strength across different body regions. Finally, we found that training led to greater increases in
performance for women than for men on both muscular strength and cardiovascular endurance tests.
However, training reduced the male–female differences on muscular strengths tests only modestly and
actually increased male–female differences on cardiovascular endurance. We discuss the implications of
these findings for research on physical ability testing and adverse impact, as well as the practical
implications of the results.
Keywords: physical ability testing, adverse impact, personnel selection
Supplemental materials: http://dx.doi.org/10.1037/a0033144.supp
Physically demanding occupations make up a significant share
of most labor markets globally. For example, over 28% of the U.S.
labor force works in physically demanding occupations such as
public safety, construction, maintenance and repair, and the mili-
tary (Bureau of Labor Statistics, 2011). In such occupations, phys-
ical ability tests are widely used as tools for selection, placement,
and retention decisions (Salgado, Viswesvaran, & Ones, 2001).
Indeed, Hough, Oswald, and Ployhart (2001) noted that physical
abilities are among the “major types of skills and abilities that
industrial-organizational psychologists tend to use in personnel
selection decisions” (p. 153). The importance of physical ability
tests is difficult to overstate given that workers who fail to meet
job-related physical demands have lower performance, more inju-
ries, more absenteeism, and higher mortality rates (Gebhardt &
Baker, 2010a;Hogan, 1991a). Moreover, since low performance in
many physically demanding jobs can have catastrophic conse-
quences (e.g., public safety; Colquitt, LePine, Zapata, & Wild,
2011), physical ability tests go beyond just benefitting organiza-
tions to benefitting society as a whole (Gebhardt & Baker, 2007).
Nevertheless, physical ability tests are not without a history of
controversy. Indeed, physical ability tests rank only behind un-
structured interviews and cognitive ability tests in terms of how
frequently they are challenged in U.S. federal court cases (Terps-
tra, Mohamed, & Kethley, 1999). The driving force of this con-
troversy is the large male–female differences that exist on certain
physical abilities. In fact, scholars have suggested that sex differ-
ences on certain physical abilities are larger than any other sub-
group difference found on any other human ability or characteristic
relevant to personnel selection (Ployhart, Schneider, & Schmitt,
This article was published Online First June 3, 2013.
Stephen H. Courtright, Department of Management, Texas A&M Uni-
versity; Brian W. McCormick, Department of Management and Organiza-
tions, University of Iowa; Bennett E. Postlethwaite, Seaver College, Busi-
ness Administration Division, Pepperdine University; Cody J. Reeves and
Michael K. Mount, Department of Management and Organizations, Uni-
versity of Iowa.
The first three authors contributed equally to this research and are listed
in alphabetical order. We thank Frank Schmidt for his feedback on a
previous version of this article.
Correspondence concerning this article should be addressed to Stephen
H. Courtright, Department of Management, Texas A&M University, 4221
TAMU, College Station, TX 77843-4221, or to Brian W. McCormick,
Department of Management and Organizations, University of Iowa, 108
Pappajohn Business Building, Iowa City, IA 52242-1994. E-mail:
scourtright@mays.tamu.edu or brian-mccormick@uiowa.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Journal of Applied Psychology © 2013 American Psychological Association
2013, Vol. 98, No. 4, 623–641 0021-9010/13/$12.00 DOI: 10.1037/a0033144
623
2006). Moreover, research has shown that these sex differences are
not due to test bias because similar differences exist in correspond-
ing physical performance criteria (Gebhardt & Baker, 2010a;Ho-
gan, 1991a;Sackett & Wilk, 1994). This issue thus presents a key
challenge to personnel selection scholars and practitioners given
the potential of physical ability tests to have adverse impact on
females, as well as the fact that increasingly more females are
entering physically demanding occupations (Bureau of Labor Sta-
tistics, 2010;U.S. Army, 2012).
Although there is a general understanding among scholars that
large sex differences exist in physical ability test scores, unfortu-
nately very little is known regarding potential methods for reduc-
ing the magnitude of these differences (Ployhart & Holtz, 2008).
This gap in research makes it difficult to provide organizations
with guidance on leveraging the strong validity of physical ability
tests (Blakley, Quiñones, Crawford & Jago, 1994;Henderson,
2010;Henderson, Berry, & Matic, 2007;Lewis, 1989;Schmitt,
Gooding, Noe, & Kirsch, 1984) while simultaneously reducing
their adverse impact on women. In addition, while there is little
debate regarding the existence of large sex differences on physical
abilities, there are several limitations in the current estimates of
these differences that must be resolved if scholars are to gain a
better understanding of their magnitude and the presence of po-
tential moderators of these differences.
Given these limitations in the extant literature, our study is
intended to make two key contributions to personnel selection
research. First, we conducted a meta-analytic review of the phys-
ical ability literature to provide the most up-to-date and accurate
estimates of sex differences in physical ability within work con-
texts. In doing so, we elucidated the large sex differences that exist
in physical abilities relative to subgroup differences on other traits
and abilities. Second, we empirically examined moderators of sex
differences in physical ability. In doing so, we provided conceptual
clarification into issues of contention in the physical ability liter-
ature, and we evaluated methods of potentially reducing sex dif-
ferences on physical ability test scores.
Structure of Physical Abilities
The most commonly referenced taxonomy of physical abilities
in the personnel selection literature was presented by Hogan
(1991b). As shown in Table 1,Hogan (1991b) identified three
meta-categories of physical abilities: muscular strength, cardiovas-
cular endurance, and movement quality. Muscular strength is
broadly defined as the ability to exert or withstand force through
muscular contraction. The muscular strength dimension comprises
three lower order abilities: muscular tension (the ability to exert
force against an object—i.e., push, pull, lift, lower, or carry),
muscular power (the ability to exert muscular force quickly), and
muscular endurance (the ability to exert muscular force continu-
ously over a prolonged period of time). Cardiovascular endurance
is the second broad dimension of physical ability and is defined as
the ability to sustain physical activity that results in increased heart
rate. The last meta-category of physical ability category is move-
ment quality, which includes flexibility (the ability to flex or
extend body limbs to work in awkward positions), balance (the
ability to maintain one’s body in a stable position when confronted
with resisting forces), and coordination (the ability to sequence
movements of arms, legs, and body). Taken together, these repre-
sent physical ability constructs from which physical ability tests
are often developed (Hoffman, 1999). It should also be noted that
these physical abilities are distinct from psychomotor abilities such
as visual acuity, reaction time, or dexterity (Hunter & Hunter,
1984;Ployhart et al., 2006;Schmidt, 1994) that often favor fe-
males (Ployhart & Holtz, 2008).
Previous Estimates of Sex Differences
in Physical Abilities
To date, there have been a few narrative and quantitative re-
views that have attempted to estimate the magnitude of sex dif-
ferences in physical abilities. First, Hogan (1991a) conducted a
narrative review of 14 validation studies that also reported sex
differences data for a variety of physical ability tests. Sackett and
Wilk (1994) then expanded Hogan’s work by quantitatively sum-
marizing the 14 studies that Hogan reported in her review. In doing
so, they computed dvalues ranging from 2.28 for muscular tension
to 0.05 for flexibility (note: positive dvalues indicate that males
score better than females). Additionally, Sackett and Wilk pre-
sented separate results from three large nonoccupational data sets,
two of which were adolescent samples (e.g., Cooper Institute for
Aerobic Research;Fleishman, 1962;Hunsicker & Reiff, 1976).
From these data, they found sex differences as large as d⫽4.12
and d⫽3.27 for certain muscular strength tests (softball throw and
pull-ups, respectively). Finally, Hough et al. (2001) meta-analyzed
all of the results presented by Sackett and Wilk (1994) to generate
overall dvalues for each dimension of physical ability. Their
results revealed slightly lower sex differences on muscular
strength (d⫽1.66) compared with prior data. They also found that
Table 1
Hogan’s (1991b) Taxonomy of Physical Abilities
Categories/subdimensions Definitions
Muscular strength
Muscular tension Exerting force against objects such as pushing, pulling, lifting, lowering, and carrying objects or materials.
Muscular power Exerting muscular force quickly.
Muscular endurance Exerting muscular force continuously over time while resisting fatigue.
Cardiovascular endurance Sustaining physical activity that results in increased heart rate.
Movement quality
Flexibility Flexing or extending the body limbs to work in awkward or contorted positions.
Balance Maintaining the body in a stable position, including resisting forces that cause loss of stability.
Coordination Sequencing movements of the arms, legs, and/or body to result in skilled action.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
624 COURTRIGHT ET AL.
women scored significantly higher than men on flexibility tests
(d⫽⫺0.64).
While these previous studies are extremely helpful in terms of
providing initial evidence that large sex differences exist in phys-
ical ability, we argue that there is a critical need for increasing the
accuracy and precision of these estimates. First, in order for
practitioners to evaluate the potential for physical ability tests to
result in adverse impact in selection contexts specifically, it is
necessary to know the level of sex differences that are found in
adult occupational samples as opposed to those found in adoles-
cent samples (Arvey, Landon, Nutting, & Maxwell, 1992;Hen-
derson et al., 2007). While Sackett and Wilk (1994) provided this
data to a certain extent when quantitatively summarizing the
validation studies reviewed in Hogan (1991a), their analysis was
limited to those 14 studies. In the last two decades, however, a
considerable number of studies have been conducted using a wide
variety of occupational samples. Incorporating these additional
studies would result in more robust estimates of sex differences in
that they would be based on a larger number of occupational
samples and participants than any of the previous estimates.
Second, key methodological limitations inherent in past esti-
mates need to be corrected. For example, Sacket and Wilk’s (1994)
study did not weight the effect sizes reported in Hogan’s (1991a)
review by sample size, nor were they or Hough et al. (2001) able
to correct for unreliability of the physical ability tests.
1
Rather,
they computed strict averages of the effect sizes reported in the
primary studies. This procedure produces estimates that do not
account for the impact of both sampling and measurement error on
reported effect sizes (Hunter & Schmidt, 2004). Furthermore,
previous studies appear to have computed average sex differences
in physical abilities based on the number of individual effect sizes
rather than the number of independent samples. Hunter and
Schmidt (2004) noted that this method capitalizes on nonindepen-
dence problems and can distort effect sizes.
In sum, one of the key contributions of our study is to provide
the most up-to-date and accurate estimates of sex differences in
physical ability in work contexts. We do this by (a) expanding the
number of studies upon which estimates are based, (b) limiting the
analysis solely to adult occupational samples, (c) weighting effect
sizes reported in primary studies by sample size, (d) correcting for
unreliability of physical ability tests, and (e) clarifying the report-
ing of kvalues to reflect the number of independent samples rather
than the number of effect sizes.
Moderators of Sex Differences in Physical Abilities
In addition to the need for revising estimates of sex differences
in physical abilities, there is a need to identify ways in which
organizations can potentially reduce sex difference scores on phys-
ical ability tests, a question that no study to our knowledge has
explicitly addressed. This was made apparent in review articles on
adverse impact by Hough et al. (2001) and Ployhart and Holtz
(2008) in which they reviewed the literature on strategies for
reducing subgroup differences on individual difference constructs
relevant to staffing. Specifically, in contrast to the extensive liter-
ature addressing moderators of subgroup differences on cognitive
ability, these authors were not able to review a single study that
explicitly addressed moderators of sex differences in physical
abilities. This is a surprising omission in the literature given that it
is necessary to understand what moderates sex differences in
physical abilities in order to determine what factors may help
mitigate the adverse impact potentially brought on by those dif-
ferences. Thus, as noted earlier, a key contribution of our study is
the examination of potential moderators of sex differences in
physical abilities.
The absence of empirical research and a wide range of opinions
on moderators of sex differences make it difficult to propose
specific hypotheses. Therefore, we propose research questions and
posit that a key contribution of our study is providing greater
conceptual clarity to the area of physical ability testing by helping
to resolve current debates in the literature regarding moderators of
sex differences in physical abilities (Klein & Zedeck, 2004). In
particular, physical ability researchers have generally debated the
existence and patterns of three different moderators: selection
system design, specificity of measurement, and training (Gebhardt
& Baker, 2010a,2010b). Each of these moderators is important
from a practical perspective because they are all under the control
of organizations; that is, they reflect specific, practical measures
that organizations can take to reduce sex differences on physical
ability test scores. Furthermore, it is noteworthy that these same
moderators have been examined at length in the literature on
cognitive ability testing (Ployhart & Holtz, 2008).
In sum, the specific moderators examined in the current study
were chosen based on their capacity to (a) potentially resolve
current debates and thereby provide conceptual clarity within the
physical ability literature, (b) offer insight into various methods of
reducing sex differences on physical ability tests, and (c) allow for
a comparison of strategies for reducing subgroup differences
across physical and cognitive ability domains, which is important
since cognitive and physical ability tests are often used jointly for
selection and placement decisions in physically demanding jobs
(Barrett, Polomsky, & McDaniel, 1999;Henderson, 2010).
Selection System Design
One of the key debates among scholars who conduct research on
physical ability testing involves the best way to design physical
ability selection systems. In particular, there are two primary ways
for designing such systems. The first method is a construct-
oriented approach in which a battery of physical ability tests is
designed to directly measure specific physical ability constructs
that are related to high physical performance on the job. In other
words, these systems focus on the development and administration
of basic ability tests(Gebhardt & Baker, 2010a,2010b), or tests
designed to measure a single physical ability construct. Such tests
are what most people would consider standard physical fitness
1
In the realm of physical ability testing, reliability is most frequently
estimated using test–retest scores. This results in a coefficient of stability,
which can be used to correct for random error and transient error. Other
forms of measurement error do exist in physical ability testing. For exam-
ple, measurement error can occur when testers (raters) use different test
administration techniques, with the amount of error depending upon the
degree to which testers deviate from an established testing protocol. Test
reliability can also be estimated in other ways besides test-retest scores. For
example, when multiple tests are administered to measure a single physical
ability construct, or if a single test contains separately scored subtests, then
internal consistency estimates such as Cronbach’s alpha can be calculated.
However, internal consistency estimates are reported far less often than
test–retest reliabilities.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
625
SEX DIFFERENCES IN PHYSICAL ABILITY
exercises. For instance, push-ups and sit-ups are examples of basic
ability tests of muscular strength; a 1.5-mile run is a type of basic
ability test of cardiovascular endurance; and sit-reach tests and
balance beam walks are two types of basic ability tests of move-
ment quality. Basic ability tests are seen as beneficial because they
are inexpensive and easy to administer and can be used to predict
physical performance across a wide variety of settings.
An alternative method of designing a physical ability selection
system is a content-oriented approach in which organizations
create a simulation of physical tasks that resemble actual physical
tasks of the job (i.e., a work sample test). For example, a job
simulation test for police officers may include a short sprint
followed by turning a 150-pound dummy from side to side in order
to simulate chasing down and wrestling a criminal to the ground
(Arvey, Landon, Nutting, & Maxwell, 1992). Similarly, a firefight-
ing job simulation may include quickly dragging a heavy fire hose
for a certain distance (e.g., Sothmann, Gebhardt, Baker, Kastello,
& Sheppard, 2004). Although job simulations are mainly designed
to resemble job tasks, they also measure multiple physical ability
constructs (Hogan, 1991a). For example, the police and firefighter
job simulations noted measure muscular strength, cardiovascular
endurance, and movement quality. Job simulations are seen as
beneficial because they possess greater content validity than basic
ability tests and, therefore, are related to more favorable applicant
reactions (Ryan, Greguras, & Ployhart, 1996). However, job sim-
ulations can only be used to predict physical performance within
the context in which they are administered, and they are far more
expensive to develop and administer than basic ability tests (Hen-
derson et al., 2010).
In sum, a key decision facing test developers is whether to
design a selection system that consists of multiple basic ability
tests (a construct-oriented approach) or to administer a job simu-
lation (a content-oriented approach). Research has shown that
these two types of selection systems demonstrate similar validities
(Arvey et al., 1992;Henderson et al., 2007;Hogan, 1991a). Thus,
a vital question to consider is which type of design scheme
produces greater sex differences. In that regard, there are mixed
claims as to whether systems focusing on basic ability tests or a job
simulation reduce sex differences to a greater extent. On one hand,
researchers such as Gebhardt and Baker (2010a) argued that a job
simulation “will result in no less adverse impact” (p. 285) than a
battery of basic ability tests. On the other hand, some scholars have
suggested that a job simulation may reduce sex differences to a
greater extent than a selection system emphasizing basic ability
tests because a job simulation is based on job performance tasks
for which sex differences could potentially be smaller; in addition,
applicant reactions to job simulations tend to be more positive
(Fleishman & Quaintance, 1984;Ryan et al., 1996). Finally, it is
also possible that a job simulation results in greater sex differences
because a simulation taps multiple basic ability constructs simul-
taneously (Williams, Rayson, & Jones, 1999). In turn, this may
compound sex differences that exist across various dimensions of
physical ability.
In any case, resolution of this debate seems critical. For in-
stance, even though job simulations are more expensive to de-
velop, if they reduce sex differences, then a simulation-based
selection system may be more desirable because it is more closely
related to the job and generates more favorable applicant reactions.
Conversely, if either system results in similar magnitudes of sex
differences, then practitioners can perhaps eliminate the consider-
ation of adverse impact when deciding whether to design physical
ability selection systems emphasizing basic ability tests or job
simulations. Finally, if job simulations result in greater sex differ-
ences, then a more cost-effective and context-generalizeable se-
lection system—that is, one that emphasizes basic ability tests—
may be warranted for assessing physical abilities (Henderson,
2010).
Research Question 1: Do selection systems that focus on
basic abilities versus job simulations result is similar magni-
tudes of sex differences in physical ability?
Specificity of Measurement
Physical ability researchers have questioned whether broad
measures—such as those invoked by Hogan’s (1991b) taxono-
my— or narrow measures of physical abilities help reduce sex
differences. More specifically, Gebhardt and Baker (2010b)
criticized Hogan’s taxonomy for being too broad, arguing in
essence against aggregating muscular tension, muscular power,
and muscular endurance into an overall muscular strength fac-
tor. Instead, they contended that the lower order dimensions of
muscular strength are conceptually and empirically distinct and
should be treated accordingly when one is designing physical
ability tests (Fleishman, 1962;Fleishman & Quaintance, 1984).
This argument may also be true of movement quality, for which
there are three underlying factors (flexibility, balance, coordi-
nation). Physiologists tend to use more narrow taxonomies of
physical ability (e.g., Astrand, Rodahl, & Stromine, 2003;
McArdle, Katch, & Katch, 2007), basing their taxonomies on
lower order models that significantly predate Hogan’s higher
order model (e.g., Fleishman, 1964). In other words, these
lower order models in the biological sciences generally encom-
pass all of the subdimensions of Hogan’s model, but they do not
group the subdimensions into higher order factors such as
overall muscular strength or overall movement quality, as is the
case with Hogan’s model. Given these discrepancies, we exam-
ined whether sex differences vary across the narrower dimen-
sions of muscular strength and movement quality. If sex dif-
ferences do meaningfully vary across these narrow dimensions
of physical ability, then it may be possible to identify which
dimensions have smaller sex differences and perhaps use a
battery of corresponding narrow tests to potentially reduce
differences, while taking into account factors like validity and
the correlations between narrow-ability tests (Hogan, 1991a;
Lewis, 1989;Ployhart & Holtz, 2008).
Gebhardt and Baker (2010a) suggested an even narrower way of
measuring physical abilities—namely, by focusing on muscular
strength in specific regions of the body. For example, some mus-
cular strength tests specifically measure upper body strength (e.g.,
push-ups), whereas others measure trunk/core (e.g., sit-ups) or
lower body (e.g., leg press) strength. If sex differences are mean-
ingfully reduced through tests of muscular strength in specific
regions of the body, and if the tests of those individual body
regions have equal validities, then our findings would suggest a
need for more sophisticated job analyses in which the muscular
regions of the body most necessary for a particular job are iden-
tified. Furthermore, examining whether sex differences are mean-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
626 COURTRIGHT ET AL.
ingfully reduced on muscular strength tests across different regions
of the body helps assess the utility of Hogan’s (1991a) taxonomy
of broad physical abilities. Specifically, if dvalues differ across
various regions of the body, a narrower taxonomy of physical
ability may have greater utility for job analyses in physically
demanding jobs.
Research Question 2a: Do the magnitudes of sex differences
in muscular strength and movement quality abilities vary
across their lower order dimensions?
Research Question 2b: Does the magnitude of sex differences
in muscular strength vary across specific body regions?
Training
Most abilities are defined as enduring traits that remain stable
over time (Gottfredson, 1997). If that is indeed the case with
physical abilities, it may be difficult for individuals to improve
their scores through training. However, Hogan (1991a) argued
that the enduring nature of human abilities “may be less true of
physical abilities than cognitive abilities. It is clear that some
parameters of abilities are biologically determined and, there-
fore, will not markedly change. However, physical perfor-
mances are highly susceptible to training” (p. 754). For exam-
ple, in a military setting, recruits undergo a basic combat
training program designed to enable them to better meet the
physical demands of the tests and therefore the physical de-
mands of their occupation. Similarly, test training can help job
applicants and recruits become more familiar with the tests
themselves and thereby gain a better understanding of the test’s
requirements. For those reasons, research has often shown that
physical ability training improves physical performance across
sex (Gebhardt & Baker, 2007).
A pertinent question for this study, however, is not just
whether men and women benefit from physical ability training,
but whether one group benefits more from training and, ulti-
mately, whether training reduces sex differences on physical
ability tests. On one hand, most physical ability researchers
have argued that physical ability training programs are primar-
ily beneficial for women (Hogan, 1994;Hogan & Quigley,
1994). This claim is consistent with arguments made by some
training scholars that training interventions tend to help those
lower on a given ability because such individuals have higher
levels of motivation to overcome deficiencies in the ability
being trained (Colquitt, LePine, & Noe, 2000;Noe & Schmitt,
1986). On the other hand, training may exacerbate sex differ-
ences because training still benefits individuals who are high on
a given physical ability, which may serve to amplify sex dif-
ferences (Knapik, Wright, Kowal, & Vogel, 1980;Von
Restorff, 2000). This perspective is consistent with research
showing a strong positive correlation between ability and train-
ing performance (Fleishman & Mumford, 1989). To resolve
these conflicting perspectives, we evaluated the efficacy of
training for reducing sex differences on physical ability test
scores.
Research Question 3a: Does training improve performance
on physical ability tests?
Research Question 3b: Does training reduce the magnitudes
of sex differences in physical abilities?
Method
Literature Search
We conducted an extensive multidisciplinary literature search
for articles that reported sex differences in physical ability. Spe-
cifically, we conducted searches utilizing electronic databases
including Google Scholar, PsycINFO, ProQuest Dissertations and
Theses, and the Defense Technical Information Center (DTIC)
using combinations of keywords such as physical ability,physical
agility,physical fitness testing,physical performance,job perfor-
mance,and sex differences. Moreover, we engaged in an extensive
process of back-searching the articles that we located in order to
expand our pool of potential studies. This back-searching method
was helpful in identifying articles, technical reports, and unpub-
lished studies (e.g., dissertations, conference articles) that ad-
dressed physical ability testing in the workplace.
Inclusion Criteria
We evaluated the studies and included them in the meta-analysis
if they met four criteria. First, they needed to include participants
drawn specifically from a work population in an occupational
setting (general population, youth, and student samples were ex-
cluded). Second, the studies needed to include a sample consisting
of both female and male participants (studies including solely
female or solely male participants were thus excluded). Third, the
studies needed to provide descriptive statistics (means and stan-
dard deviations) on physical ability for both female and male
participants or else include other data (e.g., correlations between
sex and physical ability) that would permit the calculation of effect
sizes (Cohen’s dvalues). Finally, we excluded duplicate studies
that reported identical data (e.g., a technical report and a scholarly
journal article with the same data) in order to ensure the validity of
our meta-analytic results (Wood, 2008). After excluding the stud-
ies that did not meet our inclusion requirements, we arrived at a
total of 113 studies and 140 unique samples that were included in
the meta-analysis.
In the online supplementary materials, Appendix A provides a
list of primary studies included in the meta-analysis and the
relevant data coded from these studies. Appendix B provides a list
of studies that were excluded from the meta-analysis and the
inclusion criteria that excluded them from consideration.
Coding Procedures
For each physical ability test that met the inclusion criteria, we
coded the number of female and male participants who were
tested, as well as the means and standard deviations of females’
and males’ test performance. Furthermore, we recorded correlation
coefficients and reliability coefficients whenever they were pro-
vided. We then coded the exact test that was administered (e.g.,
push-up) and the design scheme of the test (basic ability test or job
simulation). Specifically, tests were coded as basic ability tests or
job simulations depending on their resemblance to actual job tasks.
In particular, we looked for keywords in authors’ descriptions of
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
627
SEX DIFFERENCES IN PHYSICAL ABILITY
the tests and determined whether the tests were (a) designed to
resemble actual job tasks (job simulations) or directly measure
physical ability constructs (basic ability tests), and (b) whether
tests were designed to predict performance in a single context (job
simulations) or across different jobs (basic ability tests). For the
most part, this information was provided in clear detail in the
primary studies. However, if a less thorough description of a test
was provided and doubt arose as to whether the test was a basic
ability test or a job simulation, we referred to Gebhardt and
Baker’s (2010a,2010b) review articles to see how they classified
certain tests. We then sought to achieve consistency between our
coding and Gebhardt and Baker’s classification of tests as well as
our coding of other studies in the database.
Next, we coded for specificity of measurement. We first clas-
sified each test according to Hogan’s (1991a) taxonomy. Thus, we
coded tests into the three broad dimensions (muscular tension,
cardiovascular endurance, and movement quality) and six subdi-
mensions (e.g., muscular tension) of physical ability. Because job
simulations are multidimensional, we indicated all broad dimen-
sions and subdimensions that the job simulation was tapping.
Whenever the coding of a test within a study was ambiguous, we
reviewed the explanation of the test provided by the primary
author to determine which dimension of physical ability the test
was assessing. When necessary, we verified our coding with that
of Hogan (1991a,1991b) to ensure the consistency of our coding
of each respective test. Moreover, any ambiguous tests we encoun-
tered were highlighted and discussed as a group to arrive at a
consensus for coding.
To further code for specificity of measurement, we coded mus-
cular strength tests for the regions of the body (upper, lower, core,
total) being assessed. For the majority of tests, the coding was
unequivocal. Specifically, tests that assessed muscular strength in
the shoulders, upper back, and arms were coded “upper”; tests that
assessed muscular strength in the legs were coded “lower”; and
tests that assessed muscular strength in the lumbar spine region
were coded “core” (Akuthota & Nadler, 2004;Faries & Green-
wood, 2007). There were some muscular strength tests that loaded
equally on multiple body regions (e.g., firefighter test described
earlier). Such tests were coded as “total.”
In order to assess the impact of training on sex differences in
physical ability, we highlighted the studies that (a) involved tests
of the same physical abilities of the same participants at two points
in time and (b) discussed a specific physical fitness intervention
(e.g., Army boot camp) aimed at improving physical ability. We
then noted the duration of time that passed between the measure-
ments of physical abilities (range of 6 –16 weeks). The training
studies included in the meta-analysis are noted in Appendix A.
Finally, we coded for two sample characteristics for informa-
tional purposes: industry (military vs. civilian/nonmilitary) and test
setting (applicants vs. incumbents). Of the 140 unique samples that
were coded, 59% were from nonmilitary contexts (see Table 2 for
examples of nonmilitary samples included in the meta-analysis),
and 94% involved incumbent samples.
2
Intercoder agreement. We took multiple steps to ensure the
accuracy and consistency of our coding. Specifically, three of the
authors coded a preliminary pool of studies together. In doing this,
we discussed how to categorize the tests being coded and eventu-
ally arrived at a consistent coding scheme. This process resulted in
the creation of a template that was used for coding the remaining
studies. The remaining studies were then divided among the first
four authors to code, with two coders independently coding each
respective study. We tracked and assessed the initial agreement on
coding decisions. For the coded categories, initial agreement was
as follows: methodological variables (97%), selection system de-
sign (97%), specificity of measurement in Hogan’s (1991a) broad
dimensions (99.5%), specificity of measurement in Hogan’s sub-
dimensions (95%), and specificity of measurement in body regions
(92%). When there was disagreement on coding, the authors
conferred with each other to arrive at a consensus.
Meta-Analytic Procedures
We followed procedures outlined by Hunter and Schmidt (2004)
for meta-analyzing effect sizes. Specifically, we began the meta-
analysis by calculating Cohen’s dvalues for the difference be-
tween males and females for each physical ability test included in
each independent sample; dvalues in the negative direction indi-
cated better performance by females and in the positive direction
indicated better performance by males. Also, for the training
moderation analysis, positive dvalues indicated improved perfor-
mance from Time 1 to Time 2. It should be noted that within
studies, there was often an issue of more than one dvalue existing
for the same construct (e.g., Sell, 2006; five tests measured mus-
cular strength: push-ups, sit-ups, vertical jump, bench press, and
grip strength). Since between-measure correlations were rarely
reported within primary studies, we followed Hunter and
Schmidt’s (2004) advice to create mean effect sizes. That is, we
averaged the effect sizes and computed a sample-size weighted
mean dfor each respective dimension within a study. This ap-
proach results in slightly underestimated effect sizes; thus, our
estimates should be considered conservative (Hunter & Schmidt,
2004).
Data were analyzed using the Hunter–Schmidt Meta-Analysis
Package Program 1.1 (Schmidt & Le, 2004). Because there were
many studies included in our meta-analysis that failed to report
reliability coefficients for the physical ability tests administered,
we used the artifact distribution procedure to correct for measure-
ment error (Hunter & Schmidt, 2004). Multiple types of reliabili-
ties were reported in the studies that met our inclusion criteria;
however, test–retest reliabilities were by far the most prevalent.
Given this, we used test–retest estimates exclusively in our reli-
ability analyses. Table 3 provides a summary of the test–retest
reliabilities used to create the artifact distributions. The mean
reliability values for each broad dimension are as follows: mus-
cular strength ⫽.90; cardiovascular endurance ⫽.81; and move-
ment quality ⫽.65. Where available, other reliability estimates
(e.g., Cronbach’s ␣) are reported in Appendix A in the online
supplemental materials. We should note that we did not correct for
range restriction because many studies used unique or customized
physical ability tests and did not report normative data for these
tests. Furthermore, we could not find a study that reported sex
differences in both an incumbent and an applicant sample. In short,
this made it impossible to estimate u
x
values and correct for range
2
We conducted a moderator test of these sample characteristics and
found that sex differences in all of the dimensions of physical ability were
not significantly different across military versus nonmilitary or applicant
versus incumbent samples.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
628 COURTRIGHT ET AL.
restriction; thus, we once again stress that the estimates provided
should be considered conservative.
Results
Sex Differences on Broad Dimensions of
Physical Ability
Table 4 contains a brief overview of our findings in comparison
to prior studies. However, for the results in subsequent tables, we
have provided more complete statistical data in line with other
meta-analyses with dvalues (Dean, Roth, & Bobko, 2008;Roth,
Bobko, McFarland, & Buster, 2008;Roth, Purvis, & Bobko,
2012). Our findings revealed substantial sex differences in two
meta-categories of physical ability: muscular strength and cardio-
vascular endurance. Specifically, males perform better on muscu-
lar strength (␦
3
⫽1.81) and cardiovascular endurance (␦⫽2.01)
assessments of physical ability. As noted in Table 4, while our
estimate of sex differences in muscular strength varied to a mod-
erate degree from Hough et al.’s (2001) estimate (d⫽1.66), we
found much larger differences in cardiovascular endurance com-
pared to past estimates (Sackett & Wilk, 1994:d⫽1.27; Hough et
al., 2001:d⫽1.09). For movement quality ability, our results
showed negligible sex differences (␦⫽–.06).
Selection System Design Moderator
Table 5 provides a summary of sex differences across basic
ability and job simulation tests at the individual-test level. Findings
at this level showed that job simulations resulted in greater sex
differences in overall muscular strength (␦⫽2.67 job simulation
vs. ␦⫽1.60 basic ability) and cardiovascular endurance (␦⫽2.26
job simulation vs. ␦⫽1.87 basic ability). We should note, how-
ever, that we identified one large-sample study (Misner, Boileau,
& Plowman, 1989) that reported particularly high male–female d
values for some of the categories of physical ability that we coded
(as high as d⫽4.15 for the muscular endurance category). Since
the outlier study sample was composed of disparate numbers of
men (n⫽9,763) and women (n⫽35), we also ran the meta-
analysis with the data from the outlier sample of the Misner et al.
study excluded. While the magnitudes of the differences were less
pronounced, the pattern of results obtained with the outlier study
excluded generally remained the same. Specifically, at the indi-
vidual test level, job simulations still resulted in greater sex dif-
ferences than basic ability tests for overall muscular strength (␦⫽
1.94 job simulation vs. ␦⫽1.60 basic ability). The difference
between simulations and basic ability tests was very small, though,
for cardiovascular endurance (␦⫽1.93 job simulation vs. ␦⫽1.87
basic ability). It should be noted that not enough studies existed on
job simulations of movement quality to merit a moderator analysis.
Although, upon initial assessment, these findings may suggest
that increased subgroup differences result from the use of job
simulation tests rather than basic ability tests, the most relevant
comparison to make is one that is conducted at the level of the
selection system rather than at the individual test level.
3
This is
because physical ability test developers are ultimately concerned
with developing selection systems rather than just individual tests.
In order to make comparisons at the system level between job
simulation tests and basic abilities tests, we conducted several
additional analyses. This was necessary because job simulation
tests are content-oriented measures that are broader in scope and
involve physical abilities across multiple dimensions, whereas
basic ability tests are construct-oriented tests that measure narrow
dimensions of physical ability. If the additional analyses on the job
simulations showed that the gap in performance between the sexes
3
We thank the editor and an anonymous reviewer for asking that we test
and interpret our results primarily at the selection system level rather than
at the individual test level.
Table 2
Examples of Nonmilitary Samples Included in the Meta-Analysis
Occupation Example studies
Police officers Arvey, Landon, Nutting, & Maxwell (1992)
Firefighters Sell (2006)
Emergency medical technicians Gamble et al. (1991)
FBI agents Knapik et al. (2011)
Steelworkers Arnold, Rauschenberger, Soubel, & Guion (1982)
Grocers Hogan, Ogden, & Fleishman (1979)
High pressure cleaners Hogan & Arneson (1988)
Habilitation therapists Hogan, Arneson, Hogan, & Jones (1986)
Coal miners Jackson & Osburn (1983)
Outdoor telephone crafters Reilly, Zedeck, & Tenopyr (1979)
Petrochemical workers Hogan & Pederson (1984)
Maintenance workers Denning (1984)
Municipal employees Nygård, Eskelinen, Suvanto, Tuomi, & Ilmarinen (1991)
Factory production workers Keyserling, Herrin, Chaffin, Armstrong, & Foss (1980)
Professional dancers Hamilton, Hamilton, Marshall, & Molnar (1992)
Construction workers Blakley, Quiñones, Crawford, & Jago (1994)
Correctional officers Jamnik, Thomas, Burr, & Gledhill (2010a);Jamnik, Thomas,
Shaw, & Gledhill (2010b)
Agricultural workers Nag, Sebastian, Ramanathan, & Chatterjee (1978)
Postal workers Van Der Beek, Kluver, Frings-Dresen, & Hoozemans (2000)
Beach lifeguards Reilly, Wooler, & Tipton (2006)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
629
SEX DIFFERENCES IN PHYSICAL ABILITY
Table 3
Reliability Coefficients Used for the Artifact Distribution Meta-Analysis
Study/test Sample size
Muscular strength
(r
xx
)
Cardio endurance
(r
xx
)
Movement quality
(r
xx
)
Arnold et al. (1982)
Leg dynamometer 249 .86
Back dynamometer 249 .91
Arm dynamometer 249 .94
Balance test 249 .45
Blakley et al. (1994)
Torso lift 13,008 .88
Arm lift 13,010 .91
Grip strength 12,985 .88
Shoulder lift 13,012 .93
Bronner & Ojofeitimi (2006)
Hip flexion 12 .88
Chaffin et al. (1978)
Isometric strength lifts 551 .87
Holzl et al. (2008)
Casualty transportation
a
134 .94
Repetitive box lifting
a
143 .82
Jamnik et al. (2010a);Jamnik et al. (2010b)
Emergency response circuit
a
86 .98
Knapik et al. (1981)
Isometric pull 270 .97
Misner et al. (1989; 1974 sample)
Hose couple
a
52 .39
Flexed arm hang 52 .79
Body lift/carry
a
52 .55
Obstacle run
a
52 .77
Stair climb
a
52 .66
Misner et al. (1989; 1983 sample)
Hose couple
a
59 .37
Dummy drag
a
59 .72
Body lift/carry
a
59 .79
Stair climb
a
59 .88
Myers et al. (1984)
Push task
a
122 .71
Carry task
a
123 .64
Torque task
a
104 .92
Lift task
a
123 .90
Upright pull 944 .86
Handgrip 946 .92
Lift to 60 in. 933 .95
Lift to 72 in. 931 .95
VO
2
maximum 662 .66
Richmond, Rayson, et al. (2008)
Shoveling
a
100 .86
Fire maneuver task
a
95 .96
Repetitive lift/carry bags
a
104 .96
Sharp et al. (2002)
VO
2
maximum (1978 sample) 244 .95
VO
2
maximum (1998 sample) 326 .95
Sharp et al. (1993)
Dead lift 40 .98
Weiner (1998a & 1998b; taken from Krueger, 1984)
Hand grip
Sit-ups 233 .89
Push-ups 233 .77
Shuttle run 233 .89
Long jump 233 .78
Bench press 233 .88
Sit and reach 233 .93
Mean .90 .81 .65
k12 6 3
N15,225 1,429 494
Note. Reliabilities for each ability type were aggregated within each independent sample prior to inclusion in the reliability meta-analyses.
a
Job simulation.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
630 COURTRIGHT ET AL.
for such tests was attributable to the multifaceted nature of job
simulation tests—which tap multiple subdimensions of workers’
strength—this would mean there is no meaningful sex difference
between selection system designs based on batteries of basic
ability tests versus selection system designs based on a job simu-
lation. If, however, the gap between selection systems was not
attributable to the multifaceted nature of job simulations, this
would indicate that selection systems based on job simulations
result in larger subgroup differences relative to selection systems
based on batteries of basic ability tests.
We began this analysis by meta-analytically estimating the
correlations among tests of basic ability subdimensions based on
our coding of these correlations from the studies in our database.
Results of this analysis are presented in Table 6, and the raw data
upon which these results are based can be found in Appendix C of
the online supplemental material. Our purpose in conducting this
analysis was to develop a correlation matrix among the subdimen-
sions of physical ability based upon the basic abilities tests. The
correlations in this matrix are necessary for creating composites of
the job simulations based on the underlying physical dimensions
being tested.
Our next step was to create a composite dvalue for each of the
job simulation tests. The purpose was to determine the magnitude
of the male–female difference that would be expected on each job
simulation test based upon the underlying dimensions of physical
ability that were being tapped. To create this composite dvalue,
we coded each job simulation test to reflect each subdimension
that it measured (mean dimensions ⫽3.2, SD ⫽1.3, minimum ⫽
1, maximum ⫽5). Each study was coded by two authors, with an
initial agreement of 88%. Following coding, we used the basic
ability intercorrelations and meta-analytic dvalues to create the
composite dvalue for each test. Our analyses had now provided us
with two dvalues for each job simulation test in our sample: (a) an
actual dvalue derived from the study data and (b) a composite d
value that was formulated based upon the underlying subdimen-
sions of physical ability which the job simulation tapped.
4
Our
final step was to meta-analyze the new, composite dvalues and
compare the estimate to a meta-analytic estimate of the dvalues
reported in the actual studies.
Our analysis revealed that the composite versus actual estimates
(d
comp
⫽1.85 vs. d
reported
⫽1.92; ␦
comp
⫽2.04 vs. ␦
reported
⫽
2.04) were very similar, suggesting that the reason the average job
simulation test results in greater sex differences relative to the
average basic ability test is simply that job simulations tap multiple
dimensions of physical ability. Thus, at the level of the selection
system, a selection system based upon a battery of basic ability
tests would be expected to result in identical sex differences to a
selection system designed around a job simulation that tapped the
same dimensions of physical ability. Nevertheless, the results of
this analysis should be interpreted with caution for several reasons:
first, a large number of studies did not report intercorrelations, so
our composites were based on meta-analytic versus primary-study
intercorrelations; second, some of the correlation coefficients were
derived from a single sample; and third, it was not feasible to
assign specific weights to each subdimension that was tapped by a
particular job simulation.
Specificity of Measurement Moderators
To analyze specificity of measurement as a moderator, we next
describe the results of our analyses conducted on subdimensions of
physical ability as well as on muscular strength in various regions
of the body. Results of the analyses appear in Table 7.
Subdimensions of meta-categories. Some specific subdi-
mensions of physical ability exhibited sex differences that were
even larger in magnitude than those present in the analyses of the
broad dimensions. Specifically, muscular tension was the subdi-
mension with the largest differences (␦⫽2.24). Sex differences
were smaller, however, for muscular endurance (␦⫽1.55) and
4
Actual and composite dvalues for each job simulation test can be
found in Appendix D of the online supplementary materials.
Table 4
Summary of Current Results Contrasted With Prior Estimates
Dimensions/subdimensions
Sackett & Wilk (1994) Hough et al. (2001) Current study
Mean dNo. of effects Mean dNo. of effects Mean dMean ␦No. of effects kN
Muscular strength n/a n/a 1.66 52 1.71 1.81 596 118 108,736
Muscular strength by body region
Upper body n/a n/a n/a n/a 1.88 1.98 254 95 103,856
Lower body n/a n/a n/a n/a 1.60 1.68 95 35 8,296
Body core n/a n/a n/a n/a 0.25 0.27 96 43 48,899
Total body n/a n/a n/a n/a 2.22 2.34 146 40 49,681
Subdimensions of muscular strength
Muscular tension 2.28 30 1.86 31 2.13 2.24 310 92 63,154
Muscular power 2.10 6 2.10 6 1.11 1.17 97 33 31,673
Muscular endurance 1.52 14 1.02 15 1.47 1.55 175 60 78,862
Cardiovascular endurance 1.27 12 1.09 13 1.81 2.01 184 83 73,968
Movement quality n/a n/a 0.20 21 ⫺0.05 ⫺0.06 70 31 6,387
Subdimensions of movement quality
Flexibility 0.05 10 ⫺0.64 11 ⫺0.15 ⫺0.18 56 29 6,322
Coordination 0.70 4 0.70 4 0.64 0.79 5 5 1,995
Balance 0.53 6 0.53 6 0.31 0.38 9 8 1,756
Note. Positive dand ␦values indicate better performance by males. Negative dand ␦values indicate better performance by females. The ␦symbol
represents a sample-size-weighted mean effect size corrected for sampling error and measurement error in the criterion.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
631
SEX DIFFERENCES IN PHYSICAL ABILITY
muscular power (␦⫽1.17). In terms of movement quality, coor-
dination (␦⫽.79) and balance (␦⫽.38) had larger effect sizes
than that of overall movement quality. Flexibility also had a larger
effect size than that of overall movement quality, and performance
on flexibility tests favored females (␦⫽–.18). Still, the 80%
credibility intervals around the effect sizes for balance and flexi-
bility included zero, suggesting that there are no meaningful sex
differences on those dimensions. Moreover, the number of studies
and sample sizes for the balance and coordination analyses were
quite small. Thus, we suggest an overall degree of caution in
interpreting these latter results.
Regions of the body. Sex differences were large for upper body
(␦⫽1.98) and lower body (␦⫽1.68) muscular strength tests.
However, total-body tests had the largest effect size (␦⫽2.34). An
interesting finding for the body region moderator analysis is the low
dvalue for core muscular strength (␦⫽.27) and the fact that the 80%
credibility interval for core muscular strength included zero. This d
value is by far the lowest within the muscular strength dimension.
Table 5
Individual Test-Level Sex Differences in Physical Ability
Category of analysis kNMales (N) Females (N) Effects dSD
d
␦SD
␦
80% Credibility
intervals
Lower Upper
Muscular strength
Job simulation 29 29,811 27,009 2,802 121 2.53 1.03 2.67 1.08 1.28 4.06
Job simulation (excluding outlier)
a
28 20,013 17,246 2,767 119 1.84 0.37 1.94 0.38 1.46 2.43
Basic ability 106 106,962 88,683 18,279 475 1.52 0.56 1.60 0.59 0.85 2.35
Cardiovascular endurance
Job simulation 12 26,362 24,789 1,573 20 2.03 0.44 2.26 0.48 1.64 2.87
Job simulation (excluding outlier)
a
11 15,664 15,026 1,538 18 1.73 0.26 1.93 0.28 1.56 2.29
Basic ability 76 48,008 37,663 10,345 164 1.68 0.54 1.87 0.59 1.12 2.62
Note. Positive dand ␦values indicate better performance by males. Negative dand ␦values indicate better performance by females. Column content
is as follows: k⫽number of independent samples; effects ⫽total number of effect sizes that factored into the calculation, contributing to the analysis (this
value does not represent independent effects); d⫽sample size weighted mean observed effect size; SD
d
⫽standard deviation of the observed mean effect
size; ␦⫽sample size weighted mean effect size corrected for sampling error and measurement error in the criterion; SD
␦
⫽standard deviation of the
population effect size. Lower and upper limits of the 80% credibility interval are for the corrected mean effect size.
a
Denotes exclusion of the outlier sample from the Misner et al. (1989) study.
Table 6
Meta-Analytic Correlations Between the Subdimensions of Physical Ability on Basic Ability Tests
Variable
Muscular
tension
Muscular
power
Muscular
endurance
Cardio
endurance Flexibility Coordination Balance
Muscular tension .48
k;N11; 15,863
80% Credibility interval [0.35, 0.61]
Muscular power .35 .67
k;N4; 1,186 2; 507
80% Credibility interval [0.22, 0.47] [0.67, 0.67]
Muscular endurance .37 .36 .43
k;N10; 2,815 5; 1,372 10; 2,928
80% Credibility interval [0.24, 0.51] [0.36, 0.36] [0.32, 0.54]
Cardio endurance .32 .28 .34 .65
k;N9; 3,064 4; 1,309 12; 3,690 6; 1,632
80% Credibility interval [0.12, 0.53] [0.08, 0.48] [0.18, 0.49] [0..55, 0.76]
Flexibility .11 .11 .24 .14 .37
a
k;N7: 1,776 5; 1,374 8; 2,223 8; 2,339 2; 115
80% Credibility interval [⫺0.01, 0.24] [0.04, 0.19] [0.10, 0.38] [0.02, 0.26] [0.37, 0.37]
Coordination .23 .28 .36 .19 .37 .26
a
k;N1; 739 1; 739 1; 739 1; 755 1; 739 2; 115
80% Credibility interval [0.23, 0.23] [0.28, 0.28] [0.36, 0.36] [0.19, 0.19] [0.37, 0.37] [0.26, 0.26]
Balance .10 ⫺.05
a
.14 .17 .04 .00 .15
a
k;N3; 587 2; 115 3; 514 4; 610 3; 587 2; 115 2; 115
80% Credibility interval [0.10, 0.10] [⫺0.05, ⫺0.05] [0.14, 0.14] [0.03, 0.30] [0.04, 0.04] [0.00, 0.00] [0.15, 0.15]
Note. The correlation coefficients in the table are uncorrected. The values immediately beneath each correlation coefficient contain the number of
respective studies (k) and participants (N) upon which the correlations were based. The lower and upper limits of the 80% credibility interval are for the
uncorrected correlation. Correlation coefficients were obtained from the basic ability tests in the studies in the meta-analysis that had mixed (both male and
female) samples.
a
No mixed sample (both male and female) correlation value was found in a study in the meta-analysis, so correlation coefficients obtained from male-only
samples are provided.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
632 COURTRIGHT ET AL.
Training Moderator
We analyzed 21 different independent samples that included
pre- and posttraining scores on physical ability tests across sex.
The studies that were included in the training analysis are denoted
in Appendix A (online). Eighteen of these studies came from the
military where recruits engaged in 6 –16 weeks of basic combat
training. The other three studies came from public safety (police
recruit training and FBI training) and professional dancing con-
texts. The training programs for each study were similar in that
they entailed intense training on the physical abilities being tested.
In each study, men and women went through training simultane-
ously in the same location and were thus exposed to similar
environmental factors.
Our examination of the impact of training on sex differences in
physical ability entailed two analyses. First, we meta-analyzed the
improvement scores of female and male participants to determine
the differential improvement gains made by each sex following a
training program. Second, we meta-analyzed difference scores in
physical ability before and after participants completed a training
program in order to evaluate the extent to which sex differences
were minimized as a result of training. The former analysis al-
lowed us to determine which sex gained the most improvement
from a training program; the latter analysis allowed us to
determine whether gains from training mitigated or exacerbated
sex differences. Results of the training analyses are presented
in Table 8.
Improvement scores. We had an adequate number of studies
with which to evaluate the impact of training on both the muscular
strength and cardiovascular endurance dimensions of physical
ability. In both cases, females improved more than males after
engaging in training programs. Specifically, improvement gains
were as follows: muscular strength (female ␦⫽1.13; male ␦⫽
.84) and cardiovascular endurance (female ␦⫽1.10; male ␦⫽
.76).
Effects of training on overall sex differences. Even though
females improved more than males did as a result of training, we
also investigated whether this gain ultimately served to decrease
sex differences in physical ability test scores. Our findings re-
vealed mixed results in this regard. While the sex-difference gap in
muscular strength narrowed as a result of training (pretraining ␦⫽
1.44; posttraining ␦⫽1.28), the gap actually increased for car-
diovascular endurance tests (pretraining ␦⫽1.90; posttraining ␦⫽
2.36).
Discussion
Two core contributions of this study are, first, to provide the
most accurate estimates of sex differences in physical ability to
date in the literature and, second, to identify key moderators of sex
differences in physical ability in an effort to provide conceptual
clarification to the literature and offer insight into methods of
reducing sex differences on physical ability test scores.
Regarding the first contribution, our study shows that large sex
differences exist for muscular strength and cardiovascular endur-
ance abilities. However, no significant sex differences exist for
movement quality ability. For several different reasons, the esti-
mates provided in the current study represent the most accurate
Table 7
Specificity of Measurement as a Moderator of Sex Differences in Physical Ability
Category of analysis kNMales (N) Females (N) Effects dSD
d
␦SD
␦
80%
Credibility
intervals
Lower Upper
Muscular strength 118 108,736 90,099 18,637 596 1.71 0.72 1.81 0.75 0.85 2.77
Muscular strength (excluding outlier)
a
117 98,938 80,336 18,602 593 1.57 0.58 1.66 0.61 0.87 2.44
Muscular strength by body region
Upper body 95 103,856 87,202 16,654 254 1.88 0.50 1.98 0.52 1.31 2.65
Lower body 35 8,296 6,075 2,221 95 1.60 0.48 1.68 0.48 1.06 2.30
Body core 43 48,899 9,009 9,890 96 0.25 0.34 0.27 0.35 -0.18 0.32
Total body 40 49,681 44,080 5,601 146 2.22 1.01 2.34 1.06 0.98 3.70
Total body (excluding outlier)
a
39 39,883 34,317 5,566 144 1.79 0.60 1.89 .62 1.09 2.69
Subdimensions of muscular strength
Muscular tension 92 63,154 50,806 12,348 310 2.13 0.62 2.24 0.65 1.42 3.07
Muscular endurance 60 78,862 66,594 12,268 175 1.47 1.28 1.55 1.34 ⫺0.16 3.27
Muscular endurance (excluding outlier)
a
59 69,064 56,831 12,233 173 1.09 0.83 1.15 0.88 0.03 2.28
Muscular power 33 31,673 28,871 2,802 97 1.11 0.45 1.17 0.47 0.57 1.77
Cardiovascular endurance 83 73,968 62,168 11,800 184 1.81 0.53 2.01 0.58 1.27 2.75
Cardiovascular endurance (excluding outlier)
a
82 64,170 52,405 11,765 182 1.70 0.48 1.88 0.52 1.22 2.55
Movement quality 31 6,387 4,752 1,635 70 ⫺0.05 0.65 ⫺0.06 0.79 ⫺1.06 0.95
Subdimensions of movement quality
Flexibility 29 6,322 4,717 1,605 56 ⫺0.15 0.67 ⫺0.18 0.81 ⫺1.22 0.85
Balance 8 1,756 1,294 462 9 0.31 0.32 0.38 0.36 ⫺0.07 0.84
Coordination 5 1,995 1,670 325 5 0.64 0.50 0.79 0.60 0.02 1.56
Note. Positive dand ␦values indicate better performance by males. Negative dand ␦values indicate better performance by females. Column content is
as follows: k⫽number of independent samples; effects ⫽total number of effect sizes that factored into the calculation, contributing to the analysis (this
value does not represent independent effects); d⫽sample-size-weighted mean observed effect size; SD
d
⫽standard deviation of the observed mean effect
size; ␦⫽sample-size-weighted mean effect size corrected for sampling error and measurement error in the criterion; SD
␦
⫽standard deviation of the
population effect size. Lower and upper limits of the 80% credibility interval are for the corrected mean effect size.
a
Denotes exclusion of the outlier sample from the Misner et al. (1989) study.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
633
SEX DIFFERENCES IN PHYSICAL ABILITY
estimates of sex differences in physical abilities. These reasons
include substantially increasing the number of studies used to
compute the estimates, focusing exclusively on adult occupational
samples, and correcting for sampling and measurement error in
computing estimates. In making these methodological improve-
ments, we witnessed several notable changes from prior estimates
(Hough et al., 2001;Sackett & Wilk, 1994). In particular, the
magnitude of the dvalue for cardiovascular endurance in our study
was significantly higher than that of prior estimates. We also found
a higher dvalue in muscular strength relative to past studies.
Equally important, we found very little sex differences in move-
ment quality, whereas other studies had shown that such tests
slightly favor men.
Regarding the second contribution of our study, we uncovered
several interesting findings regarding moderators of sex differ-
ences in physical ability. First, in relation to how selection systems
moderate sex differences, we found that a composite of basic
ability tests resulted in a virtually identical dvalue as a job
simulation. This suggests, then, that at the system level, develop-
ing a selection system that emphasizes basic ability tests over a job
simulation (or vice versa) does not reduce sex differences. In terms
of other moderators, we found that sex differences often dimin-
ished as the measurement of physical abilities became narrower
(i.e., more specific). For example, while sex differences were
greatest for muscular tension, they decreased substantially in mus-
cular endurance and muscular power measurement. Sex differ-
ences decreased even further when muscular strength tests were
specifically geared toward measuring strength in the core area of
the body (i.e., lumbar spine and abdominal areas). However,
differences increased dramatically with tests that focused on total
body strength. Finally, we found that test training was advanta-
geous to women in the sense that women showed higher improve-
ment scores than men on muscular strength and cardiovascular
endurance tests. However, posttraining dvalues on muscular
strength decreased only slightly, and the gap between males’ and
females’ performance on cardiovascular endurance tests actually
increased after training. In prior scholarship, it has been noted that
subgroup members’ performance at an initial time point can have
an impact on such members’ performance at subsequent episodes
through regression to the mean or ceiling effects (Reilly, Smither,
& Vasilopoulos, 1996;Smither et al., 1995;Walker & Smither,
1999), so these findings in the training analysis may be attributable
to males initially scoring well more than 1 standard deviation
higher than females at pretraining.
Implications for Research and Practice
Our findings on the magnitude of sex differences in muscular
strength and cardiovascular endurance essentially confirm Ploy-
hart et al.’s (2006) assertion that sex differences in these two
abilities are higher than in any other subgroup difference on any
other human ability or characteristic relevant to personnel selec-
tion. For example, while there has been a great deal of discussion
among scholars and the media concerning the d⫽0.99 Black–
White difference in cognitive ability (Roth, Bevier, Bobko, Swit-
zer, & Tyler, 2001), our study shows that substantially larger sex
Table 8
Training as a Moderator of Sex Differences in Physical Ability
Category of analysis kN N
1
N
2
Effects dSD
d
␦SD
␦
80% Credibility
interval
Lower Upper
Training improvement by sex
Muscular strength
Females 12 6,496 3,254 3,242 39 1.07 0.58 1.13 0.59 0.37 1.89
Males 12 13,669 6,854 6,815 39 0.80 0.47 0.84 0.49 0.22 1.47
Cardiovascular endurance
Females 13 4,976 2,486 2,490 19 1.06 0.41 1.10 0.40 0.59 1.61
Males 13 10,450 5,222 5,228 19 0.74 0.33 0.76 0.33 0.34 1.18
Effect of training on sex differences
Muscular strength
Time 1 sex difference
(pretraining) 15 12,093 7,979 4,114 45 1.36 0.31 1.44 0.31 1.04 1.84
Time 2 sex difference
(posttraining) 15 11,810 7,815 3,995 45 1.21 0.41 1.28 0.43 0.73 1.83
Cardiovascular endurance
Time 1 sex difference
(pretraining) 16 9,693 6,347 3,346 22 1.84 0.57 1.90 0.58 1.15 2.64
Time 2 sex difference
(posttraining) 16 9,471 6,228 3,243 22 2.29 0.38 2.36 0.38 1.88 2.85
Note. For the “training improvement by sex” analysis, positive dand ␦values indicate improved performance from Time 1 to Time 2. For the “effect
of training on sex differences” analysis, positive dand ␦values indicate better performance by males. Negative dand ␦values indicate better performance
by females. Column content is as follows: k⫽number of independent samples; N⫽total sample size; N
1
for training improvement by sex ⫽total number
of males/females in the analysis at Time 1; N
1
for effect of training on sex differences ⫽total number of males in the analysis; N
2
for training improvement
by sex ⫽total number of males/females in the analysis at Time 2; N
2
for effect of training on sex differences ⫽total number of females in the analysis;
effects ⫽total number of effect sizes that factored into the calculation, contributing to the analysis (this value does not represent independent effects);
d⫽sample-size-weighted mean observed effect size; SD
d
⫽standard deviation of the observed mean effect size; ␦⫽sample-size-weighted mean effect
size corrected for sampling error and measurement error in the criterion; SD
␦
⫽standard deviation of the population effect size. Lower and upper limits
of the 80% credibility interval are for the corrected mean effect size.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
634 COURTRIGHT ET AL.
differences exist in muscular strength (␦⫽1.81) and cardiovas-
cular endurance (␦⫽2.01). However, compared with the amount
of attention devoted to racial differences on cognitive ability tests,
far less attention has been given by industrial– organizational psy-
chologists or the media to sex differences on physical ability tests.
Hence, our study provides the necessary impetus for further dis-
cussion on this topic by scholars and practitioners alike.
Interestingly, however, the lack of sex differences in movement
quality suggests that to the degree supported by job analyses,
organizations might consider adopting tests measuring movement
quality ability, particularly those measuring flexibility, in order to
potentially reduce adverse impact from an overall test battery. In
that sense, we suggest that job analysts take a highly detailed
approach to classifying the physical abilities needed for a partic-
ular physically demanding job. For example, it seems reasonable
to suggest that many of the jobs listed in Table 2 require movement
quality abilities to some extent. Nevertheless, most of the studies
in our meta-analysis included assessments of only muscular
strength and/or cardiovascular endurance. A highly detailed job
analysis, however, could reveal the validity of movement quality
and thereby suggest the development of selection instruments that
assess movement quality.
Although elucidating the large sex differences in physical abil-
ities has significant implications for research and practice, Hough
et al. (2001) argued over a decade ago that “a real science of
group-level differences on predictor measures will require an un-
derstanding of . . .the substantive causes for group differences,
when to expect to find group differences, and to what extent group
differences can change” (p. 183). Although this type of research is
becoming increasingly common in the realm of cognitive ability
(Ployhart & Holtz, 2008), it has largely been overlooked in the
area of physical ability testing. In our view, this oversight repre-
sents a critical omission in the research literature, given the prev-
alence of physical ability testing in physically demanding occupa-
tions and the controversy surrounding the use of such tests
(Ployhart et al., 2006;Terpstra et al., 1999). Our study thus
contributes specifically to the physical ability testing literature by
being the first study in which the efficacy of various methods for
potentially reducing sex differences was empirically summarized.
First, as noted earlier, our study demonstrates that at the system
level, a composite of basic ability tests yields the same degree of
sex differences as a simulation-based selection system. This find-
ing supports Gebhardt & Baker’s (2010b) conceptual argument
that when test developers decide between designing a selection
system emphasizing basic ability tests or a job simulation, orga-
nizations may not need to give a large amount of consideration as
to which type of system could result in greater adverse impact.
Rather, when choosing between different selection systems, orga-
nizations might give stronger consideration to factors such as
applicant reactions, the validity of specific tests for the exact job
being hired for, the reliability of the tests, the respective legal
climate, and cost concerns.
Second, compared with broad ability measures, narrow ability
measures—particularly tests of muscular power, muscular endur-
ance, and core muscular strength—were found to result in sub-
stantially less sex differences. Thus, while Hogan’s (1991b) tax-
onomy works extremely well to simplify the structure of physical
abilities, the finding that subdimensions of the meta-categories in
Hogan’s taxonomy exhibited very different dvalues shows that
organizations may benefit from a more fine-grained, lower order
taxonomy of physical abilities on which to base job analyses when
seeking to reduce sex differences (Gebhardt & Baker, 2010b).
However, whether organizations ultimately choose to adopt nar-
row basic ability tests over broad ability tests obviously depends
on the criterion in question as well as the job analysis on which a
battery of tests is based.
Third, physical fitness training appears highly beneficial for
improving women’s scores on muscular strength and cardiovascu-
lar endurance tests; however, overall sex differences on such tests
either differ slightly or, in the case of cardiovascular endurance,
are exacerbated with test training. Nonetheless, if all that is needed
to be selected into an organization and perform a job successfully
is to meet a set of minimum physical ability standards, then
training (either preemployment or postemployment) may be useful
for reducing adverse impact. In other words, training increases the
probability of females meeting minimum cutoff scores, thereby
potentially reducing adverse impact. This suggests that organiza-
tions should look for ways to target physical ability training
specifically toward women. For example, to the extent that it is
legally defensible, organizations could use resources to advertise
and promote training programs toward women. Indeed, the Los
Angeles Police Department offers both a candidate assistance
program for all prospective candidates as well as a candidate
assistance program targeted specifically toward women (Los An-
geles Police Department, n.d.).
Our study also has several implications for the adverse impact
literature at large. In particular, several of the moderators of sex
differences on physical ability tests examined in our study are
analogous to moderators discussed in the literature for reducing
subgroup differences in cognitive ability across certain racial
groups. Making comparisons between moderators of subgroup
differences on cognitive and physical ability tests allows research-
ers and practitioners to potentially identify complementary strate-
gies for reducing the adverse impact of both types of tests in
physically demanding occupations. For example, whether cogni-
tive ability should be assessed via paper-and-pencil tests—which
are analogous to basic ability tests because they are construct-
oriented— or work samples—which are analogous to job simula-
tions because they are content-oriented—is an issue in the cogni-
tive ability literature. Although research has generally found that
work samples help to reduce adverse impact of cognitive ability
tests, when viewed at the system level, construct- and content-
oriented approaches to designing physical ability selection systems
yield virtually identical sex differences. Another comparison that
can be made between the physical and cognitive ability domains is
in the use of narrow measures to reduce subgroup differences.
Ployhart and Holtz (2008) suggested, for instance, that using
narrow measures of ability meaningfully reduces subgroup differ-
ences on cognitive ability. Our results essentially mirror these
findings in the realm of physical ability testing. Furthermore,
another method of reducing adverse impact in the cognitive ability
domain is to assess other knowledge, skills, and abilities that tend
to engender small subgroup differences so that an overall predictor
battery has less adverse impact. As noted earlier, an implication of
our study is that assessments of movement quality could poten-
tially reduce sex differences in a battery of tests. Last, Ployhart and
Holtz (2008) noted that while training and practice opportunities
can sometimes help minority groups score better on cognitive
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
635
SEX DIFFERENCES IN PHYSICAL ABILITY
ability tests, training changes subgroup differences on cognitive
ability only slightly (Kulik, Bangert-Drowns, & Kulik, 1984).
Once again, our results mirrored these findings in the realm of
physical ability testing.
In short, our purpose in comparing our findings to those in the
cognitive ability literature is to suggest that in the future, research-
ers should explore potentially complementary methods for reduc-
ing adverse impact from physical and cognitive ability tests. Such
research would allow, in turn, for a more integrative perspective on
ways to potentially reduce adverse impact of human ability tests
relevant for selection. We view this integrative effort as an inter-
esting and important research endeavor given that cognitive and
physical ability tests are often used jointly in physically demand-
ing occupations (Henderson, 2010;Henderson et al., 2007), and as
seen in this study, some efforts to reduce adverse impact that work
for cognitive ability tests may or may not directly translate to the
realm of physical ability testing.
Limitations and Future Directions
We should note several limitations of our study. First, sex was
the only subgroup difference on physical abilities that we exam-
ined. Some studies have shown that racioethnic and age differ-
ences exist on certain physical abilities as well (Arvey et al., 1992;
Blakley, Quiñones, Crawford, & Jago, 1994). However, a lack of
available data precluded us from meta-analytically examining
other subgroup differences in physical ability. Hence, we suggest
this as an area meriting future research. In particular, more com-
plex analyses in the future could focus on race-by-sex differences
or even race-by-age-by-sex differences.
Second, there is a potential trade-off between validity and ad-
verse impact that we were not able to explicitly examine with
regard to using narrow measures of muscular strength. For exam-
ple, although core muscular strength tests result in substantially
reduced sex differences, they may possess less validity than tests
measuring overall muscular strength. At the same time, however,
they could possess stronger validity than overall muscular strength
depending on the specificity of the criterion. Either way, we were
not able to assess whether narrow muscular strength measures
relative to body region have stronger validity. This question re-
mains an important topic for future research.
Third, our analysis of training as a moderator of sex differences
has a few inherent weaknesses. In particular, none of the primary
studies we examined incorporated control groups. Thus, we cannot
be fully sure that training was the only factor that caused improve-
ments in males’ and females’ scores on muscular strength and
cardiovascular endurance tests. However, these concerns are mit-
igated to a certain degree by the fact that men and women in the
primary studies went through their respective trainings at the same
time and in the same location. Furthermore, in each study, men and
women completed the exact same tests in the exact same period of
time. Male and female training participants thus experienced
nearly identical training conditions. Furthermore, given the rigidity
and relatively short time frame of the training studies, it is difficult
to identify other factors that may have contributed to males’ and
females’ improvement scores from training. Still, we suggest that
future studies assess the impact of training on sex differences in
physical ability with control groups in order to rule out other
extraneous factors that could be causing score improvements on
physical ability tests.
Finally, as noted previously, we did not correct for range re-
striction when computing sex differences in physical abilities
because many studies used unique or customized physical ability
tests and did not report normative data for these tests. Furthermore,
we could not locate any single study that reported sex differences
in both an incumbent and an applicant sample. These limitations in
the primary studies made it impossible to estimate u
x
values and,
in turn, correct for range restriction.
Conclusion
In this study, we have provided the most accurate estimates of
sex differences in physical ability within work contexts, and we
have identified various moderators of these differences. As phys-
ically demanding occupations continue to hold an important role in
labor economies globally and as more women enter physically
demanding occupations, we hope our study will be useful to
researchers and practitioners who seek to investigate ways to
leverage the validity of physical ability tests while employing
evidence-based strategies for potentially reducing sex differences
on such tests.
References
References marked with an asterisk indicate studies included in the
meta-analysis. References marked with a single dagger indicate that the
original report is not available and that the data were taken from Hogan
(1991a). References marked with a double dagger indicate that the original
report is not available and that the data were provided by the U.S. Army
Center for Health Promotion and Preventive Medicine.
ⴱAgrawal, K. N., Singh, R. K. P., & Satapathy, K. K. (2009). Isometric
strength of agricultural workers of Meghalaya: A case study of an Indian
population. International Journal of Industrial Ergonomics, 39, 919 –
923. doi:10.1016/j.ergon.2009.06.008
ⴱAgrawal, K. N., Tiwari, P. S., Gite, L. P., & Bhushanababu, V. (2010).
Isometric push/pull strength of agricultural workers of central India.
Agricultural Engineering International: CIGR Journal, 12, 1–12.
Akuthota, V., & Nadler, S. (2004). Core strengthening. Archives of Phys-
ical Medicine and Rehabilitation, 85(Suppl. 1), S86 –S92. doi:10.1053/
j.apmr.2003.12.005
ⴱAnderson, G. S., & Plecas, D. B. (2000). Predicting shooting scores from
physical performance data. Policing: An International Journal of Police
Strategies & Management, 23, 525–537. doi:10.1108/
13639510010355611
ⴱArnold, J. D., Rauschenberger, J. M., Soubel, W. G., & Guion, R. M.
(1982). Validity and utility of a strength test for selecting steelworkers.
Journal of Applied Psychology, 67, 588 – 604. doi:10.1037/0021-9010
.67.5.588
ⴱArvey, R. D., Landon, T. E., Nutting, S. M., & Maxwell, S. E. (1992).
Development of physical ability tests for police officers: A construct
validation approach. Journal of Applied Psychology, 77, 996 –1009.
doi:10.1037/0021-9010.77.6.996
Astrand, P., Rodahl, K., Dahl, H. A., & Stromine, S. G. (2003). Textbook
of work physiology (4th ed.). Champaign, IL: Human Kinetics.
ⴱAyoub, M. M., Bethea, N. J., Deivanayagam, S., Asfour, S. S., Bakken,
G. M., Liles, D.,...Sherif, M. (1978). Determination and modeling of
lifting capacity (Final report, HEW [NIOSH] Grant 5R01-OH-00545–
02). Washington, DC: Government Printing Office.
Barrett, G. V., Polomsky, M. D., & McDaniel, M. A. (1999). Selection
tests for firefighters: A comprehensive review and meta-analysis. Jour-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
636 COURTRIGHT ET AL.
nal of Business and Psychology, 13, 507–513. doi:10.1023/A:
1022966820186
ⴱBeckett, M. B., & Hodgdon, J. A. (1987). Lifting and carrying capacities
relative to physical fitness measures (Report No. 87–26). San Diego,
CA: Naval Health Research Center.
ⴱBell, N. S., Mangione, T. W., Hemenway, D., Amoroso, P. J., & Jones,
B. H. (2000). High injury rates among female Army trainees: A function
of gender? American Journal of Preventative Medicine, 18(Suppl. 1),
141–146. doi:10.1016/S0749-3797(99)00173-7
ⴱBilzon, J. L. J., Scarpello, E. G., Smith, C. V., Ravenhill, N. A., &
Rayson, M. P. (2001). Characterization of the metabolic demands of
simulated shipboard Royal Navy fire-fighting tasks. Ergonomics, 44,
766 –780.
ⴱBlacker, S. D., Wilkinson, D. M., & Rayson, M. P. (2009). Gender
differences in the physical demands of British army recruit tactics.
Military Medicine, 174, 811– 816.
ⴱBlakley, B. R., Quiñones, M. A., Crawford, M. S., & Jago, I. A. (1994).
The validity of isometric strength tests. Personnel Psychology, 47,
247–274. doi:10.1111/j.1744-6570.1994.tb01724.x
ⴱBoyce, R. W., Ciulla, S., Jones, G. R., Boone, E. L., Elliott, S. M., &
Combs, C. S. (2008). Muscular strength and body composition compar-
ison between the Charlotte–Mecklenburg fire and police department.
International Journal of Exercise Science, 1, 125–135.
ⴱBronner, S., & Ojofeitimi, S. (2006). Gender and limb differences in
healthy elite dancers: Passé kinematics. Journal of Motor Behavior, 38,
71–79. doi:10.3200/JMBR.38.1.71-79
Bureau of Labor Statistics. (2010). Women in the labor force: A databook.
Retrieved from http://www.bls.gov/cps/wlf-databook.pdf
Bureau of Labor Statistics. (2011). Employed persons by detailed occupa-
tion, sex, race, and Hispanic or Latino ethnicity. Retrieved from http://
www.bls.gov/cps/cpsaat11.pdf
ⴱBurton, S. (2006). Performance and injury predictability during fire-
fighter candidate training (Doctoral dissertation). Virginia Polytechnic
Institute and State University, Blacksburg, VA.
ⴱByström, S., Hall, C., Welander, T., & Kilbom, A. (1995). Clinical
disorders and pressure-pain threshold of the forearm and hand among
automobile assembly line workers. Journal of Hand Surgery, 20, 782–
790.
ⴱChaffin, D. B., Herrin, G. D., & Keyserling, W. M. (1978). Preemploy-
ment strength testing: An updated position. Journal of Occupational
Medicine, 20, 403– 408.
ⴱCohen, J. L.., Segal, K. R., Witriol, I., & McArdle, W. D. (1982).
Cardiorespiratory response to ballet exercise and the VO
2
max of elite
ballet dancers. Medicine & Science in Sports and Exercise, 14, 212–217.
doi:10.1249/00005768-198203000-00011
Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative
theory of training motivation: A meta-analytic path analysis of 20 years
of research. Journal of Applied Psychology, 85, 678 –707. doi:10.1037/
0021-9010.85.5.678
Colquitt, J. A., LePine, J. A., Zapata, C. P., & Wild, R. E. (2011). Trust in
typical and high-reliability contexts: Building and reacting to trust
among firefighters. Academy of Management Journal, 54, 999 –1015.
doi:10.5465/amj.2006.0241
Cooper Institute for Aerobics Research. (n.d.). Physical fitness assessment.
Dallas, TX: Author.
ⴱCopay, A. G., & Charles, M. T. (1998). Police academy fitness training at
the Police Training Institute, University of Illinois. Policing: An Inter-
national Journal of Police Strategies & Management, 21, 416 – 431.
doi:10.1108/13639519810228732
ⴱCosgrove, R. J. (2006). A study of New Jersey State Police Physical
Qualification Test and its relationship to leadership, organizational
decision making, and policy implementation (Unpublished doctoral dis-
sertation). Seton Hall University, South Orange, NJ.
ⴱCox, M., Shephard, R. J., & Corey, P. (1981). Influence of an employee
fitness programme upon fitness, productivity and absenteeism. Ergo-
nomics, 24, 795– 806. doi:10.1080/00140138108924900
ⴱDaniels, W. L., Kowal, D. M., Vogel, J. A., & Stauffer, R. M. (1979).
Physiological effects of a military training program on male and female
cadets. Aviation Space and Environmental Medicine, 50, 562–566.
Dean, M. A., Roth, P. L., & Bobko, P. (2008). Ethnic and gender subgroup
differences in assessment center ratings: A meta-analysis. Journal of
Applied Psychology, 93, 685– 691. doi:10.1037/0021-9010.93.3.685
ⴱ†Denning, D. L. (1984, August). Applying the Hogan model of physical
performance of occupational tasks. Paper presented at the American
Psychological Association annual convention, Toronto, Ontario, Can-
ada.
Faries, M. D., & Greenwood, M. (2007). Core training: Stabilizing the
confusion. Strength and Conditioning Journal, 29, 10 –25.
ⴱFitzgerald, P. I., Vogel, J. A., Daniels, W. L., Dziados, J. E., Teves, M. A.,
Mello, R. P., & Reich, P. J. (1986). The body composition project: A
summary report and descriptive data (Report No. T 5/87). Natick, MA:
U.S. Army Research Institute of Environmental Medicine.
Fleishman, E. A. (1962). The dimensions of physical fitness: The nation-
wide normative and developmental study of basic tests. (Tech. Report
No. 4). New Haven, CT: Yale University.
Fleishman, E. A., & Mumford, M. D. (1989). Abilities as causes of
individual differences in skill acquisition. Human Performance, 2, 201–
223. doi:10.1207/s15327043hup0203_4
Fleishman, E. A., & Quaintance, M. K. (1984). Taxonomies of human
performance. New York, NY: Academic Press.
ⴱGamble, R. P., Stevens, A. B., McBrien, H., Black, A., Cran, G. W., &
Boreham, C. A. G. (1991). Physical fitness and occupational demands of
the Belfast ambulance service. British Journal of Industrial Medicine,
48, 592–596.
Gebhardt, D. L., & Baker, T. A. (2007). Physical performance assessment.
In S. G. Rogelberg (Ed.), Encyclopedia of industrial and organizational
psychology (pp. 179 –199). Thousand Oaks, CA: Sage. doi:10.4135/
9781412952651.n239
Gebhardt, D. L., & Baker, T. A. (2010a). Physical performance. In J. C.
Scott & D. H. Reynolds (Eds.), Handbook of workplace assessment (pp.
165–197).San Francisco, CA: Jossey–Bass.
Gebhardt, D. L., & Baker, T. A. (2010b). Physical performance tests. In
J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp.
277–298). New York, NY: Routledge.
ⴱGollub, L. R. (1992). Validity of pre-employement physical ability test for
predicting industrial injuries (Unpublished doctoral dissertation). Uni-
versity of Houston, Houston, TX.
Gottfredson, L. S. (1997). Intelligence and social policy. Intelligence, 24,
203–320.
ⴱHamilton, W. G., Hamilton, L. H., Marshall, P., & Molnar, M. (1992). A
profile of the musculoskeletal characteristics of elite professional ballet
dancers. American Journal of Sports Medicine, 20, 267–273. doi:
10.1177/036354659202000306
Henderson, N. D. (2010). Predicting long-term firefighter performance
from cognitive and physical ability measures. Personnel Psychology, 63,
999 –1039. doi:10.1111/j.1744-6570.2010.01196.x
Henderson, N. D., Berry, M. W., & Matic, T. (2007). Field measures of
strength and fitness predict firefighter performance on physically de-
manding tasks. Personnel Psychology, 60, 431– 473. doi:10.1111/j
.1744-6570.2007.00079.x
ⴱHodgdon, J. A., Beckett, M. B., Sopchick, T., Prusaczyk, W. K., &
Goforth, H. W. (1998). Physical fitness requirements for explosive
ordnance disposal divers. (Report No. 98 –31). San Diego, CA: Naval
Health Research Center.
Hoffman, C. C. (1999). Generalizing physical ability test validity: A case
study using test transportability, validity generalization, and construct-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
637
SEX DIFFERENCES IN PHYSICAL ABILITY
related validation evidence. Personnel Psychology, 52, 1019 –1041. doi:
10.1111/j.1744-6570.1999.tb00188.x
ⴱHoffman, T., Stauffer, R. W., & Jackson, A. S. (1979). Sex difference in
strength. American Journal of Sports Medicine, 7, 265–267. doi:
10.1177/036354657900700415
Hogan, J. C. (1991a). Physical abilities. In M. D. Dunnette & L. M. Hough
(Eds.), Handbook of industrial and organizational psychology (Vol. 2,
pp. 753– 831). Palo Alto, CA: Consulting Psychologists Press.
Hogan, J. C. (1991b). Structure of physical performance in occupational
tasks. Journal of Applied Psychology, 76, 495–507. doi:10.1037/0021-
9010.76.4.495
Hogan, J. C. (1994). Theoretical and applied developments in models of
individual differences: Physical abilities. In M. G. Rumsey, C. B.
Walker, & J. H. Harris (Eds.), Personnel selection and classification (pp.
233–245). Hillsdale, NJ: Erlbaum.
ⴱ†Hogan, J. C., & Arneson, S. (1988). Development and validation of
physical ability tests for high pressure cleaning worker jobs. Tulsa, OK:
University of Tulsa.
ⴱ†Hogan, J. C., Arneson, S., Hogan, R., & Jones, S. (1986). Development
and validation of personnel selection tests for the habilitation therapist
job. Tulsa, OK: Hogan Assessment Systems.
ⴱHogan, J. C., & Hogan, R. (1985). Psychological and physical perfor-
mance characteristics of successful explosive ordnance diver techni-
cians. (Report No. AD-A157 947). Arlington, VA: Office of Naval
Research.
ⴱHogan, J. C., Hogan, R., & Briggs, S. (1984). Psychological and physical
performance factors associated with attrition in explosive ordnance
disposal training. (Report No. AD-A141 475). Arlington, VA: Office of
Naval Research.
ⴱ†Hogan, J. C., Ogden, G. D., & Fleishman, E. A. (1979). The development
and validation of tests for the order selector job at Certified Grocers of
California, Ltd. Washington, DC: Advanced Research Resources Orga-
nization.
ⴱ†Hogan, J., & Pederson, K. (1984). Validity of physical tests for selecting
petrochemical workers. Unpublished manuscript.
Hogan, J. C., & Quigley, A. M. (1986). Physical standards for employment
and the courts. American Psychologist, 41, 1193–1217. doi:10.1037/
0003-066X.41.11.1193
Hogan, J. C., & Quigley, A. M. (1994). Effects of preparing for physical
ability tests. Public Personnel Management, 23, 85–104.
ⴱHölzl, T., Hofmann, P., Rausch, W., & Zeilinger, M. (2008). Gender
differences and their impact on physical performance in soldiers of the
Austrian Armed Forces. (Report No. MP-HFM-158 –15). Brussels, Bel-
gium: North Atlantic Treaty Organization.
ⴱHostler, D., Bednez, J. C., Kerin, S., Reis, S. E., Kong, P. W., Morley, J.,
. . . Suyama, J. (2010). Comparison of rehydration regimens for reha-
bilitation of firefighters performing heavy exercise in thermal protective
clothing: A report from the Fireground Rehab Evaluation (Fire) Trial.
Prehospital Emergency Care, 14, 194 –201. doi:10.3109/
10903120903524963
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants,
detection and amelioration of adverse impact in personnel selection
procedures: Issues, evidence, and lessons learned. International Journal
of Selection and Assessment, 9, 152–194. doi:10.1111/1468-2389.00171
Hunsicker, P. A., & Reiff, G. G. (1976). Youth fitness test manual.
Washington, DC: American Alliance for Health, Physical Education,
and Recreation.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative
predictors of job performance. Psychological Bulletin, 96, 72–98. doi:
10.1037/0033-2909.96.1.72
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Cor-
recting error and bias in research findings (2nd ed.). Thousand Oaks,
CA: Sage.
ⴱImrhan, S. N., Jenkins, G. K., & Townes, M. (1992). The effect of
forearm orientation on wrist-turning strength. In S. Kumar (Ed.), Ad-
vances in industrial ergonomics and safety (Vol. 4, pp. 687– 691).
London, England: Taylor Francis.
ⴱ
†Jackson, A. S., & Osburn, H. G. (1983). Validity of isometric strength
tests for predicting performance in underground coal mining tasks.
Houston: Shell Oil, Employment Services.
ⴱJamnik, V. K., Thomas, S. G., Burr, J. F., & Gledhill, N. (2010a).
Construction, validation, and derivation of performance standards for a
fitness test for correctional officer applicants. Applied Physiology, Nu-
trition, and Metabolism, 35, 59 –70. doi:10.1139/H09-122
ⴱJamnik, V. K., Thomas, S. G., Shaw, J. A., & Gledhill, N. (2010b).
Identification and characterization of the critical physically demanding
tasks encountered by correctional officers. Applied Physiology, Nutrition
& Metabolism: Clinical and Experimental, 35, 45–58. doi:10.1139/H09-
121
ⴱJette, M., Sidney, K., & Lewis, W. (1990). Fitness, performance and
anthropometric characteristics of 19,185 Canadian Forces personnel
classified according to body mass index. Military Medicine, 155, 120 –
126.
ⴱJones, B. H., Bovee, M. W., & Knapik, J. J. (1992). Associations among
body composition, physical fitness, and injury in men and women army
trainees. In B. M. Marriot & J. Grumstrup-Scott (Eds.), Body composi-
tion and physical performance (pp. 141–174). Washington, DC: Na-
tional Academy Press.
ⴱKeyserling, W. M., Herrin, G. D., Chaffin, D. B., Armstrong, T. J., &
Foss, M. L. (1980). Establishing an industrial strength testing program.
American Industrial Hygiene Association Journal, 41, 730 –736. doi:
10.1080/15298668091425572
ⴱKirkendall, D. T., & Calabrese, L. H. (1983). Physiological aspects of
dance. Clinical Sports Medicine, 2, 525–537.
Klein, K. J., & Zedeck, S. (2004). Theory in applied psychology: Lessons
(re)learned. Journal of Applied Psychology, 89, 931–933. doi:10.1037/
0021-9010.89.6.931
ⴱKnapik, J., Banderet, L., Bahrke, M., O’Connor, J., Jones, B., & Vogel,
J. (1994). Army Physical Fitness Test (APFT): Normative data on 6,022
soldiers (Technical Report No. T94-7). Natick, MA: U.S. Army Re-
search Institute of Environmental Medicine.
ⴱ‡Knapik, J. J., Cuthie, J., Canham, M., Hewitson, W., Laurin, M. J., Nee,
M.A.,...Jones, B. H. (1998). Injury incidence, injury risk factors, and
physical fitness of U.S. Army basic trainees at Ft. Jackson, SC, 1997
(USACHPPM Report No. 29-HE-7513–98). Aberdeen, MD: U.S. Army
Center for Health Promotion and Preventive Medicine.
ⴱKnapik, J. J., Darakjy, S., Hauret, K. G., Canada, S., Scott, S., Rieger, W.,
. . . Jones, B. H. (2006). Increasing the physical fitness of low fit recruits
prior to basic combat training: An evaluation of fitness, injuries and
training outcomes. Military Medicine, 171, 45–54.
ⴱ‡Knapik, J. J., Darakjy, S., Scott, S., Hauret, K. G., Canada, S., Marin, R.,
. . . Jones, B. H. (2004). Evaluation of two Army fitness programs: The
TRADOC Standardized Physical Training Program for Basic Combat
Training and the Fitness Assessment Program (USACHPPM Report No.
12-HF-5772B-04). Aberdeen, MD: U.S. Army Center for Health Pro-
motion and Preventive Medicine.
ⴱ‡Knapik, J. J., Hauret, K., Bednarek, J. M., Arnold, S., Canham-Chervak,
M., Mansfield, A.,...Peterson, J. (2001). The Victory Fitness Program:
Influence of the U.S. Army’s emerging physical fitness doctrine on
fitness and injuries in Basic Combat Training (USACHPPM Report No.
12-MA-5762– 01). Aberdeen, MD: U.S. Army Center for Health Pro-
motion and Preventive Medicine.
ⴱ‡Knapik, J. J., Sharp, M. A., Canham, M. L., Hauret, K., Cuthie, J.,
Hewitson, W.,...Jones, B. (1998). Injury incidence and injury risk
factors among U.S. Army basic trainees at Ft. Jackson, SC (including
fitness training unit personnel, discharges, and newstarts; USACHPPM
Report No. 29-HE-8370 –99). Aberdeen, MD: U. S. Army Center for
Health Promotion and Preventive Medicine.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
638 COURTRIGHT ET AL.
ⴱKnapik, J. J., Spiess, A., Swedler, D., Grier, T., Hauret, K., Yoder, J., &
Jones, B. H. (2011). Retrospective examination of injuries and physical
fitness during Federal Bureau of Investigation new agent training. Jour-
nal of Occupational Medicine and Toxicology, 6, 26 –37. doi:10.1186/
1745-6673-6-26
ⴱKnapik, J. J., Vogel, J. A., & Wright, J. E. (1981). Measurement of
isometric strength in an upright pull at 38 cm (Report No. T 3/81).
Natick, MA: U.S. Army Research Institute of Environmental Medicine.
ⴱKnapik, J. J., Wright, J. E., Kowal, D., & Vogel, J. A. (1980). The
influence of U.S. Army basic initial entry training on the muscular
strength of men and women. Aviation, Space, and Environmental Med-
icine, 51, 1086 –1090.
ⴱKoutedakis, Y., Khaloula, M., Pacy, P. J., Murphy, M., & Dunbar,
G. M. J. (1997). Thigh peak torques and lower-body injuries in dancers.
Journal of Dance Medicine and Science, 1, 12–15.
ⴱKrueger, K. G. (1984). [Job analysis of peace officer jobs]. Unpublished
technical report. Sacramento, CA: Commission on Peace Officer Stan-
dards and Training.
Kulik, J. A., Bangert-Drowns, R. L., & Kulik, C. C. (1984). Effectiveness
of coaching for aptitude tests. Psychological Bulletin, 95, 179 –188.
doi:10.1037/0033-2909.95.2.179
Lewis, R. E. (1989). Physical ability tests as predictors of job-related
criteria: A meta-analysis (Unpublished manuscript).
Los Angeles Police Department. (n.d.). Candidate Assistance Program.
Retrieved from http://joinlapd.com
ⴱMaikala, R. V., Ciriello, V. M., Dempsey, P. G., & O’Brien, N. V. (2010).
Comparison of psychophysiological responses in health men and women
workers during cart pushing on two walkways of high and low coeffi-
cient of friction. International Journal of Industrial Ergonomics, 40,
171–179. doi:10.1016/j.ergon.2009.06.003
McArdle, W. D., Katch, F. I., & Katch, V. L. (2007). Exercise physiology:
Energy, nutrition, and human performance physiology (5th ed.). Balti-
more, MD: Lippincott Williams & Wilkins.
ⴱMcDaniel, J. W., Skandis, R. J., & Madole, S. W. (1983, May). Weight lift
capabilities of Air Force basic trainees (Technical Report No.
AFAMRL-TR-83– 0001). Wright Air Force Base, OH: Air Force Aero-
space Medical Research Laboratory.
ⴱMeyer, L. G., Pokorski, L., Ortel, B. E., Saxton, J. L., & Collyer, P. D.
(1996). Muscular strength and anthropometric characteristics of male
and female naval aviation candidates (Report No. NAMRL-1396).
Pensacola, FL: Naval Aerospace Medical Research Laboratory.
ⴱMisner, J. E., Boileau, R. A., & Plowman, S. A. (1989). Development of
placement tests for firefighting: A long-term analysis by race and sex.
Applied Ergonomics, 20, 218 –224. doi:10.1016/0003-6870(89)90080-X
ⴱMital, A. (1984). Maximum weights of lift acceptable to male and female
industrial workers for extended work shifts. Ergonomics, 27, 1115–
1126. doi:10.1080/00140138408963594
ⴱMoran, D. S., Evans, R., Hadid, A., Epstein, Y., & Yanovich, R. (2008).
Gender differences in physical fitness of military recruits during Army
basic training. (Technical Report No. RTO-MP-HFM-158). Natick,
MA: North Atlantic Treaty Organization.
ⴱMyers, D. C., Gebhardt, D. L., Crump, C. E., & Fleishman, E. A. (1984).
Validation of the Military Entrance Physical Strength Capacity Test.
(Report No. 610). Alexandria, VA: U.S. Army Research Institute for the
Behavioral and Social Sciences.
ⴱNag, P. K., Sebastian, N. C., Ramanathan, N. L., & Chatterjee, S. K.
(1978). Maximal oxygen uptake of Indian agricultural workers with
reference to age and sex. Journal of the Madurai University, 7, 69 –74.
Noe, R. A., & Schmitt, N. (1986). The influence of trainee attitudes on
training effectiveness: Test of a model. Personnel Psychology, 39,
497–523. doi:10.1111/j.1744-6570.1986.tb00950.x
ⴱNordander, C., Ohlsson, K., Balogh, I., Hansson, G., Axmon, A., Persson,
R., & Skerfving, S. (2008). Gender differences in workers with identical
repetitive industrial tasks: Exposure and musculoskeletal disorders. In-
ternational Archives of Occupational and Environmental Health, 81,
939 –947. doi:10.1007/s00420-007-0286-9
ⴱNygård, C. H., Eskelinen, L., Suvanto, S., Tuomi, K., & Ilmarinen, J.
(1991). Associations between functional capacity and work ability
among elderly municipal employees. Scandinavian Journal of Work,
Environment & Health, 17, 122–127.
ⴱNygård, C. H., Kilbom, A., Wigaeus-Hjelm, E., & Winkel, J. (1993).
Life-time occupational physical activity and physical capacity. In R.
Nielsen & K. Jorgensen (Eds.), Advances in industrial ergonomics and
safety (Vol. 5, pp. 371–379). London, England: Taylor & Francis.
ⴱNygåard, C. H., Luopajarvi, T., Suurnakki, T., & Ilmarinen, J. (1988).
Muscle strength and muscle endurance of middle-aged women and men
associated to type, duration and intensity of muscular load at work.
International Archives of Occupational and Environmental Health, 60,
291–297. doi:10.1007/BF00378476
ⴱOh, S., & Radwin, R. G. (1993). Pistol grip power tool handle and trigger
size effects on grip exertions and operator preference. Human Factors,
35, 551–569.
ⴱPappas, E., Kremenic, I., Liederbach, M., Orishimo, K., & Hagins, M.
(2011). Time to stability differences between male and female dancers
after landing from a jump on flat and inclined floors. Clinical Journal of
Sports Medicine, 21, 325–329. doi:10.1097/JSM.0b013e31821f5cfb
ⴱPatrick, R. W. (2000). An empirical analysis of the Physical Aptitude
Exam as a predictor of performance on the physical readiness test
(Unpublished master’s thesis). Naval Postgraduate School, Monterey,
CA.
ⴱPatton, J. F., Daniels, W. L., & Vogel, J. A. (1980). Aerobic power and
body fat of men and women during Army basic training. Aviation,
Space, and Environmental Medicine, 51, 492– 496.
ⴱPedersen, O. F., Petersen, R., & Staffeldt, E. S. (1975). Back pain and
isometric back muscle strength of workers in a Danish factory. Scandi-
navian Journal of Rehabilitation Medicine, 7, 125–128.
ⴱPhillips, M. D., & Pepper, R. L. (1982). Shipboard fire-fighting perfor-
mance of females and males. Human Factors, 24, 277–283.
Ployhart, R. E., & Holtz, B. C. (2008). The diversity–validity dilemma:
Strategies for reducing racioethnic and gender subgroup differences and
adverse impact in selection. Personnel Psychology, 61, 153–172. doi:
10.1111/j.1744-6570.2008.00109.x
Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organiza-
tions: Contemporary practice and theory (3rd ed.). Mahwah, NJ: Erl-
baum.
ⴱRayson, M., Holliman, D., & Delyavin, A. (2000). Development of
physical selection procedures for the British Army. Phase 2: Relation-
ship between physical performance tests and criterion tasks. Ergonom-
ics, 43, 73–105. doi:10.1080/001401300184675
Reilly, R. R., Smither, J. W., & Vasilopoulos, N. L. (1996). A longitudinal
study of upward feedback. Personnel Psychology, 49, 599 – 612. doi:
10.1111/j.1744-6570.1996.tb01586.x
ⴱReilly, R. R., Zedeck, S., & Tenopyr, M. L. (1979). Validity and fairness
of physical ability tests for predicting performance in craft jobs. Journal
of Applied Psychology, 64, 262–274. doi:10.1037/0021-9010.64.3.262
ⴱReilly, T., Wooler, A., & Tipton, M. (2006). Occupational fitness stan-
dards for beach lifeguards. Phase 1: The physiological demands of beach
lifeguarding. Occupational Medicine, 56, 6 –11. doi:10.1093/occmed/
kqi169
ⴱRice, V. J. (1992). A usability assessment of two harnesses for stretcher-
carrying. In S. Kumar (Ed.), Advances in industrial ergonomics and
safety (Vol. 4, pp. 1269 –1274). London, England: Taylor & Francis.
ⴱRice, V. J., & Sharp, M. A. (1994). Prediction of performance on two
stretcher– carry tasks. Work, 4, 201–210.
ⴱRichmond, V. L., Carter, J. M., Wilkinson, D. M., Horner, F. E., Rayson,
M. P., & Bilzon, J. L. J. (2008). Does training male and female recruit
separately optimise their physiological response? (Technical Report No.
RTO-MP-HFM-158). Natick, MA: North Atlantic Treaty Organization.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
639
SEX DIFFERENCES IN PHYSICAL ABILITY
ⴱRichmond, V. L., Rayson, M. P., Wilkinson, D. M., Carter, J. M., Blacker,
S. D., Nevill, A.,...Moore, S. (2008). Development of an operational
fitness test for the Royal Air Force. Ergonomics, 51, 935–946. doi:
10.1080/00140130801939725
ⴱRobertson, D. W. (1992). Development of job performance standards for
muscularly demanding military tasks. In S. Kumar (Ed.), Advances in
industrial ergonomics and safety (Vol. 4, pp. 1299 –1304). London,
England: Taylor & Francis.
ⴱ†Robertson, D. W., & Trent, T. T. (1983). Predicting muscularly de-
manding job performance in Navy occupations. Paper presented at the
annual meeting of the American Psychological Association, Anaheim,
CA.
Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P. (2001).
Ethnic group differences in cognitive ability in employment and educa-
tion settings: A meta-analysis. Personnel Psychology, 54, 297–330.
doi:10.1111/j.1744-6570.2001.tb00094.x
Roth, P. L., Bobko, P., McFarland, L., & Buster, M. (2008). Work sample
tests in personnel selection: A meta-analysis of black-white differences
in overall and exercise scores. Personnel Psychology, 61, 637– 661.
doi:10.1111/j.1744-6570.2008.00125.x
Roth, P. L., Purvis, K. L., & Bobko, P. (2012). A meta-analysis of gender
group differences for measures of job performance in field studies.
Journal of Management, 38, 719 –739. doi:10.1177/0149206310374774
Ryan, A. M., Greguras, G. J., & Ployhart, R. E. (1996). Perceived job
relatedness of physical ability testing for firefighters: Exploring varia-
tions in reactions. Human Performance, 9, 219 –240. doi:10.1207/
s15327043hup0903_3
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other
forms of score adjustment in preemployment testing. American Psychol-
ogist, 49, 929 –954. doi:10.1037/0003-066X.49.11.929
Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2001). Predictors used in
personnel selection: An overview of constructs, methods, and tech-
niques. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran,
International handbook of work and organizational psychology, (Vol. 1,
pp. 165–199). London, England: Sage. doi:10.4135/9781848608320.n10
ⴱSavinainen, M., Nygard, C., & Ilmarinen, J. (2004a). A 16-year follow-up
study of physical capacity in relation to perceived workload among
ageing employees. Ergonomics, 47, 1087–1102. doi:10.1080/
00140130410001686357
ⴱSavinainen, M., Nygard, C., Korhonen, O., & Ilmarinen, J. (2004b).
Changes in physical capacity among middle-aged municipal employees
over 16 years. Experimental Aging Research, 30, 1–22. doi:10.1080/
0361073049025746
Schmidt, F. L. (1994). Future of personnel selection in the U.S. Army. In
M. G. Rumsey, C. B. Walker, & J. H. Harris (Eds.), Personnel selection
and classification (pp. 333–350). Florence, KY: Psychology Press.
Schmidt, F. L., & Le, H. (2004). Hunter–Schmidt Meta-Analysis Package
Program 1.1 [Computer software]. Iowa City, IA: University of Iowa.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-
analyses of validity studies published between 1964 and 1982 and the
investigation of study characteristics. Personnel Psychology, 37, 407–
422. doi:10.1111/j.1744-6570.1984.tb00519.x
ⴱSell, K. M. (2006). Development of minimal physical fitness standards for
firefighters (Unpublished doctoral dissertation). Department of Exercise
and Sport Science, University of Utah, Salt Lake City, UT.
ⴱSharp, D. S., Wright, J. E., Vogel, J. A., Patton, J. F., Daniel, W. L.,
Knapik, J., & Korval, D. M. (1980). Screening for physical capacity in
the U.S. Army: An analysis of measures predictive of strength and
stamina. (Report No. T8/80). Natick, MA: U.S. Army Research Institute
of Environmental Medicine.
ⴱSharp, M. A., Knapik, J. J., Hauret, K., Frykman, P., & Patton, J. F.
(1999). Lifting ability of Army men and women in relation to occupa-
tional demands. Proceedings of the Human Factors and Ergonomics
Society annual meeting, 43, 718 –722.
ⴱ‡Sharp, M. A., Knapik, J. J., Patton, J. F., Smutok, M. A., Hauret, K.,
Chervak, M.,...Jones, B. H. (2000). Physical fitness of soldiers
entering and leaving basic combat training. (Report No. T00-13).
Natick, MA: U.S. Army Research Institute of Environmental Medicine.
ⴱSharp, M. A., Patton, J. F., Knapik, J. J., Hauret, K., Mello, R. P., Ito, M.,
& Frykman, P. N. (2002). Comparison of the physical fitness of men and
women entering the U.S. Army: 1978 –1998. Medicine & Science in
Sports & Exercise, 34, 356 –363. doi:10.1097/00005768-200202000-
00026
ⴱSharp, M. A., Rice, V. J., Nindl, B. C., & Mello, R. P. (1995). Maximum
acceptable load for lifting and carrying in two-person teams. Proceed-
ings of the Human Factors and Ergonomics Society annual meeting, 39,
640 – 644.
ⴱSharp, M. A., Rice, V. J., Nindl, B. C., & Williamson, T. L. (1993).
Maximum team lifting capacity as a function of team size. (Report No.
T94-2). Natick, MA: U.S. Army Research Institute of Environmental
Medicine.
ⴱSharp, M. A., & Vogel, J. A. (1992). Maximal lifting strength in military
personnel. In S. Kumar (Ed.), Advances in industrial ergonomics and
safety (Vol. 4, pp. 1261–1268). London, England: Taylor & Francis.
ⴱSheaff, A. K. (2009). Physiological determinants of the candidate phys-
ical ability test in firefighters (Unpublished master’s thesis). Department
of Kinesiology, University of Maryland, College Park, MD.
Smither, J. W., London, M., Vasilopoulos, N. L., Reilly, R. R., Millsap,
R. E., & Salvemini, N. (1995). An examination of the effects of an
upward feedback program over time. Personnel Psychology, 48, 1–34.
doi:10.1111/j.1744-6570.1995.tb01744.x
ⴱSnook, S. H., & Ciriello, V. M. (1974). Maximum weights and workloads
acceptable to female workers. Journal of Occupational Medicine, 16,
527–534.
ⴱSothmann, M. S., Gebhardt, D. L., Baker, T. A., Kastello, G. M., &
Sheppard, V. A. (2004). Performance requirements of physically stren-
uous occupations: Validating minimum standards for muscular strength
and endurance. Ergonomics, 47, 864 – 875. doi:10.1080/
00140130410001670372
ⴱStone, M. H., Sands, W. A., Pierce, K. C., Carlock, J., Cardinale, M., &
Newton, R. U. (2005). Relationship of maximum strength to weightlift-
ing performance. Medicine & Science in Sports & Exercise, 37, 1037–
1043.
ⴱTaha, Z., & Nazaruddin. (2005). Grip strength prediction for Malaysian
industrial workers using artificial neural networks. International Journal
of Industrial Ergonomics, 35, 807– 816. doi:10.1016/j.ergon.2004.11
.006
ⴱTakken, T., Ribbink, A., Heneweer, H., Moolenaar, H., & Wittink, H.
(2009). Workload demand in police officers during mountain bike pa-
trols. Ergonomics, 52, 245–250. doi:10.1080/00140130802334553
Terpstra, D. A., Mohamed, A. A., & Kethley, R. B. (1999). An analysis of
federal court cases involving nine selection devices. International Jour-
nal of Selection and Assessment, 7, 26 –34. doi:10.1111/1468-2389
.00101
ⴱTipton, M., Reilly, T., Iggleden, C., & Rees, A. (2002). Fitness and
medical standards for beach lifeguards. Portsmouth, England: Institute
of Biomedical & Biomolecular Sciences, Department of Sport and
Exercise Science, University of Portsmouth.
ⴱTiwari, P. S., Gite, L. P., Majumder, J., Pharade, S. C., & Singh, V. V.
(2010). Push/pull strength of agricultural workers in central India. In-
ternational Journal of Industrial Ergonomics, 40, 1–7. doi:10.1016/j
.ergon.2009.10.001
ⴱTurner, N. L., Chiou, S., Zwiener, J., Weaver, D., & Spahr, J. (2010).
Physiological effects of boot weight and design on men and women
firefighters. Journal of Occupational and Environmental Hygiene, 7,
477– 482. doi:10.1080/15459624.2010.486285
U.S. Army. (2012). Women in the U.S. Army. Retrieved from http://www
.army.mil/women/today.html
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
640 COURTRIGHT ET AL.
ⴱVan Der Beek, A. J., Kluver, B. D. R., Frings-Dresen, M. H. W., &
Hoozemans, M. J. M. (2000). Gender differences in exerted forces and
physiological load during pushing and pulling of wheeled cages by
postal workers. Ergonomics, 43, 269 –281. doi:10.1080/
001401300184602
ⴱVogel, J. A., & Friedl, K. E. (1992). Army data: Body composition and
physical capacity. In B. M. Marriott and J. Grumstrup-Scott (Eds.), Body
composition and physical performance: Applications for the military
services (pp.)89 –103. Washington, DC: National Academy Press.
ⴱVogel, J. A., Patton, J. F., Mello, R. P., & Daniels, W. L. (1986). An
analysis of aerobic capacity in a large United States population. Journal
of Applied Physiology, 60, 494 –500.
ⴱVogel, J. A., Ramos, M. U., Patton, J. F., & Vogel, J. A. (1977).
Comparisons of aerobic power and muscle strength between men and
women entering the U.S. Army [Abstract]. Medicine & Science in
Sports, 9, 58. doi:10.1249/00005768-197721000-00062
ⴱVon Restorff, W. (2000). Physical fitness of young women: Carrying
simulated patients. Ergonomics, 43, 728 –743. doi:10.1080/
001401300404706
Walker, A. D., & Smither, J. W. (1999). A five-year study of upward
feedback: What managers do with their results matters. Personnel Psy-
chology, 52, 393– 423. doi:10.1111/j.1744-6570.1999.tb00166.x
ⴱWarr, B. J., Alvar, B., Dodd, D., Heumann, K., Mitros, M. R., Keating,
C., & Swan, P. D. (2011). How do they compare? An assessment of
predeployment fitness in the Arizona National Guard. Journal of
Strength and Conditioning Research, 25, 2955–2962. doi:10.1519/JSC
.0b013e31822dfba8
ⴱWeiner, J. A. (1988). Criterion-related validation study of the POST
Work Sample Test Battery. Sacramento, CA: Commission on Peace
Officer Standards and Training.
ⴱWeiner, J. A. (1994). Physical abilities test follow-up validation study.
Sacramento, CA: Commission on Peace Officer Standards and Training.
ⴱWilkinson, D. M., Rayson, M. P., & Bilzon, J. L. (2005). Development
and re-validation of physical selection standards for recruits in the
British Army. Paper presented at International Congress on Soldiers’
Physical Performance, Jyväskylä, Finland. Retrieved from http://www
.mil.fi/images/ICSPP05_proceedings.pdf#page⫽106
ⴱWilliams, A. G., Rayson, M. P., & Jones, D. A. (1999). Effects of basic
training on material handling ability and physical fitness of British Army
recruits. Ergonomics, 42, 1114 –1124. doi:10.1080/001401399185171
ⴱWilmore, J. H., & Davis, J. A. (1979). Validation of a physical abilities
field test for the selection of state traffic officers. Journal of Occupa-
tional and Environmental Medicine, 21, 33– 40.
Wood, J. (2008). Methodology for dealing with duplicate effects in a
meta-analysis. Organizational Research Methods, 11, 79 –95. doi:
10.1177/1094428106296638
ⴱWright, J. E., Sharp, D. S., Vogel, J. A., & Patton, J. F. (1984). Assess-
ment of muscle strength and prediction of lifting capacity in U.S. Army
personnel. (Report No. AD-A148 846). Natick, MA: U.S. Army Institute
of Environmental Medicine.
ⴱWyon, M., Allen, N., Angioi, M., Nevill, A., & Twitchett, E. (2006).
Anthropometric factors affecting vertical jump height in ballet dancers.
Journal of Dance Medicine & Science, 10, 106 –110.
ⴱYoopat, P. (2002). Cardiorespiratory capacity and strain of blue-collar
workers in Thailand (Unpublished doctoral dissertation). Department of
Physiology, University of Kuopio, Kuopio, Finland.
Received April 26, 2012
Revision received March 6, 2013
Accepted April 25, 2013 䡲
E-Mail Notification of Your Latest Issue Online!
Would you like to know when the next issue of your favorite APA journal will be available
online? This service is now available to you. Sign up at http://notify.apa.org/ and you will be
notified by e-mail when issues of interest to you become available!
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
641
SEX DIFFERENCES IN PHYSICAL ABILITY
A preview of this full-text is provided by American Psychological Association.
Content available from Journal of Applied Psychology
This content is subject to copyright. Terms and conditions apply.