ChapterPDF Available

The Role of General Cognitive Ability and Job Performance: Why There Cannot Be a Debate: A Special Double Issue of human Performance



Content may be subject to copyright.
The Role of General Cognitive Ability
and Job Performance: Why There
Cannot Be a Debate
Frank L. Schmidt
Tippie College of Business
University of Iowa
Given the overwhelming research evidence showing the strong link between general
cognitive ability (GCA) and job performance, it is not logically possible for industrial
-organizational (I/O) psychologists to have a serious debate over whether GCA is im-
portant for job performance. However, even if none of this evidenceexisted in I/O psy-
chology, research findings in differential psychology on the nature and correlates of
GCA provide a sufficient basis for the conclusion that GCA is strongly related to job
performance. In I/O psychology, the theoretical basis for the empirical evidence link-
ing GCA and job performance is rarely presented, but is critical to understanding and
acceptance of these findings. The theory explains the why behind the empirical find-
ings. From the viewpoint of the kind of world we would like to live in—and would like
to believe we live in—the research findings on GCA are not what most people would
hope for and are not welcome. However, if we want to remain a science-based field, we
cannot reject what we know to be true in favor of what we would like to be true.
In light of the evidence in the research literature, can there really be a debate about
whether general cognitive ability (GCA) predicts job performance? This was the
question that made me ambivalent about the proposal for the 2000 Society for In-
dustrial-Organizational Psychology (SIOP) Conference debate. One explicit as-
sumption underlying the debate proposal was that there is no broad agreement in
(I/O) psychology about the role of GCA in job performance. This is not my percep-
tion; I think there is broad agreement. Of course, there is never a consensus om-
nium on any question in science. There are still biologists today who do not accept
the theory of evolution. However, broad agreement is another question. Given the
HUMAN PERFORMANCE, 15(1/2), 187–210
Copyright © 2002, Lawrence Erlbaum Associates, Inc.
Requests for reprints should be sent to Frank Schmidt, Tippie College of Business, University of
Iowa, Iowa City, IA 52242. E-mail:
research literature as it exists today, it is almost impossible that there would not be
broad agreement. As a telling example of this, I see this agreement in court cases in
which expert I/O witnesses for plaintiffs challenging the use of ability tests agree
that GCA is generally important in job performance—despite the fact that it would
be in the interests of their case to maintain the opposite. Twenty years ago, when
the research evidence was less extensive, this was not the case.
In any research-based field, given enough time and effort, research questions
are eventually answered. An example is the question of whether GCA and aptitude
tests underpredict minority job and training performance. By the early 1980s, hun-
dreds of studies had accumulated making it clear that they do not. Ultimately, two
reports by the National Academy of Sciences reviewed this research and con-
firmed the finding of no predictive bias (Hartigan & Wigdor, 1989; Wigdor & Gar-
ner, 1982; see also Schmidt & Hunter, 1999). Thus, this question can now be re-
garded as settled scientifically (Sackett, Schmitt, Ellingson, & Kabin,2001).
The question of whether GCA predicts job performance is another example of
this: The evidence is so overwhelmingthat it is no longer an issue among those famil-
iar with the research findings (Sackett et al., 2001). Hence, there cannot be a serious
debate. Yet, there was a debate at SIOP 2000—and now there is this special issue.
One of my purposes in this article is to explain how such a situation could arise.
General cognitive ability is essentially the ability to learn (Hunter, 1986; Hunter &
Schmidt, 1996). From a public relations point of view, it is perhaps unfortunate
that psychologists, including I/O psychologists, often refer to GCA as intelli-
gence—because to laymen this term implies genetic potential and not the concept
of developed ability at the time the test is administered that psychologists are in
fact referencing. This semantic confusion engenders reluctance to accept research
findings. Likewise, the term grefers not to genetic potential, but to developed
GCA. The fact that GCA scores are influenced—even strongly influenced—by
genes does not change the fact that GCA scores reflect more than just genetic
There are abilities other than GCA that are relevant to performance and learning
on many jobs: psychomotor ability, social ability, and physical ability. However,
validity for these abilities is much more variable across jobs and has a much lower
average validity across jobs.
Cognitive abilities that are narrower than GCA are called specific aptitudes—or
often just aptitudes. Examples include verbal aptitude, spatial aptitude, and numer-
ical aptitude. Until about 10 years ago, it was widely believed that job performance
could be better predicted by using a variety of aptitude measures than by using
GCA alone. Multiple aptitude theory hypothesized that different jobs required dif-
ferent aptitude profiles and that regression equations containing different aptitudes
for different jobs should therefore optimize the prediction of performance on the
job and in training. Despite the fact that to most people this theory had a compel-
ling plausibility, it has been disconfirmed. Differentially weighting multiple apti-
tude tests produces little or no increase in validity over the use of measures of gen-
eral mental ability. It has been found that aptitude tests measure mostly GCA; in
addition, each measures something specific to that aptitude (e.g., specifically nu-
merical aptitude, over and above GCA). The GCA component appears to be re-
sponsible for the prediction of job and training performance, whereas factors spe-
cific to the aptitudes appear to contribute little or nothing to prediction. The
research showing this is reviewed in Hunter (1986); Jensen (1986), Olea and Ree
(1994); Ree and Earles (1992); Ree, Earles, and Teachout (1994); and Schmidt,
Ones, and Hunter (1992), among other sources.
Some have turned this argument around and have asked whether GCA contrib-
utes anything to prediction over and above specific aptitudes. This question is not
as simple as it appears. If this question is asked for any single specific aptitude
(e.g., verbal ability), the answer from research is that GCA has higher validity than
any single aptitude and contributes incremental validity over and above that apti-
tude. In the context of GCA theory, this is not surprising, because any single apti-
tude measure is just one indicator of GCA and thus would be expected to have
lower validity than a multiple indicator measure of GCA. However, as discussed in
the next paragraph, any combination of two or three or more specific aptitudes is
actually a measure of GCA. Comparing the validity of such a combination with the
validity of a GCA measure amounts to comparing the validity of two GCA mea-
sures. So any difference in validity (typically small) would merely indicate that
one measure was a better measure of GCA than the other. The same would be true
for any findings of “incremental validity” in this situation.
If the specific aptitudes in the combination of specific aptitudes are differen-
tially weighted (i.e., using regression weights), the composite aptitude measure is
still a measure of GCA. The question examined in the research described here pit-
ting specific aptitude theory against GCA is whether such differentially weighted
measures of GCA predict job or training performance better than an ordinary mea-
sure of GCA. The finding is that they do not—meaning that the differential weight-
ing of the indicators of GCA is not effective in enhancing prediction. There are
good theoretical reasons why this should in fact be the case (Schmidt, Hunter, &
Pearlman, 1981).
Despite the disconfirmation of specific aptitude theory, a variant of this theory is
still used by practitioners, especially in the public sector. Based on my experiences
and that of others, those developing tests to select police officers, firefighters, and
other public sector employees typically conduct a thorough task analysis of the job.
In the next step, they usually attempt to identify the specific cognitive skills required
to perform each specific task. Examples of such specific cognitive skills include
things such as perceptual skills, memory, and numerical problem solving. The moti-
vation for focusing on such specific skills is often the expectationof reduced adverse
impact. Typically, a dozen or more such specific skills are identified and included in
the resulting selection test. Such a test—a test that measures performance across a
number of such cognitive skills—functions as a measure of GCA. Hence, despite
their initial intention to avoid GCA and to focus on the specific cognitive skills actu-
ally used, test developers using this approach create and use tests of GCA (Hunter &
Schmidt, 1996). They are often unpleasantly surprised when they find that such tests
show the usual majority–minority mean differences.A better understanding of GCA
research and theory could prevent such surprises.
On the basis of meta-analysis of over 400 studies, Hunter and Hunter (1984) es-
timated the validity of GCA for supervisor ratings of overall job performance to be
.57 for high-complexity jobs (about 17% of U.S. jobs), .51 for medium-complexity
jobs (63% of jobs), and .38 for low-complexity jobs (20% of jobs). These findings
are consistent with those from other sources (Hunter & Schmidt, 1996; validities
are larger against objective job sample measures of job performance, Hunter,
1983a). For performance in job training programs, a number of large databases ex-
ist, many based on military training programs. Hunter (1986) reviewed military
databases totaling over 82,000 trainees and found an average validity of .63 for
GCA. This figure is similar to those for training performance reported in various
studies by Ree and his associates (e.g., Ree and Earles, 1991), by Thorndike
(1986), by Jensen (1986), and by Hunter and Hunter (1984).
Validity generalization research has advanced the understanding and prediction
of job performance by demonstrating that, for most jobs, GCA is the most important
trait determinant of job and training performance (Schmidt & Hunter, 1998). Tables
1–4 summarize many validity generalization findings on the validity of GCA and
specific aptitudes for predicting both job and training performance. The data in Ta-
bles 1-3 illustrate an important research finding: Task differences between jobs do
not appear to affect the generalizability of validity. The results presented in Table 1
are for individual jobs (e.g., specific types of clerical jobs), those presented in Table 2
are for broader job groupings (e.g., all clerical jobs combined), and the results pre-
sented in Table 3 are from validity generalization studies in which completely differ-
ent jobs were included in the meta-analysis (e.g., cooks, welders, clerks). Yet, it can
be seen that validity generalizes about as well across completely different jobs as it
does within a single job. Standard deviations of validities increase very little as jobs
become more task-heterogenous. Unlike differences in complexity levels, differ-
ences between jobs in task make-up do not affect GCA and aptitude test validity
(Schmidt, Hunter, & Pearlman, 1981).
Supervisory ratings of overall job performance were used as the measure of job
performance in most of the data summarized in Table 4. In some settings, supervisors
have limited ability to observe job performance, making the ratings potentially less
accurate. An example of this is the data for law enforcement occupations shown in
Validity Generalization Findings for Aptitude and Ability Tests—
Results With Narrow Job Groupings: All Job Titles the Same
Job Test Type
ρSDρBest CaseaWorst CasebCriterion
Schmidt, Hunter, Pearlman, and Shane (1979)
Mechanical Repairman mechanical comprehension .78 .12 .93 .63 T
General Clerks general mental ability .67 .35 1.00 .22 P
Machine Tenders spatial ability .05 .45 .63 –.52 P
First Line Supervisor general mental ability .64 .23 .93 .35 P
First Line Supervisor mechanical comprehension .48 .27 .83 .14 P
First Line Supervisor spatial ability .43 .18 .66 .20 P
Lilienthal and Pearlman (1980)
Health Aid/Tech verbal .59 .20 .85 .33 T
Health Aid/Tech quantitative .65 .16 .85 .44 T
Health Aid/Tech perceptual speed .39 .03 .43 .36 T
Health Aid/Tech memory/follow directions .66 .00 .66 .66 T
Health Aid/Tech reasoning .42 .21 .69 .16 T
Health Aid/Tech spatial .36 .16 .56 .15 T
Health Aid/Tech verbal .48 .12 .63 .33 P
Health Aid/Tech quantitative .48 .17 .70 .26 P
Health Aid/Tech perceptual speed .36 .06 .44 .28 P
Health Aid/Tech reasoning .21 .21 .48 –.06 P
Health Aid/Tech spatial .38 .16 .58 .17 P
Science/Engineering Aid/Tech verbal .38 .07 .47 .29 T
Science/Engineering Aid/Tech quantitative .38 .07 .47 .29 T
Science/Engineering Aid/Tech perceptual speed .40 .00 .40 .40 T
Science/Engineering Aid/Tech reasoning .52 .29 .89 .14 T
Science/Engineering Aid/Tech spatial .61 .15 .80 .43 P
Science/Engineering Aid/Tech verbal .58 .14 .80 .40 P
Science/Engineering Aid/Tech quantitative .69 .17 .91 .48 P
Job Test Type
ρSDρBest CaseaWorst CasebCriterion
Science/Engineering Aid/Tech perceptual speed .31 .22 .59 .04 P
Science/Engineering Aid/Tech spatial .48 .25 .80 .16 P
Pearlman, Schmidt, and Hunter (1980)
A general mental ability .50 .24 .81 .19 P
B general mental ability .49 .24 .80 .18 P
E general mental ability .43 .00 .43 .43 P
A verbal ability .39 .23 .68 .10 P
B verbal ability .41 .25 .69 .10 P
C verbal ability .37 .00 .37 .37
A quantitative .49 .13 .66 .32 P
B quantitative .52 .16 .72 .32 P
C quantitative .60 .09 .72 .49 P
E quantitative .45 .00 .45 .45 P
A reasoning .38 .00 .38 .38 P
B reasoning .63 .12 .78 .47 P
C reasoning .31 .18 .54 .08 P
A perceptual speed .45 .24 .76 .14 P
B perceptual speed .50 .14 .68 .33 P
C perceptual speed .45 .00 .45 .45 P
D perceptual speed .40 .22 .68 .12 P
E perceptual speed .39 .12 .54 .24 P
A memory .38 .20 .64 .13 P
B memory .42 .00 .42 .42 P
C memory .44 .14 .62 .25 P
A spatial/mechanical .20 .12 .36 .04 P
B spatial/mechanical .42 .19 .66 .17 P
C sspatial/mechanical .48 .06 .56 .41 P
A general mental ability .80 .00 .80 .80 T
B general mental ability .66 .06 .74 .57 T
C general mental ability .70 .00 .70 .70 T
D general mental ability .54 .11 .68 .41 T
A verbal ability .75 .00 .75 .75 T
B verbal ability .62 .12 .77 .47 T
C verbal ability .60 .00 .60 .60 T
D verbal ability .46 .13 .63 .29 T
A quantitative .79 .00 .79 .79 T
B quantitative .66 .07 .75 .57 T
C quantitative .62 .05 .68 .55 T
D quantitative .46 .08 .56 .36 T
A reasoning .29 .00 .29 .29 T
A reasoning .29 .00 .29 .29 T
B reasoning .26 .00 .26 .26 T
A perceptual speed .46 .27 .81 .11 T
B perceptual speed .38 .25 .70 .06 T
C perceptual speed .41 .14 .59 .23 T
D perceptual speed .36 .17 .58 .53 T
E perceptual speed .02 .17 .24 –.20 T
A spatial/mechanical .47 .10 .60 .35 T
B spatial/mechanical .36 .25 .68 .05 T
C spatial/mechanical .26 .15 .45 .07 T
Schmidt, Gast-Rosenberg, and Hunter (1979)
Computer programmer number series (1) .48 .38 .92 –.05 P
Computer programmer figure analogies (2) .46 .32 .87 .06 P
Computer programmer arithmetic reasoning (3) .57 .34 .72 .13 P
Computer programmer sum of (1), (2), & (3) .73 .27 1.00 .39 P
Computer programmer sum of (1), (2), & (3) .91 .17 1.00 .70 T
Linn, Harnish, and Dunbar (1981)
Law School Students LSAT (Low School
Aptitude Test)
.49 .12 .64 .33 T
Note. T = training performance; P = proficiency on the job; Jobs: A = stenography, typing, and filing clerks; B = computing and account receiving clerks; C
= production and stock clerks; D = information and message distribution clerks; E = public contact and service clerks.
Results With Broader Job Groupings
Job Test Type
ρSDρBest CaseaWorst CasebCriterion
Pearlman (1980)
All clerical combined general mental ability .52 .24 .83 .21 P
All clerical combined verbal ability .39 .23 .68 .09 P
All clerical combined quantitative ability .47 .14 .65 .30 P
All clerical combined reasoning .39 .15 .58 .19 P
All clerical combined perceptual speed .47 .22 .75 .19 P
All clerical combined memory .38 .17 .60 .17 P
All clerical combined spatial mechanical .30 .19 .54 .05 P
All clerical combined motor ability .30 .21 .57 .03 P
All clerical combined performance tests .44 .43 .99 –.11 P
All clerical combined general mental ability .71 .12 .86 .56 T
All clerical combined verbal ability .64 .13 .81 .47 T
All clerical combined quantitative ability .70 .12 .86 .56 T
All clerical combined reasoning .39 .18 .62 .16 T
All clerical combined perceptual speed .39 .20 .63 .11 T
All clerical combined spatial/mechanical .37 .20 .63 .11 T
Results for law enforcement occupations (Hirsh, Nothup, & Schmidt, 1986)
All occupations combined Memory (M) .40 .00 .40 .40 T
Quantitative (Q) .53 .19 .77 .29 T
Reasoning (R) .57 .14 .77 .39 T
Spatial/mechanical .49 .00 .49 .49 T
Verbal (V) .62 .21 .89 .35 T
*Verbal + reasoning .75 .00 .75 .75 T
*V+Q+spatial/mechanical .76 .00 .76 .76 T
Memory (M) .11 .14 .50 .–.07 P
Perceptual speed .31 .27 .66 –.04 P
Psychomotor .26 .15 .45 .07 P
Quantitative (Q) .33 .20 .59 .07 P
Reasoning (R) .20 .10 .33 .07 P
Spatial/mechanical .21 .10 .34 .08 P
Verbal (V) .21 .21 .48 –.06 P
V + R .34 .26 .67 .01 P
V+R+M+spatial/mechanical .27 .27 .62 .01 P
V + Q +spatial/mechanical .35 .27 .70 .00 P
Schmidt, Hunter, and Caplan (1981)
All petroleum plant operators (1) mechanical competence .32 .05 .38 .26 P
All maintenance trades (2) mechanical competence .36 .20 .62 .11 P
(1) and (2) mechanical competence .33 .10 .46 .20 P
(1), (2) and Lab technicians (3) mechanical competence .32 .08 .42 .22 P
(1) mechanical competence .47 .15 .66 .28 T
(1) and (2) mechanical competence .48 .10 .61 .35 T
(1), (2) and (3) mechanical competence .46 .12 .61 .30 T
(1) Chemical competence .30 .05 .36 .24 P
(2) Chemical competence .30 .05 .36 .24 P
(1) and (2) Chemical competence .28 .03 .32 .24 P
(1), (2) and (3) Chemical competence .29 .00 .29 .29 P
(2) Chemical competence .47 .00 .47 .47 T
(1) and (2) Chemical competence .28 .03 .32 .24 T
(1), (2) and (3) Chemical competence .48 .00 .48 .48 T
(1) General mental ability .22 .16 .42 .02 P
(2) General mental ability .31 .18 .49 .04 P
(1) and (2) General mental ability .26 .18 .49 .04 P
(1), (2) and (3) General mental ability .26 .16 .46 .06 P
(1) General mental ability .68 .00 .68 .68 T
(2) General mental ability .56 .00 .56 .56 T
(1) and (2) General mental ability .65 .00 .65 .65 T
(1), (2) and (3) General mental ability .63 .00 .63 .63 T
Job Test Type
ρSDρBest CaseaWorst CasebCriterion
(1) Arithmetic reasoning .26 .20 .52 .01 P
(2) Arithmetic reasoning .15 .16 .36 –.05 P
(1) and (2) Arithmetic reasoning .31 .18 .54 .09 P
(1), (2) and (3) Arithmetic reasoning .31 .17 .53 .09 P
(1) Arithmetic reasoning .49 .30 .87 .11 T
(2) Arithmetic reasoning .70 .00 .70 .70 T
(1) and (2) Arithmetic reasoning .52 .22 .80 .13 T
(1), (2) and (3) Arithmetic reasoning .48 .27 .83 .13 T
(1) Numerical ability .33 .20 .59 .07 P
(1) and (2) Numerical ability .30 .14 .48 .12 P
(1), (2) and (3) Numerical ability .30 .11 .44 .15 P
Callender and Osburn (1981)*
(1), (2) and (3) General mental ability .31 .12 .46 .16 P
(1), (2) and (3) Mechanical competence .31 .16 .52 .11 P
(1), (2) and (3) Chemical competence .28 .00 .28 .28 P
(1), (2) and (3) Arithmetic reasoning .20 .19 .44 –.04 P
(1), (2) and (3) General mental ability .50 .00 .50 .50 T
(1), (2) and (3) Mechanical competence .52 .07 .61 .43 T
(1), (2) and (3) Chemical competence .47 .00 .47 .47 T
(1), (2) and (3) Quantitative ability .52 .15 .71 .33 T
Note. T = training performance; P = proficency on the job; astericks denotes approximate measures of general cognitive ability.
a90th percentile. b10th percentile.
*Same Job Groups as in Schmidt, Hunter, & Caplan (1981).
Results With Extremely Heterogeneous Job Groupings
Job Test Type
ρSDρBest CaseaWorst CasebCriterion
Schmidt, Hunter and Pearlman (1981)c
35 widely varying
army jobs
Vocabulary .51 .12 .66 .36 T
Arithmetic reasoning .56 .13 .73 .39 T
Spatial ability .48 .10 .61 .35 T
Mech. comp. .50 .11 .64 .36 T
Perceptual speed .41 .11 .55 .27 T
Hunter and Hunter (1984)d
Over 500 widely
varying jobs
Gen. mental ability .45 .12 .61 .29 P
Spatial/perceptual .37 .10 .50 .24 P
Psychomotor .37 .16 .57 .17 P
Gen. mental ability .54 .17 .76 .32 T
Spatial/perceptual .41 .10 .54 .28 T
Psychomotor .26 .13 .43 .09 T
Note. T = training performance; P = proficiency on the job.
a90th percentile. b10th percentile. c35 Widely varying Army jobs. dOver 500 widely varying jobs.
GATB Research Findings: Average Validity of Three Kinds of Ability
Performance on the Job
Performance in
Training Programs
Complexity Level of Job % of Work Force GMA GPA PMA GMA GPA PMA
1 (high) 14.7 .58 .35 .21 .59 .26 .13
2 2.5 .56 .52 .30 .65 .53 .09
3 2.5 .56 .52 .30 .65 .53 .09
4 17.7 .40 .35 .43 .54 .53 .40
5 (low) 2.4 .23 .24 .48
Note. Source: J.E. Hunter (1984), Validity Generalization for 12,000 Jobs: An Application of Syn-
thetic Validity and Validity Generalization to the General Aptitude Test Battery. Washington, DC: U.S.
Employment Service, U.S. Department of Labor.
GATB = General Aptitude Test Battery; GMA = General mental ability, which can be measured by
any of a number of commercially available tests (e.g., The Wonderlic Test and The Purdue Adaptability
Test); GPA = General perceptual ability, which is a combination of spatial ability and perceptual speed
(GPA is measured by the sum of standardized scores on three-dimensional spatial visualization tests -
an example is the Revised Minnesota Paper form Board Test) and perceptual speed tests (e.g., The Min-
nesota Clerical Test); PMA = Psychomotor ability, tests measuring this ability focus on manual dexter-
ity (these are usually not paper and pencil test) they typically require such things as the rapid assem-
bling and disassembling of bolts, nuts, and washers or the rapid placement of differently shaped pegs or
blocks in holes of corresponding shape (e.g., The Purdue Pegboard Test). Psychomotor ability is partic-
ularly useful in selecting workers for jobs low in cognitive (information processing) complexity.
Table 2 (mostly police jobs; Hirsh, Northrup, & Schmidt, 1986); supervisors typi-
cally spend their day in the precinct officeand do not observe their officers at work. In
law enforcement, the average validity of GCA measures for job performance ratings
(last three validities for this occupation in Table 2) is only .32, lower than the average
for other occupations (although still substantial). Yet, validities of GCA measures
for predicting performance in the police academy are very large, averaging .75 (last
two training performance validities), calling into question the mean GCA validity of
.32 for performance on the job. Hence, it seems clear that job performance ratings in
law enforcement are probably not as accurate as in the case of jobs in which supervi-
sors have greater opportunity to observe their subordinates.
A major concern in the case of GCA measures is lower rates of minority hiring
due to the mean differences between groups on GCA measures. These differences in
hiring rates can and should be reduced by including valid noncognitive measures in
the selection process (Hunter & Schmidt, 1996). However, within the context of
GCA measurement per se, there has been interest in the possibility of using video-
based tests and other nontraditional vehicles to measure GCA with the objective of
reducing adverse impact while holding validity constant. These attempts have gen-
erally failed; either group differences havenot been reduced or they have but validity
has fallen (Sackett et al., 2001). From a theoretical point of view, it is easy to under-
stand these findings. The underlying assumption or hypothesis is that the typical
paper-and-pencil approach to measuring GCA creates a bias against minority
groups—an hypothesis that is contradicted both by the research findings on predic-
tive fairness and by the broader research findings in differential psychology that I
discuss later. If the underlying group differences in GCA are as revealed by previous
research, and if a predictor is a measure of GCA, it is not possible to reduce group dif-
ferences without reducing validity. For a measure of GCA, there are only two ways to
reduce group differences. First, one can reduce the reliability of the measure; adding
measurement error reduces group differences. This lowers validity. Second, one can
modify the measure so that it no longer is a measure only of GCA. This can be
achieved by introducing variance from other constructs; for example, one can add
personality items and make the measure partly a measure of conscientiousness. This
can reduce group differences—and can do so without reducing validity(in fact valid-
ity could increase)—but this would no longer be a measure of GCA (Sackett et al.,
2001). Hence, for a measure of GCA per se, it is not possible to reduce group
differnces in any important way while avoiding loss of validity.
Furthermore, even if one could develop a reliable GCA measure with reduced
group differences and with no reduction in validity, such a test would be predic-
tively biased—against the majority group. This is because the mean difference in
job performance would remain the same; the reduction in the predictor mean dif-
ference would not change the criterion mean difference. For example, if such a hy-
pothetical test showed no mean Black–White differences, it would predict a zero
mean difference on job performance, whereas the actual difference would remain
the usual .50 SD. Hence, such a hypothetical test would be predictively biased; it
would underpredict for Whites and overpredict for Blacks.
The relative validity of different predictors of job performance was recently re-
viewed by Schmidt and Hunter (1998). Among predictors that can be used for en-
try-level hiring, none are close to GCA in validity. The next most valid predictor
was found to be integrity tests (shown by Ones, 1993, to measure mostly the per-
sonality trait of conscientiousness). Integrity tests can be used along with a test of
GCA to yield a combined validity of .65 (Ones, Viswesvaran & Shmidt, 1993).
Among predictors suitable for hiring workers already trained or experienced, only
job sample tests and job knowledge tests are comparable in validity to GCA. How-
ever, job knowledge and job sample performance are consequences of GCA
(Hunter & Schmidt, 1996; Schmidt & Hunter, 1992). That is, GCA is the major
cause of both job knowledge and job sample performance. Hence, in using these
predictors, one is using an indirect measure of GCA. For the reasons discussed in
Schmidt and Hunter (1998), predictors other than ability are best used as supple-
ments to increment the validity of GCA alone. On the basis of available meta-anal-
yses, Schmidt and Hunter (1998) presented estimates of incremental validity for
15 such predictors for job performance and 8 for training performance. Some of
these predictors are positively correlated with GCA (e.g., employment interviews)
and so are in part measures of GCA.
Could the findings summarized earlier have been different? Could it have turned
out that GCA was not very important to job performance? Actually, it could not
have. In fact, even if no validation studies had ever been conducted relating GCA to
job performance, we would still know that GCA predicted job performance. To see
why this is so, we must take a broader view; we must broaden the narrow view
common in I/O psychology. I/O psychology is not a social science island—it is
part of a larger social science continent. Part of the continent is differential psy-
chology: the general study of individual differences. Research in differential psy-
chology has shown that GCA is related to performances and outcomes in so many
areas of life—more than any other variable measured in the social sciences—that it
would not be possible for job performance to be an exception to the rule that GCA
impacts the entire life-space of individuals. The following list is a sampling of
some of the life outcomes, other than job performance, that GCA predicts (Brody,
1992; Herrnstein & Murray, 1994; Jensen, 1980; Jensen, 1998):
1. School performance and achievement through elementary school, high
school, and college.
2. Ultimate education level attained.
3. Adult occupational level.
4. Adult income.
5. A wide variety of indexes of “adjustment” at all ages.
6. Disciplinary problems in kindergarten through 12th grade (negative rela-
7. Delinquency and criminal behavior (negative relation).
8. Accident rates on the job (negative relation).
9. Poverty (negative relation).
10. Divorce (negative relation).
11. Having an illegitimate child (for women; negative relation).
12. Being on welfare (negative relation).
13. Having a low birth weight baby (negative relation).
GCA is also correlated with a variety of important health-related behaviors
(Lubinski & Humphreys, 1997) and a wide variety of tasks involved in everyday
life, such as understanding instructions on forms and reading bus schedules, that
are not part of one’s occupational or job role (Gottfredson, 1997). In fact, the num-
ber of life outcomes the GCA predicts are too large to list in an article of this sort.
Research in differential psychology has shown there are few things of any impor-
tance in the lives of individuals that GCA does not impact. How likely would it be
that the one important exception to this rule would be something as important (and
cognitively complex) as performance on the job? Not very likely. This means that
the research showing a strong link between GCA and job performance could not
have turned out differently.
As I/O psychologists, we have a responsibility to take a broader, more inclusive
view. We have a responsibility to be aware of relevant research outside the immedi-
ate narrow confines of I/O psychology, research that has important implications for
conclusions in our field. One such research finding is the finding in differential
psychology that GCA is more important than any other trait or characteristic dis-
covered by psychologists in determining life outcomes. It would be irresponsible
to ignore this finding in considering the role of GCA in job performance. Yet, this
does sometimes happen, even when the individual differences research is con-
ducted by I/O psychologists (e.g., Wilk, Desmarais, & Sackett, 1995; Wilk &
Sackett, 1996). Why does this happen? One reason is that most I/O graduate pro-
grams no longer include a course in differential psychology. In our program at the
University of Iowa, I teach such a seminar, and I find that students are astounded by
the recent research findings in such areas as behavior genetics; GCA; personality;
interests and values; and trait differences by age, sex, social class, ethnicity, and re-
gion of the United States. They are astounded because they have never before been
exposed to this body of research. We need to ensure that no one becomes an I/O
psychologist without a graduate course in differential psychology.
It is especially difficult for people to accept facts and findings they do not like if
they see no reason why the findings should or could be true. When Alfred Weggner
advanced the theory of plate tectonics early in the 20th century, geologists could
think of no means by which continents or continental plates could move around.
Not knowing of any plausible mechanism or explanation for the movement of con-
tinents, they found Weggner’s theory implausible and rejected it. Many people
have had the same reaction to the empirical findings showing that GCA is highly
predictive of job performance. The finding does not seem plausible to them be-
cause they cannot think of a reason why such a strong relation should exist (in fact,
their intuition often tells them that noncognitive traits are more important than
GCA; Hunter & Schmidt, 1996). However, just as in the case of plate tectonics the-
ory, there is an explanation. Causal analyses of the determinants of job perfor-
mance show that the major effect of GCA is on the acquisition of job knowledge:
People higher in GCA acquire more job knowledge and acquire it faster. The
amount of job-related knowledge required even on less complex jobs is much
greater than is generally realized. Higher levels of job knowledge lead to higher
levels of job performance. Viewed negatively, not knowing what one should be do-
ing—or even not knowing all that one should about what one should be doing—is
detrimental to job performance. And knowing what one should be doing and how
to do it depends strongly on GCA.
However, the effect of GCA on performance is not mediated solely through job
knowledge; there is also a direct effect. That is, over and above the effects of job
knowledge, job performance requires direct problem solving on the job. Hence,
GCA has a direct effect on job performance independent of job knowledge. Space
limitations preclude a full description of the causal research that provides this ex-
planation; reviews of this research can be found in Hunter and Schmidt (1996) and
Schmidt and Hunter (1992). The key point here is that, even within the confines of
I/O research, there is more than just the brute empirical fact of the predictive valid-
ity of GCA. There is also an elaborated and empirically supported theoretical ratio-
nale that explains why GCA has such high validity. (And, of course, there are also
the broader converging and confirming findings in differential psychology, dis-
cussed earlier.)
Would it be better if GCA were less important than it is in determining job perfor-
mance? In my opinion, it would. From many points of view, it would be better if
GCA were less important. For example, it would be better if specific aptitude the-
ory had been confirmed rather than disconfirmed. If specific aptitude theory fit re-
ality, then a larger percentage of people could be high on both predicted and actual
job performance. The majority of people could be in, say, the top 10% in predicted
job performance for at least one job and probably many. This outcome seems much
fairer and more democratic to me. It would also have been much better if
noncognitive traits, such as personality traits, had turned out to have the highest va-
lidity—rather than GCA. Personality trait measures show minimal group differ-
ences, and so adverse impact would have virtually disappeared as a problem. In ad-
dition, if all of the Big Five personality traits had high validity, whereas GCA did
not, and different, relatively uncorrelated personality traits predicted performance
for different jobs, then most people could again be in the top 10% in predicted (and
actual) performance for at least one job and probably many. Again, this outcome
seems much fairer and more desirable to me. So the world revealed by our research
(and research in differential psychology in general) is not the best of all worlds.
The world revealed by research is not one that is easy to accept, much less to em-
brace. However, can we reject what we know to be true in favor of what we would
like to be true, and still claim that we are a science-based field?
This is in fact what current social policy does. Current social policy, in effect,
pretends that the research findings summarized earlier do not exist. Current social
policy strongly discourages hiring and placing people in jobs on the basis of GCA,
even when the consequences of not doing so are severe. For example, in Washing-
ton, DC, in the late 1980s, GCA requirements were virtually eliminated in the hir-
ing of police officers, resulting in severe and socially dangerous decrements in the
performance of the police force (Carlson, 1993a, 1993b). More recently, under
pressure from the U.S. Department of Justice, changes in the selection process for
police hiring in Nassau County, NY, were made that virtually eliminated GCA re-
quirements (Gottfredson, 1996). In a large U.S. steel company, reduction in mental
ability requirements in the selection of applicants for skilled trades apprentice-
ships resulted in documented dramatic declines in quality and quantity of work
performed (Schmidt & Hunter, 1981). As an industrial psychologist, I am familiar
with numerous cases such as these resulting from current social policy, most of
which have not been quantified and documented. This social policy also has a neg-
ative effect on U.S. international competitiveness in the global economy (Schmidt,
The source of many of these policies has been the interpretation that the govern-
ment agencies, such as the Equal Employment Opportunity Commission and the
Department of Justice, and some courts have placed on Title VII of the 1964 Civil
Rights Act (and its subsequent amendments). Some minorities, in particular Blacks
and Hispanics, typically have lower average scores on employment tests of aptitude
and abilities, resulting in lower hiring rates. The theory of adverse impact holds that
such employment tests cause these differences rather than merely measuring them.
That is, this theory falsely attributes the score differences and the hiring rate differ-
ences to biases in the tests—biases which research has shown do not exist.
A large body of research shows that employment (and educational) tests of abil-
ity and aptitude are not predictively biased (Hartigan & Wigdor, 1989; Hunter,
1981b; Hunter & Schmidt, 1982a; Schmidt & Hunter, 1981; Schmidt et al., 1992;
Wigdor & Garner, 1982). That is, the finding is that any given GCA test score has
essentially the same implications for future job performance for applicants regard-
less of group membership. For example, Whites and Blacks with low test scores
are equally likely to fail on the job. Hence, research findings directly contradict the
theory of adverse impact and the requirements that social policy has imposed on
employers on the basis of that theory (Schmidt & Hunter, 1999).
The major requirement stemming from the theory of adverse impact has been
costly and complicated validation requirements for any hiring and promotion pro-
cedures that show group disparities. In particular, employers desiring to select on
the basis of GCA must meet these expensive and time-consuming requirements.
These requirements encourage the abandonment of ability requirements for job se-
lection, resulting in reduced levels of job performance and output among all em-
ployees, not merely for minority employees. In fact, the productivity losses are
much greater among nonminority employees than among minority employees
(Schmidt & Hunter, 1981).
What should social policy be in connection with GCA in the area of employee
selection? Social policy should encourage employers to hire on the basis of valid
predictors of performance, including GCA. The research findings discussed earlier
indicate that such a policy is likely to maximize economic efficiency and growth
(including job growth), resulting in increases in the general standard of living
(Hunter & Schmidt, 1982), benefiting all members of society. However, social pol-
icy should also encourage use in hiring of those noncognitive methods known to
both decrease minority–majority hiring rate differences and to increase validity
(and, hence, job performance). That is, social policy should take into account re-
search findings on the role of personality and other noncognitive traits in job per-
formance to simultaneously reduce hiring rate differences and increase the pro-
ductivity gains from personnel selection (Ones, Viswevaran, & Schmidt, 1993;
Sackett et al., 2001; Schmidt et al., 1992).
The goal of current social policy is equal representation of all groups in all jobs
and at all levels of job complexity. Even with fully nondiscriminatory and predic-
tively fair selection methods, this goal is unrealistic, at least at this time, because
groups today differ in mean levels of job-relevant skills and abilities (Sackett et al.,
2001). They also differ greatly in mean age and education level, further reducing
the feasibility of this policy goal. The current pursuit of this unrealistic policy goal
results not only in frustration, but also in social disasters of the sort that befell the
Washington, DC, police force.
This unrealistic policy goal should give way to a policy of eradication of all re-
maining real discrimination against individuals in the workplace. The chief indus-
trial psychologist in a large manufacturing firm told me that his firm had achieved
nondiscrimination in employment, promotion, and other personnel areas. He
stated that the firm still had some discrimination against Blacks and women in
some locations, but that was balanced out by the discrimination (preference) in fa-
vor of Blacks and women in the firm’s affirmative action programs. So, in balance,
the firm was nondiscriminatory! Actually, the firm simply had two types of dis-
crimination, both of which should not have existed. Both types of discrimination
cause employee dissatisfaction and low morale. Discrimination causes resentment
and bitterness, because it violates the deeply held American value that each person
should be treated as an individual and judged on his or her own merits.
Defenders of the present policy often argue that the task of eliminating all indi-
vidual-level discrimination is formidable, excessively time consuming, and costly.
They argue that the use of minority preferences, hiring goals, time tables, and quo-
tas is much more resource-efficient. However, it is also ineffective, socially divi-
sive, and productive of social disasters of the type described earlier.
In this section I discuss some of the contentions advanced by other participants in
the SIOP debate. One participant stated that there would be no reluctance to accept
the finding of high validity for GCA if it were not for the fact of group differences
and adverse impact. I do not think this is the case. Although group differences and
adverse impact (in employment and education) do contribute substantially to this
reluctance, this is not the whole story. Even if there were no mean differences be-
tween groups, a world in which success at work and in life was determined by a
large number of relatively independent traits would be fairer and more desirable
than the world as revealed by research—a world in which GCA is the dominant de-
terminant of success and half of all people are by definition below average in GCA.
Hence, people prefer to believe in the former world and tend to want to reject the
research findings on GCA. I find this tendency among students who are not even
aware of group differences in GCA. As I indicated earlier, the world as revealed by
research is not the best world we can imagine. We can imagine a more desirable
world, and we want to believe that that world is the real world. We want to believe a
pleasant falsehood. We are in denial.
Another participant stated that the research by Claude Steele on stereotype
threat (Steele, 1997; Steele & Aronson, 1995) could explain the lower average
GCA scores of some minority groups. However, if this were the case, then GCA
scores would of necessity display predictive bias in the prediction of educational
and occupational performance. Stereotype threat is hypothesized to artificially
lower minority scores, resulting in underestimate of actual GCA. Any artificial de-
pression of scores of a group must logically result in predictive bias against that
group. Yet, as noted earlier, the literature is clear in showing no predictive bias
against minority groups. There are also other problems. In the typical stereotype
threat study, minority and majority students are prematched on some index of
GCA (such as Scholastic Assessment Test scores). (For reasons that are not clear,
the researchers assume these scores have not been affected by stereotype threat.)
Hence, group differences existing in the wider population are artificially elimi-
nated in the study. Participants are then given other GCA tests under both condi-
tions designed to evoke stereotype threat and conditions without stereotype threat
(control condition). It is then often found that stereotype threat lowers minority
scores somewhat in comparison to the control condition. Therefore, the group dif-
ference created in this manner is in addition to the regularly observed group differ-
ence. Perhaps the best way to summarize this research is as follows: Minorities
typically average lower than nonminorities (and no predictive bias is found for
these scores); however, by creating stereotype threat we can make this difference
even larger (Sackett et al., 2001). (Note that this means that for scores obtained un-
der stereotype threat, predictive bias against minorities should be found. However,
no predictive bias studies have been conducted on GCA scores obtained under
conditions of stereotype threat.)
Another proposition advanced during the debate is that GCA is “not unique as a
predictor” because meta-analyses indicate that job sample (work sample) tests
have even higher validity, as shown in Schmidt and Hunter (1998). This review re-
ported a mean validity of .54 for work sample tests versus a validity of .51 for GCA
for medium complexity jobs (63% of all jobs in the United States). The problem
with this comment is that performance on work sample tests is a consequence of
GCA and hence reflects GCA, as noted earlier. Causal modeling studies have sup-
ported the following causal sequence: GCA causes the acquisition of job knowl-
edge, which in turn is the major cause of performance on work sample tests.
Hence, work sample tests are not independent of GCA, but rather reflect the effects
of GCA. There is also another consideration: Most hiring is done at the entry level,
and work sample tests cannot be used for entry-level hiring.
Another participant advanced the hypothesis that “the criterion may be the
problem.” That is, he suggested that the problem lies in the ways in which I/O psy-
chologists measure job performance, with the implication being that more con-
struct valid performance measures might show that GSA tests are predictively bi-
ased against minorities. Job performance is typically measured using supervisory
ratings, and the focus of this comment was that such ratings may have construct va-
lidity problems as performance measures. Could the research findings of predic-
tive fairness just be a function of construct deficiencies in job performance ratings?
This is an example of a case in which triangulation is essential in scientific re-
search. What are the research findings on predictive fairness when other criterion
measures are used? Although supervisory ratings are used in most studies, this is
not true for all studies. Exceptions fall into two major categories: studies of perfor-
mance in training using objective measures of amount learned and studies using
objective job or work sample measures as the criterion. The military has produced
large numbers of training performance studies based on objective measures of
amount learned—sometimes objective written tests and sometimes hands-on work
sample tests (and sometimes combinations of these). These studies show the same
lack of predictive bias as is found in studies based on supervisory ratings. Non-
training validation studies using work sample measures obtain the same result.
This is true even when those scoring the work sample are blind to the subgroup
membership of the participants (Campbell, Crooks, Mahoney, & Rock, 1973; Gael
& Grant, 1972; Gael, Grant, & Ritchie, 1975a, 1975b; Grant & Bray, 1970). The
findings are similar in the educational domain; different criterion measures used in
educational research show the same pattern of lack of predictive bias. Hence, it is
not the case that conclusions of predictive fairness rest on the foundation of rat-
ings. All criterion types support the same conclusion. Thus, there is enough re-
search information available to reject the hypothesis that “the criterion is the
One participant emphasized the fact that the structured employment interview
appears to have validity comparable to GCA but with smaller subgroup differ-
ences, with the implication being that something may be amiss with GCA tests. Al-
though no definitive studies have been conducted showing exactly what constructs
are measured by structured interviews, it seems likely that both cognitive and
noncognitive constructs are assessed. The finding of positive correlations between
structured interview scores and GCA scores (Huffcutt, Roth, & McDaniel, 1996)
indicates that GCA is assessed to some degree. Personality traits and other
noncognitive traits are probably also assessed. Any predictor score or composite
made of both noncognitive constructs and GCA can be expected to show smaller
group differences than GCA alone (Sackett et al., 2001). And if the noncognitive
dimensions are valid, such a measure may have validity equal to that of GCA mea-
sures (Schmidt & Hunter, 1998). This leads to the question raised by the partici-
pant: Why not use the structured interview in place of a GCA test and get the same
validity with less adverse impact? There is no question that this can be done. How-
ever, as noted and demonstrated by Schmidt and Hunter (1998), combinations of
predictors lead to higher validity and practical utility than single predictors. Use of
a structured interview and a GCA test together (in a compensatory model) yields a
validity of .63, versus a validity of .51 for the interview alone. Hence, dropping
GCA from the combination results in reduction in both validity and practical util-
ity of nearly 20%. We can also look at this from the opposite view: What would be
the consequences of using a GCA test alone, without the structured interview? The
answer is lower validity (.51 vs. .63) and greater adverse impact. This example,
again, points up the value of using valid noncognitive measures as supplements to
GCA measures to increase validity and utility and reduce adverse impact.
Finally, one speaker was concerned that differences between group means are
smaller on job performance measures than on GCA. This comment must be con-
sidered against the backdrop of the established finding that GCA tests do not show
predictive bias against minorities—that is, the finding, discussed in detail, that
Blacks, Whites, and Hispanics with the same GCA test scores have essentially the
same later average job performance. Because GCA is only one of the determinants
of job performance, it is expected statistically that the difference on job perfor-
mance will be smaller than the difference on GCA scores—given the fact of pre-
dictive fairness (Hunter & Schmidt, 1977). For example, given that the difference
between Blacks and Whites on GCA is 1 SD and that the validity of GCA measures
is typically about .50, the expected difference on job performance must be approxi-
mately .50 SD, and this has generally been found to be the case. If the job perfor-
mance difference were larger or smaller than .50 SD, then research would not find
a lack of predictive bias for GCA measures.
Another consideration is reliability. Measures of job performance, especially
job performance ratings, are less reliable than measures of GCA. Unreliability arti-
ficially reduces apparent group differences—that is, adding the noise of measure-
ment error increases standard deviations, which in turn reduces mean differences
in standard deviation units. (This applies on the predictor end as well: GCA tests
with low reliability show much smaller group differences than highly reliable
GCA measures.) Hence, discussions of group differences that do not take reliabil-
ity into account are misleading (Sackett et al., 2001). Job sample performance
measures are usually more reliable (and more objective) than ratings. Job sample
measures typically show Black–White mean differences of about .50 SD or
slightly larger (e.g., see Campbell et al., 1973; Gael & Grant, 1972; Gael et al.,
1975a, 1975b; Grant & Bray, 1970). At the true score level, these differences are
about .55 to .60—about the same as the true score differences on job performance
ratings (Hunter & Hirsh, 1987). This complex of research findings is consistent
with the theory that GCA is a major determinant of job performance, but there are
other determinants that do not show group differences, such as conscientiousness.
This is why combining a measure of GCA and a measure of conscientiousness re-
duces group differences while increasing validity and maintaining predictive
The purely empirical research evidence in I/O psychology showing a strong link
between GCA and job performance is so massive that there is no basis for ques-
tioning the validity of GCA as a predictor of job performance—a predictor that
is valid and predictively unbiased for majority and minority groups. In addition
to all the purely empirical evidence for the validity and predictive fairness of
GCA, there is also a well developed and empirically supported theory that ex-
plains why GCA predicts job performance. And even if I/O psychologists had
never researched the link between GCA and job performance, research findings
on GCA in the broader areas of differential psychology would still compel us to
conclude that GCA is predictive of job performance, because it is simply too im-
plausible that GCA could predict all major life performance outcomes except job
performance. These findings from differential psychology make the overall theo-
retical picture for GCA even clearer. Especially when combined with the fact of
group differences in mean GCA scores, these findings do not reflect the kind of
world most of us were hoping for and hence are not welcome to many people.
As a result, we see many attempts—desperate attempts—to somehow circum-
vent these research findings and reach more palatable conclusions. Such at-
tempts are clearly evident in some of the articles published in this special issue,
and I have addressed many of them in this article. Others are addressed in the lit-
erature cited. These attempts are in many ways understandable; years ago I was
guilty of this myself. However, in light of the evidence that we now have, these
attempts are unlikely to succeed. There comes a time when you just have to
come out of denial and objectively accept the evidence.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A
meta analysis. Personnel Psychology,41, 1–26.
Brody, N. (1992). Intelligence. (2nd Ed.). San Diego, CA: Academic Press.
Callender, J. C., & Osburn, H. G. (1981). Testing the constancy of validity with computer generated
sampling distributions of the multiplicative model variance estimate: Results for the petroleum in-
dustry validation research. Journal of Applied Psychology, 66, 274–281.
Campbell, J.T., Crooks, L.A., Mahoney, M.H., & Rock, D. A. (1973). An investigation of sources of
bias in the prediction of job performance: A six year study. (Final Project Report No. PR-73-37).
Princeton, NJ: Educational Testing Service.
Carlson, T. (1993a). D.C. blues: The rap sheet on the Washington police. Policy Review,21, 27–33.
Carlson, T. (1993b, November 3). Washington’s inept police force. The Wall Street Journal, p. 27.
Gael, S., & Grant, D. L. (1972). Employment test validation for minority and nonminority telephone
company service representatives. Journal of Applied Psychology, 56, 135-139.
Gael, S., Grant, D.L., & Ritchie, R.J. (1975). Employment test validation for minority and nonminority
clerks with work sample criteria. Journal of Applied Psychology, 60, 420-426.
Grant, D. L., & Bray, D.W. (1970). Validation of employment tests for telephone company installation
and repair occupations. Journal of Applied Psychology, 54, 7-14.
Gottfredson, L. S. (1996). Racially gerrymandering the content of police tests to satisfy the U.S. Justice
Department: A case study. Psychology, Public Policy, and Law,2, 418–446.
Gottfredson, L. S. (1997). Why gmatters: The complexity of everyday life. Intelligence,24, 79–132.
Hartigan, J. A., & Wigdor, A. K. (Eds.). (1989). Fairness in employment testing. Washington, DC: Na-
tional Academy of Sciences Press.
Herrnstein, R., & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in American
Life. New York: The Free Press.
Hirsh, H. R., Northrup, L., & Schmidt, F. L. (1986). Validity generalization results for law enforcement
occupations. Personnel Psychology, 39, 399–420.
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive abil-
ity in employment interview evaluations: Moderating characteristics and implications for incremen-
tal validity. Journal of Applied Psychology,81, 459–473.
Hunter, J. E. (1980). Test validation for 12,000 jobs: An application of synthetic validity and validity gen-
eralization to the GeneralAptitude Test Battery (GATB).Washington,DC:U.S. Employment Service.
Hunter, J. E. (1981). Fairness of the General Aptitude Test Battery (GATB): Ability differences and their
impact on minority hiring rates. Washington, DC: U.S. Employment Service.
Hunter, J. E. (1983a). A causal analysis of cognitive ability, job knowledge, job performance, and su-
pervisor ratings. In F. Landy, S. Zedeck, & J. Cleveland (Eds.), Performance measurement theory
(pp. 257–266). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Hunter, J. E. (1983b). Validity generalization of the ASVAB: Higher validity for factor analytic compos-
ites. Rockville, MD: Research Applications.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Jour-
nal of Vocational Behavior,29, 340–362.
Hunter, J. E., & Hirsh, H. R. (1987). Application of meta-analysis. In C. L. Cooper & I. T. Robertson
(Eds.), Review of industrial psychology (Vol. 2, pp. 321–357). New York: Wiley.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin,96, 72–98.
Hunter, J. E., & Schmidt, F. L. (1977). A critical analysis of the statistical and ethical definitions of test
fairness. Psychological Bulletin,83, 1053–1071.
Hunter, J. E., & Schmidt, F. L. (1982). Ability tests: Economic benefits versus the issue of fairness. In-
dustrial Relations,21, 293–308.
Hunter, J. E., & Schmidt, F. L. (1996). Intelligence and job performance: Economic and social implica-
tions. Psychology, Public Policy, and Law,2, 447–472.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Jensen, A. R. (1986). g: Artifact or reality? Journal of Vocational Behavior,29, 301–331.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Lilienthal, R. A., & Pearlman, K. (1983). The validity of federal selection tests for aid/technician in the
health, science, and engineering fields. Washington, DC: U.S. Office of Personnel Management, Of-
fice of Personnel Research and Development.
Linn, R. L., & Dunbar, S. B. (1985). Validity generalization and predictive bias. In R. A. Burk (Ed.),
Performanceassessment: State of the art (pp. 205-243). Baltimore: Johns Hopkins University Press.
Lubinski, D., & Humphreys, L. G. (1997). Incorporating general intelligence into epidemiology and
the social sciences. Intelligence,24, 159–201.
McDaniel, M. A. (1985). The evaluation of a causal model of job performance: The interrelationships
of general mental ability, job experience, and job performance. Unpublished doctoral dissertation,
George Washington University, Washington, DC.
Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions: Implications for research
and practice in human resources management. In G. Ferris (Ed.), Research in personnel and human
resources management (Vol. 13, pp. 153–200). Greenwich, CT: JAI.
Olea, M. M., & Ree, M. J. (1994). Predicting pilot and navigator criteria: Not much more than g. Jour-
nal of Applied Psychology,79, 845–851.
Ones, D. S. (1993). The construct validity of integrity tests. Unpublished doctoral dissertation, Univer-
sity of Iowa, Iowa City, IA.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test
validities: Findings and implications for personnel selection and theories of job performance. Jour-
nal of Applied Psychology,78, 679–703.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to pre-
dict job proficiency and training criteria in clerical occupations. Journal of Applied Psychology, 65,
Ree, M. J., & Earles, J. A. (1991). Predicting training success: Not much more than g. Personnel Psy-
chology,44, 321–332.
Ree, M. J., & Earles, J. A. (1992). Intelligence is the best predictor of job performance. Current Direc-
tions in Psychological Science,1, 86–89.
Ree, M. J., Earles, J. A., & Teachout, M. (1994). Predicting job performance: Not much for than g.
Journal of Applied Psychology,79, 518–524.
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-states testing in employment,
credentialing, and higher education: Prospects in a post-affirmative action world. American Psychol-
ogist, 56, 302-318.
Schmidt, F. L. (1988). The problem of group differences in ability scores in employment selection.
Journal of Vocational Behavior,33, 272–292.
Schmidt, F. L. (1993). Personnel psychology at the cutting edge. In N. Schmitt & W. Borman (Eds.),
Personnel selection (pp. 497–515). San Francisco: Jossey-Bass.
Schmidt, F. L., Gast-Rosenberg, I. F., & Hunter, J. E. (1980). Validity generalization results for com-
puter programmers. Journal of Applied Psychology, 65, 643–661.
Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theories and new research findings.
American Psychologist,36, 1128–1137.
Schmidt, F. L., & Hunter, J. E. (1992). Causal modeling of processes determining job performance.
Current Directions in Psychological Science,1, 89–92.
Schmidt, F. L., & Hunter, J. E. (1998). The valadity and utility of selection methods in personal psy-
chology: Practical and theoretical implications of 85 years of research findings. Psychological Bulle-
tin, 124, 262-274.
Schmidt, F. L., & Hunter, J. E. (1999). Bias in standardized educational and employment tests as justifi-
cation for racial preferences in affirmative action programs. In K. T. Leicht (Ed.), The future of affir-
mative action (Vol. 17, pp. 285-302). Stanford, CT: JAI.
Schmidt, F. L., Hunter, J. E., & Caplan, J. R. (1981). Validity generalization results for two job groups
in the petroleum industry. Journal of Applied Psychology, 66, 261–273.
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Goff, S. (1988). The joint relation of experience and
ability with job performance: A test of three hypotheses. Journal of Applied Psychology,73, 46–57.
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences and the validity of aptitude tests
in selection: A red herring. Journal of Applied Psychology, 66, 166–185.
Schmidt, F. L., Hunter, J. E., Pearlman, K., & Shane, G. S. (1979). Further tests of the Schmidt–Hunter
Bayesian validity generalization model. Personnel Psychology, 32, 257–281.
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel selection. Annual Review of Psychology,
43, 627–670.
Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and performance.
American Psychologist,52, 613–629.
Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African
Americans. Journal of Personality and Social Psychology, 69, 797–811.
Thorndike, R. L. (1986). The role of general ability in prediction. Journal of Vocational Behavior,29,
Wigdor, A. K., & Garner, W. R. (Eds.). (1982). Ability testing: Uses, consequences, and controversies
(Report of the National Research Council Committee on Ability Testing). Washington, DC: National
Academy of Sciences Press.
Wilk, S. L., Desmarais, L. B., & Sackett, P. R. (1995). Gravitation to jobs commensurate with ability:
Longitudinal and cross-sectional tests. Journal of Applied Psychology,80, 79–85.
Wilk, S. L., & Sackett, P. R. (1996). Longitudinal analysis of ability–job complexity fit and job change.
Personnel Psychology,49, 937–967.
... There is a debate about the adequate level of interpretation of IQ-test results in general (Hale, Fiorello, Kavanagh, Hoeppner, & Gaither, 2001;Nelson, Canivez, & Watkins, 2013), and for clinical (Fiorello et al., 2010;Watkins, Glutting, & Lei, 2007) and school psychology (Bowman, Markham, & Roberts, 2001;Fiorello, Hale, McGrath, Ryan, & Quinn, 2001;Kranzler, 2001;Vanderwood, McGrew, Flanagan, & Keith, 2001) and for I/O psychology (Schmidt, 2002) in particular. It is certainly beyond the aims or possibilities of this paper to review the arguments in each of these debates. ...
... In the I/O literature, it well established that g is the single best predictor of job performance (Schmidt & Hunter, 2004). However, I/O psychologists tend to interpret g as a "general mental ability," and often refer to it as "the ability to learn" (Schmidt, 2002). According to POT, this interpretation of g is incorrect because there is no such thing as general mental ability. ...
Individual differences have been mostly ignored in cognitive/experimental psychology since the birth of the field, while the measurement of cognitive abilities has become a successful field of applied psychology. Because of its separation from mainstream research psychology, cognitive ability testing has focused on application, often without providing sound theoretical basis for the tests. More recently, the gap between cognitive/experimental psychology and differential/psychometric research has been closing. This stems from a rediscovery of variation in cognitive abilities in experimental psychology, owing largely to the concept of working memory. We present process overlap theory, a new theory of intelligence that is informed by cognitive psychology. The theory explains the positive correlations between diverse tests on the basis of overlapping cognitive processes and reinterprets the general factor of intelligence, g, as a formative construct. The consequences of this approach are discussed, including a focus on specific abilities rather than on global scores in cognitive test results.
Full-text available
Data from four different jobs (N = 1,474) were used to evaluate three hypotheses of the joint relation of job experience and general mental ability to job performance as measured by (a) work sample measures, (b) job knowledge measures, and (c) supervisory ratings of job performance. The divergence hypothesis predicts an increasing difference and the convergence hypothesis predicts a decreasing difference in the job performance of high- and low-mental-ability employees as employees gain increasing experience on the job. The noninteractive hypothesis, by contrast, predicts that the performance difference will be constant over time. For all three measures of job performance, results supported the noninteractive hypothesis. Also, consistent with the noninteractive hypothesis, correlational analyses showed essentially constant validities for general mental ability (measured earlier) out to 5 years of experience on the job. In addition to their theoretical implications, these findings have an important practical implication: They indicate that the concerns that employment test validities may decrease over time, complicating estimates of selection utility, are probably unwarranted.
Full-text available
This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2 combinations is that they can be used for both entry level selection and selection of experienced employees. The practical utility implications of these summary findings are substantial. The implications of these research findings for the development of theories of job performance are discussed.
Cognitively loaded tests of knowledge, skill, and ability often contribute to decisions regarding educpation, jobs, licensure, or certification. Users of such tests often face difficult choices when trying to optimize both the performance and ethnic diversity of chosen individuals. The authors describe the nature of this quandary, review research on different strategies to address it, and recommend using selection materials that assess the full range of relevant attributes using a format that minimizes verbal content as much as is consistent with the outcome one is trying to achieve. They also recommend the use of test preparation, face-valid assessments, and the consideration of relevant job or life experiences. Regardless of the strategy adopted, it is unreasonable to expect that one can maximize both the performance and ethnic diversity of selected individuals.
The study examines job mobility as a function of congruence between individuals' abilities and their job's complexity. The gravitational hypothesis (McCormick, DeNisi, & Staw, 1979; McCormick, Jeanneret, & Mecham, 1972), a keystone of this objective, posits that individuals will sort into jobs that are commensurate with their ability level. This study used various analytical techniques to examine the relationship between ability, person-job fit, and occupational mobility. First, the gravitational hypothesis was tested. Second, the direction of mismatch between ability and job complexity was hypothesized to predict direction of change in job complexity over time. Two national, longitudinal databases, the National Longitudinal Study of the Class of 1972 (NLS-72) and the National Longitudinal Survey of Labor Market Experience-Youth Cohort (NLSY), were used to test these relationships. Results were supportive in both the NLS-72 and the NLSY. Additional analyses examined the difference between measures of objective job complexity and subjective job complexity (Gerhart, 1988) for the gravitational process and the difference in employer- and employee-initiated job changes. These results have implications for employers, vocational counselors and job applicants. Suggestions for improving the ability-job complexity match are provided.