Content uploaded by Emil O. W. Kirkegaard
Author content
All content in this area was uploaded by Emil O. W. Kirkegaard on Feb 17, 2023
Content may be subject to copyright.
Submitted: 15th of February 2022 DOI: 10.26775/OP.2023.02.12
Published: 12th of February 2023 ISSN: 2597-324X
Intelligence Really Does Predict Job Performance: A
Long-Needed Reply to Richardson and Norgate
Peter Zimmer∗Emil Ole William Kirkegaard†
OpenPsych
Abstract
One commonly studied aspect of the importance of IQ is its validity in predicting job performance. Previous research on
this subject has yielded impressive results, regularly finding operational validities for general mental ability exceeding 0.50.
In 2015, Ken Richardson and Sarah Norgate criticized the research on the relationship between IQ and job performance,
reducing it to virtually nothing. Their assessment of this topic has enjoyed little criticism since its publication despite the
crux of their arguments being undermined by readily available empirical evidence and thirty years of replication of the
contrary. This article replies to their main criticisms, including the construct validity of IQ tests and supervisory ratings,
the validity of the Hunter-Schmidt meta-analytic methods, and possible psychological confounders.
Keywords: intelligence, IQ, cognitive ability, g-factor, job performance, meta-analysis, general mental ability,
predictive validity, industrial-organizational psychology
1 Introduction
Richardson & Norgate (2015) presented a detailed critique of the dense literature on the relationship between
job performance and IQ test scores. Their review primarily targeted the meta-analytic procedures introduced
and popularized by John Hunter and Frank Schmidt, as well as the construct and predictive validity of IQ
tests. Few papers replying to Richardson and Norgate have been published. To our knowledge, only two
commentaries (Sternberg,2015;Kaufman & Kaufman,2015) have been published in reference to Richardson
& Norgate (2015), both of which view the paper more positively than negatively. The latter commentary is a
father-and-son response wherein each writer took a different position on the article. The more critical response,
the father’s, mainly attacked Richardson and Norgate’s views on the construct validity of IQ tests, conceding
to the claim that meta-analytic procedures drastically overestimate the true association between IQ and job
performance. The generally positive outlook on the paper suggests that it is accurate throughout; we argue this
is not the case.
Richardson and Norgate’s main arguments can be summarized as follows:
(1) IQ tests are indirect measures of poorly-defined concepts, causing them to lack construct validity,
(2)
the supposed predictive validity of IQ tests is a poor defense as the correlations argued to support predictive
validity are built into the tests,
(3)
supervisory ratings, the primary measurement of job performance, are unlikely to be measures of actual job
performance but are, rather, a product of biases in supervisor judgement,
(4)
meta-analytic results are riddled with uncertainty and the procedures meant to reduce error in meta-
analyses are error-prone in themselves,
∗Pseudonym, Independent Researcher, USA, Email: me@email.com
†Ulster Institute of Social Research, London, UK, Email: author@email.edu
1
Published: 12th of February 2023 OpenPsych
(5)
the large report produced by the National Academy of Sciences (NAS; Council 1989) showed the relationship
between job complexity, job performance, and IQ to be much smaller than previously estimated by Hunter
and colleagues, and
(6) non-cognitive causes drive the relationship between general mental ability and job performance.
While Richardson and Norgate did not attack any specific paper, much of their article was in response to an
article by Hunter & Hunter (1984). This article — often regarded as seminal in test validity research — led the
way for decades of additional research on job performance and IQ and had 3423 citations on Google Scholar as
of writing
1
. Most meta-analyses to date have found an average operational validity for general mental ability
of over 0.50, with lower operational validity typically observed in the least complex jobs along with greater
operational validity in more complex jobs. Schmidt (2002) summarized such findings:
On the basis of meta-analysis of over 400 studies, Hunter & Hunter (1984) estimated the validity of
GCA for supervisor ratings of overall job performance to be .57 for high-complexity jobs (about 17 %
of U.S. jobs), .51 for medium-complexity jobs (63 % of jobs), and .38 for low-complexity jobs (20 % of
jobs). These findings are consistent with those from other sources (Hunter & Schmidt, 1996Hunter
& Schmidt 1996; validities are larger against objective job sample measures of job performance,
Hunter, 1983a). For performance in job training programs, a number of large databases exist, many
based on military training programs. Hunter (1986) reviewed military databases totaling over
82,000 trainees and found an average validity of .63 for GCA. This figure is similar to those for
training performance reported in various studies by Ree and his associates (e.g., Ree and Earles,
1991), by Thorndike (1986), by Jensen (1986), and by Hunter and Hunter (1984) (p. 190).
Job performance aside, IQ has also been found to predict changes in occupational status, (Schmitt et al.,1984),
self-employment (de Wit & van Winden,1989), and training success (Hülsheger,2007). The relationship
between job performance and IQ has played a substantial role in public policy debate and psychology, and the
critiques proposed by Richardson and Norgate seemed to turn what was previously well-established on its
head. However, fallacious arguments, errant assumptions about meta-analysis, and pure speculation drove a
significant portion of their paper. If strong conclusions are to be made from it, a number of relevant replies
should be considered first.
2 Construct Validity of IQ Tests
As has been well established, IQ tests indicate a powerful and sociologically evident construct, general mental
ability, g, which is often used to support their validity as a mental test (eg. Carroll 1993). The authors may
argue that the fact the original test creators were aiming to create similar tests forced a positive manifold and
hence the tests are highly g-loaded. As it happens, they did not go to any length to make this argument within
their article. Regardless, tests created without the intention of being g-loaded and even tests developed to
discredit gtheory have nonetheless ended up with a high g-loading (Dalliard,2013).
Famously illustrating this phenomenon, Thurstone (1938) attempted to develop a test measuring seven inde-
pendent facets of mental ability. Shortly after publishing his work on the test, Eysenck (1939) found these seven
facets of mental ability all actually loaded onto g. Later, the British Ability Scales were developed in order
to measure multiple independent mental abilities. However, when the data for the British Ability Scales was
analyzed by Elliott (1986), the scales still gave rise to a higher order gfactor. Finally, one could also point to the
Cognitive Ability Scales (CAS) battery. This was based on the Planning, Attention-Arousal, Simultaneous and
Successive (PASS) theory of intelligence, which was intentionally meant to combat gtheory (Naglieri,2001).
Despite this, Keith et al. (2001) found that the CAS battery is not a valid measure of PASS, and it is actually a
measure of g.
Recent studies have shown that performance on video games correlates very strongly with intelligence (latent
correlations from .60 to .93), especially when prior practice on the games is relatively uniform and many
games are used to extract a general gaming ability (Quiroga et al.,2015,2019). Similarly, a meta-analysis by
Burgoyne et al. (2016) found that chess skill is correlated with various measures of g. Indeed, at the national
1https://scholar.google.com/scholar?cites=13884901685679622391&as_sdt=2005&sciodt=0,5&hl=en
2
Published: 12th of February 2023 OpenPsych
level, smarter nations perform better across a wide range of mental games, even when adjusting for variation in
internet prevalence, and adding regional dummies (r = .79, Kirkegaard 2019). Overall, the evidence suggests the
intercorrelation of IQ tests and their loading onto a higher-order gfactor is not an artifact of test construction.
How does one determine the validity of a test? In the case of the gfactor and IQ tests, the traditional method
has been to use factor analysis. This method is prone to error (Cooper,2018), but can be useful as a foundation
for the validity of a given construct. Lubinski & Humphreys (1997) argued “a measure’s meaning (technically,
its construct validity) is found in its network of causes and correlates, not in the unique aspects of its item
content or label.” A similar definition was given by Nunnally (1978) who argued a measure can be considered
construct valid if it either strongly correlates with other measures of said construct or if the predictive validity
of the measure is similar to the predictive validity of other measures of the same construct. Campbell & Fiske
(1959), in their landmark paper on construct validity, argued the validity of a construct is assessed through its
correlations with construct-relevant variables. Even further back, the people who originally formulated the
concept of construct validity, Cronbach & Meehl (1955), argued there is no single method to construct validation.
They argued the correlations of two tests presumed to measure the same construct could be semi-sufficient
evidence for construct validity. Thus, given the prior evidence that IQ tests are all correlated, there is certainly
some construct validity to IQ tests, even if it is limited.
Richardson & Norgate note that there is no accepted theory of intelligence, and hence IQ tests are not built
like many other forms of measurement, such as a breathalyzer (p. 154). They argue that IQ tests “rely on
correlations of scores with those from other IQ or achievement tests as evidence of validity” and therefore
cannot be construct valid. It should be noted that breathalyzers are measurements of internal, biological
criteria, whereas virtually all psychological tests measure traits indirectly as of now. Furthermore, while the
intercorrelation of tests is not perfect evidence of construct validity, it is surely useful as a foundation. There are
more fitting methods to construct validation one could use, though Richardson & Norgate predictably disagree
with the utility of these.
As a primary example, reaction times have a long-standing relationship with IQ tests (Der & Deary,2017).
But Richardson & Norgate dismiss this primarily on the ground that the correlations are small and may be
confounded by psychological variables. These may be true, but the correlations between IQ tests and various
measures of reaction times correlate in a way that would be predicted if IQ tests measured mental speed. For
example, as Der & Deary (2017) show, the correlation between IQ tests and choice reaction times is stronger
than that between IQ tests and simple reaction times. Furthermore, there is a Jensen effect on the relationship
between IQ and reaction times, meaning the subtests which load the highest on gare more strongly correlated
with reaction times (Jensen,1998). Since this is true, we may predict the relationship is at least partially due to
differences in mental ability (further discussion on the causes of this relationship are given by Jensen (1993)
and Jensen (1998)).
They also argue there is no accepted internal theory of intelligence which allows us to properly interpret IQ
tests. They cite Haier et al. (2009) to this respect, but it is worth noting much of Haier (2016)’s book is dedicated
to proving the PFIT theory of intelligence. Understanding the neurological basis of intelligence has proven
difficult, though some theories have survived more than others. For example, while substantial criticism has
been given to the idea that IQ tests measure mental efficiency, enough revision has been made to the theory to
show it generally has some validity (see Haier 2016). Furthermore, researchers have found Jensen effects in the
relation of IQ tests to various biological variables (Jensen,1998;Gignac et al.,2003), further providing evidence
of construct validity. Finally, since Richardson & Norgate deny the validity of IQ tests to measure mental ability,
they argue we must address the predictive validity of IQ tests.
3 Predictive Validity of IQ Tests
Richardson & Norgate’s next arguments are against the predictive validity of IQ testing. The authors argue the
cause of the correlation between educational achievement and IQ is an artifact of test construction, rather than
a function of intelligence influencing educational outcomes. Richardson & Norgate say,
Since the first test designers such as Binet, Terman, and others, test items have been devised, either
with an eye on the kinds of knowledge and reasoning taught to, and required from, children in
schools, or from an attempt to match an impression of the cognitive processes required in schools.
This matching is an intuitively-, rather than a theoretically-guided, process, even with nonverbal
items such as those in the Raven’s Matrices. (p. 154)
3
Published: 12th of February 2023 OpenPsych
If these processes are required in schools and such processes are truly mental ability, then Richardson &
Norgate’s argument is entirely circular. Their claim is essentially the same as anyone’s who defends IQ testing:
IQ and educational achievement are correlated because mental ability is required for school.
Richardson & Norgate argue the relationship is partially due to the fact that such mental processes required for
intelligence tests are taught in modern curriculum. This is not actually true for many early childhood tests
such as Piagetian tests. Yet factor analysis of such tests show that they measure the same thing as ordinary
intelligence tests (Lasker,2022). In discussing age-related changes in the relationship between intelligence test
scores and scholastic tests, Richardson & Norgate argue that the increase seen with age fits with their model.
That may be true in the abstract, but if one checks the citation, it goes to Sternberg et al. (2001) and then to
McGrew & Knopik (1993) for the actual results. However, interpretation of this study is difficult because the
authors adopted a correlated factors model, i.e., no general factor, so it is difficult to say whether the intelligence
score increased its correlation with their achievement tests (two tests of writing) with age. The multiple R
(square root of the more common r
2
) increased with age, but is this because gbecomes more correlated with
writing ability, or because non-g group factors increase their importance at later ages. Analysis of their reported
regression coefficients (in their Table 3) suggests that it isn’t g’s correlation to the writing tests that’s increasing:
it is mainly the crystalized ability (gc) that increases its relation to the writing test over time. The other ability
factors, such as long term memory (glr), fluid ability (gf) show no increases. One simple interpretation here is
then that as children accumulate school knowledge with age, this leads to an increasing overlap between the gc
factor and the writing tests. As such, there is no need to invoke the interpretation proposed by Richardson &
Norgate. It would be informative to reanalyze this study using modern structural equation methods.
Going further, longitudinal studies by Watkins et al. (2007) and Watkins & Styck (2017) both found that a model
where gcauses educational achievement was best fit compared to vice versa. Similarly, while education has been
shown to raise IQ scores (Ritchie & Tucker-Drob,2018), Ritchie et al. (2015) found that the effect of years of
education on IQ was only on specific skills rather than on g(i.e., schooling improved some broad abilities, but
not general intelligence). These studies imply that the relationship between IQ and educational achievement
is driven by individual differences in general mental ability. Another important criticism of Richardson &
Norgate’s theory is that gand education have discriminant validity. Lubinski & Humphreys (1997)&Lubinski
(2009), for example, showed that gis a better predictor of health outcomes than education (also see Gottfredson
(2004) for a critical review of this topic). Gensowski et al. (2011) found IQ predicted income beyond its
relationship with education. If intelligence was just a proxy for social class or educational background, it is a
mystery why employers would keep paying smarter people more money if they were not also more productive
Indeed, numerous studies find correlations between income and intelligence, including controlling for parental
social status (Marks,2022). Also in the economics literature, Altonji & Pierret (2001) found that education
predicted income well at the beginning a person’s career. But as people’s careers progressed, intelligence became
a better predictor. This finding is in line with a model where employers use education a proxy for intelligence
and other traits, but over time, they learn an employee’s characteristics, so they don’t need the education proxy
as much. Much more detail about signaling vs. human capital models of the value of education can be found in
Bryan Caplan’s book length treatment (Caplan,2018).
The authors also support their belief by pointing to the fact that the relationship between IQ and educational
achievement increases with age, but they give no reason why this should support the point that the correlation
between IQ and educational achievement is built into the tests. On the contrary, there are two pro-g interpre-
tations of this finding. The first would be an increasing role of cognitive ability at higher levels. As students
continue through education, it becomes more difficult and cognitive ability plays a greater role in determining
educational achievement. The second is the accumulative effect of gover time. Scholastic tests are based on the
amount individuals learn in school over a period of years. This is essentially a measure of the average learning
rate over increasing spans of time, therefore increasing the correlation between gand educational achievement.
This interpretation would make sense given the fact that highly crystallized tests like those which measure
verbal ability are often more g-loaded than fluid intelligence tests (Carroll,1993).
Richardson & Norgate provide the argument that parental drive correlates with IQ. But, the effects of parental
drive seem to disappear by adulthood whether looked at through a genetic lens (Bouchard,2013) or through an
environmental lens (Dickens & Flynn,2001). This is because the effect of shared environment on phenotypic IQ
almost entirely disappears by adulthood. Any correlation of parental drive on IQ by adulthood could be easily
explained as a gene-environment correlation: higher IQ parents, who share half their genes with their children,
likely motivate their kids more in addition to putting them in better schools. Putting all of this together,
Richardson & Norgate make no convincing argument that the correlation between educational achievement
and IQ is an artifact of test construction.
4
Published: 12th of February 2023 OpenPsych
Richardson & Norgate’s argument to deconstruct the relationship between IQ and occupational level and
income are entirely contingent on their argument about IQ and educational achievement being accurate. Since,
as we have shown, it is not, they have little ground to stand on, and their claims about predictive validity for
occupational and income are not substantiated either.
4 Supervisory Ratings
The authors move on to challenge what supervisory ratings mean and what value they may provide. While
they are correct that most studies on IQ and job performance are done using supervisory ratings, Richardson &
Norgate seemingly ignore the related finding that IQ scores correlate with job performance as measured by
objective work tests too. In fact, the latter correlation seems to be stronger, whereas by Richardson & Norgate’s
model, this is expected to be weaker or null. This has been shown in analyses by McHenry et al. (1990) and
Ree et al. (1994). The former found validities of 0.63-0.65 for predicting on-the-job military performance with
general mental ability. The latter found a validity of 0.45 in predicting on-the-job military performance. In a
meta-analysis by Nathan & Alexander (1988), many forms of criteria were used, including ratings, rankings,
work samples, and production quantities. General mental ability maintained high validity in predicting all
of these. Hunter (1986) also used large military databases and found an operational validity of 0.63 for IQ in
predicting job performance, using both supervisory ratings and objective job performance. Other reviews have
used military service deaths rather than any sort of rating system and found even this is related to intelligence
(Laurence & Ramsberger,1991).
While supervisor ratings have relatively low correlations with results of work sample tests, both appear to be
valid measures of job performance. The relatively small correlation between work sample tests and supervisory
ratings is due to "notable criterion deficiency inherent in objective records and problems of unreliability [in
objective job performance measures]” (Ones et al.,2008). Still, other studies find statistically significant,
positive correlations between the two variables (cf. Bommer et al. 1995;Heneman 1986;Viswesvaran 2002).
Assessments of job performance at the supervisory level are correlated with job performance assessments at
the peer level (Harris & Schaubroeck,2006;Viswesvaran,2002). However, supervisory ratings have greater
reliability than peer ratings (Viswesvaran et al.,1996). Since these are true, Richardson & Norgate’s argument
that halo effects may substantially bias the correlation between IQ and job performance is unsound.
Addressing other biases, taller people do seem to be objectively better at their jobs and are good at advancing,
likely partially due to a greater sense of self-esteem (see Rosenberg 2009), and the evidence shows height
correlates with IQ as well (Pearce et al.,2005). The effect of height on wages (which is correlative of supervisory
rating) is non-linear and exponential, meaning the effect is primarily among the tallest people (Kim & Han,
2017). The most flawed bias argument relates to the effect of race in supervisory ratings. Dejung & Kaplan
(1962) found black supervisors rated black employees higher than white employees, whereas white employees
did not rate white employees better than black employees. A meta-analysis by McKay & McDaniel (2006)
looked at studies on both objective and subjective ratings of job performance in blacks and whites. Objective
measures closer favored that of the white supervisory ratings. Interestingly, a study by Roth et al. (2003) found
that objective measures of job performance actually predict a larger racial difference in job performance than
do subjective measures. This is the exact opposite of what Richardson & Norgate would predict if there were
meaningful bias in supervisory ratings. As a consequence, this would also mean that the effect of racial bias
in supervisory ratings creates an underestimation of the job performance and IQ correlation. Bobko & Roth
(2013) found differences in job performance between blacks and whites are mediated by job knowledge and are
largest in the most complex jobs, largely contradicting the discrimination hypothesis. To cement things further,
a study analyzed job predictors too, and found that predictors that are more correlated with intelligence are the
ones with larger black-white gaps on them (Dahlke & Sackett,2017).
Finally, it is worth mentioning a meta-analysis done by Viswesvaran et al. (2005). The authors analyzed research
from over 90 years and found, after controlling for three different forms of measurement error and halo error,
there remained a general factor of job performance ratings. Similar results are found in a military database by
Vance et al. (1988). Overall, supervisory ratings remain a useful measure of job performance, and regardless,
more objective measures of job performance have even greater correlations with IQ. The various other findings
from related research support the important causal role of intelligence in explaining job performance.
5
Published: 12th of February 2023 OpenPsych
5 Meta-analytic Procedures
Richardson & Norgate cast a large amount of doubt on the reliability of meta-analysis to create non-biased
results. Their first argument is that meta-analysis lumps large amounts of low-quality studies in with high
quality studies and this can cause the results produced by meta-analysis to be ultimately skewed. Similar
arguments were made at the conception of meta-analysis (see Greco et al. (2013)). However, meta-analyses
usually weigh studies by quality and sample size/precision in order to give the best studies the most say in
the final result (Hunter & Schmidt 2004; also see Borenstein et al. 2009). Additionally, many of the studies
on job performance and IQ are high quality, large-scale studies rather than meta-analyses, typically done in
the military with objective measures (e.g. McHenry et al. 1990;Ree et al. 1994). These studies are thus not
subject to the criticism that Richardson & Norgate put forward yet still produce the same finding: gpredicts job
performance fairly well.
Richardson & Norgate also express concern with the tests used in studies on IQ and job performance, primarily
that one meta-analysis (Salgado, Anderson, Moscoso, Bertua, De Fruyt, & Rolland,2003) classified tests as
either ‘g-tests’ or ‘batteries’, thereby suggesting they don’t measure the same thing. In fact, the authors of
that study don’t seem to make much of this distinction because none of their results are broken down by it,
and it is not mentioned after the methods section. So we are curious as to why Richardson & Norgate make
much of this somewhat odd phrasing. Each standalone test or battery of tests measures some mix of general
intelligence, group factors, and more specific abilities as well as motivation indirectly (Duckworth et al. 2011
2
;
Gignac et al. 2019). No single test is known to measure gand nothing else (aside from random error), but it is
known that batteries of diverse tests measure the same g(Johnson et al.,2004,2008). Generally, longer and
more diverse tests provide better measures of gin the sense that they better capture the full construct and
have higher reliabilities. For instance, in the Johnson et al. (2008) study, the gfactor from the Cattell Culture
Fair Test (CCFT) was less strongly correlated with the gfactors from the other batteries even accounting for
reliability. This is because the CCFT is a nonverbal battery with 4 types of matrix tests, and thus does not
capture general intelligence variation related to e.g. verbal or 3D spatial abilities and thus is missing some of
the construct variance (in other words, it lacks perfect construct validity). McHenry et al. (1990) found adding
additional, cognitively demanding tests to the ASVAB battery only marginally increased the validity of the
battery. So, while there is some variation in the construct measures by different tests and batteries of tests,
they are relatively minor and thus of little importance to researchers interested in the relationship between job
performance and general intelligence.
Richardson & Norgate’s more concrete replies come through their criticism of how much Hunter & Schmidt
(and others) have corrected for restriction and error in their meta-analyses. While some (Kaufman & Kaufman,
2015) have agreed that the job performance corrections are likely too large, Richardson & Norgate certainly
over-estimate the degree to which this is true. The primary analysis cited by Richardson & Norgate to defend
their argument is the report done by Council (1989). The analysis done by Council (1989) was commissioned
by the National Academy of Sciences to investigate the relationship between job performance and IQ. Council
(1989) primarily argue that inter-rater reliability should be estimated at about 0.80 rather than the 0.60 used
by Hunter & Hunter (1984), and that correcting for range restriction causes a large upward bias because the
meta-analyses on IQ and job performance are typically limited to specific job sectors.
It is interesting that Richardson & Norgate are willing to accept supervisory ratings when Council (1989)
used them for their analysis, but not in the case of Hunter & Hunter. Many studies have come out since
Council (1989)’s analysis showing that their estimate of inter-rater reliability was too high. Most studies find
inter-rater reliability of around 0.50 to 0.60, somewhat lower than what Hunter & Hunter (1984) used. Reviews
of this nature include ?Shen et al. (2014); Hirsh et al. (1986); Rothstein (1990); Salgado & Anderson (2003);
Salgado, Anderson, Moscoso, Bertua, & De Fruyt (2003); Salgado & Moscoso (1996); Viswesvaran et al. (1996).
Presumably, Richardson & Norgate have read Viswesvaran et al. (1996) as they cite it within their own article.
All of the meta-analyses of inter-rater reliability find Hunter & Hunter (1984)’s original estimate of 0.60 was not
only correct, but probably an overestimate. The now-accepted validity coefficient for inter-rater reliability is
0.52 (Shen et al.,2014). As noted by Anderson et al. (2014), if the inter-rater reliability they found were applied
to the Council (1989) analysis, the mean operational validity would be 0.38, which is substantially closer to
that estimated by Hunter & Hunter (1984). In reviewing the evidence, Viswesvaran et al. (1996) noted that
the probability of the 0.80 figure that Council (1989) used being accurate is only 0.0026. Viswesvaran et al.
2
It should be noted Duckworth et al. (2011)’s meta-analysis was partially based on studies by Stephen Breuning who has been accused of
fraud in his scientific literature (Witkowski,2014).
6
Published: 12th of February 2023 OpenPsych
(2014), Shen et al. (2014), Brown (2014), and Sackett (2014) have also provided replies to common criticisms of
correcting for measurement error. If Richardson & Norgate wish to seriously criticize the issue of inter-rater
reliability, they will need a much stronger basis than the Council (1989) analysis. The vast majority of research
in this area is in strong disagreement with them.
In order to correct for range restriction, Hunter & Hunter (1984) had to estimate the standard deviations of job
applicants’ test results by assuming the careers which they could find this data for were generalizable to the
entire United States population. Council (1989) found this troubling and argued this assumption will strongly
bias the results upward. Sackett & Ostgaard (1994) replied to Council (1989)’s analysis, which excluded a
correction for range restriction, by empirically estimating the standard deviations for applicants of a wide
range of jobs. Based on this analysis, Hunter & Hunter (1984)’s correction for range restriction was justified.
The authors argued that Council (1989) wrongly excluded their correction for measurement error as it would
lead to a much larger downward bias than the upward bias created by Hunter & Hunter (1984). Furthermore,
until 2004, corrections were not made for indirect range restriction in meta-analyses on job performance
and intelligence (Hunter & Schmidt,2004). The method of correcting for indirect range restriction has been
shown to provide more reliable estimates of validity. Schmidt et al. (2008) and Sjöberg et al. (2012) found
the traditional method of simply correcting for direct range restriction has resulted in underestimates in the
validity of intelligence and personality measures in predicting job performance. Schmidt et al. (2006) found
that the operational validities for job performance measures were underestimated by 21 percent due to failure
to correct for indirect range restriction.
Richardson & Norgate accuse Hunter & Hunter (1984) of over-correcting for sampling error. In their view, the
true variability due to sampling error is three quarters or less of the size that Hunter & Hunter (1984) reported.
However, their reasoning for this is ill-founded. First, they say that the studies used in IQ-job performance
studies would be limited to a specific sort which are “willing to have employees tested and finding supervisors
willing to rate them” (p. 158). Conversely, Hunter & Hunter (1984) reported on studies which showed that the
validity of GMA in job performance holds across virtually all sorts of careers using both supervisory ratings
and work sample tests. While it is true that IQ testing will be more liberally used in more complex careers, this
is the result of the greater association between IQ and job performance in more complex occupations and a
simple analysis of the cost of testing compared to the incremental increase in productivity.
Richardson & Norgate correctly note that data is often not available from older, poorer studies to correct them
individually for sampling error, and that correcting after averaging the results could lead to some bias. However,
Hunter & Schmidt (1994) found correcting the correlations individually may bias the estimation of sampling
error further, hence the “average correlation” method is preferable. This is likely because correcting each study
using its own estimates of reliability introduces another source of sampling error (in the reliability coefficient)
into the estimate of the association between gand job performance, whereas using artifact distributions or
averages avoids this source of variance, but at the cost of missing some true variation in reliability.
Since the report by Council (1989), even larger analyses have come out further supporting Hunter & Hunter
(1984)’s validity estimates. Kuncel et al. (2010) summarized these in a review concerning the role of intelligence
in life outcomes. The most notable is a meta-analysis by Ones et al. (2005) which reviewed over 20,000
studies and a sample of over 5,000,000 people. They found the validity of cognitive ability in predicting
job performance is around 0.50-0.60. Job complexity also correlated with the validity of IQ in predicting
job performance, but even in low-complexity jobs, the validity coefficients ranged from 0.30-0.40. Overall,
Richardson and Norgate’s criticism of standard meta-analytic procedures used in industrial and organizational
psychology falls short.
6 Job Complexity
In order to criticize the position that job complexity correlates with the validity of GMA in predicting job
performance, Richardson & Norgate refer, once again, to the Council (1989) analysis. This is flawed for the
same reasons discussed in the previous section. Council (1989) did not correct for measurement error or range
restriction. Once again, Ones et al. (2005) replicated the Hunter & Hunter (1984) finding on job complexity
with a much larger amount of studies and properly correcting for measurement error and range restriction.
Any occupation which is more complex will demand more of the employee in a variety of ways, so there is
no reason why the same shouldn’t happen for intelligence. For example, job knowledge is correlated with job
performance to a greater degree in more complex jobs as well (Dye et al.,1993).
7
Published: 12th of February 2023 OpenPsych
Richardson & Norgate also bring up how psychological variables may be confounding the correlation, such
as self-esteem and the fact that people in jobs of lower complexity communicate less with their managers.
However, the relationship between job complexity, job performance, and IQ has been shown on work sample
tests as well (Salgado & Moscoso,2019), i.e. objective tests not based on the opinions of supervisors or peers.
Another reason to be wary of Richardson & Norgate’s criticism is that people who are higher in intelligence
do tend to have higher self-esteem, but this did not translate to greater confidence in job ability in a study by
Lynch & Clark (1985).
Richardson & Norgate are concerned that there is more communication between supervisors and employees
in more complex positions which may cause the correlation to be artifactual. This is improbable for multiple
reasons. As we stated before, the role of job complexity is even shown on work sample tests as well (Salgado
& Moscoso,2019). However, a more telling reason why Richardson & Norgate are incorrect about this is that
conscientiousness, the willingness to do tasks thoroughly, actually has greater validity in lower complexity jobs,
as shown by Wilmot & Ones (2019). If Richardson & Norgate were correct that interaction with supervisors
confounds this relationship, then there should always be greater validity at higher levels of job complexity.
Finally, there are a few more reasons why job complexity should partially mediate the relationship between
job performance and IQ. Schmidt & Hunter (2004), for example, showed the standard deviations in IQ are
smaller in more complex jobs. Ganzach et al. (2013) found occupational complexity mediated the relationship
between IQ and income. The correlation between IQ and educational achievement increases at higher levels of
education (Arneson et al.,2011). An older study used the subjectively assessed degree of intelligence required
for an occupation and found it is correlated to a remarkably strong degree (r = 0.91) with its subjectively rated
level of prestige (Jensen 1980: 340). Since the latter is true, it seems that IQ is important in predicting how well
one can perform more complex occupations. When comparing simple reaction times to choice reaction times
(the latter being more complex), choice reaction times have a greater correlation with IQ (Der & Deary,2017).
Intelligence becomes more predictive at higher ranges of complexity in a wide range of mental tasks, so there is
no reason to assume the relationship wouldn’t be the same for job performance.
7 Supposed Non-Cognitive Causes
Richardson & Norgate attribute any leftover relationship between IQ and job performance to be confounded by
other psychological traits. However, Richardson & Norgate make some major errors in their analysis of this
topic. They cite a study which showed the relationship between cognitive ability and job performance was
entirely mediated by job knowledge (Palumbo et al.,2005). However, the argument assumes IQ is not a cause of
how easily and quickly individuals can attain job knowledge. As Schmidt & Hunter (2004, p. 170) explained:
As can be seen, in both data sets, the major effect of GMA is on the acquisition of job knowledge,
and job knowledge in turn is the major determinant of job performance (measured using hands-on
job sample tests). GMA does have a direct effect on job performance independent of job knowledge
in both data sets but this effect is smaller than its indirect effect through job knowledge. . . These
results also show that supervisory ratings of job performance are determined in both data sets by
both job knowledge and job sample performance.
The path analyses referred to by Schmidt & Hunter (2004) are reproduced in Figure 1. It is more likely that
intelligence predicts job knowledge. James & Carretta (2002) noted that before people can perform occupational
tasks, they need to learn what to do and how to do it. This requires the ability to retain and apply knowledge
within the real world. Even job knowledge has its limits. For example, Joseph (1997) tested the success of
“Right-to-know” training programs (additional information provided about toxic substances so as to reduce
workplace injury). These programs had no significant effect on related workplace injuries for people with an IQ
below 70. Periodic assessments over many years show the validity of gin predicting job knowledge, supervisory
ratings, and performance on objective work sample tests does not decline (Schmidt et al.,1986,1988). If job
knowledge were more important or just as important in predicting occupational performance, the validity of
gshould decline over time as one becomes more familiar with their job, their duties, and their occupational
network.
Much of their argument is based around a view that gis invalid or that IQ lacks construct validity. There is not
much reason to further argue this case past what we detailed in Section 2, and what Kaufman & Kaufman (2015)
8
Published: 12th of February 2023 OpenPsych
Figure 1: A path analysis of relations among general mental ability (GMA), job knowledge, job performance, and supervisor
ratings. Reprinted from Hunter & Schmidt (1996).
wrote in their commentary of Richardson & Norgate’s paper. Richardson & Norgate also point to evidence
showing that IQ test performance can be improved through “presumably knowledge based - experience with
compatible cognitive tasks” (p. 162). It is unlikely such IQ gains are on g, rather than specific skills though (see
te Nijenhuis et al. 2001,2007,2014;Ritchie et al. 2015).
The authors also assert that “emotional intelligence” is a better predictor of job performance than IQ, citing a
review by Goleman (2000)?which found “emotional competence mattered twice as much” compared to IQ.
The issue with this argument is that some have argued emotional intelligence is a very vague concept (Locke,
2005). Emotional intelligence could also be characterized as downstream of personality, which has been shown
to predict job performance as well (Judge et al.,2013). Schulte et al. (2004) found emotional intelligence is
mostly just a measure of gand personality. As a consequence, the addition in incremental validity from using
emotional intelligence measures is unimpressive (O’Boyle Jr et al.,2011). As detailed in a review by Antonakis
(2004), emotional intelligence does not appear to hold any validity beyond its relationship to IQ and personality
in predicting leadership effectiveness as well.
In defense of the position that occupational structure and networks are more important in predicting job
performance, Richardson & Norgate point to a study (Groysberg,2012) which showed that high performers on
Wall Street who switched firms suffered a decline in performance. However, pushing this as a major argument
seems detached from reality – the pure existence of this decline does not mean that intelligence doesn’t remain
an important factor. Taking the high performers of Wall Street doesn’t really tell us anything about the general
9
Published: 12th of February 2023 OpenPsych
American population either. The results could potentially be an example of regression to the mean, though the
primary source does not seem to investigate this.
Finally, the authors present some arguments concerning anxiety, motivation and test scores. A study by Gignac
et al. (2019) found that motivation had a modest correlation with IQ, but the effect was non-linear and entirely
centered in the low-moderate levels of intelligence. Primarily less intelligent people are uninterested in taking
IQ tests as they are not personally relevant to them (cf. (Dang et al.,2015), so greater motivation would
result in slightly better performance. Reeve & Lam (2007) found the effect of motivation on IQ was not on g.
Furthermore, greater gpredicted greater test motivation, as would be expected.
If motivation were truly a confounding variable, it would have to predict job performance as well as IQ. Many
studies have found motivation is unrelated to educational achievement and job performance (Gagné & St
Père,2001;Bloom,1976;McHenry et al.,1990;Schmidt & Hunter,1998;Terborg,1977). One study also finds
proactive behavior explains less than one percent of the variance in objective sales performance (Pitt et al.,
2002). Since, its relation to IQ is minimal, it doesn’t predict g, and it has no relation to job performance,
motivation is not going to be a confounding psychological variable for the association between intelligence and
job performance. Finally, since the IQ tests would be taken for employment purposes, the potential employees
are likely to be more motivated than normal.
The effect of anxiety on IQ scores in general seems to be up for debate. In questioning the existing literature at
the time, Jensen (1980) wrote,
In brief, many studies have reported generally low but significant negative correlations between
various measures of the subject’s anxiety level, such as the Taylor Manifest Anxiety Scale and the
Sarason Test Anxiety Scale, and performance on various mental ability tests. Many nonsignificant
correlations are also reported, although they are in the minority, and are usually rationalized by the
investigators in various ways, such as atypical samples, restriction of range on one or both variables,
and the like (e.g., Spielberger, 1958). I suspect that this literature contains a considerably larger
proportion of “findings” that are actually just Type I errors (i.e., rejection of the null hypothesis
when it is in fact true) than of Type II errors (i.e., failure to reject the null hypothesis when it is
in fact false). Statistically significant correlations are more often regarded as a “finding” than are
nonsignificant results, and Type I errors are therefore more apt to be submitted for publication.
Aside from that, sheer correlations are necessarily ambiguous with respect to the direction of
causality. Persons who, because of low ability, have had the unpleasant experience of performing
poorly on tests in the past may for that reason find future test situations anxiety provoking—hence
a negative correlation between measures of test anxiety and ability test scores (p. 615).
There are other issues with the literature. For one, a bivariate correlation between IQ and anxiety does not
determine causality. Second of all, there must be a distinction between trait anxiety and state anxiety. The
former would be recognizable through a typical questionnaire but the latter is aroused in specific situations.
Jensen & Figueroa (1975) noted that digit span scores are associated with state anxiety rather than trait anxiety.
Additionally, research shows test anxiety and motivation are negatively correlated with one another (Rajiah et
al.,2014). If those taking IQ tests for employment purposes are more motivated, which they may very well be,
it is likely they are less anxious as well. Therefore, anxiety is unlikely an important confounder. Overall, there
is no reason to take these non-cognitive causes as major detriments to the prior research on intelligence and job
performance.
8 Summary
Ken Richardson & Sarah Norgate’s case that IQ is an invalid predictor of job performance is uncompelling on
account of its ignorance of the general literature that surrounds job performance, and a misinterpretation of
intelligence assessments and their academic validity. Their first arguments relied on assuming face validity
as a necessity for construct validity, which is an incredibly stringent requirement. As we demonstrate, the
intercorrelation of IQ tests and their relation to a higher-order gfactor, is sufficient enough to make a judgement
as to the construct they measure. As a large body of research now shows, IQ is also predictive of life outcomes
(Strenze,2015;Herrnstein & Murray,1996), and these correlations are not built into the tests. For the test to
seriously be considered a measure of social class rather than intelligence, as Richardson might do, one has to
deal with the facts that:
10
Published: 12th of February 2023 OpenPsych
1)
the correlation between parental socioeconomic status and child IQ is relatively low (Hanscombe et al. 2012;
r = 0.08-0.37, differing by age of child and when socioeconomic status was estimated),
2)
multiple studies have found individual IQ is nearly as predictive of life outcomes within families as it is
within cohorts (Murray,1998,2002;Frisell et al.,2012;Hegelund et al.,2019), and
3)
IQ test scores correlate with neurological variables, both structural (e.g. whole brain volume) and activation
patterns, implying processes are occurring within the brain to determine someone’s score on the test (Haier,
2016).
Richardson & Norgate spent a short amount of time addressing certain biases which could occur in the
workplace, influencing supervisory ratings. However, this appears to be a distraction. The authors take little to
no time to investigate the correlation between IQ and work sample tests or other objective criteria for measuring
job performance. In tackling biases, Richardson & Norgate fail to address contemporary literature which has
dealt with these objections already. The end result is an insubstantial argument with little, empirically, to hold
itself up.
The longest section of Richardson & Norgate’s argument concerns meta-analytic procedures. Whereas some
have applauded them for their discussion on this matter, it is largely flawed. This, even they should know, as
their own sources seem to contradict the arguments they put forward. Most notably, they cite the argument
from Council (1989) that Hunter & Hunter (1984) largely underestimated the inter-rater reliability used for
supervisor ratings. However, Viswesvaran et al. (1996), who Richardson & Norgate have cited in their paper,
conducted a meta-analysis disproving this. Their usage of the Council (1989) analysis is also strange for other
reasons. First, it was viewed by the authors to be a positive replication of the work done by Hunter & Hunter
(1984). Second, Richardson & Norgate cite the General Aptitude Test Battery as a test used as a ‘battery’ rather
than a ‘g-test’, questioning its psychometric validity, however Council (1989) dedicated an entire chapter to
showing its psychometric validity. These patterns indicate further omission of important details from the
primary authors.
Richardson & Norgate end their paper by arguing there should be far greater skepticism regarding the
relationship between job performance and IQ. However, as we have shown, Richardson & Norgate’s position
on the matter is on the fringes. Aside from smaller studies, Hunter & Hunter (1984) have enjoyed positive
replication for over thirty years. Even before the paper by Hunter & Hunter (1984), the results were the same.
The most notable critic of these was McClelland (1973), whose criticisms are very similar to those of Richardson
& Norgate. This was later responded to by Barrett & Depinet (1991) as well as Barrett et al. (2003), showing
that none of the major claims about the predictive validity of IQ tests, biases in the workplace and in testing,
and the underlying cause of the IQ and job performance relationship held up. This all goes to show Richardson
& Norgate’s arguments are not new, ignore relevant counter-evidence, and are not well-supported by data.
Declaration of Interest
The authors declare that they have no known competing financial interests or personal relationships that could
have appeared to influence the work reported in this paper.
Funding
The study did not receive external funding.
References
Altonji, J. G., & Pierret, C. R. (2001). Employer learning and statistical discrimination. The quarterly journal of
economics,116(1), 313–350.
Anderson, N., Ones, D. S., Kepir Sinangil, H., & Viswesvaran, C. (2014). Handbook of industrial, work and
organizational psychology (vol. 1). SAGE.
Antonakis, J. (2004). On why “emotional intelligence” will not predict leadership effectiveness beyond iq or the
“big five”: An extension and rejoinder. Organizational Analysis, 171–182.
11
Published: 12th of February 2023 OpenPsych
Arneson, J. J., Sackett, P. R., & Beatty, A. S. (2011). Ability-performance relationships in education and
employment settings: Critical tests of the more-is-better and the good-enough hypotheses. Psychological
Science,22(10), 1336–1342. doi: 10.1177/0956797611417004
Barrett, G. V., & Depinet, R. L. (1991). A reconsideration of testing for competence rather than for intelligence.
American Psychologist,13.
Barrett, G. V., Kramen, A. J., & Lueke, S. B. (2003). Chapter 19 - new concepts of intelligence: Their practical
and legal implications for employee selection. In H. Nyborg (Ed.), The scientific study of general intelligence
(p. 411-439). Oxford: Pergamon. doi: https://doi.org/10.1016/B978-008043793-4/50058-1
Bloom, B. S. (1976). Human characteristics and school learning. New York: McGraw-Hill.
Bobko, P., & Roth, P. L. (2013). Reviewing, categorizing, and analyzing the literature on black–white mean
differences for predictors of job performance: Verifying some perceptions and updating/correcting others.
Personnel Psychology,66(1), 91–126. doi: 10.1111/peps.12007
Bommer, W. H., Johnson, J., Rich, G. A., Podsakoff, P. M., & MacKenzie, S. B. (1995). On the interchangeability
of objective and subjective measures of employee performance: A meta-analysis. Personnel Psychology,48,
587–605.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. John
Wiley & Sons, Ltd. doi: 10.1002/9780470743386
Bouchard, T. J. (2013). The wilson effect: The increase in heritability of iq with age. Twin Research and Human
Genetics,16(5), 923–930. doi: 10.1017/thg.2013.54
Brown, R. D. (2014). In defense of the accuracy of the criterion reliability adjustment of bivariate correlations.
Industrial and Organizational Psychology,7(4), 524–526. doi: 10.1111/iops.12188
Burgoyne, A. P., Sala, G., Gobet, F., Macnamara, B. N., Campitelli, G., & Hambrick, D. Z. (2016). The
relationship between cognitive ability and chess skill: A comprehensive meta-analysis. Intelligence,59, 72-83.
doi: https://doi.org/10.1016/j.intell.2016.08.002
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod
matrix. Psychological Bulletin,56(2), 81–105. doi: 10.1037/h0046016
Caplan, B. (2018). The case against education: Why the education system is a waste of time and money. Princeton
University Press.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.
doi: 10.1017/CBO9780511571312
Cooper, C. (2018). Psychological testing.
Council, N. R. (1989). Fairness in employment testing: Validity generalization, minority issues, and the general
aptitude test battery (J. A. Hartigan & A. K. Wigdor, Eds.). Washington, DC: The National Academies
Press. Retrieved from
https://nap.nationalacademies.org/catalog/1338/fairness-in-employment
-testing-validity-generalization-minority-issues-and-the doi: 10.17226/1338
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin,52(4),
281–302. doi: 10.1037/h0040957
Dahlke, J. A., & Sackett, P. R. (2017). The relationship between cognitive-ability saturation and subgroup mean
differences across predictors of job performance. The Journal of Applied Psychology,102(10), 1403–1420. doi:
10.1037/apl0000234
Dalliard. (2013, April 3). Is psychometric g a myth? Human Varieties. Retrieved from
https://humanvarieties
.org/2013/04/03/is-psychometric-g-a-myth/
Dang, J., Xiao, S., & Dewitte, S. (2015). Commentary: "poverty impedes cognitive function" and "the poor’s
poor mental power". Frontiers in psychology,6, 1037. doi: 10.3389/fpsyg.2015.01037
Dejung, J. E., & Kaplan, H. (1962). Some differential effects of race of rater and ratee on early peer ratings of
combat aptitude. Journal of Applied Psychology,46(5), 370–374. doi: 10.1037/h0048376
12
Published: 12th of February 2023 OpenPsych
Der, G., & Deary, I. J. (2017). The relationship between intelligence and reaction time varies with age:
Results from three representative narrow-age age cohorts at 30, 50 and 69years. Intelligence,64, 89-97. doi:
https://doi.org/10.1016/j.intell.2017.08.001
de Wit, G., & van Winden, F. A. A. M. (1989). An empirical analysis of self-employment in the netherlands.
Small Business Economics,1(4), 263–272. doi: 10.1007/BF00393805
Dickens, W. T., & Flynn, J. R. (2001). Heritability estimates versus large environmental effects: The iq paradox
resolved. Psychological Review,108(2), 346–369. doi: 10.1037/0033-295x.108.2.346
Duckworth, A. L., Quinn, P. D., Lynam, D. R., Loeber, R., & Stouthamer-Loeber, M. (2011). Role of test
motivation in intelligence testing. Proceedings of the National Academy of Sciences,108(19), 7716-7720. doi:
10.1073/pnas.1018601108
Dye, D. A., Reck, M., & McDaniel, M. A. (1993). The validity of job knowledge measures. International Journal
of Selection and Assessment,1(3), 153–157. doi: 10.1111/j.1468-2389.1993.tb00103.x
Elliott, C. D. (1986). The factorial structure and specificity of the british ability scales. British Journal of
Psychology,77(2), 175–185. doi: 10.1111/j.2044-8295.1986.tb01992.x
Eysenck, H. J. (1939). Primary mental abilities. British Journal of Educational Psychology,9(3), 270–275. doi:
10.1111/j.2044-8279.1939.tb03214.x
Frisell, T., Pawitan, Y., & Långström, N. (2012). Is the association between general cognitive ability and violent
crime caused by family-level confounders? PLOS ONE,7(7), e41783. doi:
10.1371/journal.pone.0041783
Gagné, F., & St Père, F. (2001). When iq is controlled, does motivation still predict achievement? Intelligence,
30(1), 71-100. doi: https://doi.org/10.1016/S0160-2896(01)00068-X
Ganzach, Y., Gotlibobski, C., Greenberg, D., & Pazy, A. (2013). General mental ability and pay: Nonlinear
effects. Intelligence,41(5), 631-637. doi: 10.1016/j.intell.2013.07.015
Gensowski, M., Heckman, J., & Savelyev, P. (2011, 02). The effects of education, personality, and iq on earnings
of high-ability men. IZA Institute of Labor Economics.
Gignac, G. E., Bartulovich, A., & Salleo, E. (2019). Maximum effort may not be required for valid intelligence
test score interpretations. Intelligence,75, 73-84. doi: 10.1016/j.intell.2019.04.007
Gignac, G. E., Vernon, P. A., & Wickett, J. C. (2003). Chapter 6 - factors influencing the relationship between
brain size and intelligence. In H. Nyborg (Ed.), The scientific study of general intelligence (p. 93-106). Oxford:
Pergamon. doi: 10.1016/B978-008043793-4/50042-8
Gottfredson, L. S. (2004). Intelligence: Is it the epidemiologists’ elusive “fundamental cause” of social
class inequalities in health? Journal of Personality and Social Psychology,86(1), 174–199. doi:
10.1037/
0022-3514.86.1.174
Greco, T., Zangrillo, A., Biondi-Zoccai, G., & Landoni, G. (2013). Meta-analysis: Pitfalls and hints. Heart, Lung
and Vessels,5(4), 219–225. Retrieved from https://pubmed.ncbi.nlm.nih.gov/24364016/
Groysberg, B. (2012). Chasing stars: The myth of talent and the portability of performance (Reprint ed.). Princeton
University Press.
Haier, R. J. (2016). The neuroscience of intelligence. Cambridge University Press.
Haier, R. J., Colom, R., Schroeder, D. H., Condon, C. A., Tang, C., Eaves, E., & Head, K. (2009). Gray matter and
intelligence factors: Is there a neuro-g? Intelligence,37(2), 136-144. doi: 10.1016/j.intell.2008.10.011
Hanscombe, K. B., Trzaskowski, M., Haworth, M. A., Davis, S. P., Dale, P. S., & Plomin, R. (2012). Socioeconomic
status (SES) and children’s intelligence (IQ): In a UK-representative sample SES moderates the environmental,
not genetic, effect on IQ. PLoS ONE,7(2), e30320. doi: 10.1371/journal.pone.0030320
Harris, M., & Schaubroeck, J. (2006). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings.
Personnel Psychology,41, 43–62. doi: 10.1111/j.1744-6570.1988.tb00631.x
13
Published: 12th of February 2023 OpenPsych
Hegelund, E. R., Flensborg-Madsen, T., Dammeyer, J., Mortensen, L. H., & Mortensen, E. L. (2019). The
influence of familial factors on the association between iq and educational and occupational achievement:
A sibling approach. Personality and Individual Differences,149, 100-107. doi:
https://doi.org/10.1016/
j.paid.2019.05.045
Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of
performance: A meta-analysis. Personnel Psychology,39, 811–826. doi:
10.1111/j.1744-6570.1986.tb00596
.x
Herrnstein, R. J., & Murray, C. (1996). The bell curve: Intelligence and class structure in american life. Free Press.
Hirsh, H. R., Northrop, L. C., & Schmidt, F. L. (1986). Validity generalization results for law enforcement
occupations. Personnel Psychology,39(2), 399–420. doi: 10.1111/j.1744-6570.1986.tb00589.x
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of
Vocational Behavior,29(3), 340-362. doi: 10.1016/0001-8791(86)90013-8
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin,96(1), 72–98. doi: 10.1037/0033-2909.96.1.72
Hunter, J. E., & Schmidt, F. L. (1994). Estimation of sampling error variance in the meta-analysis of correlations:
Use of average correlation in the homogeneous case. Journal of Applied Psychology,79(2), 171–177. doi:
10.1037/0021-9010.79.2.171
Hunter, J. E., & Schmidt, F. L. (1996). Intelligence and job performance: Economic and social implications.
Psychology, Public Policy, and Law,2(2-3), 447–472. doi: 10.1037/1076-8971.2.3-4.447
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings.
SAGE.
Hülsheger, M. G. W. . S. T., U. R. (2007). Validity of general mental ability for the prediction of job performance
and training success in germany: A meta-analysis1. International Journal of Selection and Assessment,15(1),
3–18. doi: 10.1111/j.1468-2389.2007.00363.x
James, M., & Carretta, T. R. (2002). g2k. Human Performance,15(1-2), 3-23. doi:
10.1080/08959285.2002
.9668081
Jensen, A. R. (1980). Bias in mental testing. Free Press.
Jensen, A. R. (1993). Why is reaction time correlated with psychometric g? Current Directions in Psychological
Science,2(2). doi: 10.1111/1467-8721.ep10770697
Jensen, A. R. (1998). The g factor: The science of mental ability. Praeger.
Jensen, A. R., & Figueroa, R. A. (1975). Forward and backward digit span interaction with race and iq:
Predictions from jensen’s theory. Journal of Educational Psychology,67(6), 882–893. doi:
10.1037/0022-0663
.67.6.882
Johnson, W., Bouchard, T. J., Krueger, R. F., McGue, M., & Gottesman, I. I. (2004). Just one g: consistent results
from three test batteries. Intelligence,32(1), 95-107. doi: 10.1016/S0160-2896(03)00062-X
Johnson, W., te Nijenhuis, J., & Bouchard, T. J. (2008). Still just 1 g: Consistent results from five test batteries.
Intelligence,36(1), 81-95. doi: 10.1016/j.intell.2007.06.001
Joseph, A. J. (1997). Right-to-know training of workers with iq less than 70: A pilot study. American Journal of
Industrial Medicine,32(4), 417–420. doi:
10.1002/(SICI)1097-0274(199710)32:4<417::AID-AJIM14>3.0
.CO;2-6
Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., & Crawford, E. R. (2013). Hierarchical representations of
the five-factor model of personality in predicting job performance: Integrating three organizing frameworks
with two theoretical perspectives. Journal of Applied Psychology,98(6), 875–925. doi: 10.1037/a0033901
Kaufman, J. C., & Kaufman, A. S. (2015). It can be very tempting to throw out the baby with the bathwater:
A father-and-son commentary on “does iq really predict job performance?”. Applied Developmental Science,
19(3), 176-181. doi: 10.1080/10888691.2015.1008922
14
Published: 12th of February 2023 OpenPsych
Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the cognitive assessment system (cas) measure?
joint confirmatory factor analysis of the cas and the woodcock-johnson tests of cognitive ability. School
Psychology Review,30(1), 89–119.
Kim, T. H., & Han, E. (2017). Height premium for job performance. Economics & Human Biology,26, 13–20.
Kirkegaard, E. O. (2019). Is national mental sport ability a sign of intelligence? an analysis of the top players of
12 mental sports. Mankind Quarterly,59(3). doi: 10.46469/mq.2019.59.3.2
Kuncel, N. R., Ones, D. S., & Sackett, P. R. (2010). Individual differences as predictors of work, educational,
and broad life outcomes. Personality and Individual Differences,49(4), 331-336. (Collected works from the
Festschrift for Tom Bouchard, June 2009: A tribute to a vibrant scientific career) doi:
10.1016/j.paid.2010
.03.042
Lasker, J. (2022). Are piagetian scales just intelligence tests? Intelligence,95, 101702. doi:
10.1016/
j.intell.2022.101702
Laurence, J. H., & Ramsberger, P. F. (1991). Low-aptitude men in the military: Who profits, who pays? Praeger
Publishers.
Locke, E. A. (2005). Why emotional intelligence is an invalid concept. Journal of organizational Behavior,26(4),
425–431. doi: 10.1002/job.318
Lubinski, D. (2009). Cognitive epidemiology: With emphasis on untangling cognitive ability and socioeco-
nomic status. Intelligence,37(6), 625-633. (Intelligence, health and death: The emerging field of cognitive
epidemiology) doi: 10.1016/j.intell.2009.09.001
Lubinski, D., & Humphreys, L. G. (1997). Incorporating general intelligence into epidemiology and the
social sciences. Intelligence,24(1), 159-201. (Special Issue Intelligence and Social Policy) doi:
10.1016/
S0160-2896(97)90016-7
Lynch, A. D., & Clark, P. (1985). Relationship of self-esteem, iq, and task performance for a sample of usa
undergraduates. Psychological reports,56(3), 955–962. doi: 10.2466/pr0.1985.56.3.955
Marks, G. N. (2022). Cognitive ability has powerful, widespread and robust effects on social stratification:
Evidence from the 1979 and 1997 us national longitudinal surveys of youth. Intelligence,94, 101686. doi:
10.1016/j.intell.2022.101686
McClelland, D. C. (1973). Testing for competence rather than for" intelligence.". American psychologist,28(1),
1–14. doi: 10.1037/h0034092
McGrew, K. S., & Knopik, S. N. (1993). The relationship between the wj-r gf-gc cognitive clusters and writing
achievement across the life-span. School Psychology Review,22(4), 687-695. doi:
10.1080/02796015.1993
.12085684
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project a validity results:
The relationship between predictor and criterion domains. personnel psychology,43(2), 335–354. doi:
10.1111/j.1744-6570.1990.tb01562.x
McKay, P. F., & McDaniel, M. A. (2006). A reexamination of black-white mean differences in work performance:
More data, more moderators. Journal of Applied Psychology,91(3), 538–554. doi:
10.1037/0021-9010.91.3
.538
Murray, C. (1998). Income inequality and iq. AEI Press, c/o Publisher Resources Inc.
Murray, C. (2002, May). Iq and income inequality in a sample of sibling pairs from advantaged family
backgrounds. American Economic Review,92(2), 339-343. doi: 10.1257/000282802320191570
Naglieri, J. A. (2001). Cognitive assessment system CAS. Understanding Psychological Assessment,96(1), 235–257.
doi: 10.1007/978-1-4615-1185-4_12
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic
investigation. personnel psychology,41(3), 517–535. doi: 10.1111/j.1744-6570.1988.tb00642.x
Nunnally, J. C. (1978). Psychometric theory 2nd ed. (2nd ed.). Mcgraw hill book company.
15
Published: 12th of February 2023 OpenPsych
O’Boyle Jr, E. H., Humphrey, R. H., Pollack, J. M., Hawver, T. H., & Story, P. A. (2011). The relation between
emotional intelligence and job performance: A meta-analysis. Journal of Organizational Behavior,32(5),
788–818. doi: 10.1002/job.714
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Cognitive ability in selection decisions. Handbook of
understanding and measuring intelligence, 431–468. doi: 10.4135/9781452233529
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (2008). No new terrain: Reliability and construct validity of job
performance ratings. Industrial and Organizational Psychology,1(2), 174–179. doi:
10.1111/j.1754-9434
.2008.00033.x
Palumbo, M. V., Miller, C. E., Shalin, V. L., & Steele-Johnson, D. (2005). The impact of job knowledge in the
cognitive ability-performance relationship. Applied HRM Research,10(1), 13–20.
Pearce, M. S., Deary, I. J., Young, A. H., & Parker, L. (2005, 03). Growth in early life and childhood IQ at age 11
years: the Newcastle Thousand Families Study. International Journal of Epidemiology,34(3), 673-677. doi:
10.1093/ije/dyi038
Pitt, L. F., Ewing, M. T., & Berthon, P. R. (2002). Proactive behavior and industrial salesforce performance.
Industrial Marketing Management,31(8), 639-644. doi: 10.1016/S0019-8501(01)00171-7
Quiroga, M., Diaz, A., Román, F., Privado, J., & Colom, R. (2019). Intelligence and video games: Beyond
“brain-games”. Intelligence,75, 85-94. doi: 10.1016/j.intell.2019.05.001
Quiroga, M., Escorial, S., Román, F. J., Morillo, D., Jarabo, A., Privado, J., . .. Colom, R. (2015). Can we reliably
measure the general factor of intelligence (g) through commercial video games? yes, we can! Intelligence,53,
1-7. doi: 10.1016/j.intell.2015.08.004
Rajiah, K., Coumaravelou, S., & Ying, O. W. (2014). Relationship of test anxiety, psychological distress and
academic motivation among first year undergraduate pharmacy students. International Journal of Applied
Psychology,4(2), 68–72. doi: 10.5923/j.ijap.20140402.04
Ree, M. J., Earles, J. A., & Teachout, M. S. (1994). Predicting job performance: Not much more than g. Journal of
applied psychology,79(4), 518–524. doi: 10.1037/0021-9010.79.4.518
Reeve, C. L., & Lam, H. (2007). Consideration of g as a common antecedent for cognitive ability test performance,
test motivation, and perceived fairness. Intelligence,35(4), 347-358. doi:
https://doi.org/10.1016/
j.intell.2006.08.006
Richardson, K., & Norgate, S. H. (2015). Does iq really predict job performance? Applied Developmental Science,
19(3), 153-169. doi: 10.1080/10888691.2014.983635
Ritchie, S. J., Bates, T. C., & Deary, I. J. (2015). Is education associated with improvements in general cognitive
ability, or in specific skills? Developmental psychology,51(5), 573–582. doi: 10.1037/a0038981
Ritchie, S. J., & Tucker-Drob, E. M. (2018). How much does education improve intelligence? a meta-analysis.
Psychological science,29(8), 1358–1369. doi: 10.1177/0956797618774253
Rosenberg, I. B. (2009). Height discrimination in employment. SSRN Electronic Journal. doi:
10.2139/
ssrn.1344817
Roth, P. L., Huffcutt, A. I., & Bobko, P. (2003). Ethnic group differences in measures of job performance: A new
meta-analysis. Journal of Applied Psychology,88(4), 694. doi: 10.1037/0021-9010.88.4.694
Rothstein, H. R. (1990). Interrater reliability of job performance ratings: Growth to asymptote level with
increasing opportunity to observe. Journal of Applied Psychology,75(3), 322–327. doi:
10.1037/0021-9010
.75.3.322
Sackett, P. R. (2014). When and why correcting validity coefficients for interrater reliability makes sense.
Industrial and Organizational Psychology,7(4), 501–506. doi: 10.1111/iops.12185
Sackett, P. R., & Ostgaard, D. J. (1994). Job-specific applicant pools and national norms for cognitive ability
tests: Implications for range restriction corrections in validation research. Journal of Applied Psychology,79(5),
680–684. doi: 10.1037/0021-9010.79.5.680
16
Published: 12th of February 2023 OpenPsych
Salgado, J. F., & Anderson, N. (2003). Validity generalization of gma tests across countries in the euro-
pean community. European Journal of Work and Organizational Psychology,12(1), 1-17. doi:
10.1080/
13594320244000292
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., & De Fruyt, F. (2003). International validity generalization
of gma and cognitive abilities: A european community meta-analysis. Personnel Psychology,56(3), 573–605.
doi: 10.1111/j.1744-6570.2003.tb00751.x
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., De Fruyt, F., & Rolland, J. P. (2003). A meta-analytic study
of general mental ability validity for different occupations in the european community. Journal of applied
psychology,88(6), 1068. doi: 10.1037/0021-9010.88.6.1068
Salgado, J. F., & Moscoso, S. (1996). Meta-analysis of interrater reliability of job performance ratings in validity
studies of personnel selection. Perceptual and Motor skills,83(3), 1195–1201. doi:
10.2466/pms.1996.83.3f
.1195
Salgado, J. F., & Moscoso, S. (2019). Meta-analysis of the validity of general mental ability for five performance
criteria: Hunter and hunter (1984) revisited. Frontiers in Psychology,10. doi: 10.3389/fpsyg.2019.02227
Schmidt, F. L. (2002). The role of general cognitive ability and job performance: Why there cannot be a debate.
Human Performance,15(1-2), 187-210. doi: 10.1080/08959285.2002.9668091
Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of work: occupational attainment and job
performance. Journal of personality and social psychology,86(1), 162–173. doi:
10.1037/0022-3514.86.1.162
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Psychological bulletin,124(2), 262–274.
doi: 10.1037/0033-2909.124.2.262
Schmidt, F. L., Hunter, J. E., & Outerbridge, A. N. (1986). Impact of job experience and ability on job knowledge,
work sample performance, and supervisory ratings of job performance. Journal of Applied Psychology,71(3),
432–439. doi: 10.1037/0021-9010.71.3.432
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Goff, S. (1988). Joint relation of experience and ability
with job performance: Test of three hypotheses. Journal of Applied psychology,73(1), 46–57. doi:
10.1037/
0021-9010.73.1.46
Schmidt, F. L., Oh, I.-S., & Le, H. (2006). Increasing the accuracy of corrections for range restriction: Implications
for selection procedure validities and other research results. Personnel Psychology,59(2), 281–305. doi:
10.1111/j.1744-6570.2006.00065.x
Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased accuracy for range restriction corrections: Implications
for the role of personality and general mental ability in job and training performance. Personnel Psychology,
61(4), 827–868. doi: 10.1111/j.1744-6570.2008.00132.x
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Metaanalyses of validity studies published between
1964 and 1982 and the investigation of study characteristics. Personnel Psychology,37(3), 407–422. doi:
https://doi.org/10.1111/j.1744-6570.1984.tb00519.x
Schulte, M. J., Ree, M. J., & Carretta, T. R. (2004). Emotional intelligence: not much more than g and personality.
Personality and Individual Differences,37(5), 1059–1068. doi: 10.1016/j.paid.2003.11.014
Shen, W., Cucina, J. M., Walmsley, P. T., & Seltzer, B. K. (2014). When correcting for unreliability of job
performance ratings, the best estimate is still. 52. Industrial and Organizational Psychology,7(4), 519–524. doi:
10.1111/iops.12187
Sjöberg, S., Sjöberg, A., Näswall, K., & Sverke, M. (2012). Using individual differences to predict job performance:
Correcting for direct and indirect restriction of range. Scandinavian journal of psychology,53(4), 368–373. doi:
10.1111/j.1467-9450.2012.00956.x
Sternberg, R. J. (2015). Competence versus performance models of people and tests: A commentary on
richardson and norgate. Applied Developmental Science,19(3), 170-175. doi:
10.1080/10888691.2015
.1008920
17
Published: 12th of February 2023 OpenPsych
Sternberg, R. J., Grigorenko, E. L., & Bundy, D. A. (2001). The predictive value of iq. Merrill-Palmer Quarterly,
47(1), 1–41. Retrieved from https://www.jstor.org/stable/23093686
Strenze, T. (2015). Intelligence and success. In S. Goldstein, D. Princiotta, & J. A. Naglieri (Eds.), Handbook of
intelligence: Evolutionary theory, historical perspective, and current concepts (pp. 405–413). Springer New York.
doi: 10.1007/978-1-4939-1562-0_25
te Nijenhuis, J., Jongeneel-Grimen, B., & Kirkegaard, E. O. (2014). Are headstart gains on the g factor? a
meta-analysis. Intelligence,46, 209-215. doi: 0.1016/j.intell.2014.07.001
te Nijenhuis, J., van Vianen, A. E., & van der Flier, H. (2007). Score gains on g-loaded tests: No g. Intelligence,
35(3), 283-300. doi: 10.1016/j.intell.2006.07.006
te Nijenhuis, J., Voskuijl, O. F., & Schijve, N. B. (2001). Practice and coaching on iq tests: Quite a lot of g.
International Journal of Selection and Assessment,9(4), 302–308. doi: 10.1111/1468-2389.00182
Terborg, J. R. (1977). Validation and extension of an individual differences model of work performance.
Organizational Behavior and Human Performance,18(1), 188–216. doi: 10.1016/0030-5073(77)90028-9
Thurstone, L. L. (1938). Primary mental abilities (New impression ed.). Univ. Chicago P.
Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job
performance measures using confirmatory factor analysis. Journal of Applied Psychology,73(1), 74–80. doi:
10.1037/0021-9010.73.1.74
Viswesvaran, C. (2002). Absenteeism and measures of job performance: A meta-analysis. International Journal
of Selection and Assessment,10(1-2), 12–17. doi: 10.1111/1468-2389.00190
Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance
ratings. Journal of Applied Psychology,81(5), 557–574. doi: 10.1037/0021-9010.81.5.557
Viswesvaran, C., Ones, D. S., Schmidt, F. L., Le, H., & Oh, I.-S. (2014). Measurement error obfuscates
scientific knowledge: Path to cumulative knowledge requires corrections for unreliability and psychometric
meta-analyses. Industrial and Organizational Psychology,7(4), 507–518. doi: 10.1017/S1754942600006799
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance? a
meta-analytic framework for disentangling substantive and error influences. Journal of applied psychology,
90(1), 108–131. doi: 10.1037/0021-9010.90.1.108
Watkins, M. W., Lei, P.-W., & Canivez, G. L. (2007). Psychometric intelligence and achievement: A cross-lagged
panel analysis. Intelligence,35(1), 59-68. doi: 10.1016/j.intell.2006.04.005
Watkins, M. W., & Styck, K. M. (2017). A cross-lagged panel analysis of psychometric intelligence and
achievement in reading and math. Journal of Intelligence,5(3). doi: 10.3390/jintelligence5030031
Wilmot, M. P., & Ones, D. S. (2019). A century of research on conscientiousness at work. Proceedings of the
National Academy of Sciences,116(46), 23004–23010. doi: 10.1073/pnas.1908430116
Witkowski, T. (2014, May 17). From the archives of scientific fraud – stephen breuning. Psychology Gone
Wrong. Retrieved from
https://forbiddenpsychology.wordpress.com/2014/05/17/from-the-archives
-of-scientific-fraud-stephen-breuning/
18