Content uploaded by John Kranzler
Author content
All content in this area was uploaded by John Kranzler on Oct 24, 2017
Content may be subject to copyright.
pits21785 W3G-pits.cls June 23, 2014 21:40
Psychology in the Schools, Vol. 00(00), 2014 C2014 Wiley Periodicals, Inc.
View this article online at wileyonlinelibrary.com/journal/pits DOI: 10.1002/pits.21785
IQs ARE VERY STRONG BUT IMPERFECT INDICATORS OF PSYCHOMETRIC g:
RESULTS FROM JOINT CONFIRMATORY FACTOR ANALYSIS
RYAN L. FARMER AND RANDY G. FLOYD
The University of Memphis
MATTHEW R. REYNOLDS
The University of Kansas
JOHN H. KRANZLER
The University of Florida
The most global score yielded by intelligence tests, IQs, are supported by substantial validity
evidence and have historically been central to the identification of intellectual disabilities, learning
disabilities, and giftedness. This study examined the extent to which IQs measure the ability they
target, psychometric g. Data from three samples of children and adolescents (Ns=200, 150, and
135) who completed varying pairs of individually administered, multidimensional intelligence tests
were analyzed using a joint confirmatory factor analysis to generate correlations between IQs and
general factors representing psychometric g. The resulting values, expressed as gloadings, for the
six IQs ranged from .88 to .95 (M=.92). The accuracy of IQs in measuring psychometric g,the
meaning of reliable specific ability variance in IQs not accounted for by psychometric g,andthe
use of IQs in schools and related settings are discussed. C2014 Wiley Periodicals, Inc.
Interpreting IQs, the most global scores yielded by intelligence tests, remains a controversial
practice in schools, clinics, and employment settings due to perceptions of their limited utility,
accusations of systematic bias against those from minority racial/ethnic groups, and risks associated
with self-fulfilling prophecies (Kranzler & Floyd, 2013; Urbina, 2011). Despite this controversy,
there are at least three reasons for the continued reference to IQs in such settings. First, the federal
legislation requires the use of IQs to determine special education eligibility for intellectual disability
and developmental delay (Individuals with Disabilities Education Improvement Act [IDEIA], 2004),
and diagnostic guidelines continue to stipulate the use of IQs for diagnosis of intellectual disability
(American Association on Intellectual and Developmental Disabilities [AAIDD], 2010; American
Psychiatric Association, 2013). Second, IQs typically have excellent psychometric properties—
including traditional internal consistency reliability estimates that are .95 or higher. Third, IQs
are better predictors of broad academic achievement, job performance, social status, health, and
mortality than any other measurable psychological variable (Deary & Batty, 2011; Hunt, 2011;
Jensen, 1998b). Thus, IQs can provide important information about individuals and the types of
interventions that may best address their needs.
This research was completed as a partial requirement for the first author’s receipt of a doctoral degree in school
psychology at The University of Memphis. Portions of this research were presented at 2011 and 2012 annual meetings
of the National Association of School Psychologists. We thank NCS Pearson, Inc., Lawrence Weiss, William Schryver,
and Ying Meng for providing data from the DAS-II and KABC-II validity studies (Sample 1), and Woodcock–Munoz
Foundation, Richard Woodcock, Fredrick Schrank, and Kevin McGrew for providing data from the WJ III validity
studies (Samples 2 and 3). Standardization data from the Differential Ability Scales−Second Edition (DAS−II).
Copyright C2007 by NCS Pearson, Inc. Used with permission. All rights reserved. Standardization data from the
Wechsler Intelligence Scale for Children−Fourth Edition (WISC−IV). Copyright C2003 by NCS Pearson, Inc. Used
with permission. All rights reserved.
Correspondence to: Ryan L. Farmer, Department of Psychology, The University of Memphis, Memphis, TN
38018. E-mail: rlfarmer@memphis.edu
1
pits21785 W3G-pits.cls June 23, 2014 21:40
2Farmer et al.
PSYCHOMETRIC gAND ITS MEASUREMENT
Q1
Psychometric gis the latent variable underlying all cognitive ability tasks. In the predominant
contemporary theories of the structure of human cognitive abilities, ability factors are arranged
hierarchically in terms of their generality (i.e., the number of other factors with which a particular
factor is correlated). In Carroll’s (1993) three-stratum theory and the closely aligned Cattell–Horn–
Carroll theory (Schneider & McGrew, 2012), psychometric gis located at the top of the hierarchical
structure at the level of stratum III. Stratum II consists of group factors associated with broad classes
of tasks employing similar content or requiring similar cognitive processes, and stratum I comprises
a rather large number of specialized abilities.
Although psychometric gis a latent variable—a psychological construct—that cannot be di-
rectly measured, a composite score can be calculated from a diverse set of cognitive tasks to represent
it (Jensen, 1998b). It is just this method that produces the most commonly used IQs in the prac-
tice of psychology in the schools. It is, however, important to know the extent to which these IQs
measure psychometric g. From a measurement perspective, evaluating the relations between IQs
and latent variables representing psychometric gprovides information about the construct validity
of IQs (American Educational Research Association [AERA], American Psychological Associa-
tion [APA], & National Council on Measurement in Education [NCME], 1999). From a practical
perspective, in high-stakes decision making—especially regarding special education placement and
diagnosis of intellectual disability—it is important to determine how accurately varying IQs measure
psychometric g. It is logical that the better an IQ measures psychometric g, the better the IQ will
(a) reflect the ability to learn new information, grasp concepts, draw distinctions, and infer meaning
from experience and (b) predict key social and health-related outcomes (see Jensen, 1998b).
Evaluation of the extent to which IQs measure psychometric gcan be accomplished through
the calculation of g loadings. These gloadings are standardized coefficients that have a hypothetical
range of .00 to 1.00, and they represent the effect, in standard deviation units, of psychometric g
on intelligence test scores. Typically, gloadings for any intelligence test score of .70 or higher are
considered strong (Floyd, McGrew, Barry, Rafael, & Rogers, 2009; McGrew & Flanagan, 1998).
To date, most researchers who have evaluated gloadings have done so at the level of the subtests.
Subtest gloadings vary widely, from .16 to .87 (Reynolds, Keith, Fine, Fisher, & Low, 2007; Sanders,
McIntosh, Dunham, Rothlisberg, & Finch, 2007; Watkins, Wilson, Kotz, Carbone, & Babula, 2006).
The gloadings of composites representing stratum II abilities have been uniformly higher than
gloadings of subtests, and composite gloading have ranged from .45 to .89 (Floyd et al., 2009;
Maynard, Floyd, Acklie, & Houston, 2011). Despite IQs being used to index psychometric g, their
gloadings have not been widely studied.
DETERMINING THE RELATIONS BETWEEN IQS AND PSYCHOMETRIC g
There are several methods that could be used to determine how accurately IQs measure psy-
chometric g(Jensen, 1998b; McDonald, 1999; Schneider, 2013; Spearman, 1927). Using one of
these methods, Reynolds, Floyd, and Niileksela (2013) calculated hierarchical omega values for IQs
using norming sample data from three popular intelligence tests: the Differential Abilities Scales,
Second Edition (DAS-II; Elliott, 2007), the Kaufman Assessment Battery for Children, Second Edi-
tion (KABC-II; Kaufman & Kaufman, 2004), and the Wechsler Intelligence Scale for Children–IV
(WISC–IV; Wechsler, 2003). In this most comprehensive study of the relations between IQs and
psychometric gto date, the resulting hierarchical omega values represent the g saturation of the IQs
(i.e., the proportion of variance in the IQs that can be attributed to psychometric g), and they are
equivalent to the square root of the gloading (McDonald, 1999). Across the three total norming
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 3
samples and samples at 1-year intervals, Reynolds et al. (2013) found that hierarchical omega values
ranged from .78 to .87. Thus, the IQ gloadings ranged from .88 to .93.
Q2
The methods for measuring the relation between IQs and psychometric gapplied by Reynolds
et al. (2013) and others (see Jensen, 1998b) have both two potential limitations and one clear-cut
limitation. The first potential limitation relates to the authenticity in representing the gloading of the
IQ through the use of methods that rely primarily on statistics derived from subtests. For example,
to obtain the hierarchical omega values, Reynolds et al. (2013) estimated the gloading of IQs based
on analysis of the gloadings of the subtests contributing to the IQ and not based on analysis of the
IQ variable, per se. Although conceptually and psychometrically similar, they are not exactly the
same. A second potential limitation with this method is that it relies on subtest scores from a single
intelligence test to yield gloadings for subtests. These gloadings may vary somewhat based on the
composition of the battery of subtests subjected to the factor analysis (McGrew & Flanagan, 1998;
Woodcock, 1990); therefore, an IQ gloading calculated from subtest gloadings may be unduly
affected by this source of error variance. Although this method of relying on subtest gloadings to
measure the relation between IQs and psychometric gproduces defensible estimates, some skeptics
may not accept its results due to these two potential limitations.
In any case, the clear-cut limitation of this method is that it cannot, at present, be used to
obtain gloadings for IQs that result from differential weighting of subtest scores. Although IQs
are typically calculated from summing equal-weighted subtest scores, one intelligence test, the
Woodcock–Johnson III Tests of Cognitive Abilities (WJ III; Woodcock, McGrew, & Mather, 2001),
employs differential weighting of subtest scores in calculating its IQ. In a manner consistent with
the calculation of factor scores that represent the latent variables derived from factor analysis, the
WJ III subtests are weighted to contribute to the WJ III General Intellectual Ability (GIA) scores
at levels proportional to their gloadings (McGrew & Woodcock, 2001). Thus, hierarchical omega
values and related statistics cannot accurately represent the gsaturation of the WJ III GIA, because
the effects of differential weighting of subtests on its variance cannot be considered in the analysis.
An alternative method to relying on subtest gloadings from a single intelligence test in the
calculation of IQ gloadings was offered by Keith, Kranzler, and Flanagan (2001). Drawing data from
a sample of children and adolescents who completed two intelligence tests, the WJ III (Woodcock
et al., 2001) and the Cognitive Assessment System (CAS; Naglieri & Das, 1997), Keith et al.
(2001) completed a joint confirmatory factor analysis to produce a gloading for global composite
score from the CAS, called the Full Scale score. After demonstrating that a second-order general
Q3
factor (representing psychometric g) affecting WJ III subtest scores and a second-order general
factor affecting CAS subtest scores were effectively perfectly correlated, they examined the relation
between the WJ III general factor and CAS Full Scale score. Keith et al.’s (2001) method provides
researchers with the opportunity to produce IQ gloadings without the previously identified potential
limitations associated with using subtest scores from a single intelligence test. We know of no other
published study in which this method was used to generate gloadings for IQs from individually
administered intelligence tests.
PURPOSE AND CONTRIBUTIONS
In the current study, we sought to determine the gloadings of IQs derived from five mainstream,
multidimensional intelligence tests using the same method as Keith et al. (2001). After Floyd,
Reynolds, Farmer, and Kranzler (2013), using a joint confirmatory factor analysis, established that
three pairs of second-order general factors representing psychometric gfrom six intelligence tests
were effectively perfectly correlated, this study examined the relations between these general factors
and the IQs from the same intelligence tests.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
4Farmer et al.
This study contributes to the existing body of research in four ways. First, it models a straight-
forward and conceptually clear way to produce gloading of an IQ—by correlating it with a latent
variable representing psychometric gderived from another intelligence test. Other methods of ob-
taining such gloadings are not as clear or straightforward. Second, it promotes examination of the g
loadings of IQs from tests not targeted in previous studies. These tests include the WJ III (Woodcock
et al., 2001) as well as the immediately previous editions of both the WISC-IV (Wechsler, 2003) and
the DAS-II (Elliott, 2007). Third, the comparison of the results of this study to those from Reynolds
et al. (2013) provides insights into the effects of the methods used to determine IQ gloadings.
For example, if the IQ gloadings from this study are uniformly lower than those from Reynolds
et al. (2013), they may indicate that Keith et al.’s (2001) method, which produced a relatively low
gloading for the CAS Full Scale score, produces downwardly biased estimates or that hierarchical
omega values are somehow upwardly biased.
Finally, based on findings by Reynolds et al. (2013) and Schneider (2013) indicating that stratum
II and stratum I factors explain 5–10% of variance in IQs beyond psychometric gand subtest-specific
influences can account for another 6% of variance in IQs beyond psychometric g(Schneider, 2013),
we examined the specificity of IQs using the traditional analytic methods (see Floyd et al., 2009;
McGrew & Flanagan, 1998). Specificity refers to reliable variance associated with a variable that is
independent of psychometric g. Using the traditional analytic methods, specificity can be attributed
to only undefined lower stratum abilities uncorrelated with psychometric g. Because IQs are intended
to index psychometric g, specificity represents construct-irrelevant influences on IQs (AERA, APA,
NCME, 1999).
METHOD
Participants
Three independent samples, obtained from the norming and validation process supporting two
intelligence tests, the DAS-II (Elliott, 2007) and WJ III (Woodcock et al., 2001), with permission
from NCS Pearson, Inc. and the Woodcock–Mu˜
noz Foundation, were employed in this study.
Demographic data for each sample are available in Table 1, and details are provided in the sections
that follow.
Sample 1. As previously described by Elliott (2007), 200 children ages 6–16, 11 months completed
the DAS-II (Elliott, 2007) followed by the WISC-IV (Wechsler, 2003).
Sample 2. As previously described by several others (Floyd, Bergeron, McCormack, Anderson,
& Hargrove-Owens, 2005; Floyd, Clark, & Shadish, 2008; McGrew & Woodcock, 2001; Phelps,
McGrew, Knopik, & Ford, 2005), 150 randomly selected children ages 8 –12, 4 months randomly
selected from three public elementary schools completed the WJ III (Woodcock et al., 2001) and
WISC-III (Wechsler, 1991) in a counterbalanced order.
Sample 3. As previously described by several others (Floyd et al., 2005, 2005; McGrew &
Woodcock, 2001; Sanders et al., 2007), 135 children ages 8–12, 3 months randomly selected from
public and private elementary schools completed the DAS (Elliott, 1990) and WJIII (Woodcock
et al., 2001)in a counterbalanced order.
Measures
Differential Ability Scales. A total of six subtests are used to calculate the DAS (Elliott, 1990)
General Conceptual Ability (GCA) composite. The GCA has a mean of 100 and standard deviation
of 15. The DAS GCA’s mean internal consistency reliability coefficient, based on its norming sample
data, was .95 across ages 6–17, 11 months. The DAS GCA correlated strongly with the WISC-III
Full Scale IQ (FSIQ; Wechsler, 1991), r=.92 (Elliott, 1990).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 5
Tab l e 1
Demographic Data for the Three Samples
Characteristic Sample 1 Sample 2 Sample 3
Intelligence tests DAS-II and WISC-IV WISC-III and WJ III WJ III and DAS
Total sample size 200 150 135
Mean age in years (SD) 11.27 (3.42) 9.79 (.89) 10.04 (.88)
Girls (percentage) 100 (50%) 66 (44%) 69 (51.1%)
Race/ethnicity (percentage)
African American 50 (25%) 2 (1.3%) 6 (4.4%)
Asian 13 (6.5%) – –
Hispanic 54 (27%) – –
Native American – – –
White 70 (35%) 148 (98.7%) 127 (94.1%)
Other 13 (6.5%) – –
Did not report – – 2 (1.5%)
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale, Fourth Edition; WJ III=Woodcock–
Johnson III Tests of Cognitive Abilities.
Differential Ability Scales, Second Edition. A total of six subtests are used to calculate the
DAS-II (Elliott, 2007) GCA composite. The GCA has a mean of 100 and standard deviation of 15.
The DAS-II GCA’s mean internal consistency reliability coefficient, based on its norming sample
data, was .96 across ages 7–17, 11 months. The DAS-II GCA correlated strongly with the DAS GCA
(r=.88) and the WISC-IV FSIQ (r=.84; Elliott, 2007).
Wechsler Intelligence Scale for Children, Fourth Edition. A total of 10 subtests are used to
calculate the WISC-IV (Wechsler, 2003) FSIQ. The FSIQ has a mean of 100 and standard deviation
of 15. The WISC-IV FSIQ’s mean internal consistency reliability coefficient, based on its norming
sample data, was .97 across ages 6–16, whereas its test–retest reliability was .89 in a smaller sample
(Wechsler, 2003). The WISC-IV FSIQ correlated strongly with the WISC-III FSIQ (r=.89).
Wechsler Intelligence Scale for Children, Third Edition. A total of 10 WISC-III (Wechsler,
1991) subtests are used to calculate the FSIQ. The WISC-III FSIQ’s mean internal consistency
reliability coefficient, based on its norming sample data, was .96 across ages 6–16. The test–retest
reliability coefficient was .93 in a smaller sample (Wechsler, 1991). The WISC-III FSIQ correlated
strongly with the DAS GCA (r=.84).
Woodcock–Johnson III Tests of Cognitive Abilities. A total of seven subtests are used to
calculate the WJ III (Woodcock et al., 2001) GIA, Standard (GIA-Std). The WJ III employs differ-
ential weighting of subtests values contributing to the GIA-Std (which varies across age groups) to
represent the varying strength of relations between subtests and the general factor underlying the
GIA-Std. The GIA-Std has a mean of 100 and standard deviation of 15. The WJ III GIA-Std’s mean
internal consistency reliability coefficient was .97 across ages 8–12 in its norming sample; further,
the test–retest reliability coefficient was .98 in a smaller sample (McGrew & Woodcock, 2001). The
WJ III GIA-Std correlated strongly with the WISC-III FSIQ (r=.71) and the DAS GCA (Elliott,
1990; r=.67).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
6Farmer et al.
Analyses
All modeling was conducted using Mplus version 6.1 (Muth´
en & Muth´
en, 1998–1998–2010).
There were no missing data from Sample 1. From Sample 2, two participants did not complete the
WISC-III, but there were no other missing data. For this dataset, Little’s test (Little, 1988) revealed
that the hypothesis of data missing completely at random (MCAR) could not be rejected, χ2(44,
N=150) =39.632, p=.659. From Sample 3, two participants did not complete the DAS, and
another two participants did not complete the WJ III. There were also occasional other missing data
for four WJ III variables. Little’s test revealed that the null MCAR hypothesis could not be rejected,
χ2(113, N=135) =128.713, p=.148. Due to the nature and number of missing data in Samples
2 and 3, maximum likelihood estimation was employed to accommodate the presence of missing
data. According to Baraldiand Enders (2010), “Rather than filling in the missing values, maximum
likelihood uses all of the available data – complete and incomplete – to identify the parameter values
that have the highest probability of producing the sample data” (p. 18).
Following Keith et al. (2001), we constructed models in which a second-order general factor
modeled from subtest scores from one intelligence test was correlated with an IQ from the other
intelligence test. Floyd et al. (2013) previously demonstrated, through the use of joint confirmatory
factor analysis, that the second-order general factors representing psychometric gfrom the pair of
tests included in each of the three samples included in this study are effectively perfectly correlated.
Figure 1 contains models for the two tests associated with Sample 1. As is evident on the left half of
Panel A and right half of Panel 2, the models included multiple first-order factors and a second-order
general factor derived from one intelligence test. Panel A presents the factor structure model for the
DAS-II and Panel B presents the factor structure model for the WISC-IV. In addition, the models
specify that the second-order general factor from one intelligence test is correlated with the IQ from
the other test. This correlation is represented by the curved arrows in Figure 1; it represents the IQ
gloading. In Figure 1, the rectangle to the right side of Panel A is the IQ from the WISC-IV, the
FSIQ; it is correlated with the DAS-II second-order general factor. The rectangle to the left side of
Panel B is the IQ from the DAS-II, the GCA; it is correlated with the WISC-IV second-order general
factor. This series of modeling and analysis steps was followed for the three pairs of intelligence tests
across samples. To produce the most accurate results and to better compare the results of this study
to those of Reynolds et al. (2013) who employed nationally representative norm sample data, the IQ
gloadings reported in this study were corrected for range restriction and expansion (controlling for
sampling error; Cohen, Cohen, West, and Aiken (2003, p. 56).1
RESULTS
Descriptive Statistics
IQs. As shown in Table 2, the mean IQs for the three samples ranged from 99.89 to 106.76 (M
=104.88). Only the mean for the DAS-II GCA from Sample 1 was lower than the population mean;
all others were at least one point higher and up to almost 9 points higher than the population mean.
Standard deviations ranged between 11.46 and 15.78 (M=13.31), and all but one were smaller than
the population standard deviation of 15 and demonstrated evidence of restriction of range.
Subtests. The general factors specified in this study were derived from subtest data from each
intelligence test. Descriptive statistics for each subtest and data set are presented in supplemental
materials, which are available upon request.
1To control for slight variation in the internal consistency reliability of the IQs when making comparisons across
intelligence tests, gloadings were also disattenuated using the formula offered by Cohen et al. (2003, p. 58). They are
not reported herein, but they can be obtained by contacting the second author.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 7
FIGURE 1. A joint factor model of the relations between second-order general factors and IQs. Panel A includes a model in
which the general factor is formed using the Differential Ability Scales, Second Edition (DAS-II) subtests and the IQ is the
Full Scale IQ (FSIQ) of the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV). Panel B includes a model
in which the general factor is formed using the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV) subtests
and the IQ is the General Conceptual Ability (GCA) of the Differential Ability Scales, Second Edition (DAS-II). Rectangles
represent subtest scores, and large ovals represent psychometric gand first-order factors. Small ovals above the large ovals
represent first-order factor unique variances (measuring specific abilities not due to psychometric gor variance specific to
subtests). Curved arrows represent correlations. Measurement residuals that contain specific variance and measurement error
associated with subtest scores are not shown. g=psychometric g,Gc=comprehension–knowledge, Gs =processing speed,
Gv =visual processing, Gf =fluid reasoning, Gsm =short-term memory, Speed of Info. Processing =speed of information
processing, Seq. & Qual. Reasoning =sequential and qualitative reasoning, and Recall of Seq. Order =recall of sequential
order.
Tab l e 2
Means and Standard Deviations for IQs across Samples
Characteristic Sample 1 Sample 2 Sample 3
IQ WISC-IV FSIQ DAS-II GCA WISC-III FSIQ WJ III GIA DAS GCA WJ III GIA
Mean 103.28 99.89 106.76 103.10 109.08 107.15
SD 13.37 11.78 12.92 11.46 13.47 15.78
Note. Means for IQs are expected to be 100, and standard deviations are expected to be 15.
DAS GC A =Differential Ability Scale, General Conceptual Ability; DAS-II GCA =Differential Ability Scale, Second
Edition, General Conceptual Ability; WISC-III FSIQ =Wechsler Intelligence Scale for Children, Third Edition, Full Scale
IQ; WISC-IV FSIQ =Wechsler Intelligence Scale, Fourth Edition, Full Scale IQ; WJ III GIA =Woodcock–Johnson III
Tests of Cognitive Abilities, General Intellectual Ability-Standard.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
8Farmer et al.
Tab l e 3
Fit Statistics and Indexes for All Models across Samples
Sample Test Yielding IQ Test Yielding gFactor χ2pdfCFI TLI RMSEA (90% CI) SRMR
Sample 1 WISC-IV DAS-II 65.78 .001 40 .966 .953 .057 (.031, .081) .040
Sample 1 DAS-II WISC-IV 181.49 .001 101 .941 .930 .063 (.048, .078) .050
Sample 2 WISC-III WJ III 130.17 .120 114 .970 .964 .031 (.000, .053) .058
Sample 2 WJ III WISC-III 72.49 .310 61 .983 .978 .035 (.000, .064) .050
Sample 3 DAS WJ III 89.81 .350 73 .966 .958 .041 (.000, .068) .058
Sample 3 WJ III DAS 37.10 .590 73 .962 .943 .064 (.013, .102) .054
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale for Children, Fourth Edition; WJ III
=Woodcock–Johnson III Tests of Cognitive Abilities. CFI =comparative fit index, TLI =Tucker–Lewis index, RMSEA =
root mean square error of approximation, SRMR =standardized root mean square residual.
Tab l e 4
Uncorrected g Loadings, Range-Corrected g Loadings, g Saturation, and Specificity
IQ Sample Uncorrected gLoading Range-Corrected gLoading gSaturation (%) Specificity (%)
DAS GC A 3 .90 .9 1 8 3 12
DAS-II GCA 1 .87 .92 85 11
WISC-III FSIQ 2 .84 .88 77 19
WISC-IV FSIQ 1 .92 .94 88 9
WJ III GIA 2 .92 .95 90 7
WJ III GIA 3 .93 .94 88 9
M/SD – .90/.04 .92/.03 85/4.73 11/4.19
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale for Children, Fourth Edition; WJ III
=Woodcock–Johnson III Tests of Cognitive Abilities.
Joint Confirmatory Factor Analysis
Although this study is focused on the gloadings of the IQs, considering the quality of the models
from which these values are derived remains important. Multiple stand-alone indicators of model fit
are reported in Table 3. Comparative fit index values and Tucker–Lewis index values greater than .95
indicate adequate fit, whereas values approaching .97 indicate excellent fit. Root mean square error
of approximation (RMSEA) values and the standardized root mean square residual (SRMR) values
of .05 or less indicate excellent fit, whereas the RMSEA may be as high as .08 and the SRMR may
be as high as .10 for adequate fit. These values, taken as a whole, indicate the models we employed
to generate IQ gloadings demonstrate adequate to excellent fit (Schermelleh-Engel, Moosbrugger,
&M
¨
uller, 2003).
As evident in Table 4, uncorrected gloadings for the IQs ranged from .84 to .93 (M=.90,
SD =.04), and range-corrected gloadings ranged from .88 to .95 (M=.92, SD =.03). To obtain
estimates of the percentage of variance in IQs attributable to psychometric g(a.k.a. gsaturation),
the range-corrected IQ gloadings were squared; these values ranged from 77% to 90% (M=85%,
SD =4.73%). Specificity estimates were also calculated by (a) subtracting the gsaturation value
for an IQ from its mean internal consistency coefficient across relevant age groups (as reported in
the “Measures” section) and (b) expressing the difference as a percentage. Specificity values ranged
from 7% to 19% (M=11%, SD =4.19%).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 9
DISCUSSION
Given that the most global scores produced by intelligence tests (IQs) are widely used in
schools, clinics, and employment settings to estimate psychometric gand to make certain diagnoses
and special education classification decisions, it is important to know the extent to which IQs actually
measure psychometric g. Evaluating the relations between IQs and latent variables representing
psychometric gprovides information about the construct validity of these IQs (AERA, APA, NCME,
1999) and contributes to a better understanding of the evidence supporting current practices in
interpreting IQs. Based on our results, IQs are very strong indicators of psychometric g; the average
range-corrected IQ gloading was .92 (SD =.03, range =.88 to .95). All IQs well exceeded the
traditional lower end standard for a strong gloading (i.e., .70; Floyd et al., 2009; McGrew &
Flanagan, 1998) that has been applied to intelligence test subtests and stratum II composites. These
findings support the use of IQs as indicators of psychometric g.
The results of this study and those from Reynolds et al. (2013) are remarkably consistent—
despite the fact that Reynolds et al. (2013) employed a different methodology, different samples,
and some different intelligence tests. According to results presented by Reynolds et al. (2013), g
loadings from the DAS-II, KABC-II, and WISC–IV IQs ranged from .88 to .93 across samples
representing 1-year divisions of the tests’ norming samples; this range of values matches those in
this study (.88–.95) extremely well. Considering the specific tests shared across these two studies,
Reynolds et al.’s (2013) analysis yielded a gloading of .91 for the DAS-II GCA for the total DAS-II
norm sample and a range of gloadings from .89 to .93 across the 1-year divisions; the current study
yielded a range-corrected gloading of .92 for the DAS-II GCA. Reynolds et al.’s (2013) analysis
yielded a gloading of .91 for the WISC-IV FSIQ for the total WISC-IV norm sample and a range
of gloadings from .89 to .92 across 1-year divisions of the norm sample; the current study yielded
a range-corrected gloading of .94 for the WISC-IV FSIQ. Furthermore, these results are consistent
with Schneider’s (2013) analysis of the WISC-IV FSIQ that produced an IQ gloading of .93, which
is almost identical to our finding. It appears that there is little evidence to support the claim that
using subtest gloadings from a single intelligence test to estimate the IQ gloadings for the same
test produces biased results (see Introduction).
Because hierarchical omega values (like those produced by Reynolds et al., 2013) and related
statistics cannot accurately represent the gsaturation of the WJ III GIA-Std due to it employing
weighting of subtests, this study targeted this IQ to obtain its gloading. Across the six IQ gloadings
produced in this study, the WJ III GIA-Std produced the highest range-corrected gloading (.95) in
one analysis and tied the WISC-IV FSIQ for second highest in another analysis (.94). Although these
values are not substantially greater than those from the IQs from the DAS-II, DAS, and WISC-III
found in this study, their slightly higher magnitude may be due to the WJ III’s weighting procedures
in which subtests contribute to the GIA-Std at levels proportional to their gloadings in the same
manner as factor scores do (Schneider, 2013).
Nevertheless, based on our findings of minimal differences in IQ gloadings across intelligence
tests and similar findings from Reynolds et al. (2013), it is unclear exactly what characteristics lead
to higher and lower IQ gloadings across the intelligence tests we targeted. Across the small body
of research to which this study belongs, the outlier across IQ gloadings is that from the CAS Full
Scale score (.79 produced by Keith et al., 2001). It is clear that this lower gloading for the CAS
Full Scale score was not due to the method employed by Keith et al. (2001) because our findings,
stemming from use of Keith et al.’s (2001) method, are so similar to Reynolds et al. (2013). As
noted by Keith et al. (2001), the CAS subtests contributing to its Full Scale score were not designed
to target psychometric gand have demonstrated lower gloadings (see Canivez, 2011) than those
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
10 Farmer et al.
subtests employed in this study and the Reynolds et al. (2013) study. More systematic research is
needed to understand the reasons for differences in IQ gloadings.
Specificity values represent the percentage of variance attributable to stratum II and stratum
I abilities independent of psychometric g. In this study, these values represented 7–19% of the IQ
score variance (M=11%) and indicate that, in addition to measurement error, IQs are influenced
by abilities that are not psychometric g. This range of specificity values is very similar to the range
of variance in the DAS-II, KABC-II, and WISC–IV IQs attributed to stratum II abilities (5–12%)
by Reynolds et al. (2013) and the percentage of variance in the WISC–IV FSIQ (16%) attributable
to stratum II and stratum I abilities as well as WISC-IV subtests by Schneider (2013). Although the
percentage of variance attributed to specific (non-g) ability influences is relatively small (McGrew &
Flanagan, 1998), and the variance in IQs attributable to psychometric gis almost 7 to more than 13
times greater than that attributable to more specific abilities, these specificity values are nontrivial.
Specificity evident in IQs weakens their construct validity and likely contributes to lower score
exchangeability across IQs (Floyd et al., 2008). The presence of these specific ability influences
on IQs represents a sobering reality across measurement from a hierarchical ability framework: no
cognitive ability can be indexed with perfect accuracy without construct-irrelevant influences from
another stratum (Carroll, 1993; Gustafsson, 2002).
Limitations
There are some limitations associated with our samples and our methodology that should
be addressed. Although participants from Samples 2 and 3 were randomly selected from larger
populations of children, sampling was geographically limited and somewhat age-restricted and
range-restricted, reducing the generalization of our findings. Future researchers should employ
larger and representative samples of children and adolescents, as did Reynolds et al. (2013). In terms
of methodology, although the correlation for each pair of general factors modeled from data from
each sample were effectively perfectly correlated, two g-factor correlations were not exactly 1.0
(.95 and .97, see Floyd et al., 2013). This imprecision may have affected the resultant gloadings
reported in this study. Finally, gloadings change as a function of g(Detterman & Daniel, 1989). It
has been shown that the accuracy of IQs in measuring gdecreases as gincreases (Reynolds, 2013;
Tucker-Drob, 2009), but we did not test for these effects.
Implications
Our results have implications for psychologists’ test use in schools, clinics, and employment
settings. First, results of this study show that IQs are very strong and reasonably accurate measures
of psychometric gacross children and adolescents. Thus, based on prior research, all IQs targeted
in this study should be expected to reflect the ability to learn new information, grasp concepts,
draw distinctions, and infer meaning from experience and to predict key social and health-related
outcomes later in adulthood (see Deary & Batty, 2011). Thus, evidence-based practices should
include IQs in some situations. For example, in school contexts, IQs provide a benchmark against
which to compare academic achievement in the present and near future, and knowledge of IQs may
be useful for matching children and adolescents who are experiencing educational difficulties to
their optimal learning environments (Frisby, 2013; Jensen, 1998a).
Second, the results of this study indicate that psychologists should be aware that IQs are not
perfectly accurate measures of psychometric g. They are affected not only by a random measurement
error—as are all measurements—but also by lower stratum abilities to some extent. In sum, IQs
are multidimensional like every other test score from the perspective of three-stratum theory and
CHC theory. On one hand, multidimensionality (and thus, lessened accuracy) weakens the construct
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 11
validity of IQs, if gis considered a psychometrically unitary construct (Gottfredson, 2009). On
the other hand, this multidimensionality may enhance the predictive validity of IQs because lower
stratum abilities affecting IQs (as evident in specificity estimates) may add to the prediction of key
outcomes, if these lower stratum abilities are uniquely correlated with these outcomes (Schneider,
2013).
Test developers and others targeting psychometric gin their measurement should consider ways
to control for or reduce the effects of construct-irrelevant influences from lower stratum abilities
on IQs. For example, factors scores reflecting psychometric gappear to reduce the effects of these
confounding influences (see Schneider, 2013). This conclusion may be supported by the finding that
the WJ III GIA-Std produced some of the highest IQ gloadings in this study as it is calculated
based on methods similar to those used in calculating factor scores. Recent research by Schneider
(2013) to produce factor scores from traditional intelligence test scores indicates that this goal may
be nearing actualization in the practice of psychology. Furthermore, to increase confidence in inter-
preting intelligence tests scores, validity-based confidence intervals using gsaturation values (viz.,
hierarchical omega values or squared gloadings) can be calculated and plotted around IQs rather than
the traditional, true-score confidence intervals obtained, in part, from reliability coefficients. These
validity-based confidence intervals are described in detail by Schneider and applied in an example in
Reynolds et al. (2013). Finally, from a pragmatic perspective, researchers should investigate how to
employ both measurement of psychometric gas well as relevant lower stratum abilities to maximize
the predictive power of IQs and other aptitude scores.
CONCLUSIONS
Intelligence tests produce measures of one of the most explanatory variables in all of the social
sciences, psychometric g, and IQs represent this construct more than reasonably well—but not
perfectly. Obtaining IQs for individual children and adolescents, while engaging in special education
eligibility determination and diagnosis of common childhood conditions, continues to be a common
practice. Based on more than a century of research examining the correlates of psychometric g,
considering the IQs of individual children and adolescents may be useful when seeking answers
to their academic problems and prescribing the most effective instructional interventions. School
psychologists and other professionals should engage in evidence-based interpretations of IQs, and
researchers should continue to evaluate the extent to which interpretations of other scores and score
profiles from intelligence tests have a sufficient evidence base to support their application in practice.
REFERENCES
American Association on Intellectual and Developmental Disabilities. (2010). Mental retardation: Definition, classification,
and systems of supports (11th ed.). Washington, DC: Author.
American Educational Research Association, American Psychological Association, & National Council on Measurement
in E7ducation. (1999). Standards for educational and psychological testing. Washington, DC: American Educational
Research Association.
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC:
Author.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48,
5 – 37.
Canivez, G. L. (2011). Hierarchical structure of the Cognitive Assessment System: Variance partitions from the Schmid-
Leiman (1957) procedure. School Psychology Quarterly, 26, 305 – 317.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge University
Press.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral
sciences (3rd ed.). Mahwah, NJ: Erlbaum.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
12 Farmer et al.
Deary, I. J., & Batty, G. D. (2011). Intelligence as a predictor of illness, health, and death. In R. J. Sternberg & S. B. Kaufman
(Eds.), The Cambridge handbook of intelligence (pp. 683 – 707). Cambridge, UK: Cambridge University Press.
Detterman, D. K., & Daniel, M. H. (1989). Correlations of mental tests with each other and with cognitive variables are
highest for low-IQ groups. Intelligence, 13, 349 – 359.
Elliott, C. (1990). Differential Ability Scales. San Antonio, TX: Psychological Corporation.
Elliott, C. (2007). Differential Ability Scales, Second Edition. San Antonio, TX: Psychological Corporation.
Floyd, R. G., Bergeron, R., McCormack, A. C., Anderson, J. L., & Hargrove–Owens, G. L. (2005). Are Cattell–Horn–Carroll
broad ability composite scores exchangeable across batteries? School Psychology Review, 34, 329 – 357.
Floyd, R. G., Clark, M. H., & Shadish, W. R. (2008). The exchangeability of IQs: Implications for professional psychology.
Professional Psychology: Research and Practice, 39, 414 – 423.
Floyd, R. G., McGrew, K. S., Barry, A., Rafael, F. A., & Rogers, J. (2009). General and specific effects on Cattell–Horn–
Carroll broad ability composites: Analysis of the Woodcock–Johnson III Normative Update CHC factor clusters across
development. School Psychology Review, 38, 249– 265.
Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (2013). Are the general factors from different child and
adolescent intelligence tests the same? Results from a five-sample, six-test analysis. School Psychology Review, 42(4).
Q4
Frisby, C. L. (2013). Meeting the psychoeducational needs of minority students: Evidence-based guidelines for school
psychologists and other school personnel. Hoboken, NJ: Wiley.
Gottfredson, L. S. (2009). Logical fallacies used to dismiss the evidence on intelligence testing. In R. Phelps (Ed.), Cor-
recting fallacies about educational and psychological testing (pp. 11 – 65). Washington, DC: American Psychological
Association.
Gustafsson, J.-E. (2002). Measurement from a hierarchical point of view. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.),
The role of constructs in psychological and educational measurement (pp. 73 –96). Mahwah, NJ: Erlbaum.
Hunt, E. (2011). Human intelligence. Cambridge, UK: Cambridge University Press.
Individuals with Disabilities Education Improvement Act of 2004. (2004). 20 U.S.C. § 1401.
Q5
Jensen, A. R. (1998a). The gfactor and the design of education. In R. J. Sternberg & W. M. Williams (Eds.), Intelligence,
instruction, and assessment: Theory into practice (pp. 111 –131). Mahwah, NJ: Erlbaum.
Jensen, A. (1998b). The gfactor: The science of mental ability. Westport, CT: Praeger.
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children, Second Edition. Circle Pines, MN:
American Guidance Service.
Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the Cognitive Assessment System (CAS) measure? Joint
confirmatory factor analysis of the CAS and the Woodcock–Johnson Tests of Cognitive Ability (3rd Edition). School
Psychology Review, 30, 89 – 119.
Kranzler, J. H., & Floyd, R. G. (2013). Assessing intelligence in children and adolescents: A practical guide. New York, NY:
Guilford Press.
Little, J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American
Statistical Association, 83, 1198 – 1202.
Maynard, J. L., Floyd, R. G., Acklie, T.J., & Houston, L. (2011). General factor loadings and specific effects of the Differential
Ability Scales, Second Edition composites. School Psychology Quarterly, 26, 108 –118.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk reference (ITDR): Gf-Gc cross-battery assessment.
Boston, MA: Allyn & Bacon.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock–Johnson III technical manual. Itasca, IL: Riverside Publishing.
Muth´
en,L.K.,&Muth
´
en, B. O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muth´
en&Muth´
en.
Naglieri, J. A., & Das, J. P. (1997). Cognitive Assessment System. Itasca, IL: Riverside Publishing.
Phelps, L., McGrew, K. S., Knopik, S. N., & Ford, L. A. (2005). The general (gfactor), broad, and narrow CHC stratum
characteristics of the WJ III and WISC-III tests: A confirmatory cross-battery investigation.School Psychology Quarterly,
20, 66 – 88.
Reynolds, M. R. (2013). Interpreting intelligence test composite scores in light of Spearman’s law of diminishing returns.
School Psychology Quarterly, 28, 63 – 76.
Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013). How well is psychometric gindexedby global composites? Evidence
from three popular intelligence tests. Psychological Assessment (advance online publication). doi:10.1037/a0034102
Q6
Reynolds, M. R., Keith, T. Z., Fine, J. G., Fisher, M. E., & Low, J. (2007). Confirmatory factor structure of the Kaufman
Assessment Battery for Children—Second Edition: Consistency with Cattell–Horn–Carroll Theory. School Psychology
Quarterly, 22, 511 – 539.
Sanders, S., McIntosh, D. A., Dunham, M., Rothlisberg, B. A., & Finch, H. (2007). Joint confirmatory factor analysis of the
Differential Ability Scales and the Woodcock–Johnson Tests of Cognitive Abilities–Third Edition. Psychology in the
Schools, 44, 119 – 138.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 13
Schermelleh-Engel, K., Moosbrugger, H., & M¨
uller, H. (2003). Evaluating the fit of structural equation models: Tests of
significance and descriptive goodness-of-fit measures. Methods of Psychological Research Online, 8, 23 – 74.
Schneider, W. J. (2013). What if we took our models seriously? Estimating latent scores in individuals. Journal of Psychoe-
ducational Assessment, 31, 186 – 201.
Schneider, W. J., & McGrew, K. S. (2012). The Cattell–Horn–Carroll model of intelligence. In D. P. Flanagan & P. Harrison
(Eds.), Contemporary intellectual assessment (3rd ed., pp. 99 –144). New York: Guilford Press.
Spearman, C. (1927). The abilities of man: Their nature and measurement. New York, NY: Macmillan.
Tucker-Drob, E. M. (2009). Differentiation of cognitive abilities across the lifespan. Developmental Psychology, 45, 1097 –
1118.
Urbina, S. (2011). Test of intelligence. In R. J. Sternberg & S. B. Kaufman (Eds.), The Cambridge handbook of intelligence
(pp. 20 – 38). Cambridge, UK: Cambridge University Press.
Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T. (2006). Factor structure of the WechslerIntelligence
Scale for Children-Fourth Edition among referred students. Educational and Psychological Measurement, 66, 975 – 983.
Wechsler, D. (1991). The Wechsler Intelligence Scale for Children–Third Edition. San Antonio, TX: Psychological Corpo-
ration.
Wechsler, D. (2003). The Wechsler Intelligence Scale for Children–Fourth Edition. San Antonio, TX: Psychological Corpo-
ration.
Woodcock, R. W. (1990). The theoretical foundations of the WJ-R measures of cognitive ability. Journal of Psychoeducational
Assessment, 8, 231 – 258.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III Tests of Cognitive Abilities. Itasca, IL:
Riverside Publishing.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
Queries
Q1: Author: Please check levels of all the section headings as typeset for correctness.
Q2: Author: Please check the inserted year of publication “2013” in Reynolds et al. throughout
the text for correctness.
Q3: Author: Please check the inserted year of publication “2001” in Keith et al. throughout
the text for correctness.
Q4: Author: Please provide page range in Floyd et al. (2013).
Q5: Author: Please check Individuals with Disabilities Education Improvement Act of 2004
(2004) as typeset for correctness.
Q6: Author: Please provide volume number and page range in Reynolds et al. (2013), and also
check the same reference as typeset for correctness.
Q1: Please check levels of all section headings as typeset for correctness.
RLF: All headings appear to be correct throughout. No changes are necessary.
Q2: Please check the inserted year of publication “2013” in Reynolds et al. throughout the text for correctness.
RLF: The dates associated with various Reynolds et al. references throughout the manuscript appear to be correct. No
changes are necessary.
Q3: Please check the inserted year of publication “2001” in Keith et al. throughout the text for correctness.
RLF: The dates associated with Keith et al. references throughout the manuscript appear to be correct. No changes are
necessary.
Q4: Please provide page range in Floyd et al. (2013).
RLF: Thank you for catching this error. The complete citation is listed below:
Q5: Please check Individuals with Disabilities Education Act of 2004 (2004) as typeset for correctness.
RLF: Thank you for catching this error. I have provided the complete citation below, to be used instead of the current
placeholder.
Q6: Please provide volume number and page range in Reynolds et al. (2013), and also check the same reference as
typeset for correctness.
RLF: Thank you for bringing this out-of-date reference to my attention. The up-to-date citation is provided below.
Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (2013). Are the general factors from different child and
adolescent intelligence tests the same? Results from a five-sample, six-test analysis. School Psychology Review, 42,
383-401.
Individuals with Disabilities Education Improvement Act of 2004, Pub L. No. 108-446 (2004)
Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013). How well is psychometric g indexed by global composites?
Evidence from three popular intelligence tests. Psychological Assessment, 25, 1314-1321.