ArticlePDF Available

Abstract and Figures

The most global score yielded by intelligence tests, IQs, are supported by substantial validity evidence and have historically been central to the identification of intellectual disabilities, learning disabilities, and giftedness. This study examined the extent to which IQs measure the ability they target, psychometric g. Data from three samples of children and adolescents (Ns = 200, 150, and 135) who completed varying pairs of individually administered, multidimensional intelligence tests were analyzed using a joint confirmatory factor analysis to generate correlations between IQs and general factors representing psychometric g. The resulting values, expressed as g loadings, for the six IQs ranged from .88 to .95 (M = .92). The accuracy of IQs in measuring psychometric g, the meaning of reliable specific ability variance in IQs not accounted for by psychometric g, and the use of IQs in schools and related settings are discussed.
Content may be subject to copyright.
pits21785 W3G-pits.cls June 23, 2014 21:40
Psychology in the Schools, Vol. 00(00), 2014 C2014 Wiley Periodicals, Inc.
View this article online at wileyonlinelibrary.com/journal/pits DOI: 10.1002/pits.21785
IQs ARE VERY STRONG BUT IMPERFECT INDICATORS OF PSYCHOMETRIC g:
RESULTS FROM JOINT CONFIRMATORY FACTOR ANALYSIS
RYAN L. FARMER AND RANDY G. FLOYD
The University of Memphis
MATTHEW R. REYNOLDS
The University of Kansas
JOHN H. KRANZLER
The University of Florida
The most global score yielded by intelligence tests, IQs, are supported by substantial validity
evidence and have historically been central to the identification of intellectual disabilities, learning
disabilities, and giftedness. This study examined the extent to which IQs measure the ability they
target, psychometric g. Data from three samples of children and adolescents (Ns=200, 150, and
135) who completed varying pairs of individually administered, multidimensional intelligence tests
were analyzed using a joint confirmatory factor analysis to generate correlations between IQs and
general factors representing psychometric g. The resulting values, expressed as gloadings, for the
six IQs ranged from .88 to .95 (M=.92). The accuracy of IQs in measuring psychometric g,the
meaning of reliable specific ability variance in IQs not accounted for by psychometric g,andthe
use of IQs in schools and related settings are discussed. C2014 Wiley Periodicals, Inc.
Interpreting IQs, the most global scores yielded by intelligence tests, remains a controversial
practice in schools, clinics, and employment settings due to perceptions of their limited utility,
accusations of systematic bias against those from minority racial/ethnic groups, and risks associated
with self-fulfilling prophecies (Kranzler & Floyd, 2013; Urbina, 2011). Despite this controversy,
there are at least three reasons for the continued reference to IQs in such settings. First, the federal
legislation requires the use of IQs to determine special education eligibility for intellectual disability
and developmental delay (Individuals with Disabilities Education Improvement Act [IDEIA], 2004),
and diagnostic guidelines continue to stipulate the use of IQs for diagnosis of intellectual disability
(American Association on Intellectual and Developmental Disabilities [AAIDD], 2010; American
Psychiatric Association, 2013). Second, IQs typically have excellent psychometric properties—
including traditional internal consistency reliability estimates that are .95 or higher. Third, IQs
are better predictors of broad academic achievement, job performance, social status, health, and
mortality than any other measurable psychological variable (Deary & Batty, 2011; Hunt, 2011;
Jensen, 1998b). Thus, IQs can provide important information about individuals and the types of
interventions that may best address their needs.
This research was completed as a partial requirement for the first author’s receipt of a doctoral degree in school
psychology at The University of Memphis. Portions of this research were presented at 2011 and 2012 annual meetings
of the National Association of School Psychologists. We thank NCS Pearson, Inc., Lawrence Weiss, William Schryver,
and Ying Meng for providing data from the DAS-II and KABC-II validity studies (Sample 1), and Woodcock–Munoz
Foundation, Richard Woodcock, Fredrick Schrank, and Kevin McGrew for providing data from the WJ III validity
studies (Samples 2 and 3). Standardization data from the Differential Ability ScalesSecond Edition (DASII).
Copyright C2007 by NCS Pearson, Inc. Used with permission. All rights reserved. Standardization data from the
Wechsler Intelligence Scale for ChildrenFourth Edition (WISCIV). Copyright C2003 by NCS Pearson, Inc. Used
with permission. All rights reserved.
Correspondence to: Ryan L. Farmer, Department of Psychology, The University of Memphis, Memphis, TN
38018. E-mail: rlfarmer@memphis.edu
1
pits21785 W3G-pits.cls June 23, 2014 21:40
2Farmer et al.
PSYCHOMETRIC gAND ITS MEASUREMENT
Q1
Psychometric gis the latent variable underlying all cognitive ability tasks. In the predominant
contemporary theories of the structure of human cognitive abilities, ability factors are arranged
hierarchically in terms of their generality (i.e., the number of other factors with which a particular
factor is correlated). In Carroll’s (1993) three-stratum theory and the closely aligned Cattell–Horn–
Carroll theory (Schneider & McGrew, 2012), psychometric gis located at the top of the hierarchical
structure at the level of stratum III. Stratum II consists of group factors associated with broad classes
of tasks employing similar content or requiring similar cognitive processes, and stratum I comprises
a rather large number of specialized abilities.
Although psychometric gis a latent variable—a psychological construct—that cannot be di-
rectly measured, a composite score can be calculated from a diverse set of cognitive tasks to represent
it (Jensen, 1998b). It is just this method that produces the most commonly used IQs in the prac-
tice of psychology in the schools. It is, however, important to know the extent to which these IQs
measure psychometric g. From a measurement perspective, evaluating the relations between IQs
and latent variables representing psychometric gprovides information about the construct validity
of IQs (American Educational Research Association [AERA], American Psychological Associa-
tion [APA], & National Council on Measurement in Education [NCME], 1999). From a practical
perspective, in high-stakes decision making—especially regarding special education placement and
diagnosis of intellectual disability—it is important to determine how accurately varying IQs measure
psychometric g. It is logical that the better an IQ measures psychometric g, the better the IQ will
(a) reflect the ability to learn new information, grasp concepts, draw distinctions, and infer meaning
from experience and (b) predict key social and health-related outcomes (see Jensen, 1998b).
Evaluation of the extent to which IQs measure psychometric gcan be accomplished through
the calculation of g loadings. These gloadings are standardized coefficients that have a hypothetical
range of .00 to 1.00, and they represent the effect, in standard deviation units, of psychometric g
on intelligence test scores. Typically, gloadings for any intelligence test score of .70 or higher are
considered strong (Floyd, McGrew, Barry, Rafael, & Rogers, 2009; McGrew & Flanagan, 1998).
To date, most researchers who have evaluated gloadings have done so at the level of the subtests.
Subtest gloadings vary widely, from .16 to .87 (Reynolds, Keith, Fine, Fisher, & Low, 2007; Sanders,
McIntosh, Dunham, Rothlisberg, & Finch, 2007; Watkins, Wilson, Kotz, Carbone, & Babula, 2006).
The gloadings of composites representing stratum II abilities have been uniformly higher than
gloadings of subtests, and composite gloading have ranged from .45 to .89 (Floyd et al., 2009;
Maynard, Floyd, Acklie, & Houston, 2011). Despite IQs being used to index psychometric g, their
gloadings have not been widely studied.
DETERMINING THE RELATIONS BETWEEN IQS AND PSYCHOMETRIC g
There are several methods that could be used to determine how accurately IQs measure psy-
chometric g(Jensen, 1998b; McDonald, 1999; Schneider, 2013; Spearman, 1927). Using one of
these methods, Reynolds, Floyd, and Niileksela (2013) calculated hierarchical omega values for IQs
using norming sample data from three popular intelligence tests: the Differential Abilities Scales,
Second Edition (DAS-II; Elliott, 2007), the Kaufman Assessment Battery for Children, Second Edi-
tion (KABC-II; Kaufman & Kaufman, 2004), and the Wechsler Intelligence Scale for Children–IV
(WISC–IV; Wechsler, 2003). In this most comprehensive study of the relations between IQs and
psychometric gto date, the resulting hierarchical omega values represent the g saturation of the IQs
(i.e., the proportion of variance in the IQs that can be attributed to psychometric g), and they are
equivalent to the square root of the gloading (McDonald, 1999). Across the three total norming
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 3
samples and samples at 1-year intervals, Reynolds et al. (2013) found that hierarchical omega values
ranged from .78 to .87. Thus, the IQ gloadings ranged from .88 to .93.
Q2
The methods for measuring the relation between IQs and psychometric gapplied by Reynolds
et al. (2013) and others (see Jensen, 1998b) have both two potential limitations and one clear-cut
limitation. The first potential limitation relates to the authenticity in representing the gloading of the
IQ through the use of methods that rely primarily on statistics derived from subtests. For example,
to obtain the hierarchical omega values, Reynolds et al. (2013) estimated the gloading of IQs based
on analysis of the gloadings of the subtests contributing to the IQ and not based on analysis of the
IQ variable, per se. Although conceptually and psychometrically similar, they are not exactly the
same. A second potential limitation with this method is that it relies on subtest scores from a single
intelligence test to yield gloadings for subtests. These gloadings may vary somewhat based on the
composition of the battery of subtests subjected to the factor analysis (McGrew & Flanagan, 1998;
Woodcock, 1990); therefore, an IQ gloading calculated from subtest gloadings may be unduly
affected by this source of error variance. Although this method of relying on subtest gloadings to
measure the relation between IQs and psychometric gproduces defensible estimates, some skeptics
may not accept its results due to these two potential limitations.
In any case, the clear-cut limitation of this method is that it cannot, at present, be used to
obtain gloadings for IQs that result from differential weighting of subtest scores. Although IQs
are typically calculated from summing equal-weighted subtest scores, one intelligence test, the
Woodcock–Johnson III Tests of Cognitive Abilities (WJ III; Woodcock, McGrew, & Mather, 2001),
employs differential weighting of subtest scores in calculating its IQ. In a manner consistent with
the calculation of factor scores that represent the latent variables derived from factor analysis, the
WJ III subtests are weighted to contribute to the WJ III General Intellectual Ability (GIA) scores
at levels proportional to their gloadings (McGrew & Woodcock, 2001). Thus, hierarchical omega
values and related statistics cannot accurately represent the gsaturation of the WJ III GIA, because
the effects of differential weighting of subtests on its variance cannot be considered in the analysis.
An alternative method to relying on subtest gloadings from a single intelligence test in the
calculation of IQ gloadings was offered by Keith, Kranzler, and Flanagan (2001). Drawing data from
a sample of children and adolescents who completed two intelligence tests, the WJ III (Woodcock
et al., 2001) and the Cognitive Assessment System (CAS; Naglieri & Das, 1997), Keith et al.
(2001) completed a joint confirmatory factor analysis to produce a gloading for global composite
score from the CAS, called the Full Scale score. After demonstrating that a second-order general
Q3
factor (representing psychometric g) affecting WJ III subtest scores and a second-order general
factor affecting CAS subtest scores were effectively perfectly correlated, they examined the relation
between the WJ III general factor and CAS Full Scale score. Keith et al.’s (2001) method provides
researchers with the opportunity to produce IQ gloadings without the previously identified potential
limitations associated with using subtest scores from a single intelligence test. We know of no other
published study in which this method was used to generate gloadings for IQs from individually
administered intelligence tests.
PURPOSE AND CONTRIBUTIONS
In the current study, we sought to determine the gloadings of IQs derived from five mainstream,
multidimensional intelligence tests using the same method as Keith et al. (2001). After Floyd,
Reynolds, Farmer, and Kranzler (2013), using a joint confirmatory factor analysis, established that
three pairs of second-order general factors representing psychometric gfrom six intelligence tests
were effectively perfectly correlated, this study examined the relations between these general factors
and the IQs from the same intelligence tests.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
4Farmer et al.
This study contributes to the existing body of research in four ways. First, it models a straight-
forward and conceptually clear way to produce gloading of an IQ—by correlating it with a latent
variable representing psychometric gderived from another intelligence test. Other methods of ob-
taining such gloadings are not as clear or straightforward. Second, it promotes examination of the g
loadings of IQs from tests not targeted in previous studies. These tests include the WJ III (Woodcock
et al., 2001) as well as the immediately previous editions of both the WISC-IV (Wechsler, 2003) and
the DAS-II (Elliott, 2007). Third, the comparison of the results of this study to those from Reynolds
et al. (2013) provides insights into the effects of the methods used to determine IQ gloadings.
For example, if the IQ gloadings from this study are uniformly lower than those from Reynolds
et al. (2013), they may indicate that Keith et al.’s (2001) method, which produced a relatively low
gloading for the CAS Full Scale score, produces downwardly biased estimates or that hierarchical
omega values are somehow upwardly biased.
Finally, based on findings by Reynolds et al. (2013) and Schneider (2013) indicating that stratum
II and stratum I factors explain 5–10% of variance in IQs beyond psychometric gand subtest-specific
influences can account for another 6% of variance in IQs beyond psychometric g(Schneider, 2013),
we examined the specificity of IQs using the traditional analytic methods (see Floyd et al., 2009;
McGrew & Flanagan, 1998). Specificity refers to reliable variance associated with a variable that is
independent of psychometric g. Using the traditional analytic methods, specificity can be attributed
to only undefined lower stratum abilities uncorrelated with psychometric g. Because IQs are intended
to index psychometric g, specificity represents construct-irrelevant influences on IQs (AERA, APA,
NCME, 1999).
METHOD
Participants
Three independent samples, obtained from the norming and validation process supporting two
intelligence tests, the DAS-II (Elliott, 2007) and WJ III (Woodcock et al., 2001), with permission
from NCS Pearson, Inc. and the Woodcock–Mu˜
noz Foundation, were employed in this study.
Demographic data for each sample are available in Table 1, and details are provided in the sections
that follow.
Sample 1. As previously described by Elliott (2007), 200 children ages 6–16, 11 months completed
the DAS-II (Elliott, 2007) followed by the WISC-IV (Wechsler, 2003).
Sample 2. As previously described by several others (Floyd, Bergeron, McCormack, Anderson,
& Hargrove-Owens, 2005; Floyd, Clark, & Shadish, 2008; McGrew & Woodcock, 2001; Phelps,
McGrew, Knopik, & Ford, 2005), 150 randomly selected children ages 8 –12, 4 months randomly
selected from three public elementary schools completed the WJ III (Woodcock et al., 2001) and
WISC-III (Wechsler, 1991) in a counterbalanced order.
Sample 3. As previously described by several others (Floyd et al., 2005, 2005; McGrew &
Woodcock, 2001; Sanders et al., 2007), 135 children ages 8–12, 3 months randomly selected from
public and private elementary schools completed the DAS (Elliott, 1990) and WJIII (Woodcock
et al., 2001)in a counterbalanced order.
Measures
Differential Ability Scales. A total of six subtests are used to calculate the DAS (Elliott, 1990)
General Conceptual Ability (GCA) composite. The GCA has a mean of 100 and standard deviation
of 15. The DAS GCA’s mean internal consistency reliability coefficient, based on its norming sample
data, was .95 across ages 6–17, 11 months. The DAS GCA correlated strongly with the WISC-III
Full Scale IQ (FSIQ; Wechsler, 1991), r=.92 (Elliott, 1990).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 5
Tab l e 1
Demographic Data for the Three Samples
Characteristic Sample 1 Sample 2 Sample 3
Intelligence tests DAS-II and WISC-IV WISC-III and WJ III WJ III and DAS
Total sample size 200 150 135
Mean age in years (SD) 11.27 (3.42) 9.79 (.89) 10.04 (.88)
Girls (percentage) 100 (50%) 66 (44%) 69 (51.1%)
Race/ethnicity (percentage)
African American 50 (25%) 2 (1.3%) 6 (4.4%)
Asian 13 (6.5%)
Hispanic 54 (27%)
Native American
White 70 (35%) 148 (98.7%) 127 (94.1%)
Other 13 (6.5%)
Did not report 2 (1.5%)
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale, Fourth Edition; WJ III=Woodcock–
Johnson III Tests of Cognitive Abilities.
Differential Ability Scales, Second Edition. A total of six subtests are used to calculate the
DAS-II (Elliott, 2007) GCA composite. The GCA has a mean of 100 and standard deviation of 15.
The DAS-II GCA’s mean internal consistency reliability coefficient, based on its norming sample
data, was .96 across ages 7–17, 11 months. The DAS-II GCA correlated strongly with the DAS GCA
(r=.88) and the WISC-IV FSIQ (r=.84; Elliott, 2007).
Wechsler Intelligence Scale for Children, Fourth Edition. A total of 10 subtests are used to
calculate the WISC-IV (Wechsler, 2003) FSIQ. The FSIQ has a mean of 100 and standard deviation
of 15. The WISC-IV FSIQ’s mean internal consistency reliability coefficient, based on its norming
sample data, was .97 across ages 6–16, whereas its test–retest reliability was .89 in a smaller sample
(Wechsler, 2003). The WISC-IV FSIQ correlated strongly with the WISC-III FSIQ (r=.89).
Wechsler Intelligence Scale for Children, Third Edition. A total of 10 WISC-III (Wechsler,
1991) subtests are used to calculate the FSIQ. The WISC-III FSIQ’s mean internal consistency
reliability coefficient, based on its norming sample data, was .96 across ages 6–16. The test–retest
reliability coefficient was .93 in a smaller sample (Wechsler, 1991). The WISC-III FSIQ correlated
strongly with the DAS GCA (r=.84).
Woodcock–Johnson III Tests of Cognitive Abilities. A total of seven subtests are used to
calculate the WJ III (Woodcock et al., 2001) GIA, Standard (GIA-Std). The WJ III employs differ-
ential weighting of subtests values contributing to the GIA-Std (which varies across age groups) to
represent the varying strength of relations between subtests and the general factor underlying the
GIA-Std. The GIA-Std has a mean of 100 and standard deviation of 15. The WJ III GIA-Std’s mean
internal consistency reliability coefficient was .97 across ages 8–12 in its norming sample; further,
the test–retest reliability coefficient was .98 in a smaller sample (McGrew & Woodcock, 2001). The
WJ III GIA-Std correlated strongly with the WISC-III FSIQ (r=.71) and the DAS GCA (Elliott,
1990; r=.67).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
6Farmer et al.
Analyses
All modeling was conducted using Mplus version 6.1 (Muth´
en & Muth´
en, 1998–1998–2010).
There were no missing data from Sample 1. From Sample 2, two participants did not complete the
WISC-III, but there were no other missing data. For this dataset, Little’s test (Little, 1988) revealed
that the hypothesis of data missing completely at random (MCAR) could not be rejected, χ2(44,
N=150) =39.632, p=.659. From Sample 3, two participants did not complete the DAS, and
another two participants did not complete the WJ III. There were also occasional other missing data
for four WJ III variables. Little’s test revealed that the null MCAR hypothesis could not be rejected,
χ2(113, N=135) =128.713, p=.148. Due to the nature and number of missing data in Samples
2 and 3, maximum likelihood estimation was employed to accommodate the presence of missing
data. According to Baraldiand Enders (2010), “Rather than filling in the missing values, maximum
likelihood uses all of the available data – complete and incomplete – to identify the parameter values
that have the highest probability of producing the sample data” (p. 18).
Following Keith et al. (2001), we constructed models in which a second-order general factor
modeled from subtest scores from one intelligence test was correlated with an IQ from the other
intelligence test. Floyd et al. (2013) previously demonstrated, through the use of joint confirmatory
factor analysis, that the second-order general factors representing psychometric gfrom the pair of
tests included in each of the three samples included in this study are effectively perfectly correlated.
Figure 1 contains models for the two tests associated with Sample 1. As is evident on the left half of
Panel A and right half of Panel 2, the models included multiple first-order factors and a second-order
general factor derived from one intelligence test. Panel A presents the factor structure model for the
DAS-II and Panel B presents the factor structure model for the WISC-IV. In addition, the models
specify that the second-order general factor from one intelligence test is correlated with the IQ from
the other test. This correlation is represented by the curved arrows in Figure 1; it represents the IQ
gloading. In Figure 1, the rectangle to the right side of Panel A is the IQ from the WISC-IV, the
FSIQ; it is correlated with the DAS-II second-order general factor. The rectangle to the left side of
Panel B is the IQ from the DAS-II, the GCA; it is correlated with the WISC-IV second-order general
factor. This series of modeling and analysis steps was followed for the three pairs of intelligence tests
across samples. To produce the most accurate results and to better compare the results of this study
to those of Reynolds et al. (2013) who employed nationally representative norm sample data, the IQ
gloadings reported in this study were corrected for range restriction and expansion (controlling for
sampling error; Cohen, Cohen, West, and Aiken (2003, p. 56).1
RESULTS
Descriptive Statistics
IQs. As shown in Table 2, the mean IQs for the three samples ranged from 99.89 to 106.76 (M
=104.88). Only the mean for the DAS-II GCA from Sample 1 was lower than the population mean;
all others were at least one point higher and up to almost 9 points higher than the population mean.
Standard deviations ranged between 11.46 and 15.78 (M=13.31), and all but one were smaller than
the population standard deviation of 15 and demonstrated evidence of restriction of range.
Subtests. The general factors specified in this study were derived from subtest data from each
intelligence test. Descriptive statistics for each subtest and data set are presented in supplemental
materials, which are available upon request.
1To control for slight variation in the internal consistency reliability of the IQs when making comparisons across
intelligence tests, gloadings were also disattenuated using the formula offered by Cohen et al. (2003, p. 58). They are
not reported herein, but they can be obtained by contacting the second author.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 7
FIGURE 1. A joint factor model of the relations between second-order general factors and IQs. Panel A includes a model in
which the general factor is formed using the Differential Ability Scales, Second Edition (DAS-II) subtests and the IQ is the
Full Scale IQ (FSIQ) of the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV). Panel B includes a model
in which the general factor is formed using the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV) subtests
and the IQ is the General Conceptual Ability (GCA) of the Differential Ability Scales, Second Edition (DAS-II). Rectangles
represent subtest scores, and large ovals represent psychometric gand first-order factors. Small ovals above the large ovals
represent first-order factor unique variances (measuring specific abilities not due to psychometric gor variance specific to
subtests). Curved arrows represent correlations. Measurement residuals that contain specific variance and measurement error
associated with subtest scores are not shown. g=psychometric g,Gc=comprehension–knowledge, Gs =processing speed,
Gv =visual processing, Gf =fluid reasoning, Gsm =short-term memory, Speed of Info. Processing =speed of information
processing, Seq. & Qual. Reasoning =sequential and qualitative reasoning, and Recall of Seq. Order =recall of sequential
order.
Tab l e 2
Means and Standard Deviations for IQs across Samples
Characteristic Sample 1 Sample 2 Sample 3
IQ WISC-IV FSIQ DAS-II GCA WISC-III FSIQ WJ III GIA DAS GCA WJ III GIA
Mean 103.28 99.89 106.76 103.10 109.08 107.15
SD 13.37 11.78 12.92 11.46 13.47 15.78
Note. Means for IQs are expected to be 100, and standard deviations are expected to be 15.
DAS GC A =Differential Ability Scale, General Conceptual Ability; DAS-II GCA =Differential Ability Scale, Second
Edition, General Conceptual Ability; WISC-III FSIQ =Wechsler Intelligence Scale for Children, Third Edition, Full Scale
IQ; WISC-IV FSIQ =Wechsler Intelligence Scale, Fourth Edition, Full Scale IQ; WJ III GIA =Woodcock–Johnson III
Tests of Cognitive Abilities, General Intellectual Ability-Standard.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
8Farmer et al.
Tab l e 3
Fit Statistics and Indexes for All Models across Samples
Sample Test Yielding IQ Test Yielding gFactor χ2pdfCFI TLI RMSEA (90% CI) SRMR
Sample 1 WISC-IV DAS-II 65.78 .001 40 .966 .953 .057 (.031, .081) .040
Sample 1 DAS-II WISC-IV 181.49 .001 101 .941 .930 .063 (.048, .078) .050
Sample 2 WISC-III WJ III 130.17 .120 114 .970 .964 .031 (.000, .053) .058
Sample 2 WJ III WISC-III 72.49 .310 61 .983 .978 .035 (.000, .064) .050
Sample 3 DAS WJ III 89.81 .350 73 .966 .958 .041 (.000, .068) .058
Sample 3 WJ III DAS 37.10 .590 73 .962 .943 .064 (.013, .102) .054
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale for Children, Fourth Edition; WJ III
=Woodcock–Johnson III Tests of Cognitive Abilities. CFI =comparative fit index, TLI =Tucker–Lewis index, RMSEA =
root mean square error of approximation, SRMR =standardized root mean square residual.
Tab l e 4
Uncorrected g Loadings, Range-Corrected g Loadings, g Saturation, and Specificity
IQ Sample Uncorrected gLoading Range-Corrected gLoading gSaturation (%) Specificity (%)
DAS GC A 3 .90 .9 1 8 3 12
DAS-II GCA 1 .87 .92 85 11
WISC-III FSIQ 2 .84 .88 77 19
WISC-IV FSIQ 1 .92 .94 88 9
WJ III GIA 2 .92 .95 90 7
WJ III GIA 3 .93 .94 88 9
M/SD .90/.04 .92/.03 85/4.73 11/4.19
Note. DAS =Differential Ability Scale; DAS-II =Differential Ability Scale, Second Edition; WISC-III =Wechsler
Intelligence Scale for Children, Third Edition; WISC-IV =Wechsler Intelligence Scale for Children, Fourth Edition; WJ III
=Woodcock–Johnson III Tests of Cognitive Abilities.
Joint Confirmatory Factor Analysis
Although this study is focused on the gloadings of the IQs, considering the quality of the models
from which these values are derived remains important. Multiple stand-alone indicators of model fit
are reported in Table 3. Comparative fit index values and Tucker–Lewis index values greater than .95
indicate adequate fit, whereas values approaching .97 indicate excellent fit. Root mean square error
of approximation (RMSEA) values and the standardized root mean square residual (SRMR) values
of .05 or less indicate excellent fit, whereas the RMSEA may be as high as .08 and the SRMR may
be as high as .10 for adequate fit. These values, taken as a whole, indicate the models we employed
to generate IQ gloadings demonstrate adequate to excellent fit (Schermelleh-Engel, Moosbrugger,
&M
¨
uller, 2003).
As evident in Table 4, uncorrected gloadings for the IQs ranged from .84 to .93 (M=.90,
SD =.04), and range-corrected gloadings ranged from .88 to .95 (M=.92, SD =.03). To obtain
estimates of the percentage of variance in IQs attributable to psychometric g(a.k.a. gsaturation),
the range-corrected IQ gloadings were squared; these values ranged from 77% to 90% (M=85%,
SD =4.73%). Specificity estimates were also calculated by (a) subtracting the gsaturation value
for an IQ from its mean internal consistency coefficient across relevant age groups (as reported in
the “Measures” section) and (b) expressing the difference as a percentage. Specificity values ranged
from 7% to 19% (M=11%, SD =4.19%).
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 9
DISCUSSION
Given that the most global scores produced by intelligence tests (IQs) are widely used in
schools, clinics, and employment settings to estimate psychometric gand to make certain diagnoses
and special education classification decisions, it is important to know the extent to which IQs actually
measure psychometric g. Evaluating the relations between IQs and latent variables representing
psychometric gprovides information about the construct validity of these IQs (AERA, APA, NCME,
1999) and contributes to a better understanding of the evidence supporting current practices in
interpreting IQs. Based on our results, IQs are very strong indicators of psychometric g; the average
range-corrected IQ gloading was .92 (SD =.03, range =.88 to .95). All IQs well exceeded the
traditional lower end standard for a strong gloading (i.e., .70; Floyd et al., 2009; McGrew &
Flanagan, 1998) that has been applied to intelligence test subtests and stratum II composites. These
findings support the use of IQs as indicators of psychometric g.
The results of this study and those from Reynolds et al. (2013) are remarkably consistent—
despite the fact that Reynolds et al. (2013) employed a different methodology, different samples,
and some different intelligence tests. According to results presented by Reynolds et al. (2013), g
loadings from the DAS-II, KABC-II, and WISC–IV IQs ranged from .88 to .93 across samples
representing 1-year divisions of the tests’ norming samples; this range of values matches those in
this study (.88–.95) extremely well. Considering the specific tests shared across these two studies,
Reynolds et al.’s (2013) analysis yielded a gloading of .91 for the DAS-II GCA for the total DAS-II
norm sample and a range of gloadings from .89 to .93 across the 1-year divisions; the current study
yielded a range-corrected gloading of .92 for the DAS-II GCA. Reynolds et al.’s (2013) analysis
yielded a gloading of .91 for the WISC-IV FSIQ for the total WISC-IV norm sample and a range
of gloadings from .89 to .92 across 1-year divisions of the norm sample; the current study yielded
a range-corrected gloading of .94 for the WISC-IV FSIQ. Furthermore, these results are consistent
with Schneider’s (2013) analysis of the WISC-IV FSIQ that produced an IQ gloading of .93, which
is almost identical to our finding. It appears that there is little evidence to support the claim that
using subtest gloadings from a single intelligence test to estimate the IQ gloadings for the same
test produces biased results (see Introduction).
Because hierarchical omega values (like those produced by Reynolds et al., 2013) and related
statistics cannot accurately represent the gsaturation of the WJ III GIA-Std due to it employing
weighting of subtests, this study targeted this IQ to obtain its gloading. Across the six IQ gloadings
produced in this study, the WJ III GIA-Std produced the highest range-corrected gloading (.95) in
one analysis and tied the WISC-IV FSIQ for second highest in another analysis (.94). Although these
values are not substantially greater than those from the IQs from the DAS-II, DAS, and WISC-III
found in this study, their slightly higher magnitude may be due to the WJ III’s weighting procedures
in which subtests contribute to the GIA-Std at levels proportional to their gloadings in the same
manner as factor scores do (Schneider, 2013).
Nevertheless, based on our findings of minimal differences in IQ gloadings across intelligence
tests and similar findings from Reynolds et al. (2013), it is unclear exactly what characteristics lead
to higher and lower IQ gloadings across the intelligence tests we targeted. Across the small body
of research to which this study belongs, the outlier across IQ gloadings is that from the CAS Full
Scale score (.79 produced by Keith et al., 2001). It is clear that this lower gloading for the CAS
Full Scale score was not due to the method employed by Keith et al. (2001) because our findings,
stemming from use of Keith et al.’s (2001) method, are so similar to Reynolds et al. (2013). As
noted by Keith et al. (2001), the CAS subtests contributing to its Full Scale score were not designed
to target psychometric gand have demonstrated lower gloadings (see Canivez, 2011) than those
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
10 Farmer et al.
subtests employed in this study and the Reynolds et al. (2013) study. More systematic research is
needed to understand the reasons for differences in IQ gloadings.
Specificity values represent the percentage of variance attributable to stratum II and stratum
I abilities independent of psychometric g. In this study, these values represented 7–19% of the IQ
score variance (M=11%) and indicate that, in addition to measurement error, IQs are influenced
by abilities that are not psychometric g. This range of specificity values is very similar to the range
of variance in the DAS-II, KABC-II, and WISC–IV IQs attributed to stratum II abilities (5–12%)
by Reynolds et al. (2013) and the percentage of variance in the WISC–IV FSIQ (16%) attributable
to stratum II and stratum I abilities as well as WISC-IV subtests by Schneider (2013). Although the
percentage of variance attributed to specific (non-g) ability influences is relatively small (McGrew &
Flanagan, 1998), and the variance in IQs attributable to psychometric gis almost 7 to more than 13
times greater than that attributable to more specific abilities, these specificity values are nontrivial.
Specificity evident in IQs weakens their construct validity and likely contributes to lower score
exchangeability across IQs (Floyd et al., 2008). The presence of these specific ability influences
on IQs represents a sobering reality across measurement from a hierarchical ability framework: no
cognitive ability can be indexed with perfect accuracy without construct-irrelevant influences from
another stratum (Carroll, 1993; Gustafsson, 2002).
Limitations
There are some limitations associated with our samples and our methodology that should
be addressed. Although participants from Samples 2 and 3 were randomly selected from larger
populations of children, sampling was geographically limited and somewhat age-restricted and
range-restricted, reducing the generalization of our findings. Future researchers should employ
larger and representative samples of children and adolescents, as did Reynolds et al. (2013). In terms
of methodology, although the correlation for each pair of general factors modeled from data from
each sample were effectively perfectly correlated, two g-factor correlations were not exactly 1.0
(.95 and .97, see Floyd et al., 2013). This imprecision may have affected the resultant gloadings
reported in this study. Finally, gloadings change as a function of g(Detterman & Daniel, 1989). It
has been shown that the accuracy of IQs in measuring gdecreases as gincreases (Reynolds, 2013;
Tucker-Drob, 2009), but we did not test for these effects.
Implications
Our results have implications for psychologists’ test use in schools, clinics, and employment
settings. First, results of this study show that IQs are very strong and reasonably accurate measures
of psychometric gacross children and adolescents. Thus, based on prior research, all IQs targeted
in this study should be expected to reflect the ability to learn new information, grasp concepts,
draw distinctions, and infer meaning from experience and to predict key social and health-related
outcomes later in adulthood (see Deary & Batty, 2011). Thus, evidence-based practices should
include IQs in some situations. For example, in school contexts, IQs provide a benchmark against
which to compare academic achievement in the present and near future, and knowledge of IQs may
be useful for matching children and adolescents who are experiencing educational difficulties to
their optimal learning environments (Frisby, 2013; Jensen, 1998a).
Second, the results of this study indicate that psychologists should be aware that IQs are not
perfectly accurate measures of psychometric g. They are affected not only by a random measurement
error—as are all measurements—but also by lower stratum abilities to some extent. In sum, IQs
are multidimensional like every other test score from the perspective of three-stratum theory and
CHC theory. On one hand, multidimensionality (and thus, lessened accuracy) weakens the construct
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 11
validity of IQs, if gis considered a psychometrically unitary construct (Gottfredson, 2009). On
the other hand, this multidimensionality may enhance the predictive validity of IQs because lower
stratum abilities affecting IQs (as evident in specificity estimates) may add to the prediction of key
outcomes, if these lower stratum abilities are uniquely correlated with these outcomes (Schneider,
2013).
Test developers and others targeting psychometric gin their measurement should consider ways
to control for or reduce the effects of construct-irrelevant influences from lower stratum abilities
on IQs. For example, factors scores reflecting psychometric gappear to reduce the effects of these
confounding influences (see Schneider, 2013). This conclusion may be supported by the finding that
the WJ III GIA-Std produced some of the highest IQ gloadings in this study as it is calculated
based on methods similar to those used in calculating factor scores. Recent research by Schneider
(2013) to produce factor scores from traditional intelligence test scores indicates that this goal may
be nearing actualization in the practice of psychology. Furthermore, to increase confidence in inter-
preting intelligence tests scores, validity-based confidence intervals using gsaturation values (viz.,
hierarchical omega values or squared gloadings) can be calculated and plotted around IQs rather than
the traditional, true-score confidence intervals obtained, in part, from reliability coefficients. These
validity-based confidence intervals are described in detail by Schneider and applied in an example in
Reynolds et al. (2013). Finally, from a pragmatic perspective, researchers should investigate how to
employ both measurement of psychometric gas well as relevant lower stratum abilities to maximize
the predictive power of IQs and other aptitude scores.
CONCLUSIONS
Intelligence tests produce measures of one of the most explanatory variables in all of the social
sciences, psychometric g, and IQs represent this construct more than reasonably well—but not
perfectly. Obtaining IQs for individual children and adolescents, while engaging in special education
eligibility determination and diagnosis of common childhood conditions, continues to be a common
practice. Based on more than a century of research examining the correlates of psychometric g,
considering the IQs of individual children and adolescents may be useful when seeking answers
to their academic problems and prescribing the most effective instructional interventions. School
psychologists and other professionals should engage in evidence-based interpretations of IQs, and
researchers should continue to evaluate the extent to which interpretations of other scores and score
profiles from intelligence tests have a sufficient evidence base to support their application in practice.
REFERENCES
American Association on Intellectual and Developmental Disabilities. (2010). Mental retardation: Definition, classification,
and systems of supports (11th ed.). Washington, DC: Author.
American Educational Research Association, American Psychological Association, & National Council on Measurement
in E7ducation. (1999). Standards for educational and psychological testing. Washington, DC: American Educational
Research Association.
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC:
Author.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48,
5 – 37.
Canivez, G. L. (2011). Hierarchical structure of the Cognitive Assessment System: Variance partitions from the Schmid-
Leiman (1957) procedure. School Psychology Quarterly, 26, 305 – 317.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge University
Press.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral
sciences (3rd ed.). Mahwah, NJ: Erlbaum.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
12 Farmer et al.
Deary, I. J., & Batty, G. D. (2011). Intelligence as a predictor of illness, health, and death. In R. J. Sternberg & S. B. Kaufman
(Eds.), The Cambridge handbook of intelligence (pp. 683 – 707). Cambridge, UK: Cambridge University Press.
Detterman, D. K., & Daniel, M. H. (1989). Correlations of mental tests with each other and with cognitive variables are
highest for low-IQ groups. Intelligence, 13, 349 – 359.
Elliott, C. (1990). Differential Ability Scales. San Antonio, TX: Psychological Corporation.
Elliott, C. (2007). Differential Ability Scales, Second Edition. San Antonio, TX: Psychological Corporation.
Floyd, R. G., Bergeron, R., McCormack, A. C., Anderson, J. L., & Hargrove–Owens, G. L. (2005). Are Cattell–Horn–Carroll
broad ability composite scores exchangeable across batteries? School Psychology Review, 34, 329 – 357.
Floyd, R. G., Clark, M. H., & Shadish, W. R. (2008). The exchangeability of IQs: Implications for professional psychology.
Professional Psychology: Research and Practice, 39, 414 – 423.
Floyd, R. G., McGrew, K. S., Barry, A., Rafael, F. A., & Rogers, J. (2009). General and specific effects on Cattell–Horn–
Carroll broad ability composites: Analysis of the Woodcock–Johnson III Normative Update CHC factor clusters across
development. School Psychology Review, 38, 249– 265.
Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (2013). Are the general factors from different child and
adolescent intelligence tests the same? Results from a five-sample, six-test analysis. School Psychology Review, 42(4).
Q4
Frisby, C. L. (2013). Meeting the psychoeducational needs of minority students: Evidence-based guidelines for school
psychologists and other school personnel. Hoboken, NJ: Wiley.
Gottfredson, L. S. (2009). Logical fallacies used to dismiss the evidence on intelligence testing. In R. Phelps (Ed.), Cor-
recting fallacies about educational and psychological testing (pp. 11 65). Washington, DC: American Psychological
Association.
Gustafsson, J.-E. (2002). Measurement from a hierarchical point of view. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.),
The role of constructs in psychological and educational measurement (pp. 73 –96). Mahwah, NJ: Erlbaum.
Hunt, E. (2011). Human intelligence. Cambridge, UK: Cambridge University Press.
Individuals with Disabilities Education Improvement Act of 2004. (2004). 20 U.S.C. § 1401.
Q5
Jensen, A. R. (1998a). The gfactor and the design of education. In R. J. Sternberg & W. M. Williams (Eds.), Intelligence,
instruction, and assessment: Theory into practice (pp. 111 –131). Mahwah, NJ: Erlbaum.
Jensen, A. (1998b). The gfactor: The science of mental ability. Westport, CT: Praeger.
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children, Second Edition. Circle Pines, MN:
American Guidance Service.
Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the Cognitive Assessment System (CAS) measure? Joint
confirmatory factor analysis of the CAS and the Woodcock–Johnson Tests of Cognitive Ability (3rd Edition). School
Psychology Review, 30, 89 – 119.
Kranzler, J. H., & Floyd, R. G. (2013). Assessing intelligence in children and adolescents: A practical guide. New York, NY:
Guilford Press.
Little, J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American
Statistical Association, 83, 1198 – 1202.
Maynard, J. L., Floyd, R. G., Acklie, T.J., & Houston, L. (2011). General factor loadings and specific effects of the Differential
Ability Scales, Second Edition composites. School Psychology Quarterly, 26, 108 –118.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk reference (ITDR): Gf-Gc cross-battery assessment.
Boston, MA: Allyn & Bacon.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock–Johnson III technical manual. Itasca, IL: Riverside Publishing.
Muth´
en,L.K.,&Muth
´
en, B. O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muth´
en&Muth´
en.
Naglieri, J. A., & Das, J. P. (1997). Cognitive Assessment System. Itasca, IL: Riverside Publishing.
Phelps, L., McGrew, K. S., Knopik, S. N., & Ford, L. A. (2005). The general (gfactor), broad, and narrow CHC stratum
characteristics of the WJ III and WISC-III tests: A confirmatory cross-battery investigation.School Psychology Quarterly,
20, 66 – 88.
Reynolds, M. R. (2013). Interpreting intelligence test composite scores in light of Spearman’s law of diminishing returns.
School Psychology Quarterly, 28, 63 – 76.
Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013). How well is psychometric gindexedby global composites? Evidence
from three popular intelligence tests. Psychological Assessment (advance online publication). doi:10.1037/a0034102
Q6
Reynolds, M. R., Keith, T. Z., Fine, J. G., Fisher, M. E., & Low, J. (2007). Confirmatory factor structure of the Kaufman
Assessment Battery for Children—Second Edition: Consistency with Cattell–Horn–Carroll Theory. School Psychology
Quarterly, 22, 511 – 539.
Sanders, S., McIntosh, D. A., Dunham, M., Rothlisberg, B. A., & Finch, H. (2007). Joint confirmatory factor analysis of the
Differential Ability Scales and the Woodcock–Johnson Tests of Cognitive Abilities–Third Edition. Psychology in the
Schools, 44, 119 – 138.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
IQs and Psychometric g 13
Schermelleh-Engel, K., Moosbrugger, H., & M¨
uller, H. (2003). Evaluating the fit of structural equation models: Tests of
significance and descriptive goodness-of-fit measures. Methods of Psychological Research Online, 8, 23 – 74.
Schneider, W. J. (2013). What if we took our models seriously? Estimating latent scores in individuals. Journal of Psychoe-
ducational Assessment, 31, 186 – 201.
Schneider, W. J., & McGrew, K. S. (2012). The Cattell–Horn–Carroll model of intelligence. In D. P. Flanagan & P. Harrison
(Eds.), Contemporary intellectual assessment (3rd ed., pp. 99 –144). New York: Guilford Press.
Spearman, C. (1927). The abilities of man: Their nature and measurement. New York, NY: Macmillan.
Tucker-Drob, E. M. (2009). Differentiation of cognitive abilities across the lifespan. Developmental Psychology, 45, 1097
1118.
Urbina, S. (2011). Test of intelligence. In R. J. Sternberg & S. B. Kaufman (Eds.), The Cambridge handbook of intelligence
(pp. 20 – 38). Cambridge, UK: Cambridge University Press.
Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T. (2006). Factor structure of the WechslerIntelligence
Scale for Children-Fourth Edition among referred students. Educational and Psychological Measurement, 66, 975 – 983.
Wechsler, D. (1991). The Wechsler Intelligence Scale for Children–Third Edition. San Antonio, TX: Psychological Corpo-
ration.
Wechsler, D. (2003). The Wechsler Intelligence Scale for Children–Fourth Edition. San Antonio, TX: Psychological Corpo-
ration.
Woodcock, R. W. (1990). The theoretical foundations of the WJ-R measures of cognitive ability. Journal of Psychoeducational
Assessment, 8, 231 – 258.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III Tests of Cognitive Abilities. Itasca, IL:
Riverside Publishing.
Psychology in the Schools DOI: 10.1002/pits
pits21785 W3G-pits.cls June 23, 2014 21:40
Queries
Q1: Author: Please check levels of all the section headings as typeset for correctness.
Q2: Author: Please check the inserted year of publication “2013” in Reynolds et al. throughout
the text for correctness.
Q3: Author: Please check the inserted year of publication “2001” in Keith et al. throughout
the text for correctness.
Q4: Author: Please provide page range in Floyd et al. (2013).
Q5: Author: Please check Individuals with Disabilities Education Improvement Act of 2004
(2004) as typeset for correctness.
Q6: Author: Please provide volume number and page range in Reynolds et al. (2013), and also
check the same reference as typeset for correctness.
Q1: Please check levels of all section headings as typeset for correctness.
RLF: All headings appear to be correct throughout. No changes are necessary.
Q2: Please check the inserted year of publication “2013” in Reynolds et al. throughout the text for correctness.
RLF: The dates associated with various Reynolds et al. references throughout the manuscript appear to be correct. No
changes are necessary.
Q3: Please check the inserted year of publication “2001” in Keith et al. throughout the text for correctness.
RLF: The dates associated with Keith et al. references throughout the manuscript appear to be correct. No changes are
necessary.
Q4: Please provide page range in Floyd et al. (2013).
RLF: Thank you for catching this error. The complete citation is listed below:
Q5: Please check Individuals with Disabilities Education Act of 2004 (2004) as typeset for correctness.
RLF: Thank you for catching this error. I have provided the complete citation below, to be used instead of the current
placeholder.
Q6: Please provide volume number and page range in Reynolds et al. (2013), and also check the same reference as
typeset for correctness.
RLF: Thank you for bringing this out-of-date reference to my attention. The up-to-date citation is provided below.
Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (2013). Are the general factors from different child and
adolescent intelligence tests the same? Results from a five-sample, six-test analysis. School Psychology Review, 42,
383-401.
Individuals with Disabilities Education Improvement Act of 2004, Pub L. No. 108-446 (2004)
Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013). How well is psychometric g indexed by global composites?
Evidence from three popular intelligence tests. Psychological Assessment, 25, 1314-1321.
... All modeling was conducted using MPlus version 7.11 (L. K. Muthén & B. O. Muthén, 1998 based on models reported in Farmer et al. (2014). Maximum likelihood estimation with robust standard errors was used to estimate free parameters, with the assumption that data are missing at random (Baraldi & Enders, 2010). ...
... Maximum likelihood estimation with robust standard errors was used to estimate free parameters, with the assumption that data are missing at random (Baraldi & Enders, 2010). Following Keith et al. (2001), Farmer et al. (2014, and M. , the general intelligence composite score from one of the intelligence tests was correlated with the second-order g factor from the other intelligence test to produce the g loading for the composite. Higher-order models including a second-order general factor representing psychometric g were selected over bifactor models for several reasons. ...
... Although the highest composite g loading from this study exceeded the previously reported highest composite g loading only slightly (cf. Farmer et al., 2014), far lower composite g loadings than those previously reported (cf. M. were evidenced. ...
Article
Full-text available
Intelligence tests produce composite scores that are interpreted as indexes of psychometric g. Like all measures, general intelligence composites are not pure representations of their intended construct, so it is important to evaluate the score characteristics that affect accuracy in measurement. In this study, we identified three characteristics of general intelligence composite scores that vary across intelligence tests, including the number, the g loadings, and the heterogeneity of contributing subtests. We created 77 composite scores to test the influence of these characteristics in measuring psychometric g. Internal consistency reliability coefficients and g loadings were calculated for the composites. General intelligence composites most accurately index psychometric using numerous highly g-loaded subtests. Considering confidence intervals, composites stemming from 4 subtests produced scores as highly g loaded as those composites that stem from additional subtests. Discussion focuses on what methods should be use to optimally measure psychometric g and how standards in constructing composites should balance psychometric and practical considerations.
... In addition, we (Farmer, Floyd, Reynolds, & Kranzler, 2014) drew data from three validity studies, in each of which two test batteries were administered to the same sample of children and adolescents. Confirmatory factor analysis was used to create a latent variable representing the construct of psychometric g from one test battery. ...
... Reynolds and colleagues (2013) employed complete norming sample datasets, and we (Farmer et al., 2014) employed much smaller validity study samples including typically developing students. These samples were not "clinical samples," selected on the basis of some condition or diagnosis. ...
... What is a practitioner striving to represent the true constructs of interest to do? Although our results and several recent studies (Farmer, Floyd, Reynolds, and Kranzler, 2014;Reynolds et al., 2013;Schneider, 2013) have demonstrated that non-refined factor scores (a.k.a., SSS) representing psychometric g (i.e., IQs) tend to measure the targeted latent variable extremely well, non-refined factor scores representing broad or other specific abilities do not appear to represent their targeted constructs well. For example, our results and a multitude of others have revealed that their consistent intercorrelations indicate the influence of psychometric g, and one of our preliminary analyses revealed that only approximately one-third of their variance could be attributed to the broad ability they target. ...
... In every case (36 pairs of analyses), the effects of psychometric g were estimated to be higher in the former than in the latter. Thus, it is possible that, due the fact that the non-refined factor score representing psychometric g employed in the hierarchical regression analysis measures variance associated with psychometric g, as well as more specific abilities (Farmer et al., 2014;Schneider, 2013), its predictive power appears to be over-estimated (due to those more specific abilities also relating to the achievement outcomes). Most likely, as a result, the effects of measures of broad abilities were consistently nil when interpreting sr 2 estimates from the hierarchical regression analyses. ...
Article
Full-text available
Prior research examining cognitive ability and academic achievement relations have been based on different theoretical models, have employed both latent variables as well as observed variables, and have used a variety of analytic methods. Not surprisingly, results have been inconsistent across studies. The aims of this study were to (a) examine how relations between psychometric g, Cattell–Horn–Carroll (CHC) broad abilities, and academic achievement differ across higher-order and bifactor models; (b) examine how well various types of observed scores corresponded with latent variables; and (c) compare two types of observed scores (i.e., refined and non-refined factor scores) as predictors of academic achievement. Results suggest that cognitive–achievement relations vary across theoretical models and that both types of factor scores tend to correspond well with the models on which they are based. However, orthogonal refined factor scores (derived from a bifactor model) have the advantage of controlling for multicollinearity arising from the measurement of psychometric g across all measures of cognitive abilities. Results indicate that the refined factor scores provide more precise representations of their targeted constructs than non-refined factor scores and maintain close correspondence with the cognitive–achievement relations observed for latent variables. Thus, we argue that orthogonal refined factor scores provide more accurate representations of the relations between CHC broad abilities and achievement outcomes than non-refined scores do. Further , the use of refined factor scores addresses calls for the application of scores based on latent variable models.
... Factor analytic work has shown that intelligence tests measure a number of cognitive abilities (Carroll, 1993;Keith & Reynolds, 2010). IQs yielded from these tests mostly measure psychometric g or, simply, g (Farmer, Floyd, Reynolds, & Kranzler, 2014;Keith, Kranzler, & Flanagan, 2001; M. R. Reynolds, Floyd, & Niileksela, 2013), though g is measured more prominently in low IQs than in high IQs (M. R. Reynolds, 2013;Tucker-Drob, 2009). ...
Article
Intelligence tests and adaptive behavior scales measure vital aspects of human functioning. Assessment of each is a required component in the diagnosis or identification of intellectual disability. The present study investigated the population correlation between intelligence and adaptive behavior using psychometric meta-analysis. The main analysis included 148 samples with 16,468 participants. Following correction for sampling error, measurement error, and range departure, analysis resulted in a population correlation of ρ = .51. The most pertinent moderator analysis indicated that the relation between intelligence and adaptive behavior tended to decrease as IQ increased. The theoretical prevalence of intellectual disability is affected not only by IQ and adaptive behavior cut scores but also by the correlation between the two; thus, these findings inform practice and policy related to eligibility criteria and prevalence of intellectual disability.
... It is given on the IQ metric, where the mean in the standardization/norming sample is transformed to 100 and the standard deviation is transferred to 15 (Petersen et al. 1989). The FSIQ score is correlated with g (Farmer et al. 2014;Floyd et al. 2008Floyd et al. , 2013, but does not measure g. 2 Instead, the FSIQ represents some formative construct whose meaning is tied to a particular instrument (Guyon 2018). ...
Article
Full-text available
Scores derived from intelligence instruments predict many important outcomes in life, so it is not surprising that researchers and clinicians seek out interventions aimed at increasing these scores. Dixon et al. (J Behav Educ, 2019. https://doi.org/10.1007/ s10864-019-09344-7) recently investigated the relation between instruction based on relational frame theory programming and intelligence and found resulted in addition gains. Nonetheless, there are some conceptual and methodological concerns with Dixon and colleagues' study that makes the interpretation of their findings unclear. In this commentary, we address some of these concerns and re-analyze Dixon and colleagues' data. We conclude that while their intervention may have potential, more investigation is needed to support the claim that it can increase intelligence.
... xii). Results of a recent study by Farmer et al. (2014) showed that the overall IQ score on the most widely used tests of intelligence with children and youth is an excellent measure of psychometric g. Not only are widely used tests of intelligence among the most reliable and valid of all psychological measures, but the overall score on these tests is more predictive of many important social outcomes than any other measurable psychological trait independent of IQ, but especially academic achievement (e.g., Gottfredson 2018). ...
Article
Full-text available
The aim of this study was to examine whether the use of a response-to-intervention (RTI) model to identify specific learning disability (SLD) over-identifies children and youth with population-relative (normative) weaknesses in general cognitive ability (IQs < 90). We compared the overall score on the Kaufman Brief Intelligence Test-Second Edition (KBIT-2; Kaufman and Kaufman 2004) for a group of students with SLD (n = 30) who had been identified in an RTI model with a group of same-age peers (n = 249) in general education. Statistically significant differences were observed between the SLD and general education groups, with considerably lower mean scores for the SLD group. Effect sizes of the mean differences were large. Approximately three-fourths (73.3%) of the SLD group had overall scores on the KBIT-2 that were below the mean of the normative sample and almost half (43.3%) had IQ scores that are below 90. In sum, results of this study support Reynolds’ (2009) assertion that use of the RTI model for the identification of SLD over-identifies children and youth with IQs less than 90, thereby fundamentally altering the conceptualization of SLD from the traditional narrow sense (unexpected underachievement) to the broad sense (expected underachievement). A modified hybrid model is presented as an alternative method of SLD identification that addresses the main shortcomings of other current models.
... Or alternatively, a model implied correlation between g and the FSIQ was 0.91 (i.e., square root of 0.82). This correlation has been referred to as the "g loading" of the FSIQ (Farmer, Floyd, Reynolds, & Kranzler, 2014;Jensen, 1998). Those estimates were based on the 10 subtest WISC-IV FSIQ. ...
Article
The purpose of this research was to test the consistency in measurement of Wechsler Intelligence Scale for Children-Fifth Edition (WISC-V; Wechsler, 2014) constructs across the 6 through 16 age span and to understand the constructs measured by the WISC-V. First-order, higher-order, and bifactor confirmatory factor models were used. Results were compared with two recent studies using higher-order and bifactor exploratory factor analysis (Canivez, Watkins, & Dombrowski, 2015; Dombrowski, Canivez, Watkins, & Beaujean, 2015) and two using confirmatory factor analysis (Canivez, Watkins, & Dombrowski, 2016; Chen, Zhang, Raiford, Zhu, & Weiss, 2015). We found evidence of age-invariance for the constructs measured by the WISC-V. Further, both g and five distinct broad abilities (Verbal Comprehension, Visual Spatial Ability, Fluid Reasoning, Working Memory, and Processing Speed) were needed to explain the covariances among WISC-V subtests, although Fluid Reasoning was nearly equivalent to g. These findings were consistent whether a higher-order or a bifactor hierarchical model was used, but they were somewhat inconsistent with factor analyses from the prior studies. We found a correlation between Fluid Reasoning and Visual Spatial factors beyond a general factor (g) and that Arithmetic was primarily a direct indicator of g. Composite scores from the WISC-V correlated well with their corresponding underlying factors. For those concerned about the fewer numbers of subtests in the Full Scale IQ, the model implied relation between g and the FSIQ was very strong.
... Bias in the estimation of g it is an important topic. There is research finding large overlaps in the g factor derived from different batteries (e.g., Floyd, Reynolds, Farmer, & Kranzler, 2013;Johnson, Nijenhuis, & Bouchard, 2008;Keith, Kranzler, & Flanagan, 2001), or between IQ scores derived from one battery and the general factor derived from another (Farmer, Floyd, Reynolds, & Kranzler, 2014). Indeed, Salthouse (2014) found large correspondence between the WAIS-IV and the Virginia Cognitive Aging Project battery general factors. ...
Article
IQ summary scores may not involve equivalent psychological meaning for different educational levels. Ultimately, this relates to the distinction between constructs and measurements. Here, we explore this issue studying the standardization of the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) for Spain. A representative sample of 743 individuals (374 females and 369 males) who completed the 15 subtests comprising this intelligence battery was considered. We analyzed (a) the best latent factor structure for modeling WAIS-IV subtest performance, (b) measurement invariance across educational levels, and (c) the relationships of educational level/attainment with latent factors, Full Scale IQ (FSIQ), and index factor scores. These were the main findings: (a) the bifactor model provides the best fit; (b) there is partial invariance, and therefore it is concluded that the battery is a proper measure of the constructs of interest for the educational levels analyzed (nevertheless, the relevance of g decreases at high educational levels); (c) at the latent level, g and, to a lesser extent, Verbal Comprehension and Processing Speed, are positively related to educational level/attainment; (d) despite the previous finding, we find that Verbal Comprehension and Processing Speed factor index scores have reduced incremental validity beyond FSIQ; and (e) FSIQ is a slightly biased measure of g. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Article
This study examined the factor structure of the Detroit Tests of Learning Abilities, Fifth Edition (DTLA‐5) using principal axis factoring, multiple factor extraction criteria, and the Schmid–Leiman orthogonalization procedures not utilized by test publishers. Results suggest that the publisher's six‐factor structure model was over factored. Although there is support for five group factors and a hierarchical general factor that correspond with some of the test's most reliable composites, there was no evidence warranting a distinction between the test's hypothesized Reasoning Abilities and Processing Abilities domains or separation of Acquired Knowledge and Verbal Comprehension subdomains. Some test users may prefer to interpret the four DTLA‐5 subdomains that correspond to some of our basic findings in a profile indicating intra‐individual strengths and weaknesses. Based on our hierarchical factor analysis and omega statistics, we believe that the DTLA‐5 should be interpreted as a measure of psychometric g and not used to examine intra‐individual strengths and weaknesses for most test‐takers.
Chapter
This chapter begins with an overview of the eligibility and diagnostic criteria for intellectual disability (ID), continues to discuss test and score properties that are important when assessing for ID, follows with an evaluation of the Woodcock–Johnson IV (WJ IV) Tests of Cognitive Abilities (COG) in terms of it addressing these properties, and ends with a case study. Evidence supporting the use and interpretation of the General Intellectual Ability and Gf–Gc Composite from the WJ IV COG is highlighted and evaluated, and applications of these scores to practice are presented.
Article
Full-text available
Many school psychologists focus their interpretation on composite scores from intelligence test batteries designed to measure the broad abilities from the Cattell-Horn-Carroll theory. The purpose of this study was to investigate the general factor loadings and specificity of the broad ability composite scores from one such intelligence test battery, the Woodcock-Johnson III Tests of Cognitive Abilities Normative Update (Woodcock, McGrew, Schrank, &Mather, 2007). Results from samples beginning at age 4 and continuing through age 60 indicate that Comprehension-Knowledge, Long-Term Retrieval, and Fluid Reasoning appear to be primarily measures of the general factor at many ages. In contrast, Visual-Spatial Thinking, Auditory Processing and Processing Speed appear to be primarily measures of specific abilities at most ages. We offer suggestions for considering both the general factor and specific abilities when interpreting Cattell-Horn-Carroll broad ability composite scores.
Article
Full-text available
Results of recent research by Kranzler and Keith (1999) raised important questions concerning the construct validity of the Cognitive Assessment System (CAS; Naglieri & Das, 1997), a new test of intelligence based on the planning, attention, simultaneous, and sequential (PASS) processes theory of human cognition. Their results indicated that the CAS lacks structural fidelity, leading them to hypothesize that the CAS Scales are better understood from the perspective of Cattell-Horn-Carroll (CHC) theory as measures of psychometric g, processing speed, short-term memory span, and fluid intelligence/broad visualization. To further examine the constructs measured by the CAS, this study reports the results of the first joint confirmatory factor analysis (CFA) of the CAS and a test of intelligence designed to measure the broad cognitive abilities of CHC theory - the Woodcock-Johnson Tests of Cognitive Abilities-3rd Edition (WJ III; Woodcock, McGrew, & Mather, 2001). In this study, 155 general education students between 8 and 11 years of age (M = 9.81) were administered the CAS and the WJ III. A series of joint CFA models was examined from both the PASS and the CHC theoretical perspectives to determine the nature of the constructs measured by the CAS. Results of these analyses do not support the construct validity of the CAS as a measure of the PASS processes. These results, therefore, question the utility of the CAS in practical settings for differential diagnosis and intervention planning. Moreover, results of this study and other independent investigations of the factor structure of preliminary batteries of PASS tasks and the CAS challenge the viability of the PASS model as a theory of individual differences in intelligence.
Article
A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the variables in the data set. One way of assessing this is to compare the means of recorded values of each variable between groups defined by whether other variables in the data set are missing or not. Although informative, this procedure yields potentially many correlated statistics for testing MCAR, resulting in multiple-comparison problems. This article proposes a single global test statistic for MCAR that uses all of the available data. The asymptotic null distribution is given, and the small-sample null distribution is derived for multivariate normal data with a monotone pattern of missing data. The test reduces to a standard t test when the data are bivariate with missing data confined to a single variable. A limited simulation study of empirical sizes for the test applied to normal and nonnormal data suggests that the test is conservative for small samples.