Assessment Service Bulletin Number 12
Use of the Woodcock-Johnson III NU Tests of Cognitive Abilities and Tests
of Achievement with Canadian Populations
University of British Columbia
Kevin S. McGrew
The use of U.S.-normed tests with Canadian populations is common practice.
Few individual batteries of cognitive and achievement abilities have reported
independent validation with Canadian populations. In a random sample of 310
school-age Canadian students, the use of the Woodcock-Johnson III Normative
Update (WJ III NU) Tests of Cognitive Abilities and Tests of Achievement with
a Canadian sample is examined. Results were compared with a matched sample of
U.S. subjects selected from the WJ III NU standardization sample using WJ III NU
norms. While some minor score differences are reported across the two samples, the
study ﬁndings generally support the use of the U.S.-based WJ III NU norms with
Canadian school-age populations.
The authors would like to thank Fredrick Schrank, Richard W. Woodcock, Mary Ruef, Peter Cameron, and
Emily Brooks for their assistance in the preparation of this manuscript, Krista Smart for her assistance with
the data collection, and David Dailey for the conversion of the original WJ III norm-based scores to scores
based on the WJ III Normative Update (NU).
Copyright © 2010 by The Riverside Publishing Company. All rights reserved. No part of this work may be
reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying
and recording or by any information storage or retrieval system without the prior written permission of
The Riverside Publishing Company unless such copying is expressly permitted by federal copyright law.
Address inquiries to Permissions, The Riverside Publishing Company, 3800 Golf Road, Suite 100, Rolling
Meadows, IL 60008-4015.
Printed in the United States of America.
Woodcock-Johnson®, Woodcock-Muñoz, WJ III, the WJ III logo, and the WJ III NU logo are trademarks or
registered trademarks of Houghton Mifflin Harcourt Publishing Company.
DAS and design (logo), Differential Ability Scales and design, KTEA, KeyMath3, WAIS®, Wechsler,
Wechsler Adult Intelligence Scale®, Wechsler Individual Achievement Test®, Wechsler Intelligence Scale
for Children®, WIAT®, WISC®, and WPPSI® are trademarks or registered trademarks of NCS Pearson,
■ To cite this document, use:
Ford, L., Swart, S., Negreiros, J., Lacroix, S., & McGrew, K. S. (2010). Use of the Woodcock-Johnson
III NU Tests of Cognitive Abilities and Tests of Achievement with Canadian Populations (Woodcock-
Johnson III Assessment Service Bulletin No. 12). Rolling Meadows, IL: Riverside Publishing.
For technical information, please call 800.323.9540 or visit our website at www.woodcock-johnson.com
1 2 3 4 5 6 7 8 9 10-XBS-12 11 10
Use of the Woodcock-Johnson III NU Tests of
Cognitive Abilities and Tests of Achievement
with Canadian Populations
In the past decade several widely used comprehensive batteries of cognitive and
achievement abilities have undergone significant revision. These revisions include the
Differential Ability Scales–Second Edition (DAS-II) (Elliott, 2007), Kaufman Assessment
Battery for Children–Second Edition (KABC-II) (Kaufman & Kaufman, 2004a), Kaufman
Tests of Educational Achievement–Second Edition (KTEATM-II) (Kaufman & Kaufman
2004b), Stanford-Binet Intelligence Scales, Fifth Edition (SB5) (Roid, 2003), Wechsler
Individual Achievement Test–Third Edition (WIAT-III) (Wechsler, 2009), Wechsler
Intelligence Scale for Children–Fourth Edition (WISC-IV) (Wechsler, 2003), Wechsler
Adult Intelligence Scale–Fourth Edition (WAIS-IV) (Wechsler, 2008a), Wechsler Preschool
and Primary Scale of Intelligence–Third Edition (WPPSI-III) (Wechsler, 2002), Woodcock-
Johnson III (WJ III) (Woodcock, McGrew, & Mather, 2001a), and the subsequent WJ III
Normative Update (WJ III NU) (Woodcock, McGrew, Schrank, & Mather, 2001, 2007).
All these test batteries have been well standardized in the United States and meet or
exceed most standards articulated in the Standards for Educational and Psychological
Testing (AERA/APA/NCME, 1999) and the International Test Commission (ITC, 2000).
While a number of special purpose or special population studies are reported on some
batteries of cognitive and achievement abilities with Canadian samples (Beal, Dumont,
Branche, & Cruse, 1996; Beal, Dumont, Cruse, & Branche, 1996; Iverson, Lange, &
Viljoen, 2006; Mark, Beal, & Dumont, 1998; Reddon, Whippler, & Reddon, 2007;
Saklofske, Tulsky, Wilkins, & Weiss, 2003; Saklofske, Hildebrand, Reynolds, & Wilson,
1998; Weiss, Saklofski, Prifitera, Chen, & Hildebrand, 1999), surprisingly only three of
the major cognitive measures, the WAIS-IV, WISC-IV, and WPPSI-III, have completed
norming studies with a Canadian population—the Wechsler Adult Intelligence Scale–
Fourth Edition , Canadian (WAIS-IVCND)
(Wechsler, 2008b); the Wechsler Intelligence Scale
for Children–Fourth Edition, Canadian (WISC-IVCND) (Wechsler, 2004); and the Wechsler
Preschool and Primary Scale of Intelligence–Third Edition (WPPSI-IIICND) (Wechsler,
2003).1 In the case of individually administered measures of academic achievement,
the KeyMath3TM Diagnostic Assessment (KeyMath3) (Connolly, 2007) and the Wechsler
Individual Achievement Test–Second Edition (WIAT-II) (Wechsler, 2001) are the only
widely used U.S. achievement tests to have undergone standardization with a Canadian
sample—the KeyMath3 Canadian Edition (KeyMath3CND) (Connolly, 2008) and the
Wechsler Individual Achievement Test–Second Edition, Canadian (WIAT-IICND) (Wechsler,
1Standardization studies conducted with the WISC-III and WAIS-III found signiﬁ cant differences in the
performance of Canadian and U.S. populations, and separate norms were published. With the subse-
quent revisions of the Wechsler scales—WPPSI-III, WISC-IV, and WAIS-IV—separate standardizations
and norms have been published with Canadian samples.
2A standardization of the WIAT-III is currently underway with publication anticipated in late 2010.
The Standards on Educational and Psychological Testing (AERA/APA/NCME, 1999)
Standards 7.1 to 7.12 address issues of fairness in testing and test use. Standard 7.1
When credible research reports that test scores differ in meaning across examinee
subgroups for the type of test in question, then to the extent feasible, the same
forms of validity evidence collected for the examinee population as a whole should
also be collected for each relevant subgroup. (p. 80).
Further, in the Guidelines for Educational and Psychological Testing, the Canadian
Psychological Association (CPA, 1996) indicates that the user of a test developed for
something other than local only use must understand the applicability of the test to
different groups. Norms and summary information about group differences is important,
and test users should be aware of situations when the norms are less appropriate for one
group than another. When a test user has reason to question the use of the norms for a
specific population, it is the user's responsibility to further examine their appropriateness
(Joint Advisory Commission, 1993).
The question of whether U.S. norms are appropriate for use with Canadian
populations is not new. While a number of U.S.-normed batteries of cognitive and
achievement abilities are used extensively throughout Canada, surprisingly few
comprehensive validation and/or standardization studies are reported in the literature.
Of the limited number of published studies on the use of U.S. cognitive and achievement
test norms used in Canada, the majority published to date have examined differences
in the various versions and editions of the Wechsler scales standardized in the United
States and administered to Canadian populations. All have pointed to significant score
differences across the Canadian and U.S. populations with Canadian samples scoring,
on average, 2 to 5 standard score points higher than the U.S. sample, depending on the
factor or subtest (Hildebrand & Saklofske, 1996; Wechsler, 1996; 2001b; 2003; 2004;
2008b). These findings have suggested the need for Canadian standardization of the
Given the widespread use of many cognitive and achievement batteries normed in
the United States with Canadian populations for diagnosis, treatment, and program
planning, more research is needed. There is a need to determine if the U.S. norms are
“transportable” and applicable to Canadian populations and, if not, whether additional
norming with a Canadian sample is needed, and/or if special adjustments are necessary
to the norms for tests standardized in the United States to make them more applicable for
The primary purpose of the present study was to examine the comparability of WJ III
NU (Woodcock, McGrew, Schrank, & Mather, 2001, 2007) cognitive and achievement
scores in matched school-age Canadian/U.S. samples. The following research questions
guided the investigation: a) Are there significant differences on the WJ III NU Tests of
Cognitive Abilities test and cluster scores between matched Canadian and U.S. samples?,
b) Are there significant distribution differences on the WJ III NU Tests of Cognitive
Abilities test and cluster scores between matched Canadian and U.S. samples?, c) Are
there significant mean differences on the WJ III NU Tests of Achievement test and cluster
scores between matched Canadian and U.S. samples?, d) Are there significant distribution
differences on the WJ III NU Tests of Achievement test and cluster scores between matched
Canadian and U.S. samples?
Overview of the WJ III NU
The WJ III (Woodcock, McGrew, & Mather, 2001) is the most recent edition of the
Woodcock-Johnson Psycho-Educational Battery (WJ) originally published in 1977
(Woodcock & Johnson, 1977). The WJ III is based on the Cattell-Horn-Carroll (CHC)
theory of cognitive abilities (Schrank, Flanagan, Woodcock, & Mascola, 2002). The
WJ III was published in 2001 and the norms were “freshened” in the WJ III NU in
2007. Briefly, the original 2001 WJ III norms were based on year 2000 U.S. Census
projections available at the time the standardization of the WJ III commenced (1996).
Census projections are estimates of the population for future dates and are subsequently
replaced by census statistics. The 2000 census final statistics produced a somewhat
different description of the U.S. population than was available from the last projections
issued in 1996. The WJ III NU updated the WJ III norms to reflect the final U.S. 2000
census statistics. In addition, innovative bootstrap resampling methods were used in the
development of the WJ III NU norms—methods not fully developed at the time of the
2001 publication of the WJ III (see McGrew, Dailey, & Schrank, 2007 for details).
McGrew (1997) was the first to synthesize Cattell-Horn’s Gf-Gc and Carroll’s Three-
Stratum models in an attempt to provide a comprehensive integrative framework for
interpreting human cognitive abilities. The result is the CHC (Cattell-Horn-Carroll)
theory, which serves as the theoretical blueprint for the WJ III (McGrew & Woodcock,
2001a). The latest updates of contemporary CHC theory can be found in McGrew (2005)
and McGrew (2009). The theoretical underpinnings of the WJ III are different from many
other measures of cognitive ability and achievement (e.g., WISC-IV, WAIT-III, DAS-II).
In order to appropriately interpret the WJ III, an understanding of the CHC model is
The CHC model applies Carroll’s (1993) Tri-Stratum theory of intelligence, organizing
cognitive abilities and Cattell-Horn's Gf-Gc theory into an integrated three-level
hierarchy. Carroll (1993) identified over 69 specific, or narrow cognitive abilities, at
Stratum I. The narrow abilities are subsumed under the broad (Stratum II) cognitive
ability domains of Fluid Intelligence or Reasoning (Gf), Crystallized Intelligence or
Comprehension-Knowledge (Gc), Broad Visual-Spatial Processing (Gv), Broad Auditory
Processing (Gu), and Processing Speed (Gs). At the apex of his model (Stratum III),
Carroll identified a higher-order factor above the broad factors, which he interpreted
as General Intelligence, or g. (For a more extensive discussion of the CHC model and
Carroll’s Tri-Stratum theory, see Carroll  and McGrew [2005, 2009].) In the WJ
III COG, clusters represent the broad abilities (e.g., Gf, Gc, Gv) and the individual tests
(e.g., Verbal Comprehension, Retrieval Fluency) are intended to represent the narrow
The WJ III NU (Woodcock, McGrew, Schrank, & Mather, 2001, 2007) is a
comprehensive measure of cognitive abilities and achievement organized into three
distinct, co-normed test batteries: The Woodcock-Johnson III NU Tests of Cognitive Abilities
(WJ III NU COG); the Woodcock-Johnson III Diagnostic Supplement to the Tests of Cognitive
Abilities (WJ III DS) and the Woodcock-Johnson III NU Tests of Achievement (WJ III NU
ACH). The WJ III is designed to measure a wide array of cognitive, oral language, and
academic achievement abilities for individuals from preschool (2 years) through the
geriatric (90+ years) age levels.
Each battery is organized into a Standard and Extended battery that can be used
independently, together, or in conjunction with other tests (including tests from the WJ
III DS). In addition to the CHC clusters, the complete set of 31 WJ III NU cognitive tests
(20 in the original WJ III cognitive battery plus 11 in the WJ III DS) are also organized
by three broader categories related to cognitive performance (Cognitive Performance
Model [CPM] Clusters): Verbal Ability, Thinking Ability, and Cognitive Efficiency
and five clinical clusters: Broad Attention, Executive Functioning, Working Memory,
Cognitive Efficiency, and Phonemic Awareness. The 22 achievement tests are organized
by curricular area (reading, mathematics, written language, and academic knowledge)
and oral language and by clusters within these areas (e.g., Basic Reading Skills, Math
Reasoning), with additional groupings for special purpose clusters (e.g., Academic Skills,
Phoneme/Grapheme Knowledge). These batteries have particular diagnostic utility
because they encourage examiners to be selective in their testing and select different
evaluation tools based on specific referrals.
Like the earlier versions of the WJ, the WJ III has been viewed as state of the art
in the individual measurement of cognitive abilities and achievement (Cizek, 2003;
Cummings, 1995; Hicks & Bolan, 1996; Lee & Stefany, 1995; Standoval, 2003). The
WJ has long been one of the most widely used individually administered academic
achievement batteries. Furthermore, the WJ III COG is being taught as a primary
measure of intelligence in over one third of all school psychology training programs
across the United States and Canada (Braden & Alfonzo, 2003; Ford, Percy, & Negreiros,
2010). Its strong psychometric properties, the co-normed tests of cognitive abilities and
achievement, its utility for use with individuals throughout the lifespan, and features
that assist in understanding unique processing strengths and weakness contribute to its
frequent use in Canada. The widespread use of the WJ III NU in the absence of norm
transportability research heightens the importance of the current investigation.
The study is comprised of two matched samples, one strategically sampled from Canada
and a matched sample of WJ III NU standardization subjects obtained from the United
States. This section describes the sample selection and comparison procedures.
The Canadian sample consisted of 341 English-speaking school-aged children from
three geographical areas (Western Canada, Central Canada, and Atlantic Canada). The
sampling procedures mirrored those used in the standardization of the WJ III (McGrew
& Woodcock, 2001a). A three-stage procedure of sampling communities, then schools,
and finally subjects was used to identify and select a sample that would be broadly
representative of the English-speaking Canadian school-age population. Communities
were sampled by census region and type of community as defined by Statistics Canada
(1996). Participants were obtained from six Provinces (British Columbia, Saskatchewan,
Manitoba, Ontario, Prince Edward Island, and Newfoundland). Communities were
targeted for selection within each of the three geographical areas based on geographic
distribution, size of community, and socioeconomic status (SES) characteristics (high,
average, and low SES communities). School board participation was then solicited
from the targeted communities. When school board participation was not obtained for
a targeted community, a similarly matched community from the same geographic area
was identified and school board participation was subsequently sought. In summary,
final inclusion of a community in the sample reflects: a) a targeted community based on
geographical area, community size, and community SES and b) school board agreement
to participate in the study.
In small communities, testing was conducted in all schools. In larger communities,
testing was conducted in a subset of schools. The general guideline for selecting the
subset of schools was to obtain an equal distribution of schools in high versus low
SES areas. This guideline was specified to avoid any potential selection bias. To best
represent a cross-section of students in the community, Catholic schools were included
in communities where these schools were available and agreement to participate was
obtained. Thirty-four Catholic schools were included in the study.
Sampling of participants was based on a quota-by-grade level criterion. The
solicitation of subjects was entirely random. The permission forms included subject
identifying information (e.g., date of birth, grade, sex, and ethnic origin), mother’s and
father’s education level, and mother’s and father’s current occupation. Any subject who
had less than 1 year of experience in an English-only classroom was excluded from the
From among the returned permission forms, subjects were selected based on the
identified subject-level variables needed to fill the sampling plan (male versus female,
highest grade completed by parents, ethnic origin) and were subsequently tested at
school. Although the total sample was 341 in grades kindergarten through 12, only 310
students in grades 1 through 12 were included in the present study due to missing data
from some tests or clusters. The 310 Canadian children ranged in age from 6 years,
8 months to 19 years, 5 months (M = 149.41 months, SD = 40.37) and were closely
distributed by sex (148 males and 162 females).
U.S. Matched Sample
The 310 Canadian subjects served as the foundation for the U.S. matched sample that
was selected from the 8,782 participants in the WJ III standardization sample. A U.S.
subject that best matched each Canadian subject was selected from the complete WJ III
Subject matching was based on a hierarchical sequence of matching variables—age
(in months), parent education (highest level of either mother’s or father’s education),
race/ethnicity (white or nonwhite), and sex. If more than one U.S. subject met the
match criteria, a U.S. subject was randomly selected from the available pool. Although
a concerted effort was made to collect common demographic indicators across the
two samples, an exact match was not possible, given differences in the way census
information and demographic variables are defined in Canada and in the United States.
For example, Statistics Canada defines ethnic groups according to ancestry (e.g., British,
French or European, Multiple Origins, or Other); the U.S. Census categorizes individuals
according to race (e.g., White, Black/African-American, American Indian, Asian/Pacific
Islander) and Hispanic origin (Hispanic or non-Hispanic). The 310 U.S. subjects selected
ranged in age from 6 years, 8 months to 19 years, 5 months (M = 149.74 months, SD =
40.11) and were closely distributed by sex (150 males and 160 females).
A comparison of the two samples on the matching and other variables revealed a
high degree of comparability. Chi-square analyses revealed no significant differences
in frequencies of subjects in the Canadian and U.S. samples as a function of parent
education level (chi-square = 4.56; df = 4.0, p = 0.34), race (chi-square = 0.37; df = 1.0,
p = 0.54), or gender (chi-square = 0.03; df = 1.00, p = 0.87). Comparison of the ages
(in months) of the Canadian and U.S. samples (t test) revealed no significant difference
(M difference = 0.40, t (618) = 0.12, p = 0.90). A similar t-test comparison of grade
placement in tenth of years (M difference = 0.29, t (618) = 1.09, p = 0.27) also was not
significant. Comparisons of the distributional characteristics of the two samples also
suggested strong comparability of the two samples. Summary statistics for the matching
variables are presented in Table 1.
Sample Characteristics (N = 310)
N Canadian Percentage N U.S. Percentage
Male 148 47.7 150 48.4
Female 162 52.3 160 51.6
1–4 91 29.4 99 31.9
5–8 105 33.8 112 36.2
9–12 114 36.8 99 31.9
Father’s Education Level
< High School Diploma 76 24.5 5 1.6
High School Diploma 62 20.0 50 16.1
Post Secondary/Diploma 75 24.2 80 25.8
University Degree 80 25.8 90 29.1
Not Reported 17 5.5 85 27.4
Mother’s Education Level
< High School Diploma 65 21.0 47 15.2
High School Diploma 61 19.7 84 27.1
Post Secondary/Diploma 105 33.8 107 34.5
University Degree 75 24.2 72 23.2
Not Reported 4 1.3 0 0
White/Anglo/European 247 79.7 252 81.3
Asian-Pacific Islander 46 14.8 13 4.2
First Nations/Aboriginal/Native American 10 3.2 7 2.3
Black/African/African American 7 2.3 38 12.2
Hispanica 4 1.3 33 10.6
Central Place 89 28.7 81 26.1
Urban Fringe 76 24.5 75 24.2
10,000 to 50,000 38 12.3 73 23.6
<10,000 107 34.5 81 26.1
aCalculated independently of the other ethnic categories. Does not figure in total percentage.
Canadian participants in this study were administered selected tests from the
standardization edition of the Woodcock-Johnson III Tests of Cognitive Abilities (WJ III
COG) (Woodcock, McGrew, & Mather, 2001c) and the Woodcock-Johnson III Tests of
Achievement (WJ III ACH) (Woodcock, McGrew, & Mather, 2001b) and the complete
Wechsler Intelligence Scale for Children–Third Edition (WISC-III) (Wechsler, 1991). Only
the WJ III COG tests that comprise the Broad CHC Ability clusters and the WJ III ACH
tests that comprise the primary academic clusters and two oral language tests were
included in the present study. Tables 2 and 3 describe the tests, clusters, and abilities
measured by the WJ III. Students were administered U.S. standardization versions of
the WJ III tests, and scores were calculated with the WJ III NU norms. WJ III NU ACH
Form B is the Canadian version of the WJ III NU ACH and was used in the present
study. In the WJ III NU ACH Form B Canadian, the majority of the test is exactly the
same as the WJ III NU ACH Form B, however, a number of items were changed to more
appropriately reflect Canadian content (e.g., coins, measurement, spelling).
Descriptions of WJ III NU COG Clusters and Tests Used in the Study
Cluster Test Descriptions
Intellectual Ability Clusters
General Intellectual Ability–Standard
Global score considered to be the best single-score predictor of a performance, on average, across a wide
variety of academic and cognitive outcomes; the single best (psychometric) measure of g. Includes one
measure of each CHC ability.
General Intellectual Ability–Extended
Tests 1–7 and 11–17
A broader global score considered to be the single best (psychometric) measure of theoretical g. Includes
two measures of each CHC ability.
Brief Intellectual Ability (BIA)
Tests 1, 5, & 6 A brief measure of intelligence. Useful in screening.
Cognitive Performance Clusters
Tests 1 & 11
A measure of language-based acquired knowledge development that includes the comprehension of
individual words and the comprehension of relationships among words and the ability to communicate that
Tests 2, 3, 4, & 5
Tests 2, 3, 4, 5, 12, 13, 14, & 15
Represents different thinking processes invoked when information in short-term memory cannot be
Tests 6 & 7
Tests 6, 7, 16, & 17
Represents the capacity of the cognitive system to process information automatically.
Broad Cognitive CHC Ability
Comprehension Knowledge (Gc)
Tests 1 & 11 Test 1: Verbal Comprehension: Identifying Objects: Knowledge of synonyms and antonyms; completing
Test 11: General Information: Identifying where objects are found and what people typically do with an
Long-Term Retrieval (Glr)
Tests 2 & 12 Test 2: Visual Auditory Learning: Learning and recalling pictographic representations of words.
Test 12: Retrieval Fluency: Naming as many examples as possible from a given category.
Visual-Spatial Thinking (Gv)
Tests 3 & 13 Test 3: Spatial Relations: Identifying the subset of pieces needed to form a complete shape.
Test 13: Picture Recognition: Identifying a subset of previously presented pictures within a field of
Auditory Processing (Ga)
Tests 4 & 14 Test 4: Sound Blending: Synthesizing phonemes.
Test 14: Auditory Attention: Identifying orally presented words amid increasingly intense background noise.
Fluid Reasoning (Gf)
Tests 5 & 15 Test 5: Concept Formation: Identifying, categorizing, and determining rules.
Test 15: Analysis-Synthesis: Analyzing puzzles (using symbolic formulations) to determine missing
Processing Speed (Gs)
Tests 6 & 16 Test 6: Visual Matching: Rapidly locating and circling identical numbers from a defined set of numbers.
Test 16: Decision Speed: Locating and circling two pictures most similar conceptually in a row.
Short-Term Memory (Gsm)
Tests 7 & 17 Test 7: Numbers Reversed: Holding a span of numbers in immediate awareness while reversing the
Test 17: Memory for Words: Repeating a list of unrelated words in correct sequence.
Descriptions of WJ III NU ACH Clusters and Tests Used in the Study
Tests 1, 2, 5, 6, 7, 8, 9, 10, & 11 Provides an overall score of achievement.
Tests 1, 2, & 9 Test 2: Reading Fluency: Reading printed statements rapidly and responding true or false.
Basic Reading Skills
Tests 1 & 13 Test 1: Letter-Word Identification: Identifying and pronouncing printed letters and words; sight word
Test 13: Word Attack: Pronouncing nonwords that conform to English spelling rules.
Tests 9 & 17 Test 9: Passage Comprehension: Identifying a missing key word that makes sense in the context of a written
Test 17: Reading Vocabulary: Reading words and providing synonyms, and antonyms; completing
Tests 5, 6, & 10
Math Calculation Skills
Tests 5 & 6 Test 5: Calculation: Performing various mathematical calculations.
Test 6: Math Fluency: Adding, subtracting, and multiplying rapidly.
Tests 10 & 18 Test 10: Applied Problems: Analyzing and solving orally presented, practical mathematical problems.
Test 18: Quantitative Concepts: Identifying math terms and formulae; identifying number patterns.
Written Language (Grw)
Broad Written Language
Tests 7, 8, & 11
Basic Writing Skills
Tests 7 & 16 Test 7: Spelling: Spelling letter combinations that are regular patterns in written English.
Test 16: Editing: Identifying and correcting errors in written passages.
Tests 8 & 11 Test 8: Writing Fluency: Formulating and writing simple sentences rapidly.
Test 11: Writing Samples: Writing meaningful sentences for a given purpose.
Special Purpose Clusters
Tests 1, 5, & 7 Overall measure of basic achievement skills.
Tests 2, 6, & 8 Overall measure of academic fluency.
Tests 9, 10, & 11 Overall measure of application of academic knowledge.
Test 19 A measure of information in curricular areas of science, social studies, and humanities.
Two sets of analyses were completed to determine if the WJ III NU scores from the
Canadian sample were similar to the scores for the U.S. sample. The first analysis
evaluated the comparability of the U.S. and Canadian samples based on the similarity
of each sample's distribution (variance) of general intelligence and overall achievement.
The second analysis evaluated mean score differences for the WJ III NU cognitive and
achievement clusters and individual tests.
Canadian/U.S. Sample Comparability
A two-sample test for the equality of variance for the WJ III NU General Intellectual
Ability Index–Extended (GIA-Ext) score was nonsignificant (F = 1.14, p = .26),
indicating that the distribution of general intelligence in the U.S. (W-score variance
= 195.46) and Canadian (W-score variance = 172.00) samples was not significantly
different. The two-sample test for the equality of variance for the WJ III NU Total
Achievement cluster score was also nonsignificant (F = 0.99, p = .93), indicating
that the distribution of overall achievement (reading, math, and written language
combined) in the U.S. (W-score variance = 486.81) and Canadian (W-score variance
= 492.01) samples was also similar. These findings indicate that the sample matching
process was successful in producing two samples that were similar in their distribution of
general intelligence and overall academic achievement abilities.
Canadian and U.S. WJ III NU Score Comparisons
To better understand the performance of the Canadian sample compared to the U.S.
standardization sample, means and standard deviations of the tests and clusters for both
the Canadian and U.S. sample were calculated. Paired sample t tests (see Table 4) were
calculated to evaluate differences between the results of the Canadian and U.S. samples.
Due to the large number of t tests conducted, which can produce significant findings
based on chance alone, each set of t-test comparisons was evaluated against familywise
bonferroni corrected p values. The results indicate that while the U.S. sample typically
scored slightly higher than the Canadian sample on the WJ III NU COG clusters, the
differences were not statistically significant with one exception—the Long-Term Retrieval
cluster. The mean difference of the General Intellectual Ability–Extended (GIA-Ext) score
for the U.S. sample (M = 100.74, SD = 15.77) was not significant compared to the mean
difference for the Canadian sample (M = 98.88, SD = 13.73). Specific CHC cluster scores
for the Canadian sample ranged from 95.05 on the Long-Term Retrieval cluster to 100.78
on the Short-Term Memory cluster, while the range for the U.S. sample was from 100.09
on the Fluid Reasoning Cluster to 102.05 on the Short-Term Memory cluster. While the
overall standard deviation is somewhat smaller for the Canadian sample, indicating a
slightly more restricted range than the range of scores for the U.S. sample, the previously
discussed tests of the equality of variances for general intelligence and total achievement
indicates that, overall, these differences are not significant.
Means and Standard Deviations of the WJ III NU Clusters for the Canadian (CND) and U.S. Samples (N = 310)
SD Diff. tp
General Intellectual Ability–Standard (GIA-Std) 98.57 13.52 100.52 15.72 1.95 1.84 .066
General Intellectual Ability–Extended (GIA-Ext) 98.88 13.73 100.74 15.77 1.86 1.78 .075
Brief Intellectual Ability (BIA) 100.62 13.06 100.78 15.48 .16 .15 .882
Verbal Ability–Standard 101.25 14.37 100.08 15.91 -1.17 1.10 .276
Verbal Ability–Extended 99.98 14.45 100.42 16.39 .44 .41 .685
Thinking Ability–Standard 98.05 13.52 101.01 15.66 2.96* 2.75 .006
Thinking Ability–Extended 98.79 13.36 101.35 14.88 2.56* 2.52 .012
Cognitive Efficiency–Standard 99.79 16.12 101.65 15.26 1.86 1.47 .143
Cognitive Efficiency–Extended 101.11 15.69 102.55 15.42 1.44 1.22 .225
Comprehension-Knowledge 99.98 14.45 100.42 16.39 .44 .41 .685
Long-Term Retrieval 95.05 15.59 100.52 15.09 5.47* 4.84 .000
Visual-Spatial Thinking 97.90 15.02 100.23 14.06 2.33* 2.08 .038
Auditory Processing 100.18 14.62 101.07 16.21 .89 .77 .441
Fluid Reasoning 98.92 13.08 100.09 15.51 1.17 1.07 .284
Processing Speed 100.07 15.92 101.39 14.45 1.32 1.16 .248
Short-Term Memory 100.78 16.00 102.05 15.62 1.27 1.04 .299
aThe 16 cognitive cluster comparisons were evaluated at the p = .003 (p = .05/16 = .003) level of significance to reflect an overall familywise error rate per
the Bonferroni adjustment. N = 310 for all clusters for both samples. All t tests had df = 309.
*Designates significant differences. Significant at the .05 level. All t tests had df = 309.
A review of the CPM clusters (see Table 4) again indicates that the U.S. sample scores
were slightly higher. However, no significant differences were reported between the
two samples on any of the CPM clusters. Given that the Thinking Abilities cluster is
comprised of tests from the Fluid Reasoning, Long-Term Retrieval, Auditory Processing,
and Visual-Spatial Thinking clusters, this difference is not surprising. An examination of
the tests that impact the Long-Term Retrieval cluster revealed that the Visual-Auditory
Learning test was the primary reason for the significant difference. A summary of the
standard scores and differences across the WJ III NU COG tests for the Canadian and
U.S. samples are highlighted in Table 5. A review of the mean score test comparisons
(Table 5) indicated that on the 14 primary cognitive tests, the samples only differed
significantly on the Visual-Auditory Learning test, with the Canadian subjects (M =
94.09, SD = 16.78) scoring approximately 6 standard score points lower than the U.S.
subjects (M = 100.28, SD = 15.40), t(309) = 5.27, p = .000.
Means and Standard Deviations of the WJ III NU COG Tests for the Canadian (CND) and U.S. Samples (N = 310)
SD Diff. tp
Verbal Comprehension 101.25 14.37 100.08 15.91 -1.17 -1.09 .277
General Information 99.16 14.11 101.15 16.18 1.99 1.86 .063
Visual-Auditory Learning 94.09 16.78 100.28 15.40 6.19* 5.27 .000
Retrieval Fluency 98.24 14.37 99.86 15.11 1.62 1.38 .168
Spatial Relations 99.05 16.29 100.59 15.53 1.54 1.21 .228
Picture Recognition 97.71 15.03 99.75 13.78 2.04 1.79 .074
Sound Blending 98.41 14.05 99.83 16.41 1.42 1.27 .207
Auditory Attention 101.97 14.49 101.04 12.80 -.93 -.86 .391
Concept Formation 98.78 12.26 99.99 16.02 1.21 1.11 .268
Analysis-Synthesis 99.86 14.52 100.64 14.52 .78 .70 .484
Visual Matching 100.65 15.12 100.81 13.98 .16 .14 .887
Decision Speed 98.95 15.94 101.26 14.98 2.31 1.96 .051
Numbers Reversed 99.11 16.65 101.44 15.61 2.33 1.78 .076
Memory for Words 101.81 15.84 101.47 15.34 -.34 -.29 .774
aThe 14 cognitive test comparisons were evaluated at the p = .004 (p = .05/14 = .004) level of significance to reflect an overall familywise error rate per the
Bonferroni adjustment. N = 310 for all tests for both samples. All t tests had df = 309.
*Designates significant differences. Significant at the .05 level. All t tests had df = 309.
The means and standard deviations, as well as the mean score comparisons, for the WJ
NU III Tests of Achievement clusters and tests for both samples are summarized in Tables
6 and 7, respectively. While the Canadian sample scored slightly higher (M = 101.30,
SD = 14.16) than the U.S. sample (M = 100.90, SD = 15.37) on the Total Achievement
cluster, the difference was not statistically significant (t = -.37, p = .715). While
the Canadian sample scored higher than the U.S. sample (although not statistically
significant) on 7 of the 14 WJ III NU ACH clusters, the achievement cluster scores are
more variable, with the Canadian sample tending to score higher on some clusters (e.g.,
Broad Reading, Broad Written Language, Math Reasoning, Academic Fluency). However,
it is important to note that the Canadian and U.S. samples displayed no statistically
significant achievement cluster differences.
At the test level (Table 7), five statistically significant differences are noted (Reading
Fluency, Reading Vocabulary, Quantitative Concepts, Editing, and Oral Comprehension).
The Canadian sample scored significantly higher on the Reading Fluency, Quantitative
Concepts, and Oral Comprehension tests, while the U.S. sample scored statistically
higher on the Reading Vocabulary and Editing tests.
Means and Standard Deviations of the WJ III NU ACH Clusters for the Canadian (CND) and U.S. Samples (N = 310)
SD Diff. tp
Total Achievement 101.30 14.16 100.90 15.37 -.40 -.37 .715
Broad Reading 102.07 15.21 101.44 15.69 -.63 -.56 .573
Basic Reading Skills 100.52 15.21 101.45 15.55 .93 .81 .416
Reading Comprehension 93.59 14.58 96.29 15.73 2.70* 2.63 .009
Broad Math 99.38 15.65 100.97 15.95 1.59 1.27 .204
Math Calculation Skills 98.33 15.64 100.63 15.67 2.30 1.91 .058
Math Reasoning 101.49 14.26 100.22 15.90 -1.27 -1.09 .278
Broad Written Language 101.48 14.67 99.54 14.41 -1.94 -1.81 .070
Basic Writing Skills 100.09 14.84 101.96 15.88 1.87 1.87 .123
Written Expression 100.32 12.05 99.77 13.48 -.55 -.574 .566
Academic Skills 101.83 14.82 101.41 16.31 -.42 -.35 .726
Academic Fluency 99.89 14.12 98.73 14.50 -1.16 -1.06 .289
Academic Applications 100.53 13.90 100.55 15.13 .02 .019 .984
Academic Knowledge 98.51 12.74 100.44 15.69 1.93 1.87 .062
aThe 14 achievement cluster comparisons were evaluated at the p = .004 (p = .05/14 = .004) level of significance to reflect an overall familywise error rate
per the Bonferroni adjustment. N = 310 for all clusters for both samples. All t tests had df = 309. No significant differences were noted.
Means and Standard Deviations of the WJ III NU ACH Tests for the Canadian (CND) and U.S. Samples (N = 310)
SD Diff. tp
Letter-Word Identification 101.97 15.19 102.34 16.42 .37 .31 .753
Word Attack 99.07 12.59 100.37 13.74 1.30 1.19 .233
Reading Fluency 102.31 14.33 99.09 14.04 -3.22* -3.02 .003
Passage Comprehension 100.04 15.30 100.68 15.15 .64 .58 .559
Reading Vocabulary 90.29 11.93 94.00 14.02 3.71* 4.33 .000
Calculation 98.25 15.84 100.56 16.29 2.31 1.84 .067
Math Fluency 99.40 14.78 100.80 13.98 1.40 1.25 .124
Applied Problems 100.24 14.74 100.78 14.37 .54 .46 .644
Quantitative Concepts 102.41 13.82 98.83 16.57 -3.58* -3.07 .002
Spelling 102.87 15.46 99.81 15.17 -3.06* -2.47 .014
Editing 97.02 15.01 101.43 18.99 4.41* 2.95 .003
Writing Fluency 97.92 12.59 98.25 13.94 .33 .33 .74
Writing Samples 102.18 11.34 100.85 12.50 -1.33 -1.51 .132
Picture Vocabulary 102.26 14.82 99.94 15.43 -2.32* -2.18 .03
Oral Comprehension 107.03 11.72 100.73 14.40 -6.30* -6.24 .000
aThe 15 cognitive test comparisons were evaluated at the p = .003 (p = .05/15 = .003) level of significance to reflect an overall familywise error rate per the
Bonferroni adjustment. N = 310 for all clusters for both samples. All t tests had df = 309.
*Designates significant differences. Significant at the .05 level. All t tests had df = 309.
While a handful of statistically significant mean score difference comparisons were
reported between matched U.S. and Canadian school-age subjects across certain WJ III
NU COG and ACH tests and clusters, the majority of the analyses reveal no systematic
WJ III mean score differences. These findings support, with some caution, the use or
transportability of the WJ III NU U.S.-based norms with Canadian populations. While
these findings are somewhat different than previous Canadian/U.S. comparison studies
with the Wechsler scales (e.g., Wechsler, 1996; 2001b; 2003; 2004; 2008b), the present
study employed somewhat different procedures for comparing the two samples and
used a test with several different types of measures of cognitive abilities grounded in
a different theoretical framework (i.e., CHC theory). Instead of administering the two
tests to both a Canadian and a U.S. sample, scoring the Canadian sample using U.S.
norms and comparing the Canadian sample results with the entire U.S. standardization
sample, or conducting a Canadian standardization with the full test, the present study
used a matched sample where the Canadian sample was compared to a demographically
matched U.S. sample drawn from the WJ III NU standardization sample. This may
account for the differences in the findings of the present study from previous research
with the Wechsler scales. While the differences in the Canadian and U.S. samples are
widely reported in the Canadian testing literature, these differences are based largely
on studies with the Wechsler scales. Few others have studied and published differences
across U.S. and Canadian samples on individually administered tests of cognitive abilities.
A review of the only non-Wechsler tests comparison of U.S. and Canadian samples on
the KeyMath Second Edition and KeyMath3 revealed results similar to the WJ III NU ACH
results reported in the present study, with similar overall scores and the Canadian sample
scoring slightly lower on several subtests (e.g., Applications and Operations).
The issues related to the transportability of the norms for measures of cognitive
abilities and achievement standardized in the United States is complex. There are no
simple answers. Consumers and users of tests must recognize that what may be gained
from the Canadian norms may result in a potential loss in other areas (e.g., reliability
and/or breadth of constructs measured or needed to answer referral questions). The
decision is not black and white. Examiners must use the test in a responsible manner and
understand both the strengths and limitations of using a given test with any population.
Additional research is needed to better understand the need for Canadian norms on all
widely used measures of cognitive ability and achievement. The present study is the first
to explore the use of the WJ III NU and its U.S.-based norms in Canada. Further, it is
one of the few studies to explore these issues with a test other than the Wechsler scales.
One should not automatically assume that separate Canadian norms are needed for
tests that are well standardized with U.S. populations and are used in Canada. And one
should not assume that any single study should result in an immediate call for separate
Canadian norms or special adjustments to scores and interpretations of scores. Simple
explanations of complex measurement issues do not provide the answer. Even the CPA
guidelines (1996) point to the complexity and cost of the proper construction of norms
for all published tests by reminding consumers that norms are both difficult and costly to
construct properly and may not be required for all tests standardized in the United States
and used with Canadian populations.
American Educational Research Association (AERA), American Psychological Association
(APA), and National Council on Measurement in Education (NCME). (1999).
Standards for educational and psychological testing. Washington, DC: American
Educational Research Association.
Beal, A. L., Dumont, R. P., Branche, A. H., & Cruse, C. L. (1996). Validation of the
WISC-III short form for Canadian students with learning disabilities. Canadian Journal
of School Psychology, 12, 1–6.
Beal, A. L., Dumont, R. P., Cruse, C. L., & Branche, A. H. (1996). Practical implication of
differences between the American and Canadian norms for the WISC-III and a short
form for children with learning disabilities. Canadian Journal of School Psychology, 12,
Braden, J., & Alfonzo, V. P. (2003). The WJ III Tests of Cognitive Abilities in cognitive
assessment courses. In F. A. Schrank & Flanagan, D. P. (Eds.), WJ III Clinical Use and
Interpretation: Scientist-Practitioner Perspectives. San Diego: Academic Press.
Canadian Psychological Association. (1996). Guidelines for Educational and Psychological
Testing. Ottawa, Ontario: Author.
Carroll, J. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York:
Cambridge University Press.
Cizek, G. J. (2003). Review of the Woodcock-Johnson III. In B. S. Plake & J. C. Impara
(Eds.), The fifteenth mental measurements yearbook (pp.1020–1024). Lincoln, NE:
Buros Institute of Mental Measurements.
Connolly, A. (2000). KeyMath - Revised Canadian Education Normative Update.
San Antonio, TX: Pearson.
Connolly, A. (2007). KeyMath3 Diagnostic Assessment. San Antonio, TX: The
Connolly, A. (2008). KeyMath3 Canadian Edition. San Antonio, TX: Pearson.
Cummings, J. A. (1995). Review of the Woodcock-Johnson Psycho-Educational Battery-
Revised. In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental measurements
yearbook (pp. 1113–1116). Lincoln, NE: Buros Institute of Mental Measurements.
Elliott, C. D. (2007). Differential Ability Scales–Second Edition. San Antonio, TX: Harcourt
Ford, L., Percy, A., & Negreiros, J. (2010). Canadian cognitive assessment training.
Paper accepted for presentation at the annual meeting of the Canadian Psychological
Association, Winnipeg, Manitoba.
Kaufman, A. S., & Kaufman, N. L. (2004a). Kaufman Assessment Battery for Children,
(2nd ed). Circle Pines, MN: American Guidance Service.
Kaufman, A. S., & Kaufman, N. L. (2004b). Kaufman Test of Educational Achievement,
Second Edition. Circle Pines, MN: American Guidance Service.
Hicks, P. & Bolan, L. P. (1996). Review of the Woodcock-Johnson Psycho-Educational
Battery–Revised. Journal of School Psychology, 6(4), 93–102.
Hildebrand, D., & Saklofske, D. H. (1996). The Wechsler Adult Intelligence Scale–Third
Edition: A Canadian Standardization Study. Canadian Journal of School Psychology, 12,
International Test Commission (2000). International guidelines for test use. http://www.
Iverson, G. L., & Lange, R. T., & Viljoen, H. (2006). Comparing the Canadian and
American WAIS-III normative systems in inpatient neuropsychiatry and forensic
psychiatry. Canadian Journal of Behavioural Science, 348–353.
Joint Advisory Committee. (1993). Principles for fair student assessment practices for
education in Canada. Edmonton, Alberta: Author.
Lee, S. W., & Stefany, E. (1995). Review of the Woodcock-Johnson Psycho-Educational
Battery–Revised. In J. C. Conoley & J. C. Impara (Eds.), The twelfth mental
measurements yearbook. (pp. 1116–1117). Lincoln, NE: The University of Nebraska
McGrew, K. S. (1997). Analysis of the major intelligence batteries according to a
proposed comprehensive Gf-Gc framework. In D. P. Flanagan, J. L. Genshaft, & P. L.
Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp.
151–179). New York: Guilford.
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of cognitive abilities: Past,
present, and future. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and issues (pp. 136–182). New York: Guilford.
McGrew, K. S. (2009). Editorial: CHC theory and the human cognitive abilities project:
Standing on the shoulders of the giants of psychometric intelligence research.
Intelligence, 37, 1–10.
McGrew, K. S., & Woodcock, R. W. (2001). Technical Manual. Woodcock-Johnson III.
Rolling Meadows, IL: Riverside Publishing.
McGrew, K., Dailey, D., & Schrank, F. (2007). Woodcock-Johnson III/Woodcock-Johnson
III Normative Update score differences: What the user can expect and why (Woodcock-
Johnson III Assessment Service Bulletin No. 9). Rolling Meadows, IL: Riverside
Mark. R., Beal, A. L., & Dumont, R. (1998). Validation of a WISC-III short form for the
identification of Canadian gifted students. Canadian Journal of School Psychology, 14,
Reddon, J. R., Whippler, S. M., & Reddon, J. E. (2007). Seemingly anomalous WISC-IV
Full Scale IQ scores in the American and Canadian standardization samples. Current
Psychology, 26, 60–69.
Roid, G. (2003). The Stanford-Binet Intelligence Scale: Fifth Edition. Rolling Meadows, IL:
Saklofske, D. H., Hildebrand, D. K., Reynolds, C. R., & Wilson, V. L. (1998). Substituting
symbol search for coding on the WISC-III: Canadian normative tables for Performance
and Full Scale IQ scores. Canadian Journal of Behavioural Science, 20, 57–68.
Saklofske, D. H., Tulsky, D. S., Wilkins, C., & Weiss, L. G. (2003). Canadian WISC-
III directional base rates of score discrepancies by ability level. Canadian Journal of
Behavioural Science, 35, 210–218.
Sandoval, J. (2003). Review of the Woodcock-Johnson III. In B. S. Plake & J. C. Impara
(Eds.), The fifteenth mental measurements yearbook (pp. 1024–1027). Lincoln, NE:
Buros Institute of Mental Measurements.
Statistics Canada (1996). 1996 Census of Population. Ottawa, Ont: Statistics Canada.
Schrank, F. A., Flanagan, D. P., Woodcock, R. W., & Mascola, J. (2002). Essentials of the
WJ III Tests of Cognitive Abilities Assessment. New York: John Wiley.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children–Third Edition. San Antonio,
TX: The Psychological Corporation.
Wechsler, D. (1996). Wechsler Intelligence Scale for Children–Third Edition, Canadian. San
Antonio, TX: The Psychological Corporation.
Wechsler, D. (2001). Wechsler Individual Achievement Test–Second Edition. San Antonio,
TX: The Psychological Corporation.
Wechsler, D. (2001b). Wechsler Adult Intelligence Scale Test–Third Edition , Canadian. San
Antonio, TX: Psycholation.
Wechsler, D. (2002). The Wechsler Preschool and Primary Scale of Intelligence Scale–Third
Edition. San Antonio, TX: The Psychological Corporation.
Wechsler, D. (2003). The Wechsler Preschool and Primary Scale of Intelligence Scale–Third
Edition, Canadian. San Antonio, TX: The Psychological Corporation.
Wechsler, D. (2003). Wechsler Intelligence Scale for Children–Fourth Edition. San Antonio,
TX: The Psychological Corporation.
Wechsler, D. (2004). Wechsler Intelligence Scale for Children–Fourth Edition, Canadian. San
Antonio, TX: The Psychological Corporation.
Wechsler, D. (2008a). Wechsler Adult Intelligence Scale–Fourth Edition. San Antonio, TX:
The Psychological Corporation.
Wechsler, D. (2008b). Wechsler Adult Intelligence Scale–Fourth Edition, Canadian. San
Antonio, TX: The Psychological Corporation.
Wechsler, D. (2009). Wechsler Individual Achievement Test–Third Edition. San Antonio, TX:
The Psychological Corporation.
Weiss, L. G., Saklofske, D. H., Prifitera, A., Chen, H. Y., & Hildebrand, D. K. (1999). The
calculation of the WISC-III General Ability Index using Canadian norms. Canadian
Journal of School Psychology, 14, 1–10.
Woodcock, R. W., & Johnson, M. B. (1977). Woodcock-Johnson Psycho-Educational
Battery. Rolling Meadows, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001a). Woodcock-Johnson III. Rolling
Meadows, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001b). Woodcock-Johnson III Tests of
Achievement. Rolling Meadows, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001c). Woodcock-Johnson III Tests of
Cognitive Abilities. Rolling Meadows, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., Schrank, F. A., & Mather, N. (2001, 2007). Woodcock-
Johnson III Normative Update. Rolling Meadows, IL: Riverside Publishing.
3800 Golf Road, Suite 100
Rolling Meadows, IL 60008-4015