Sandra Weintraub, Sureyya S. Dikmen, Robert K. Heaton, et al.
Cognition assessment using the NIH Toolbox
March 26, 2013 This information is current as of
located on the World Wide Web at:
The online version of this article, along with updated information and services, is
Neurology. All rights reserved. Print ISSN: 0028-3878. Online ISSN: 1526-632X.
since 1951, it is now a weekly with 48 issues per year. Copyright © 2013 American Academy of
® is the official journal of the American Academy of Neurology. Published continuously
Sandra Weintraub, PhD
Sureyya S. Dikmen, PhD
Robert K. Heaton, PhD
David S. Tulsky, PhD
Philip D. Zelazo, PhD
Patricia J. Bauer, PhD
Noelle E. Carlozzi, PhD
Jerry Slotkin, PhD
David Blitz, MA
Nathan A. Fox, PhD
Jennifer L. Beaumont, MS
Dan Mungas, PhD
Cindy J. Nowinski, MD,
Jennifer Richler, PhD
Joanne A. Deocampo, PhD
Jacob E. Anderson, MA
Jennifer J. Manly, PhD
Beth Borosh, PhD
Richard Havlik, MD
Kevin Conway, PhD
Emmeline Edwards, PhD
Lisa Freund, PhD
Jonathan W. King, PhD
Claudia Moy, PhD
Ellen Witt, PhD
Richard C. Gershon, PhD
Cognition assessment using the NIH
Cognition is 1 of 4 domains measured by the NIH Toolbox for the Assessment of Neurological and
Behavioral Function (NIH-TB), and complements modules testing motor function, sensation, and
emotion. On the basis of expert panels, the cognition subdomains identified as most important for
health, success in school and work, and independence in daily functioning were Executive Function,
Episodic Memory, Language, Processing Speed, Working Memory, and Attention. Seven measures
were designed to tap constructs within these subdomains. The instruments were validated in
English, in a sample of 476 participants ranging in age from 3 to 85 years, with representation from
both sexes, 3 racial/ethnic categories, and 3 levels of education. This report describes the develop-
ment of the Cognition Battery and presents results on test-retest reliability, age effects on perfor-
mance, and convergent and discriminant construct validity. The NIH-TB Cognition Battery is
intended to serve as a brief, convenient set of measures to supplement other outcome measures
in epidemiologic and longitudinal research and clinical trials. With a computerized format and
national standardization, this battery will provide a “common currency” among researchers for com-
parisons across a wide range of studies and populations. Neurology?2013;80 (Suppl 3):S54–S64
CAT 5 computer adaptive testing; CB 5 Cognition Battery; EF 5 executive function; NIH-TB 5 NIH Toolbox for the Assess-
ment of Neurological and Behavioral Function; PS 5 processing speed; WM 5 working memory.
Cognition is 1 of the 4 domains of behavioral and neurologic health assessed in the NIH Toolbox
forthe AssessmentofNeurologicalandBehavioralFunction(NIH-TB). All domainmeasureswere
intended to be freely accessible, to be usable with individuals from 3 to 85 years of age, with each
domain battery not toexceed30 minutes in duration.Expertsurveyswere conductedandpanels of
research scientists and clinicians consulted in an iterative manner to rank cognitive subdomains in
order of their perceived importance for health. Information was requested from experts (N 5 102)
who reported sufficient familiarity with cognition to make recommendations for specific subdo-
mains of importance. The 2 top-ranked subdomains were Executive Function (EF) (95%) and
Episodic Memory (93%), followed by Language (55%), Processing Speed (52%), and Attention
(50%). Many (57%) also listed a “Global Score” as important. Other cognitive subdomains were
excluded because of lower priority in the rankings, coupled with the stringent time constraints on
the length of the battery.
The rationale for specific cognitive constructs within subdomains and instrument selection was
based on a systematic review of the literature, including evidence of the known biological associ-
ations of each. The EF subdomain was deemed to include several distinct constructs, including
Switching/Set Shifting, Inhibitory Control and Attention, and Working Memory. Because of
From the Cognitive Neurology and Alzheimer’s Disease Center (S.W., B.B.), Northwestern Feinberg School of Medicine, Chicago, IL; Department
of Rehabilitation Medicine (S.S.D.), University of Washington, Seattle, WA; Department of Psychiatry (R.K.H.), University of California, San
Diego, CA; Department of Physical Medicine and Rehabilitation (D.S.T., N.E.C.), University of Michigan, Ann Arbor, MI; Institute of Child
Development (P.D.Z., J.E.A.), University of Minnesota, Minneapolis, MN; Department of Psychology (P.J.B., J.A.D.) Emory University, Atlanta,
GA; Department of Medical Social Sciences (J.S., D.B. J.L.B., C.J.N., R.C.G.), Northwestern University, Chicago, IL; Westat (K.W.-A., R.H.),
Rockville, MD; Department of Human Development (N.A.F.), University of Maryland, College Park, MD; Department of Neurology (D.M.),
University of California, Davis, CA; Department of Psychological and Brain Sciences (J.R.), Indiana University, Bloomington, IN; Cognitive
Neuroscience Division (J.J.M.), Taub Institute for Research in Alzheimer’s Disease and the Aging Brain, Sergievsky Center, Columbia University,
New York, NY; National Institute on Drug Abuse (K.C.), Rockville, MD; National Center for Complementary and Alternative Medicine (E.E.),
Bethesda; National Institute of Child Health and Human Development (L.F.), Bethesda; National Institute on Aging (J.W.K.), Bethesda; National
Institute of Neurological Disorders and Stroke (C.M.), Bethesda; and National Institute on Alcohol Abuse and Alcoholism (E.W.), Bethesda, MD.
Go to Neurology.org forfull disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.
S54 © 2013 American Academy of Neurology
the heavy weighting of EF by respondents,
these 3 constructs were considered separate
subdomains, with single instruments address-
ing each. Cognitive subdomains and specific
constructs selected for measurement follow.
EF, also called “cognitive control,” refers to
the top-down cognitive modulation of goal-
directed activity. Development of EF in child-
hood parallels the development of prefrontal,
anterior cingulate, and parietal cortex, and the
basal ganglia, as well as the growth of connec-
tions between these regions and others.1EF
emerges in infancy2and grows rapidly between
the ages of 2 and 5 years3with more gradual
changes continuing into adolescence and early
adulthood. EF is very vulnerable to aging,4and
comparisons across the lifespan yield an inverted
U-shaped pattern, with early age-related im-
provement followed by later age-related decline.5
Based on factor-analytic work, there is an emerg-
ing consensus that EF can be divided into 3
partially independent subcomponents: set shift-
ing, inhibitory control, and updating/working
memory.6These distinctions are clearest in
middle childhood and beyond, and far less
distinct in children younger than age 6.7
There is also evidence that prefrontal activa-
tion during the performance of EF tasks be-
comes increasingly focal and differentiated in
the course of development.8
The set-shifting component of EF con-
sists of the ability to shift responses based
on rules or contingencies. It is measured in
the Cognition Battery by a paradigm ini-
tially developed for children, the NIH-TB
Dimensional Change Card Sort Test.9This
aspect of EF is supported by a distributed
neuroanatomical network involving lateral
prefrontal, anterior cingulate, and inferior
The ability to focus, sustain, and shift atten-
tion is a prerequisite for performing most con-
scious cognitive operations frequently tested
syndrome ofattention deficithyperactivity disor-
der has been associated with poor outcomes in
academic achievement and adult life adaptation,
including increased risk of accidents.11Visual
spatial attention, critical at many developmental
time points and important for safety in a variety
of environments, is mediated by a well-studied,
distributed large-scale neuroanatomical network
composed of the frontal eye fields, the posterior
parietal cortex, and the anterior cingulate area
and their interconnections with one another
and with subcortical structures in the thalamus
and basal ganglia.12,13A measure of visuospatial
itory Control andAttention Test, was chosen for
the Cognition Battery.
Working memory (WM) refers to a limited-
capacity storage buffer that becomes overloaded
when the amount of information exceeds that
capacity. Conceptually, WM refers to the ability
to 1) process information across a series of tasks
and modalities, 2) hold the information in a
short-term buffer, 3) manipulate the informa-
tion, and 4) hold the products of that manipu-
lation in the same short-term buffer. Cortical
networks associated with spatial and nonspatial
gions.14WM has been studied extensively across
the lifespan.15,16Its integrity has been linked to
improves significantly as children develop and
WM span is thought to double in capacity
between the agesof5 and10.20WMis relatively
stable throughout adulthood. A reduction in
performance in older adults may be attributable
changes in WM per se.21The test chosen to
measure this construct is the NIH-TB List Sort-
ing Working Memory Test.
Memory is composed of different systems of
information storage and retrieval. The memory
events or experiences encoded in a time-specific
decay and interference and to both “normal”
aging and many brain diseases. Episodic mem-
ory provides the building blocks for cognitive
growth during development and is the system
we rely on to update reality. Its absence, as in
the historic case of the patient H.M., results
in an existence in which there is only the pre-
sent.22Episodic memory has a protracted
course of development, with pronounced
changes throughout the first 2 decades of
life.23,24It is one of the first cognitive functions
to show age-related decline and the most
Neurology 80 (Suppl 3)March 12, 2013 S55
susceptible to developmental disorders,25brain
trauma, and neurodegenerative diseases such as
Alzheimer disease.26The large-scale neuroana-
in addition to the hippocampus includes the
hypothalamus, thalamus, medial temporal re-
gions, cingulate cortex, and prefrontal cor-
The NIH-TB Picture Sequence
Memory Test is the measure of episodic mem-
ory in the Cognition Battery.
Language is a system of conventional symbols
anatomical network in the left cerebral hemi-
sphere.28Developmental disorders of language
and communication (e.g., autism, dyslexia) and
limited opportunities to acquire literacy have
a significant impact on academic achievement
and life adaptation. Language scores can predict
occupational attainment and performance.29
Many acquired conditions can impair language
in adulthood, including aphasia due to stroke
and neurodegenerative brain disease. After much
deliberation considering the various language
sures were designed: a single-word oral reading
test, the NIH-TB Oral Reading Recognition
Test, and a single-word vocabulary comprehen-
sion test, the NIH-TB Picture Vocabulary Test.
Reading was selected because it is a proxy for
a broad range of cognitive, educational, and
socioeconomic factors. The ability to pronounce
low-frequency words with irregular orthography
has been used as an estimate of overall intelli-
gence.30Single-word reading recognition tasks
comes across the lifespan, and performance on
these tasks is also an estimate of the quality of
nic differences on neuropsychological test per-
formance seen in older adults.31,32
Vocabulary represents the lexical compo-
nent of language and is highly associated with
general measures of “crystallized intelli-
gence,” or “gc,”33overall cognitive function-
ing, and success in school and work.29,34
Single-word auditory comprehension is a fun-
damental language skill that children learn
very early, even before they are able to speak.
Infants may have a repertoire of as many as 50
words they can understand before age 1.35
Syntactic proficiency is equally important
for development,36,37but is more challenging
to measure and to translate across different
languages than single-word processing.
The final subdomain, Processing Speed
(PS), is defined as either the amount of time
it takes to process a set amount of information,
or the amount of information that can be pro-
cessed within a certain unit of time.38Simple
PS tasks require a simple motor response to a
target stimulus. Measures of complex PS, in
contrast, require more concentration, as well
as some mental manipulation.
The greatest growth in PS is observed rela-
tively early and becomes more attenuated dur-
ing childhood and adolescence.39Performance
declines in young adulthood and steadily as
people age.40PS measures are among the most
sensitive indicators of cerebral dysfunction,41
and slowed PS has been demonstrated in trau-
matic brain injury, multiple sclerosis, Parkinson
disease, symptomatic HIV, chronic fatigue syn-
drome, dementia, and schizophrenia.42Slowed
PS has been associated with changes in neuro-
transmitter activity (e.g., reduced cholinergic
function, reduced numbers of D2dopamine
receptors, and altered glutamate activity), white
matter integrity, glucose metabolism, and nerve
conduction velocities (e.g., as measured by
evoked potentials, event-related potentials, and
EEG).42For the Cognition Battery, the NIH-
TB Pattern Comparison Processing Speed Test
was chosen to measure PS.
The data reported in this article are derived
from the validation study of the Cognition Bat-
tery. Results are reported for test-retest reliabil-
ity, the effects of age on performance, and
convergent and discriminant construct validity.
Moreextensive details of test design andadmin-
istration and scoring are available for the pediat-
ric portion of the sample (ages 3–15),43and
similar details will be presented for the adult
sample (ages 20–85) in future publications.
METHODS Although the entire battery is computerized and
includes automated scoring, it is necessary for an examiner to pre-
sent task instructions, monitor compliance, and ensure valid re-
sults. For accessibility, all instructions are administered visually
on the screen and also presented orally.
NIH-TB Cognition Battery tests. NIH-TB Flanker Inhib-
itory Control and Attention Test (Executive/Attention). This
test is a version of the Eriksen flanker task derived fromthe Atten-
tion Network Test.44It tests the ability to inhibit visual attention
to irrelevant task dimensions. On each trial, a central directional
S56Neurology 80 (Suppl 3) March 12, 2013
target (fish for children younger than 8, arrows for ages 8 and
older) is flanked by similar stimuli on the left and right. The task
is to indicate the direction of the central stimulus. On congruent
trials, the flankers face the same direction as the target. On incon-
gruent trials, they face the opposite direction. A scoring algorithm
integrates accuracy, a suitable measure in early childhood, and
reaction time, a more relevant measure of adult performance on
this task, yielding scores from 0 to 10. There are 40 trials and the
average time to complete the task is 4 minutes.
NIH-TB Dimensional Change Card Sort Test (Executive/
Shifting). The NIH-TB Dimensional Change Card Sort Test,9orig-
inally designed for children, was adapted for adults to assess the set-
shifting component of EF. A target visual stimulus must be matched
to 1 of 2 choice stimuli according to shape or color. Participants
sion is critical. Those who succeed following the switch also receive a
mixed block, in which color is relevant on the majority of trials with
receive only the mixed block. The relevant criterion word, “color” or
“shape,” appears onthe screenand for young children is also delivered
orally via the computer. Scoring is similar to that for the flanker task,
for adults. A total of 40 trials require 4 minutes.
NIH-TB List Sorting Working Memory Test (Working
Memory). This task is an adaptation of Mungas’ List Sorting task
from the Spanish and English Neuropsychological Assessment
Scales.45,46A series of stimuli is presented on the computer screen
visually (object) and orally (spoken name), 1 at a time. Participants
are instructed to repeat the stimuli to the examiner in order of size,
from smallest to largest. In 1 condition, all stimuli come from 1 cat-
egory. In the second, stimuli are presented from 2 categories, follow-
then from the other, in order of size within each. The number of
items in each series increases from one trial to the next and the test is
discontinued when 2 trials of the same length are failed. The proto-
type task has been previously validated in an elderly sample.45,47The
List Sorting task takes approximately 7 minutes to administer. Test
scores consist of total items correct across all trials.
NIH-TB Picture Sequence Memory Test (Episodic Memory).
The NIH-TB Picture Sequence Memory Test is a new measure
derived from imitation-based tasks (elicited and deferred imitation)
used in research with infants and young children.48–50The original
stimuli were 3-dimensional props used to produce action sequences
that the infant or child imitates. For the NIH-TB, the stimuli are
pictured objects and activities, thematically related but with no
inherent order. For each trial, pictures appear in the center of the
computer screen and then are moved 1 at a time into a fixed spatial
order, as an audio file simultaneously describes the content of each
(e.g., “Plant the tomatoes”), until the entire sequence is displayed on
the screen. Then the pictures return to the center of the screen in a
random display and the participant must move them into the
sequence demonstrated. The score is derived from the cumulative
number of adjacent pairs of pictures remembered correctly over 3
learning trials. Based on pilot testing, level of task difficulty was
adjusted for the various age groups. Thus, for ages 3 to 4 years,
6 pictures were administered; 5 to 6 years, 9 pictures; 8 years,
Administration time is approximately 10 minutes.
NIH-TB Oral Reading Recognition Test (Language). This
recognize letters. An English item bank, controlled for frequency of
worduse,complexity of letter-sound relationships, and orthographic
typicality, was developed with an initial set of item response theory
is asked to read them aloud. Items are administered by computer
adaptive testing (CAT) and participant responses are entered by the
examiner. The CAT item bank in final form will contain approxi-
ing on performance. Average administration time is 4 minutes.
NIH-TB Picture Vocabulary Test (Language). Single
words are presented via an audio file, paired simultaneously with
4 screen images of objects, actions, and/or depictions of concepts.
The task is to pick the picture that matches the spoken word. The
test is CAT administered, which reduces the amount of time to
identify performance level. The test does not require speaking
and can be performed by individuals who are preliterate and illit-
erate. Items were recalibrated and final parameter estimates were
obtained after norming for optimal CAT administration. Total
administration time is approximately 5 minutes.
NIH-TB Pattern Comparison Processing Speed Test (Pro-
cessing Speed). This test is modeled after Salthouse’s Pattern Com-
parison Task,51an extensively researched assessment of choice
reaction time, easily adapted for computerized administration. Par-
(“Yes” button) or “not the same” (“No” button). Children younger
than 8 years indicate these choices with a “smiley” or “frowny” face
85 years. The NIH-TB Pattern Comparison Processing Speed Test
requires 3 minutes to administer and the score is the number of
correct items (of a possible 130) completed in 90 seconds.
Subjects. Thesample(N5 476)wasbasedonastratificationplan
to include adequate numbers of individuals within age bands, level
of education, and racial/ethnic backgrounds. A marketing research
firm assisted in recruitment of community-dwelling individuals.
Testing was completed at Northwestern University (Chicago)
and 5 additional sites: Emory University (Atlanta), the University
of Minnesota (Minneapolis), the University of Washington (Seat-
tle), NorthShore University HealthSystem (Evanston, IL), and
Kessler Foundation Research Center (West Orange, NJ). Eligible
participants were 3 to 85 years of age living in the community. See
table 1 for age, sex, race, and education strata. Not all ages were
sampled in this study. Education levels in the table are defined as
One-third of the sample was randomly selected to repeat testing
7 to 21 days later to assess test-retest reliability.
Analyses. This initial report includes results from analyses of test-
retest reliability, associations of test scores with age, and convergent
and discriminant construct validity. Age associations reflect the
validity of the tests for measuring cognitive development during
childhood and age-related cognitive decline during adulthood.
Convergent and discriminant validity results provide evidence that
the Cognition Battery is measuring the intended constructs.
were calculated separately for children and adults. Intraclass correla-
vergent validity was assessed by correlations between each NIH-TB
measure and a well-established “gold standard” measure of the same
construct; evidence of discriminant validity was assessed by correla-
tions with gold standards of a different cognitive construct. Gold
standard measures for each NIH-TB instrument are listed in table 2.
were unable to identify well-established measures to test convergent
or discriminant validity in this age group for the measures of
Neurology 80 (Suppl 3)March 12, 2013 S57
attention, episodic memory, EF, and PS. Thus, in this age group,
only convergent validity was measured between the Cognition Bat-
tery measures and a measure of general cognitive ability (i.e., “g”)
obtained by averaging z scores of the Wechsler Preschool and Pri-
mary Scale of Intelligence–3rd edition Block Design subtest52and
the Peabody Picture Vocabulary Test–4th edition.53More detailed
psychometric information on individual measures and on challenges
related to testing for construct validity in very young children is
detailed in Zelazo et al. (in press).9
RESULTS Test-retest reliability. Test-retest reliability
was strong for the entire sample and separately for
children (ages 3–15 years) and adults (ages 20–85
years; table 3). Intraclass correlation coefficients for
the entire sample on the NIH Toolbox measures
ranged from 0.78 for the Picture Sequence Memory
Test to 0.99 on the Oral Reading Recognition Test,
with most other values falling above 0.90.
Age effects. All cognitive abilities are expected to
improve during childhood, and most are expected
to show some age-related decline during adulthood,
with the exception of language skills and other
Table 1 Initial validation sample demographics (N 5 476)a
Age groups Education (self/parent)
Male FemaleWhiteBlack Hispanic/other/multiple
3–6 y, n 5 120
High school graduate292723 19 14
29 2426 16 11
8–15 y, n 5 88
High school graduate22 23181314
14 19167 10
20–60 y, n 5 159
,High school2226 2115 12
High school graduate 2931 2619 15
24 2724 1512
65–85 y, n 5 109
,High school9 119 101
High school graduate 12 27 26 112
aParental education was used for participants ages 3 to 15 years and participant education was used for adults (ages 201).
Table 2Convergent and discriminant validity (“gold standard”) measures used for ages 8 to 85 years
NIH Toolbox measureConvergent validity measureDiscriminant validity measure
WISC-IV/WAIS-IV Letter-Number Sequencing/Coding/Symbol
WISC-IV/WAIS-IV Letter-Number Sequencinga/PASAT averagePPVT-4
WISC-IV/WAIS-IV Coding/Symbol Search averagea
Abbreviations: BVMT-R 5 Brief Visuospatial Memory Test–revised; DCCS 5 Dimensional Change Card Sort; D-KEFS 5
Delis-Kaplan Executive Function System; EF 5 executive function; PASAT 5 Paced Auditory Serial Addition Test;
PPVT-4 5 Peabody Picture Vocabulary Test–4th edition; PSMT = Picture Sequence Memory Test; RAVLT 5 Rey Auditory
Verbal Learning Test; WAIS-IV 5 Wechsler Adult Intelligence Scale–4th edition; WISC-IV 5 Wechsler Intelligence Scale for
Children–4th edition; WRAT-4 5 Wide Range Achievement Test–4th edition.
aDepending on subject’s age.
S58Neurology 80 (Suppl 3)March 12, 2013
aspects of “crystallized” intelligence. Therefore, cor-
relations between age and NIH-TB test performance
were conducted separately for children (ages 3–15
years) and adults (ages 20–85 years). Table 4
presents these results.
All NIH-TB Cognition Battery measures showed
robust associations between test performance and age
in the child group (r 5 0.58–0.87), where scores
improved with age. With the exception of the language
measures (Vocabulary and Reading, r 5 0.15 and
20.02, respectively), age and test scores (r 5 20.46
to 20.65) on the remaining NIH-TB Cognition Bat-
tery measures were negatively associated, with lower
scores at higher age levels. Thus, on the Picture
Sequence Memory Test, performance improved during
childhood and early adolescence, with gradual decline
in scores across adult age ranges beginning in the 30s
(figure, A). In contrast, the NIH-TB Picture Vocabu-
lary Test showed gradual, linear improvement with age
until the mid 50s and then stabilized, whereas Oral
Reading Recognition showed a much sharper increase
until early grade-school years (age 7–8) and then fol-
lowed the same pattern of more gradual improvement
and then stability in the older age groups (figure, B).
Convergent and discriminant validity. In children from
3 to 6 years of age, all NIH-TB Cognition Battery mea-
sures were significantly correlated (ranging from r 5
0.54 to r 5 0.74) with our measure of general cognitive
ability (“g”), indicating that they are sensitive to a range
Table 5 shows results for convergent and discriminant
validity for ages 8 to 85 years. For all NIH-TB CB
instruments, correlations for convergent validity meas-
uresrangedfromr5 0.48tor5 0.93(allp, 0.0001),
suggesting that the NIH-TB measures are tapping the
desired constructs.Correlations for discriminantvalidity
measures ranged from r 5 0.05 to r 5 0.30, indicating
lack of, or relatively weak, relationship with measures
that tap different constructs.
DISCUSSION This article introduces the NIH Tool-
box Cognition Battery, a brief series of cognitive tests
for the purpose of supplementingmeasuresin epidemi-
ologic and longitudinal studies to constitute a “com-
mon currency” among researchers. The Cognition
Battery has 7 computerized instruments that measure
6 ability subdomains important for cognitive health
from the ages of 3 to 85 years. Data are presented for
208 normal children (age 3–15 years) and 268 normal
adults (age 20–85 years) on 3 important psychometric
characteristics: test-retest reliability, sensitivity to cog-
nitive growth during childhood and age-related decline
during adulthood, and construct validity.
Table 3 Test-retest reliability of instruments in the NIH-TB Cognition Battery
NIH-TB cognition subdomain/instrument
All Children aged 3–15 y Adults aged 20–85 y
No. ICC (95% CI) No.ICC (95% CI)No. ICC (95% CI)
125 0.96 (0.94, 0.97) 520.95 (0.92, 0.97) 73 0.80 (0.70, 0.87)
123 0.94 (0.92, 0.96)49 0.92 (0.86, 0.95) 74 0.88 (0.82, 0.92)
130 0.95 (0.93, 0.96)57 0.95 (0.92, 0.97)73 0.94 (0.91, 0.96)
1550.78 (0.71, 0.83) 660.76 (0.64, 0.84) 890.77 (0.67, 0.84)
Working Memory/List Sorting
1550.89 (0.85, 0.92) 660.87 (0.80, 0.92) 890.77 (0.67, 0.84)
Processing Speed/Pattern Comparison
1480.82 (0.76, 0.87) 59 0.84 (0.75, 0.90)89 0.72 (0.60, 0.81)
155 0.94 (0.92, 0.96) 660.84 (0.75, 0.90)89 0.81 (0.73, 0.87)
1540.99 (0.99, 0.99) 650.99 (0.98, 0.99)890.91 (0.87, 0.94)
Abbreviations: CI 5 confidence interval; DCCS 5 Dimensional Change Card Sort; ICC 5 intraclass correlation coefficient;
NIH-TB 5 NIH Toolbox for the Assessment of Neurological and Behavioral Function; PSMT = Picture Sequence Memory
Table 4Correlations between NIH-TB cognition
subdomain scores and age
Abbreviations: DCCS 5 Dimensional Change Card Sort;
NIH-TB 5 NIH Toolbox for the Assessment of Neurological
and Behavioral Function; r 5 Pearson correlation coefficient.
aCorrelations adjusted for education in adult group.
bp , 0.001.
Neurology 80 (Suppl 3)March 12, 2013S59
The subject sample for this study deliberately
emphasized representation of ethnic minorities (almost
50%) and the oldest and youngest groups (3–6 years
and 65–85 years; together, 48%) to ensure that the tests
would perform as needed in these important segments
of the population. Thus, the participants in this study
(or their parents, in the case of children) tended to be
rather highly educated, particularly in the youngest and
oldest groups (see table 1). A more representative
population-based sampling strategy was implemented
for the NIH-TB norming study.
Adequate test-retest reliability was considered essen-
tial for the NIH-TB Cognition Battery, particularly
because of its anticipated use in longitudinal studies.
The results suggest that test-retest reliability of all
NIH-TB measures is good to excellent across a large
ity than individual test scores, are being developed to
Figure Episodic memory scores vs language scores across age
(A) NIH Toolbox Picture Sequence Memory Test scores show improvement into early adulthood and then decline from the
50s on. Administration set size varied by age group as follows: ages 3 to 4 years, 6 pictures; 5 to 6 years, 9 pictures; 8
years, 12 pictures; 9 to 60 years, 15 pictures, and 65 to 85 years, 9 pictures. (B) Results for NIH Toolbox Reading and
Vocabulary scores (reported as a “theta,” or individual ability score, based on item response theory analyses) show improve-
ment sustained into adulthood. The data points in both A and B represent the mean score 6 standard error.
S60 Neurology 80 (Suppl 3)March 12, 2013
increase the potential for use of the battery in clinical
trials and other longitudinal research.
Evidence of test validity can take many forms, and
derives from both clinical and nonclinical subject sam-
performance with age in cognitively normal children
and adults. The Reading and Vocabulary scores showed
the expected associations with age, growing through
adolescence and stabilizing in older adulthood. Not sur-
prisingly, Reading showed an especially steep improve-
ment from age 3 to the early school years, when both
formal and informal educational experiences ideally pro-
mote such development. The Language subdomain
tests, as expected, are experience-based and peaked
somewhat later than measures of other cognitive subdo-
mains, and then remained relatively stable even into the
ninth decade of life. The nonlanguage subdomain tests
in the battery, in contrast, conformed to the pattern
expected with cognitive ability measures in that they
peaked in early adulthood and then declined in later
adulthood at different rates, depending on the measure.
Another validity measure we included in this report
expresses how well the tests in the battery measure the
intended constructs (convergent validity) as opposed to
different cognitive constructs (discriminant validity).
The “gold standard” tests related to the Cognition Bat-
tery instruments in the expected ways for participants
across a wide age band (ages 8–85 years), demonstrat-
ing both convergent and discriminant validity. Evalu-
ating construct validity in young children (ages 3–6
years) was challengingbecause of the absenceof specific
gold standard measures of targeted constructs appropri-
ate for these ages. (See Zelazo et al., in press,54for dis-
cussion.) The lack of such measures may reflect the fact
that different subdomains of cognition become more
differentiated with experience and development.55The
correlations between the NIH-TB measures and gen-
young children (ages 3–6 years), were high (0.54–
0.74), possibly supporting such a notion.
The NIH-TB Cognition Battery was designed as a
brief, diverse, accessible, and psychometrically sound
set of instruments that will be broadly applicable in
research studies of normal and abnormal groups across
a wide age range. The current results regarding age ef-
fects, test-retest reliability, and construct validity are
promising. The next phase of development established
normative standards for the NIH-TB measures using
a large, demographically diverse sample, including a
information will also be available about associations of
test performances with various aspects of everyday func-
tioning (e.g., school performance in the child sample)
and relationships with additional demographic charac-
teristics (educational level/socioeconomic status, sex,
and ethnicity). The NIH-TB Cognition Battery was
not developed as a clinical measure to either screen for
cognitive impairment or to substitute for a full, compe-
tent neuropsychological evaluation. However, future
another source of validation of the Cognition Battery as
a sound set of measures of a broad range of normal and
abnormal cognitive functioning, with implications for
brain health in large-scale research studies.
Sandra Weintraub: drafting/revising the manuscript, study concept or
design, analysis or interpretation of data, acquisition of data. Sureyya
Table 5 Convergent and discriminant validity of NIH-TB cognitive function battery measures
Ages 8–85 y Ages 8–85 y Ages 3–6 y
Correlation with average of WPPSI
Block Design and PPVT-4 z scores
No. of subjectsr No. of subjectsr No. of subjectsr
List Sorting/Working Memory
351 0.08 111 0.74b
Abbreviations: DCCS 5 Dimensional Change Card Sort; NIH-TB 5 NIH Toolbox for the Assessment of Neurological and
Behavioral Function; PPVT-4 5 Peabody Picture Vocabulary Test–4th edition; PSMT 5 Picture Sequence Memory Test; r 5
Pearson correlation coefficient; WPPSI 5 Wechsler Preschool and Primary Scale of Intelligence–3rd edition.
aGold standard measures used to test convergent and discriminant validity for ages 8 to 85 y appear in table 2.
bp , 0.001.
cp , 0.01.
dp , 0.05.
Neurology 80 (Suppl 3)March 12, 2013 S61
Dikmen: drafting/revising the manuscript, study concept or design, anal-
ysis or interpretation of data, statistical analysis, study supervision. Robert
Heaton: drafting/revising the manuscript, study concept or design, anal-
ysis or interpretation of data. David Tulsky: drafting/revising the manu-
script, study concept or design, analysis or interpretation of data,
acquisition of data, statistical analysis, study supervision. Philip Zelazo:
drafting/revising the manuscript, study concept or design, analysis or
interpretation of data. Patricia Bauer, Noelle Carlozzi, and Jerry Slotkin:
drafting/revising the manuscript, study concept or design, analysis or
interpretation of data, acquisition of data, study supervision. David Blitz:
analysis or interpretation of data, statistical analysis, study supervision.
Kathleen Wallner-Allen: drafting/revising the manuscript, study concept
or design, analysis or interpretation of data. Nathan Fox: drafting/revising
the manuscript, study concept or design, analysis or interpretation of
data, study supervision. Jennifer Beaumont: analysis or interpretation
of data, statistical analysis. Dan Mungas: drafting/revising the manu-
script, study concept or design, analysis or interpretation of data, statisti-
cal analysis. Cindy Nowinski: drafting/revising the manuscript, study
concept or design, study supervision. Jennifer Richler: drafting/revising
the manuscript, study concept or design, acquisition of data. Joanne Deo-
campo and Jacob Anderson: study concept or design, analysis or interpre-
tation of data, contribution of vital reagents/tools/patients, acquisition of
data, statistical analysis, study supervision. Jennifer Manly: drafting/revis-
ing the manuscript, acquisition of data. Beth Borosh: drafting/revising the
manuscript, study concept or design, analysis or interpretation of data,
acquisition of data, study supervision. Richard Havlik: drafting/revising
the manuscript, study concept or design, analysis or interpretation of
data, acquisition of data. Kevin Conway: drafting/revising the manu-
script, study concept or design, study supervision. Emmeline Edwards:
drafting/revising the manuscript, study concept or design, analysis or
interpretation of data, member of the NIH Project team managing the
Tool Box Contract that produced data for this manuscript. Lisa Freund:
study concept or design, study supervision, obtaining funding. Jonathan
King and Claudia Moy: drafting/revising the manuscript. Ellen Witt:
drafting/revising the manuscript, analysis or interpretation of data. Ri-
chard Gershon: drafting/revising the manuscript, study concept or design,
analysis or interpretation of data, acquisition of data, statistical analysis,
study supervision, obtaining funding.
The authors thank Abigail Sivan and Edmond Bedjeti (Northwestern Uni-
versity) for their valuable assistance in the validation phase of testing. The
authors also acknowledge the following individuals for their helpful con-
sultation during the development of the NIH Toolbox Cognition Battery:
Jean Berko Gleason (Boston University), Rachel Byrne (Kessler Founda-
tion), Gordon Chelune (University of Utah), Nancy Chiaravalotti (Kessler
Foundation), Dean Delis (University ofCalifornia, San Diego), Adele Dia-
mond (University of British Columbia), Roberta Golinkoff (University of
Delaware), Kathy Hirsh-Pasek (Temple University), Marilyn Jager Adams
(Brown University), Joel Kramer (University of California, San Francisco),
Joanie Machamer (University of Washington), Amanda O’Brien (Kessler
Foundation), Timothy Salthouse (University of Virginia), Jerry Sweet
(University of Chicago), Keith O. Yeates (Ohio State University), and
Frank Zelkoe (Northwestern University).
This study is funded in whole or in part with Federal funds from the
Blueprint for Neuroscience Research, NIH, under contract no. HHS-
S. Weintraub is funded by NIH grants R01DC008552, P30AG013854,
and the Ken and Ruth Davee Foundation, and conducts clinical neuro-
psychological evaluations (35% effort) for which her academic-based
practice clinic bills. S. Dikmen receives research grant funding from
NIH R01 NS058302 and R01HD061400, NIDRR H133A080035,
NIDRR H133G090022, and NIDRR, H133A980023, and DoD
R01MH073433, R01MH058076, R01MH078748, R01MH078737,
is fundedby NIH grants
U01MH083506, R01MH083552, and R01MH081861. D. Tulsky
isfundedby NIH contracts
H133G070138, B6237R, cooperative agreement U01AR057929, and
grant R01HD054659. He has received consultant fees from the Institute
for Rehabilitation and Research, Frazier Rehabilitation Institute/Jewish
Hospital, Craig Hospital, and Casa Colina Centers for Rehabilitation.
P. Zelazo serves on the editorial boards of Child Development, Develop-
ment and Psychopathology, Frontiers in Human Neuroscience, Cognitive
Development, Emotion, and Developmental Cognitive Neuroscience, and
Monographs of the Society for Research in Child Development. He is a
Senior Fellow of the Mind and Life Institute and President of the Jean
Piaget Society. He receives research funding from the Canadian Institute
for Health Research (grant 201963), NIDDK/NICHD (1699-662-
6312), and the Baumann Foundation. P. Bauer is funded by NIH grant
HD067359. N. Carlozzi is funded by NIH grant R03NS065194 and by
contracts H133B090024, B6237R, and H133G070138; she previously
received funding from NIH grant H133A070037-08A and a grant from
the NJ Department of Health and Senior Services. J. Slotkin, D. Blitz,
and K. Wallner-Allen report no disclosures. N. Fox is funded by NIH
grants R37HD017899, MH074454, U01MH080759, R01MH091363,
P50MH078105, and P01HD064653. He serves on the scientific board of
the National Scientific Council for the Developing Child. J. Beaumont served
as a consultant for NorthShore University HealthSystem, FACIT.org, and
Georgia Gastroenterology Group PC. She received funding for travel as an
invited speaker at the North American Neuroendocrine Tumor Symposium.
D. Mungas is funded by research grants from the National Institute on Aging
and a grant from the California Department of Public Health California
Alzheimer’s Disease Centers program. C. Nowinski receives or has received
research support from the NIH (contracts HHSN265200423601C,
HHSN260200600007C, and HHSN267200700027C), the Department of
Veteran’s Affairs, the Analysis Group, Novartis, and Teva Pharmaceuticals.
She has also received honoraria for writing and updating an article for Med-
link. J. Richler is funded by NIH/NCRR grant UL1RR025761. J. Deocampo
and J. Anderson report no disclosures. J. Manly is funded by NIH grants
R01AG028786 and R01AG037212; she previously received funding from
NIH grant R01AG016206 and a grant from the Alzheimer’s Association
(IIRG 05-14236). B. Borosh, R. Havlik, and K. Conway report no disclo-
sures. E. Edwards is the Director of the Division of Extramural Research at
NCCAM. Dr. Edwards declares that except for income received from her
primary employer, no financial support or compensation has been received
from any individual or corporate entity over the past 3 years for research or
professional service and there are no personal financial holdings that could be
perceived as constituting a potential conflict of interest. L. Freund reports no
disclosures. J. King is the NIA Project Scientist for the NIH cooperative
agreements U01AG014289, U01AG014276, U01AG14260, U01AG14282,
and U01AG014263 (comprising the ACTIVE clinical trial). C. Moy and
E. Witt report no disclosures. R. Gershon has received personal compensation
for activities as a speaker and consultant with Sylvan Learning, Rockman,
and the American Board of Podiatric Surgery. He has several grants awarded
by NIH: N01-AG-6-0007, 1U5AR057943-01, HHSN260200600007,
1U01DK082342-01, AG-260-06-01, HD05469; NINDS: U01 NS 056
975 02; NHLBI K23: K23HL085766 NIA; 1RC2AG036498-01;
NIDRR: H133B090024; OppNet: N01-AG-6-0007. Go to Neurology.
org for full disclosures.
The views and opinions expressed in this report are those of the authors
and should not be construed to represent the views of NIH or any of the
sponsoring organizations, agencies, or the U.S. government.
Received June 6, 2012. Accepted in final form November 15, 2012.
1. Diamond A. Normal development of prefrontal cortex from
birth to young adulthood: cognitive functions anatomy and
biochemistry. In: Stuss D, Knight B, editors. Principles of
Frontal Lobe Function. New York: Oxford University Press;
S62 Neurology 80 (Suppl 3) March 12, 2013
2. Diamond A. Frontal lobe involvement in cognitive changes
during the first year of life. In: Gibson KR, Petersen AC;
Council SSR, editors. Brain Maturation and Cognitive
Development: Comparative and Cross-cultural Perspectives.
New York: De Gruyter; 1991:127–180.
Zelazo PD, Carlson SM, Kesek A. Development of execu-
tive function in children. In: Nelson CA, Luciana M, edi-
tors. Handbook of Developmental Cognitive Neuroscience,
2nd ed. Cambridge, MA: MIT Press; 2008:553–574.
Daniels K, Toth J, Jacoby L. The aging of executive func-
tions. In: Bialystok E, Craik FIM, editors. Lifespan Cogni-
tion: Mechanisms of Change. New York: Oxford University
Zelazo PD, Craik FI, Booth L. Executive function across the
life span. Acta Psychol 2004;115:167–183.
Miyake A, Friedman NP, Emerson MJ, Witzki AH,
Howerter A, Wager TD. The unity and diversity of execu-
tive functions and their contributions to complex "Frontal
Lobe" tasks: a latent variable analysis. Cogn Psychol 2000;
Wiebe SA, Sheffield T, Nelson JM, Clark CAC, Chevalier N,
Espy K. The structure of executive function in 3-year-olds.
J Exp Child Psychol 2011;108:436–452.
Durston S, Davidson MC, Tottenham N, et al. A shift
from diffuse to focal cortical activity with development.
Dev Sci 2006;9:1–8.
Zelazo PD. The Dimensional Change Card Sort (DCCS):
a method of assessing executive function in children.
Nature Protocols 2006;1:297–301.
Weintraub S. Neuropsychological assessment of mental
state. In: Mesulam MM, editor. Principles of Cognitive
and Behavioral Neurology. New York: Oxford University
Swensen A, Birnbaum HG, Ben Hamadi R, Greenberg P,
Cremieux PY, Secnik K. Incidence and costs of accidents
among attention-deficit/hyperactivity disorder patients.
J Adolesc Health 2004;35:346.e341–346.e349.
Mesulam M. A cortical network for directed attention and
unilateral neglect. Ann Neurol 1981;10:309–325.
Posner MI, Petersen SE. The attention system of the
human brain. Annu Rev Neurosci 1990;13:25–42.
D’Esposito M, Aguirre GK, Zarahn E, Ballard D,
Shin RK, Lease J. Functional MRI studies of spatial
and nonspatial working memory. Cogn Brain Res
Conlin JA, Gathercole SE, Adams JW. Children’s working
memory: investigating performance limitations in complex
span tasks. J Exp Child Psychol 2005;90:303–317.
Salthouse TA, Meinz EJ. Aging, inhibition, working mem-
ory, and speed. J Gerontol B Psychol Sci Soc Sci 1995;50:
Hitch GJ, Towse JN, Hutton U. What limits children’s
working memory span? Theoretical accounts and applica-
tions for scholastic development. J Exp Psychol Gen 2001;
de Jong PF, Olson RK. Early predictors of letter knowl-
edge. J Exp Child Psychol 2004;88:254–273.
de Jong PF. Working memory deficits of reading disabled
children. J Exp Child Psychol 1998;70:75–96.
Riggs KJ, McTaggart J, Simpson A, Freeman RP. Changes
in the capacity of visual working memory in 5- to 10-year-
olds. J Exp Child Psychol 2006;95:18–26.
21.Salthouse TA, Coon VE. Interpretation of differential def-
icits: the case of aging and mental arithmetic. J Exp Psy-
chol Learn Mem Cogn 1994;20:1172–1182.
Corkin S, Amaral DG, Gonzalez RG, Johnson KA,
Hyman BT. H. M.’s medial temporal lobe lesion: findings
from magnetic resonance imaging. J Neurosci 1997;17:
Bauer PJ. Remembering the Times of Our Lives: Memory
in Infancy and Beyond. Mahwah, NJ: Lawrence Erlbaum
Perner J, Ruffman T. Episodic memory and autonoetic con-
sciousness: developmental evidence and a theory of child-
hood amnesia. J Exp Child Psychol 1995;59:516–548.
Gadian DG, Aicardi J, Watkins KE, Porter DA, Mishkin M,
Vargha-Khadem F. Developmental amnesia associated with
early hypoxic-ischaemic injury. Brain2000;123(pt 3):499–507.
Albert MS. Cognitive and neurobiologic markers of early
Alzheimer disease. Proc Natl Acad Sci USA 1996;93:
Mitchell AS, Dalrymple-Alford JC. Lateral and anterior
thalamic lesions impair independent memory systems.
Learn Mem 2006;13:388–396.
Price CJ. The anatomy of language: contributions from
functional neuroimaging. J Anat 2000;197(pt 3):335–359.
Schmidt FL, Hunter J. General mental ability in the world
of work: occupational attainment and job performance.
J Pers Soc Psychol 2004;86:162–173.
Grober E, Sliwinski M. Development and validation of a
model for estimating premorbid verbal intelligence in the
elderly. J Clin Exp Neuropsychol 1991;13:933–949.
Manly JJ, Byrd DA, Touradji P, Stern Y. Acculturation, read-
ing level, and neuropsychological test performance among Afri-
can American elders. Appl Neuropsychol 2004;11:37–46.
Manly JJ, Jacobs DM, Sano M, et al. Effect of literacy on
neuropsychological test performance in nondemented, educa-
tion-matched elders. J Int Neuropsychol Soc 1999;5:191–202.
Cattell RB. Intelligence: Its Structure, Growth and Action.
Amsterdam: Elsevier; 1987.
Kastner JW, May W, Hildman L. Relationship between
language skills and academic achievement in first grade.
Percept Mot Skills 2001;92:381–390.
Fenson L, Dale PS, Reznick JS, Bates E, Thal DJ,
Pethick SJ. Variability in early communicative develop-
ment. Monogr Soc Res Child Dev 1994;59:1–173.
Gleason JB, Ratner NB, editors. The Development of
Language, 7th ed. Boston: Pearson/Allyn & Bacon; 2009.
Hirsh-Pasek K, Golinkoff RM. The Origins of Grammar:
Evidence from Early Language Comprehension. Cam-
bridge, MA: M.I.T. Press; 1996.
Kalmar JH. Information processing speed in multiple scle-
rosis: a primary deficit? In: DeLuca J, Kalmar JH, editors.
Information Processing Speed in Clinical Populations.
New York: Taylor & Francis; 2007:153–172.
Fry AF, Hale S. Relationships among processing speed,
working memory and fluid intelligence in children. Biol
Salthouse TA. Aging and measures of processing speed.
Biol Psychol 2000;54:35–54.
Hawkins KA. Indicators of brain dysfunction derived from
graphic representations of the WAIS-III/WMS-III techni-
cal manual clinical samples data: a preliminary approach to
clinical utility. Clin Neuropsychologist 1998;12:535–551.
DeLuca J, Kalmar JH. Information Processing Speed in
Clinical Populations. London: Psychology Press; 2007.
Neurology 80 (Suppl 3)March 12, 2013 S63
43. Zelazo PD, Bauer PJ, Editors. National Institutes of
Health Toolbox—Cognitive Function Battery (NIH
Toolbox CFB): Validation for Children Between 3 and
15 Years. Monogr Soc Res Child Dev (in press).
Rueda MR, Fan J, McCandliss BD, et al. Development of
attentional networks in childhood. Neuropsychologia
Mungas D, Reed BR, Tomaszewski Farias S, DeCarli C.
Criterion-referenced validity of a neuropsychological test bat-
tery: equivalent performance in elderly Hispanics and non-
Hispanic Whites. J Int Neuropsychol Soc 2005;11:620–630.
Crane PK, Narasimhalu K, Gibbons LE, et al. Item
response theory facilitated cocalibrating cognitive tests
and reduced bias in estimated rates of decline. J Clin Epi-
Mungas D, Widaman KF, Reed BR, Tomaszewski Farias S.
Measurement invariance of neuropsychological tests in
diverse older persons. Neuropsychology 2011;25:260–269.
Bauer PJ. Long-term recall memory: behavioral and neuro-
developmental changes in the first two years of life. Curr
Dir Psychol Sci 2002;11:137–141.
49.Bauer PJ. Developments in declarative memory. Psychol
Bauer PJ. Constructing a past in infancy: a neuro-devel-
opmental account. Trends Cogn Sci 2006;10:175–181.
Salthouse TA, Babcock RL, Shaw RJ. Effects of adult age
on structural and operational capacities in working mem-
ory. Psychol Aging 1991;6:118–127.
Wechsler D. WPPSI-III Wechsler Preschool Primary Scale
of Intelligence, 3rd ed. San Antonio: Psychological Cor-
Dunn LM, Dunn LM. Peabody Picture Vocabulary Test
Fourth Edition (PPVT-IV). Bloomington, MN: NCS
Zelazo PD, Anderson JE, Richler J, Wallner-Allen K,
Beaumont JL, Weintraub S. NIH Toolbox cognitive func-
tion battery (CFB): measuring executive function and atten-
tion. Monogr Soc Res Child Dev (in press).
Johnson MH. Interactive specialization: a domain-general
framework for human functional brain development?
Developmental Cognitive Neuroscience 2011;1:7–21.
S64Neurology 80 (Suppl 3)March 12, 2013
DOI 10.1212/WNL.0b013e3182872ded Download full-text
Sandra Weintraub, Sureyya S. Dikmen, Robert K. Heaton, et al.
Cognition assessment using the NIH Toolbox
March 26, 2013 This information is current as of
Updated Information &
including high resolution figures, can be found at:
This article cites 40 articles, 6 of which can be accessed free at:
This article has been cited by 2 HighWire-hosted articles:
This article, along with others on similar topics, appears in the
Permissions & Licensing
tables) or in its entirety can be found online at:
Information about reproducing this article in parts (figures,
Information about ordering reprints can be found online: