Content uploaded by Laszlo A Erdodi
Author content
All content in this area was uploaded by Laszlo A Erdodi on Aug 21, 2019
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=hapc20
Download by: [University of Windsor], [Dr Laszlo A. Erdodi] Date: 07 July 2016, At: 06:40
Applied Neuropsychology: Child
ISSN: 2162-2965 (Print) 2162-2973 (Online) Journal homepage: http://www.tandfonline.com/loi/hapc20
Embedded validity indicators in Conners’ CPT-II: Do
adult cutoffs work the same way in children?
Laszlo A. Erdodi, Jonathan D. Lichtenstein, Jaspreet K. Rai & Lloyd Flaro
To cite this article: Laszlo A. Erdodi, Jonathan D. Lichtenstein, Jaspreet K. Rai & Lloyd Flaro
(2016): Embedded validity indicators in Conners’ CPT-II: Do adult cutoffs work the same way in
children?, Applied Neuropsychology: Child, DOI: 10.1080/21622965.2016.1198908
To link to this article: http://dx.doi.org/10.1080/21622965.2016.1198908
Published online: 06 Jul 2016.
Submit your article to this journal
View related articles
View Crossmark data
APPLIED NEUROPSYCHOLOGY: CHILD
http://dx.doi.org/10.1080/21622965.2016.1198908
Embedded validity indicators in Conners’ CPT-II: Do adult cutoffs work the same
way in children?
Laszlo A. Erdodia, Jonathan D. Lichtensteinb, Jaspreet K. Raia, and Lloyd Flaroc
aDepartment of Psychology, University of Windsor, Windsor, Ontario, Canada; bDepartment of Psychiatry, Geisel School of Medicine at
Dartmouth, Hanover, New Hampshire, USA; cPrivate Practice, Edmonton, Alberta, Canada
ABSTRACT
In previous research, several subscales of Conners’ CPT-II were found to be useful as performance
validity tests (PVTs) when administered to adults with traumatic brain injury (TBI). Furthermore,
invalid response sets were associated with inflated scores on several CPT-II scales. The present
study proposed to investigate whether these findings would replicate in a pediatric sample. The
analyses were based on archival data from 15 children with TBI. The Omissions, Hit RT,
Perseverations, and Hit RT BC scales proved effective at differentiating valid and invalid response
sets. However, Commission errors were unrelated to scores on PVTs. A composite measure based
on these four scores was a superior and more stable validity indicator than individual scales. Two or
more T-scores >65 on any of these scales resulted in acceptable overall specificity (.86–1.00) and
variable sensitivity (.00–1.00). Scores on CPT-II scales were generally higher among those who failed
the reference PVTs. Results suggest that embedded CPT-II validity indices developed in adult TBI
samples function similarly in children with TBI, with some notable exceptions. Although the use of
adult PVT cutoffs in pediatric assessment is a common practice, and broadly supported by the
present findings, there remains a clear need for the independent empirical validation of adult PVTs
in children.
KEYWORDS
Conners’ CPT-II; pediatric
neuropsychology;
performance validity
assessment; traumatic brain
injury
Until recently, research on test taking effort in
neuropsychological assessment had been relegated to
adults. However, within the last decade, there has been
an increased interest in the use of performance validity
tests (PVTs) with pediatric populations (Baker,
Connery, Kirk, & Kirkwood, 2014). This area of
investigation appears especially worthwhile in certain
diagnostic groups. For example, in one study, 17% of
a sample of children with mild traumatic brain injury
(mTBI) produced noncredible response sets (Kirkwood
& Kirk, 2010). Such findings have clinical significance as
children who fail PVTs have been found to perform
more poorly on cognitive testing as compared to
children who demonstrate adequate effort (Kirkwood,
Yeates, Randolph, & Kirk, 2012).
In the context of clinical evaluation, embedded
validity indicators (EVIs) hold considerable advantage
over stand-alone PVTs. Even though individually, their
signal detection performance is generally inferior to
stand-alone PVTs, EVIs provide a valuable alternative
method of assessing test-taking effort. First, because
they utilize data that are already collected for clinical
purpose, EVIs are more cost effective and expedient.
They also tend to be more resistant to coaching, as they
cannot be easily identified as validity measures (Miele,
Gunner, Lynch, & McCaffrey, 2010). Finally, they allow
for continuous monitoring of cognitive effort in
neuropsychological evaluation, a practice that is consist-
ent with the highest forensic standards (Boone, 2009).
Although stand-alone PVTs remain the gold standard
for assessing performance validity in cognitive testing,
there is a growing body of evidence suggesting that
the aggregation of multiple EVIs into a single composite
produces results similar to those of stand-alone
PVTs (Erdodi & Roth, 2016; Erdodi, Roth, Kirsch,
Lajiness-O’Neill, & Medoff, 2014). Such findings, which
demonstrate the potential of EVIs to serve as effective
PVTs, are encouraging and warrant further research.
Despite their being an attractive tool to the scientist-
practitioner, research on EVIs in pediatrics has lagged
behind that of their adult counterparts. Moreover, most
studies investigating the effect of effort on assessment
outcome in children rely on instruments modeled after
adult PVTs, and few have provided cutoff scores specific
to pediatric populations. Nevertheless, empirical
support for the use of adult PVTs in pediatrics
CONTACT Laszlo A. Erdodi lerdodi@gmail.com Department of Psychology, University of Windsor, 168 Chrysler Hall South, 401 Sunset Avenue, Windsor,
ON N9B 3P4, Canada.
© 2016 Taylor & Francis
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
(Constantinou & McCaffrey, 2003; Donders, 2005;
Green & Flaro, 2003; Green, Flaro, Brockhaus, &
Montijo, 2012; MacAllister, Nakhutina, Bender,
Karantzoulis, & Carlson, 2009; Nagle, Everhart,
Durham, McCammon, & Walker, 2006) has been
accumulating in the research literature.
On the other hand, there is evidence that the appli-
cation of adult cutoffs to pediatric assessment may not
always be appropriate (Constantinou & McCaffrey,
2003). Such findings may be due, at least in part, to
the practice of extending existing measures to a new
population rather than designing new measures that
meet the unique challenges of assessing any given
group. The downward extension of adult assessment
tools for use with children is common in neuropsychol-
ogy, although perhaps not optimal and certainly not
without limitations. However, the fact that EVIs were
originally conceived as dynamic models (Babikian &
Boone, 2007; Boone, 2013; Larrabee, 2005; Lu, Rogers,
& Boone, 2007; Sweet & Nelson, 2007) places them in
a better position, as compared with stand-alone PVTs,
to respond to the unique challenges of pediatric neurop-
sychology and provide the flexibility to accommodate
population-specific idiosyncrasies (developmental
trajectories, higher volatility of the target construct,
increased reactivity to variables that are extraneous to
the assessment process).
Conners’ Continuous Performance Test, Second
Edition (CPT-II) was designed to quantify various
aspects of an individual’s ability to focus and sustain
attention during a basic vigilance task, and is widely
used in attention deficit/hyperactivity disorder (ADHD)
research and clinical assessments for respondents aged
6 or older (Conners, 2004). Although the instrument
was primarily developed to detect ADHD, certain subt-
ests have shown promise as EVIs. For example, Suhr,
Hammers, Dobbins-Buckland, Zimak, and Hughes
(2008) reported that individuals who failed the Word
Memory Test (WMT; Green, 2003) were also more
impaired on the CPT-II. Similarly, Marshall et al.
(2010) found that the Omissions (OMI) and Commis-
sions (COM) subtests performed well (sensitivity [SENS]
.04–.57; specificity [SPEC] .87–1.00) against established
PVTs in 268 adults referred for ADHD assessment. In
a sample of 82 adults with TBI, Ord, Boettcher, Greve,
and Bianchini (2010) found that the OMI and Hit Reac-
tion Time Standard Errors (HRT-SE) were successful at
identifying invalid response sets with a SPEC of .95
and SENS of .30–.44. In addition to OMI and COM,
Lange et al. (2013) identified Perseverations (PER) as
effective at separating valid and invalid response sets
(operationalized as failing the WMT) in a military sample
of 158 individuals assessed for TBI. Results are mixed on
HRT: Ord et al. (2010) found a significant difference on
this scale between valid and invalid response sets, while
Lange et al. (2013) did not.
More recently, Erdodi et al. (2014) added the
Variability (VAR) subtest to the list of potential EVIs
in the CPT-II. In their sample of 104 adults with
TBI, at cutoffs with SPEC .90, five EVIs produced
SENS values hovering around .50. Interestingly, the
subsample that passed reference PVTs (refPVT) per-
formed within normal limits on all 12 scales, while
those who failed refPVTs produced mean scores in
the clinical range (T >60) on six scales, suggesting that
elevations on the CPT-II following TBI are more likely
to reflect invalid responding than acquired attention
deficit.
Despite growing empirical support for the CPT-II
based EVIs in adult TBI, to our knowledge, no research
has been published on the topic in pediatric popula-
tions. The present study was conducted to investigate
whether the patterns of findings observed in adults with
TBI would replicate in a pediatric sample. Specifically,
we were interested in the signal detection profile of EVIs
within the CPT-II in children with TBI and whether the
link between impaired scores on the clinical scales and
invalid performance would replicate in a pediatric
sample.
Method
Participants
The study was based on archival data from 15 children
(53%male, M
Age
¼12.6, SD
Age
¼3.2) referred for neu-
ropsychological assessment following TBI to a private
practice in a Canadian metropolitan area. Two-thirds
of the sample had a comorbid psychiatric diagnosis:
ADHD (n ¼4), learning disability (n ¼3), conduct dis-
order (n ¼1), drug and alcohol abuse (n ¼1), and
somatization disorder (n ¼1).
Overall intellectual functioning fell in the borderline
range (M
FSIQ
¼76.9, SD
FSIQ
¼12.5), while performance
on a picture vocabulary test was average (M
PPVT-3
¼
95.5, SD
PPVT-3
¼16.1). A similar discrepancy between
verbal vs. perceptual, and receptive vs. expressive abili-
ties was also noted within the WISC-III (M
VIQ
¼79.5,
SD
VIQ
¼11.8; M
PIQ
¼84.1, SD
PIQ
¼16.2). The mean
reading level in the sample was equivalent to grade 5.3
(SD ¼3.3), which is about two years below what would
be expected based on chronological age. On the Rey
Complex Figure Test, performance was borderline to
low average [M
IR
¼35.2, SD
IR
¼12.2; M
DR
¼35.8,
SD
DR
¼13.8; M
REC
¼41.4, SD
REC
¼15.5 (T-scores)].
Likewise, on average, the sample performed broadly
2 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
within normal limits on the Wisconsin Card Sorting
Test (WCST; M
CATcomp
¼4.3, SD
CATcomp
¼2.1; M
FMS
¼
1.7, SD
FMS
¼1.7).
With respect to executive functioning, caregiver
ratings on the BRIEF produced mean scores in the
clinically significant range (T 65) on all subscales.
The highest elevation was on the general composite
(M
GEC
¼70.1, SD
GEC
¼16.3). On the PAI-A, the only
mean scores that reached a mild elevation (T 60)
were the Depression (M
DEP
¼61.5, SD
DEP
¼12.5) and
Schizophrenia (M
SCZ
¼62.4, SD
SCZ
¼7.2) subscales.
Materials
A core battery of standard neuropsychological tests
was administered to all participants, along with a
set of PVTs and rating scales (Table 1). The WMT,
Medical Symptom Validity Test (MSVT; Green,
2004) and Non-Verbal Medical Symptom Validity Test
(NV-MSVT; Green, 2008) were used, at the standard
dichotomous (Pass/Fail) cutoffs, as the free-standing
refPVT. The embedded PVTs included the Trail
Making Test B/A ratio (TMT
B/A
), at the cutoff pub-
lished by Iverson, Lange, Green, and Franzen (2002),
and the logistic regression equation (LRE) developed
by Suhr and Boyer (1999), which uses variables from
the WCST (Heaton, Chelune, Talley, Kay, & Curtiss,
1993). Given that the Suhr-Boyer LRE (S-B
LRE
) was
calibrated on adults, a more conservative cutoff
[1.69, associated P(invalid profile) .80] was used
in the present sample to protect against false positives.
Although equating a score in the failing range on any
given PVT with a globally invalid cognitive profile
remains an epistemologically contentious practice
(Bigler, 2012, 2015), for purely practical reasons, per-
formance validity was operationalized as the outcome
of these individual refPVT (Pass/Fail) in the present
study. Data (T-scores) were only available on five
CPT-II clinical scales: OMI, COM, HRT, PER, and
HRT Block Change (HRT-BC).
Data analysis
Descriptive statistics were computed for the main
variables of interest. Between-group contrasts on con-
tinuous variables were computed using independent
t tests. The statistical significance of the difference in
frequency distribution was assessed using v
2
. SENS
and SPEC of the target CPT-II variables against estab-
lished PVTs were computed using standard formulas.
Area under the curve was not reported, given the recent
controversy around its validity as a single-number
summary of overall classification accuracy (Hanczar
et al, 2010; Wald & Bestwick, 2014).
SENS is the proportion of invalid response sets cor-
rectly identified as such, or true positive rate. In con-
trast, SPEC is the proportion of correctly classified
valid response sets or true negative rate. In PVT
research, SPEC is the key parameter, given the impera-
tive to protect against false positive errors (incorrectly
labeling a valid response set as invalid). As a bench-
mark, .90 is considered the lower threshold for desirable
SPEC (Boone, 2013), with .84 being the lowest accept-
able value (Larrabee, 2003).
Results
Base rates of failure (BR
Fail
) on refPVTs ranged from
8.3%(MSVT and TMT
B/A
) to 30.0%(WMT). Given pre-
vious reports that reading level might be a confound in
the application of adult PVTs to pediatric populations
(Constantinou & McCaffrey, 2003), this link was explicitly
investigated. There was minimal overlap between reading
skills <3rd grade level and BR
Fail
. Only one child with a
reading level lower than 3rd grade failed the S-B
LRE
. All
children failing other refPVTs had higher reading levels.
The relationship between PVT failures and key
sample characteristics was evaluated to rule out
potential background variables that might account for
performance validity (Table 2). Gender was unrelated
to passing or failing any of the refPVTs. Similarly, no
age effects emerged as a function of Pass/Fail status
Table 1. List of tests administered.
Test Abbreviation Reference
Behavior Rating Inventory of Executive Function BRIEF Gioia, Isquith, Guy, and Kenworthy (2000)
Conners’ Continuous Performance Test, 2nd edition CPT-II Conners (2004)
Medical Symptom Validity Test MSVT Green (2004)
Non-Verbal Medical Symptom Validity Test NV-MSVT Green (2008)
Peabody Picture Vocabulary Test, 3rd edition PPVT-3 Dunn and Dunn (1997)
Personality Assessment Inventory - Adolescent PAI-A Morey (1991, 2007)
Rey Complex Figure Test RCFT Meyers and Meyers (1995)
Trail Making Test TMT A & B Reitan and Wolfson (1985)
Wechsler Intelligence Scale for Children, 3rd edition WISC-III Wechsler (1991)
Wide Range Achievement Test, 3rd edition WRAT-III Wilkinson (1993)
Wisconsin Card Sorting Test WCST Heaton, Chelune, Talley, Kay, and Curtiss (1993)
Word Memory Test WMT Green (2003)
APPLIED NEUROPSYCHOLOGY: CHILD 3
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
on refPVTs with the exception of TMT
B/A
. Caregiver rat-
ings of overall executive function as captured on the
BRIEF did not differ between children who passed and
those who failed refPVTs. Finally, with the exception of
TMT
B/A
, presence of psychiatric comorbidity was unre-
lated to Pass/Fail status on refPVTs. Given the low BR
Fail
on TMT
B/A
in combination with the atypical inner logic
behind this PVT (“abnormal pattern of impairment”),
the isolated age and comorbidity effects observed on it
might reflect an instrumentation artifact.
Of the CPT-II scales, OMI >65 achieved acceptable
SPEC against most refPVTs, except the MSVT (.82)
and TMT
B/A
(.80). The only conceivable cutoff on
COM was >60, as no participant scored 65. As a
result, the COM >60 cutoff produced unacceptably
low SPEC (.63–.75) against all refPVTs. HRT >65
cleared the minimum SPEC benchmark against all
refPVTs, with extreme fluctuations in SENS (.00–
1.00). PER >65 produced acceptable SPEC against all
refPVTs except the MSVT (.73), but variable SENS
(.00–.50). Finally, HRT-BC >65 produced uniformly
good SPEC (.88–1.00), in the backdrop of fluctuating
SENS (.00–.40). Further details are displayed in Table 3.
Based on these findings, four of the CPT-II scales
that produced acceptable signal detection profiles were
aggregated into a single composite labeled CPT-II
Validity Indicator (CVI-4). Each CPT-II scale was
recoded into a four-point scale (0–3). The clearly valid
range was assigned the value of zero (PASS). The first
level of suspect performance was assigned the value of
one (Borderline), the next level of invalid performance
was assigned the value of two (Fail), while the most
conservative (low SENS, high SPEC) cutoff was assigned
the value of three (FAIL), following the methodology
described by Erdodi, Abeare, et al. (2016); Erdodi,
Tyson, et al., 2016). The BR
Fail
on the most liberal cutoff
(Borderline; high SENS, low SPEC) ranged from 20%to
33%, which is broadly consistent with the values pro-
duced by the stand-alone refPVTs. The details of the
re-scaling procedure are displayed in Table 4.
Next, the distribution of the CVI-4 scores was exam-
ined and classification ranges were established. The
majority of the sample (8/15 or 53.3%) had a score of
zero, which means that they passed the most liberal cut-
off on all four components of the CVI-4. Thus, this
Table 2. Differences in age, BRIEF GEC T-scores, and rate of
psychiatric comorbidity as a function of passing of failing PVTs.
WMT MSVT
NV-MSVT
TMT
B/A
S-B
LRE
A 1 A 2
Age (Years)
Pass
M 14.9 12.8 13.8 13.5 14.2 13.6
SD 1.9 3.3 2.9 2.9 2.5 3.0
Fail
M 13.7 14.0 14.0 15.0 10.0 12.6
SD 1.5 0.0 0.0 1.4 1.4 3.0
BRIEF GEC (T-score)
Pass
M 72.5 74.9 72.5 73.0 71.8 68.3
SD 22.0 15.6 16.7 18.0 18.0 20.8
Fail
M 74.5 80.0 80.0 74.0 67.5 71.5
SD 7.8 0.0 0.0 7.8 13.4 9.0
%PSY-COM
Pass 85.7 63.6 77.8 75.0 90.0 75.0
Fail 66.7 100.0 100.0 100.0 0.0 80.0
Note. BRIEF GEC ¼Behavior Rating Inventory of Executive Function Global
Executive Composite; % PSY-COM ¼Percentage of the sample with co-
morbid psychiatric diagnoses; WMT ¼Word Memory Test (Standard
Cutoffs); MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-
MSVT ¼Non-Verbal MSVT (Standard Cutoffs); TMT
B/A
¼Trail Making
Test, B/A ratio (cutoff <1.50; Iverson et al., 2002); S-B
LRE
: Logistic
regression equation developed by Suhr and Boyer (1999) using
Wisconsin Card Sorting Test variables [cutoff 1.69; P(invalid) .80].
The only contrasts that reached statistical significance (p <.05) were age
and %PSY-COM on TMT
B/A
.
Table 3. The signal detection properties of the three embedded CPT-II validity indicators against reference PVTs.
CPT-II
EVIs Cutoff BR
Fail
WMT MSVT
NV-MSVT
TMT
B/A
S-B
LRE
A 1 A 2
30% 8% 10% 20% 17% 23%
OMI >65 20% SENS .67 1.00 1.00 .50 .00 .80
SPEC .86 .82 .89 .88 .80 1.00
COM >60 27% SENS .00 .00 .00 .50 .50 .00
SPEC .71 .64 .67 .75 .70 .63
HRT >65 20% SENS .33 1.00 1.00 .50 .00 .60
SPEC .86 .91 1.00 1.00 .90 1.00
PER >65 23% SENS .00 .00 – – .50 .40
SPEC .86 .73 – – 1.00 1.00
HRT-BC >65 13% SENS .33 .00 .00 .00 .00 .40
SPEC 1.00 .91 .89 .88 .90 1.00
CVI-4 7 20% SENS .50 – – .00 1.00 .75
SPEC .86 – – .88 .89 1.00
Note. CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; EVI ¼Embedded validity indicator; BR
Fail
¼Base rate of failure (%scoring below the cutoff);
OMI ¼Omissions; COM ¼Commissions; HRT ¼Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time Block Change; CVI-4 ¼Validity
composite based on OMI, HRT, PER, and HRT-BC cutoffs; SENS ¼sensitivity; SPEC ¼specificity; WMT ¼Word Memory Test (Standard Cutoffs);
MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-MSVT ¼Non-Verbal MSVT (Standard Cutoffs); TMT
B/A
¼Trail Making Test, B/A ratio; cutoff
<1.50 (Iverson et al., 2002); S-B
LRE
1.69 [P(invalid) .80]: logistic regression equation developed by Suhr and Boyer (1999) using Wisconsin Card
Sorting Test variables.
4 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
range was labeled an unequivocal PASS. The next
observed value of CVI-4 was 2, which could either mean
one score at the second level of CVI-4 or two scores at
the first level. Neither combination provides sufficient
evidence of invalid performance, so this range was also
labeled Pass. The next highest value of CVI-4 was 3,
which could mean one score at the most conservative
cutoff, three at the most liberal cutoff, or a combination
of one score at the CVI-4 level of one and one at level
two. While this range of performance is suspect, it also
does not provide incontrovertible evidence for invalid
performance. As such, it was labeled Borderline. The
next highest value of 7, however, is extreme enough to
consider an unequivocal FAIL. Only 3/15 (20%)
performed in this range or above.
Finally, the effect of performance validity on the five
CPT-II scales was examined as a function of Pass/Fail
status on the six refPVTs, creating a 6 6 matrix. In
about half of the cases, those who failed refPVTs also
produced means in the clinical range (T >60), while
those who passed refPVTs performed in the nonclinical
range. In one case, this trend was reversed (PER >65
against MSVT). In the remaining cases the means were
within normal limits regardless of Pass/Fail status.
Overall, six of the contrasts (20%) reached statistical sig-
nificance. Table 5 displays the details of these analyses.
Discussion
This study examined the classification accuracy of the
adult cutoffs on the CPT-II based EVIs in a sample of
children with TBI against six refPVTs to control for
instrumentation artifacts in the criterion measure. Prior
to performing the signal detection analyses, the poten-
tial confounding effects of reading level, age, gender,
executive functioning and psychiatric comorbidity were
independently assessed and largely ruled out. Overall,
results suggest that obtaining two T-scores >65 on
select CPT-II scales raises concerns about the validity
of a given response set.
Of the five potential EVIs examined, OMI >65 pro-
duced the single best signal detection profile, resulting
in a good combination of SENS and SPEC against four
refPVTs. This is broadly consistent with research on
adults (Erdodi et al., 2014; Lange et al., 2013; Ord
et al., 2010), although some authors (Marshall et al.,
2010) recommended more conservative cutoffs (>80).
HRT produced a good combination of SENS and SPEC
against five refPVTs, but zero SENS against the sixth.
Similarly, while HRT-BC had a perfect SPEC against
two refPVTs, it also produced zero SENS on four occa-
sions. Although it produced a good combination of
SENS and SPEC against two refPVTs, overall, PER
>65 resulted in zero SENS on two occasions, rendering
its ability to reliably differentiate valid from invalid
response sets in children questionable. This finding
recapitulates the inconsistent results on PER as an
EVI in adults. Some studies found perseverative errors
Table 4. Cumulative base rates (BR) of failure on four CPT-II
scales across levels of CVI-4.
CPT-II
Levels of CVI-4
PASS Borderline Fail FAIL
Scales 0 1 2 3
OMI <60 60 65 70
Base Rate 73.3% 26.7% 26.7% 20.0%
HRT <60 60 65 70
Base Rate 73.3% 26.7% 20.0% 13.3%
PER <65 65 80 90
Base Rate 66.7% 33.3% 20.0% 13.3%
HRT-BC <60 60 65 70
Base Rate 80.0% 20.0% 20.0% 13.3%
Note. CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; CVI-4 ¼
CPT-II Validity Indicator; OMI ¼Omissions; COM ¼Commissions; HRT ¼
Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time
Block Change (T-scores).
Table 5. Mean CPT-II OMI, COM, HRT, PER, and HRT-BC T-scores as a function of pass/fail status on reference PVTs.
OMI COM HRT PER HRT-BC CVI-4
WMT Pass 53.3 49.8 57.0 53.9 49.8* 1.3
Fail 67.1 45.9 59.0 55.6 62.3 3.3
MSVT Pass 57.7 51.4 55.2 72.1 52.3 2.0
Fail 60.1 49.0 67.8 58.6 55.4 3.0
NV-MSVT A1 Pass 55.5 50.5 53.5** 52.3 55.7 1.0
Fail 60.1 49.0 67.8 58.6 55.4 3.0
NV-MSVT A2 Pass 55.1 49.1 53.3 50.1** 55.0 0.9
Fail 59.3 55.4 61.7 64.5 58.5 2.5
TMT
B/A
Pass 55.6 50.0 55.6 52.7* 58.9 1.2
Fail 47.9 61.4 46.7 123.2 45.2 1.5
S-B
LRE
Pass 56.9 52.1 55.4** 55.0* 54.1 0.3**
Fail 61.9 46.4 69.3 96.8 66.8 5.8
Note. Statistical significance was determined using independent t-tests; CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; CVI-4 ¼CPT-II Validity
Indicator; OMI ¼Omissions; COM ¼Commissions; HRT ¼Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time Block Change WMT ¼Word
Memory Test (Standard Cutoffs); MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-MSVT ¼Non-Verbal MSVT (Standard Cutoffs);
TMT
B/A
¼Trail Making Test, B/A ratio (cutoff <1.50; Iverson et al., 2002); S-B
LRE
: Logistic regression equation developed by Suhr and Boyer (1999) using
Wisconsin Card Sorting Test variables [cutoff 1.69; P(invalid) .80].
*p <.05 (one-tailed). **p <.01 (one-tailed).
APPLIED NEUROPSYCHOLOGY: CHILD 5
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
on the CPT-II to be effective at separating valid and
invalid response sets (Erdodi et al., 2014; Lange et al.,
2013), whereas others produced inconclusive results
(Ord et al., 2010).
COM was a notable outlier—both within this study
and in comparison to the adult literature (Erdodi
et al., 2014; Lange et al., 2013; Marshall et al., 2010;
Ord et al., 2010). Its only conceivable cutoff (>60) con-
sistently produced SPEC values .75, which imply an
unacceptably high false positive rate. In addition,
COM >60 resulted in zero SENS against four refPVTs.
Finally, the CVI-4, a composite EVI, produced the most
efficient and stable estimate of performance on the
criterion measures. The advantage of the CVI-4 over
individual EVIs in signal detection likely stems from
combining the diagnostic power of several indicators.
Its differential success rate is predicted by the central
limit theorem, and exemplifies the methodological
superiority of multivariate models in performance
validity assessment. As such, the results of the present
study are consistent with previous research on CPT-
II based EVIs in adult samples (Erdodi et al., 2014;
Lange et al., 2013).
Even so, the CVI-4 demonstrated extreme fluctua-
tions in SENS, ranging from .00 to 1.00. SPEC values
were less variable, ranging from .80 to 1.00. The
notable discrepancy between SENS and SPEC is likely
caused by a number of factors: the natural variability
in effort across PVTs; small sample size; low BR
Fail
on both EVIs and refPVTs; and a deliberate attempt
to protect against false positives at the expense of
higher false negative rate. The fact that Marshall
et al. (2010) observed a similarly wide range of SENS
despite a vastly larger sample lends support to the first
two explanations.
Careful examination of Table 3 also suggests that sig-
nal detection profiles may be domain-specific. In other
words, they could be partially driven by the inherent
variability in the stimulus properties of the refPVTs.
The lowest classification accuracy among EVIs was
observed against the TMT
B/A
, while the best combina-
tions of SENS and SPEC occurred on the S-B
LRE
. This
heterogeneity in signal detection performance may be
another contributing factor to the fluctuation in SENS
and SPEC previously described.
Two of the individual CPT-II based EVIs developed
in adult samples (OMI and PER) remained effective
when applied to pediatric TBI. However, COM had a
consistently poor performance against all refPVTs.
Hence, it does not appear to be useful in dissociating
impairment from effort in children. HRT and HRT-
BC performed reasonably well, suggesting that they
may warrant further research in larger samples.
This pattern of findings fits the clinical interpretation
of the underlying constructs that the EVIs of interest are
designed to measure and the larger context of perfor-
mance validity assessment. Of the four successful
CPT-II based EVIs examined in the present study, the
OMI scale can be conceptualized as the purest measure
of task engagement as it only requires the willingness to
respond to the abundant and salient targets (Erdodi
et al., 2014). Thus, inflated error rates likely reflect poor
compliance with simple instructions. The manual itself
identifies extreme scores on this scale as potential
indication of invalid responding (Conners, 2004).
Similarly, HRT and its temporal derivative, HRT-BC,
are measures of motor speed, a basic construct that is
under conscious control. An overall RT that is unusually
slow or is becoming unusually slower over time could
indicate failure to put forth full effort throughout the
task or a gradual disengagement from the testing
process. Since these two scales may potentially capture
distinct forms of invalid responding, a larger scale study
exploring their divergent validity would be a worthwhile
pursuit in subtyping performance validity and, in com-
bination with more opaque scales such as HRT-SE and
VAR, modeling intent, a highly elusive construct in
signal detection analyses (Boone, 2013; Delis & Wetter,
2007; Frederick, Crosby, & Wynkoop, 2000; Frederick &
Bowden, 2009).
On the other hand, the COM scale is a measure of
the examinee’s ability to inhibit a basic response that
quickly becomes automatic as the test progresses.
Because it requires higher-order cognitive abilities that
are still developing and maturing in youth (Baron,
2004), COM errors may be equally likely in children
with full and questionable effort, and therefore they
may not differentiate between them. Given the consist-
ently poor signal detection performance of COM, it
should not be used as an EVI in children until further
research suggests otherwise.
Lastly, even though Conners (2004) endorsed
extreme elevations on the PER scale as a measure of
performance validity, perseverative errors, as defined
on the CPT-II, are multifactorial. At face value, they
are equally likely to indicate slow responses to the pre-
vious targets, repeat responses to the same target, or
random responses, resulting in an unstable validity
index in pediatric populations. This would explain
why PER >65 behaved so inconsistently against
refPVTs. Nevertheless, this scale showed enough
promise to warrant further investigation in pediatrics.
Using Pass/Fail status on the refPVTs as independent
variables and the CPT-II scales as dependent variables
produced a pattern of findings that was consistent with
the results previously described, suggesting that the
6 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
CPT-II is sensitive to poor test taking effort. On one
hand, this makes some of its scales good candidates to
become EVIs. On the other hand, it provides a note
of caution to assessors that an elevated score on the
CPT-II could mean impaired sustained visual attention
as well as poor test taking effort. Given the far reaching
implications of this differential diagnosis, scores in the
clinical range should be interpreted carefully and the
alternative conceptualization of invalid responding
should be considered (Boone, 2013).
Since adequate effort during neuropsychological
testing is often assumed rather than formally assessed,
this practice is concerning as it may enable a powerful
confounding variable to contaminate the measurement
model. Even within the present sample, there was a
notable difference between overall intellectual func-
tioning and receptive vocabulary. There is no obvious
clinical reason for such a wide discrepancy (d ¼1.08).
However, if the BR
Fail
on various established and
experimental PVTs (10–30%) is taken into account,
that could provide a tentative explanation of this other-
wise puzzling finding.
Procedurally, the FSIQ is comprised of a series of
tests, many of which require considerable mental effort
to demonstrate maximal ability level. In contrast, the
picture vocabulary test involves minimal response from
the examinee (pointing), who is being presented with
constantly changing, colorful, and pleasant visual stim-
uli. Thus, while children with poor effort are expected to
perform below their true ability on the more challenging
IQ test, they may produce valid response sets on the less
demanding picture vocabulary test. In other words, the
construct of “cognitive effort” itself may change across
different tests, potentially requiring an instrument-
specific definition of “performance validity” (Bigler,
2015; Boone, 2013; Frederick et al., 2000; Frederick &
Bowden, 2009).
The overall findings are largely consistent with
existing literature on EVIs in the CPT-II, with the
notable exception that in the present sample COM
was not associated with performance on formal mea-
sures of test taking effort (although this was at least
partly due to restricted range). The most obvious limi-
tation of the study is the small sample size, which
may render the parameter estimates unstable. There-
fore, replication using a larger sample and different
refPVTs is needed before the results can be incorporated
into routine clinical decision making. Even though, con-
sistent with previous reports (Lichtenstein, Erdodi, &
Linnea, 2016), comorbid psychiatric and neurodevelop-
mental conditions were unrelated to BR
Fail
on refPVTs
or the newly developed EVIs within this sample, the
cumulative effect of multiple disorders as a confound
in pediatric PVT research should continue to be
monitored. Finally, given that estimates of signal
detection parameters (SENS, SPEC) are dependent on
BR
Fail
, (Baldessarini, Finklestein, & Arana, 1983;
Grimes & Schultz, 2005), the values obtained in this
study likely reflects the failure rates observed in our
sample, and may not generalize to population with
vastly different BR
Fail
. All things considered, the con-
sistency of the findings across instruments and
methods, even within a limited sample, is encouraging
and suggests that the topic is worth pursuing in future
research.
References
Babikian, T., & Boone, K. B. (2007). Intelligence tests as
measures of effort. In K. B. Boone (Ed.), Assessment of
feigned cognitive impairment (pp. 103–127). New York,
NY: Guilford.
Baker, D. A., Connery, A. K., Kirk, J. W., & Kirkwood, M. W.
(2014). Embedded performance validity indicators within
the California Verbal Learning Test, Children’s Version,
The Clinical Neuropsychologist, 28, 116–127. doi:10.1080/
13854046.2013.858184
Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The
predictive power of diagnostic tests and the effect of
prevalence of illness, Archives of General Psychiatry, 40,
569–573. doi:10.1001/archpsyc.1983.01790050095011
Baron, I. S. (2004). Neuropsychological evaluation of the child.
New York, NY: Oxford University Press.
Bigler, E. D. (2012). Symptom validity testing, effort, and
neuropsychological assessment, Journal of the International
Neuropsychological Society, 18, 632–642. doi:10.1017/
S1355617712000252
Bigler, E. D. (2015). Neuroimaging as a biomarker in
symptom validity and performance validity testing. Brain
Imaging and Behavior, 9(3), 421–444. doi:10.1007/s11682-
015-9409-1
Boone, K. B. (2009). The need for continuous and
comprehensive sampling of effort/response bias during
neuropsychological examination. The Clinical Neuropsy-
chologist, 23(4), 729–741. doi:10.1080/13854040802427803
Boone, K. B. (2013). Clinical practice of forensic neuropsychol-
ogy. New York, NY: Guilford.
Conners, K. C. (2004). Conners’ Continuous Performance Test
(CPT II) version 5 for Windows technical guide and software
manual. North Tonawada, NY: Multi-Health Systems.
Constantinou, M., & McCaffrey, R. J. (2003). Using the
TOMM for evaluating children’s effort to perform opti-
mally on neuropsychological measures. Child Neuropsy-
chology, 9(2), 81–90. doi:10.1076/chin.9.2.81.14505
Delis, D., & Wetter, S. R. (2007). Cogniform disorder and
cogniform condition: Proposed diagnoses for excessive
cognitive symptoms, Archives of Clinical Neuropsychology,
22, 589–604. doi:10.1016/j.acn.2007.04.001
Donders, J. (2005). Performance on the test of memory
malingering in a mixed pediatric sample. Child Neurop-
sychology, 11(2), 221–227. doi:10.1080/09297040490
917298
APPLIED NEUROPSYCHOLOGY: CHILD 7
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
Dunn, L. M., & Dunn, L. M. (1997). The Peabody Picture
Vocabulary Test (3rd ed.). Bloomington, MN: Pearson
Assessments.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T.,
Kucharski, B., Zuccato, B. G., & Roth, R. M. (2016). WAIS-
IV processing speed scores as measures of non-credible
responding: The third generation of embedded perfor-
mance validity indicators. Psychological Assessment.
doi:10.1037/pas0000319
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE
complex ideational material are associated with invalid
performance in adults without aphasia. Applied Neuropsy-
chology: Adult. doi:10.1080/23279095.2016.1154856
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R.,
& Medoff, B. (2014). Aggregating validity indicators
embedded in Conners’ CPT-II outperforms individual
cutoffs at separating valid from invalid performance in
adults with traumatic brain injury. Archives of Clinical
Neuropsychology, 29(5), 456–466. doi:10.1093/arclin/
acu026
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D.,
Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE
complex ideational material: A measure of receptive lan-
guage or performance validity? Psychological Injury and
Law. doi:10.1007/s12207-016-9254-6
Frederick, R. I., & Bowden, S. C. (2009). Evaluating constructs
represented by symptom validity tests in forensic neuropsy-
chological assessment of traumatic brain injury. Journal of
Head Trauma Rehabilitation, 24(2), 105–122. doi:10.1097/
HTR.0b013e31819b1210
Frederick, R. I., Crosby, R. D., & Wynkoop, T. F. (2000).
Performance curve classification of invalid responding
on the validity indicator profile. Archives of Clinical
Neuropsychology, 15(4), 281–300. doi:10.1093/arclin/
15.4.281
Gioia, G. A., Isquith, P. K., Guy, S. C., & Kenworthy, L.
(2000). BRIEF: Behavior rating inventory of executive
function. Lutz, FL: Psychological Assessment Resources.
Green, P. (2003). Manual for the Computerized Word Memory
Test for Windows (revised 2005). Edmonton, AB: Green’s
Publishing.
Green, P. (2004). Manual for the Medical Symptom Validity
Test for Windows. Edmonton, AB: Green’s Publishing.
Green, P. (2008). Manual for the Green’s Non-Verbal Medical
Symptom Validity Test for Windows. Edmonton, AB,
Canada: Green’s Publishing.
Green, P., & Flaro, L. (2003). Word memory test performance
in children. Child Neuropsychology, 9(3), 189–207.
doi:10.1076/chin.9.3.189.16460
Green, P., Flaro, L., Brockhaus, R., & Montijo, J. (2012).
Performance on the WMT, MSVT, and NV-MSVT in
children with developmental disabilities and in adults with
mild traumatic brain injury. In R. Reynolds & A. M.
Horton (Eds.), Detection of malingering during head injury
litigation (pp. 201–219). New York, NY: Springer.
Grimes, D. A., & Schultz, K. F. (2005). Refining clinical
diagnosis with likelihood ratios, Lancet, 365, 1500–1505.
doi:10.1016/S0140-6736(05)66422-7
Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., &
Dougherty, E. R. (2010). Small-sample precision of ROC-
related estimates. Bioinformatics, 26(6), 822–830.
doi:10.1093/bioinformatics/btq037
Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., &
Curtiss, G. (1993). Wisconsin Card Sorting Test manual:
Revised and expanded. Odessa, FL: Psychological
Assessment Resources.
Iverson, G. L., Lange, R. T., Green, P., & Franzen, M. D.
(2002). Detecting exaggeration and malingering with
the trail making test. Clinical Neuropsychologist, 16(3),
398–406. doi:10.1076/clin.16.3.398.13861
Kirkwood, M. W., & Kirk, J. W. (2010). The base rate of
suboptimal effort in a pediatric mild TBI sample: Perfor-
mance on the Medical Symptom Validity Test. The Clinical
Neuropsychologist, 24(5), 860–872. doi:10.1080/
13854040903527287
Kirkwood, M. W., Yeates, K. O., Randolph, C., & Kirk, J. W.
(2012). The implications of Symptom Validity Test failure
for ability-based test performance in a pediatric sample.
Psychological Assessment, 24(1), 36–45. doi:10.1037/
a0024628
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T.,
Pancholi, S., Bhagwant, A., & French, L. M. (2013). Clinical
utility of the Conners’ Continuous Performance Test-II to
detect poor effort in US military personnel following
traumatic brain injury. Psychological Assessment, 25(2),
339–352. doi:10.1037/a0030915
Larrabee, G. J. (2003). Detecting of malingering using atypical
performance patterns on standard neuropsychological
tests. The Clinical Neuropsychologist, 17(3), 410–425.
doi:10.1076/clin.17.3.410.18089
Larrabee, G. J. (2005). Forensic neuropsychology – A scientific
approach. New York, NY: Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2016).
Introducing a forced-choice recognition task to the
California Verbal Learning Test – Children’s Version. Child
Neuropsychology. Advance online publication. doi:10.1080/
09297049.2015.1135422
Lu, P. H., Rogers, S. A., & Boone, K. B. (2007). Use of
standard memory tests to detect suspect effort. In K. B.
Boone (Ed.), Assessment of feigned cognitive impairment
(pp. 128–151). New York, NY: Guilford.
MacAllister, W. S., Nakhutina, L., Bender, H. A., Karantzoulis,
S., & Carlson, C. (2009). Assessing effort during neuro-
psychological evaluation with the TOMM in children and
adolescents with epilepsy. Child Neuropsychology, 15(6),
521–531. doi:10.1080/09297040902748226
Marshall, P., Schroeder, R., O’Brien, J., Fischer, R., Ries, A.,
Blesi, B., & Barker, J. (2010). Effectiveness of symptom
validity measures in identifying cognitive and behavioral
symptom exaggeration in adult attention deficit hyperactiv-
ity disorder, Clinical Neuropsychologist, 24, 1204–1237.
doi:10.1080/13854046.2010.514290
Meyers, J., & Meyers, K. (1995). Rey Complex Figure Test
and recognition trial professional manual. Lutz, FL:
Psychological Assessment Resources.
Miele, A. S., Gunner, J. H., Lynch, J. K., & McCaffrey, R. J.
(2010). Are embedded validity indices equivalent to
free-standing symptom validity tests? Archives of Clinical
Neuropsychology, 27(1), 10–22. doi:10.1093/arclin/acr084
Morey, L. (1991). Personality Assessment Inventory professional
manual. Odessa, FL: Psychological Assessment Resources.
Morey, L. (2007). Personality Assessment Inventory –
Adolescent professional manual. Lutz, FL: Psychological
Assessment Resources.
8 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
Nagle, A. M., Everhart, D. E., Durham, T. W., McCammon, S.
L., & Walker, M. (2006). Deception strategies in children:
Examination of forced choice recognition and verbal
learning and memory techniques. Archives of Clinical
Neuropsychology, 21(8), 777–785. doi:10.1016/j.acn.
2006.06.011
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J.
(2010). Detection of malingering in mild traumatic brain
injury with the Conners’ Continuous Performance Test-
II. Journal of Clinical and Experimental Neuropsychology,
32(4), 380–387. doi:10.1080/13803390903066881
Reitan, R. M., & Wolfson, D. (1985). The Halstead-Reitan
Neuropsychological Test Battery: Theory and interpretation.
Tucson, AZ: Neuropsychology Press.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card
sorting test in the detection of malingering in student
simulator and patient samples. Journal of Clinical and
Experimental Neuropsychology, 21(5), 701–708.
doi:10.1076/jcen.21.5.701.868
Suhr, J. A., Hammers, D., Dobbins-Buckland, K., Zimak, E., &
Hughes, C. (2008). The relationship of malingering test
failure to self-reported symptoms and neuropsychological
findings in adults referred for ADHD evaluation. Archives
of Clinical Neuropsychology, 23(5), 521–530. doi:10.1016/j.
acn.2008.05.003
Sweet, J. J., & Nelson, N. W. (2007). Validity indicators within
executive function measures: Use and limits in detection of
malingering. In K. B. Boone (Ed.), Assessment of feigned
cognitive impairment (pp. 152–177). New York, NY:
Guilford.
Wald, N. J., & Bestwick, J. P. (2014). Is the area under an ROC
curve a valid measure of performance of a screening or
diagnostic test? Journal of Medical Screening, 21(1),
51–56. doi:10.1177/0969141313517497
Wechsler, D. A. (1991). Wechsler intelligence scale for children
(3rd ed.). San Antonio, TX: The Psychological Corporation.
Wilkinson, G. S. (1993). Wide Range Achievement Test—
Revision 3. Wilmington, DE: Jastak Association.
APPLIED NEUROPSYCHOLOGY: CHILD 9
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016