ArticlePDF Available

Embedded validity indicators in Conners’ CPT-II: Do adult cutoffs work the same way in children?

Authors:

Abstract and Figures

In previous research, several subscales of Conners’ CPT-II were found to be useful as performance validity tests (PVTs) when administered to adults with traumatic brain injury (TBI). Furthermore, invalid response sets were associated with inflated scores on several CPT-II scales. The present study proposed to investigate whether these findings would replicate in a pediatric sample. The analyses were based on archival data from 15 children with TBI. The Omissions, Hit RT, Perseverations, and Hit RT BC scales proved effective at differentiating valid and invalid response sets. However, Commission errors were unrelated to scores on PVTs. A composite measure based on these four scores was a superior and more stable validity indicator than individual scales. Two or more T-scores >65 on any of these scales resulted in acceptable overall specificity (.86–1.00) and variable sensitivity (.00–1.00). Scores on CPT-II scales were generally higher among those who failed the reference PVTs. Results suggest that embedded CPT-II validity indices developed in adult TBI samples function similarly in children with TBI, with some notable exceptions. Although the use of adult PVT cutoffs in pediatric assessment is a common practice, and broadly supported by the present findings, there remains a clear need for the independent empirical validation of adult PVTs in children.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=hapc20
Download by: [University of Windsor], [Dr Laszlo A. Erdodi] Date: 07 July 2016, At: 06:40
Applied Neuropsychology: Child
ISSN: 2162-2965 (Print) 2162-2973 (Online) Journal homepage: http://www.tandfonline.com/loi/hapc20
Embedded validity indicators in Conners’ CPT-II: Do
adult cutoffs work the same way in children?
Laszlo A. Erdodi, Jonathan D. Lichtenstein, Jaspreet K. Rai & Lloyd Flaro
To cite this article: Laszlo A. Erdodi, Jonathan D. Lichtenstein, Jaspreet K. Rai & Lloyd Flaro
(2016): Embedded validity indicators in Conners’ CPT-II: Do adult cutoffs work the same way in
children?, Applied Neuropsychology: Child, DOI: 10.1080/21622965.2016.1198908
To link to this article: http://dx.doi.org/10.1080/21622965.2016.1198908
Published online: 06 Jul 2016.
Submit your article to this journal
View related articles
View Crossmark data
APPLIED NEUROPSYCHOLOGY: CHILD
http://dx.doi.org/10.1080/21622965.2016.1198908
Embedded validity indicators in Conners’ CPT-II: Do adult cutoffs work the same
way in children?
Laszlo A. Erdodia, Jonathan D. Lichtensteinb, Jaspreet K. Raia, and Lloyd Flaroc
aDepartment of Psychology, University of Windsor, Windsor, Ontario, Canada; bDepartment of Psychiatry, Geisel School of Medicine at
Dartmouth, Hanover, New Hampshire, USA; cPrivate Practice, Edmonton, Alberta, Canada
ABSTRACT
In previous research, several subscales of Conners’ CPT-II were found to be useful as performance
validity tests (PVTs) when administered to adults with traumatic brain injury (TBI). Furthermore,
invalid response sets were associated with inflated scores on several CPT-II scales. The present
study proposed to investigate whether these findings would replicate in a pediatric sample. The
analyses were based on archival data from 15 children with TBI. The Omissions, Hit RT,
Perseverations, and Hit RT BC scales proved effective at differentiating valid and invalid response
sets. However, Commission errors were unrelated to scores on PVTs. A composite measure based
on these four scores was a superior and more stable validity indicator than individual scales. Two or
more T-scores >65 on any of these scales resulted in acceptable overall specificity (.86–1.00) and
variable sensitivity (.00–1.00). Scores on CPT-II scales were generally higher among those who failed
the reference PVTs. Results suggest that embedded CPT-II validity indices developed in adult TBI
samples function similarly in children with TBI, with some notable exceptions. Although the use of
adult PVT cutoffs in pediatric assessment is a common practice, and broadly supported by the
present findings, there remains a clear need for the independent empirical validation of adult PVTs
in children.
KEYWORDS
Conners’ CPT-II; pediatric
neuropsychology;
performance validity
assessment; traumatic brain
injury
Until recently, research on test taking effort in
neuropsychological assessment had been relegated to
adults. However, within the last decade, there has been
an increased interest in the use of performance validity
tests (PVTs) with pediatric populations (Baker,
Connery, Kirk, & Kirkwood, 2014). This area of
investigation appears especially worthwhile in certain
diagnostic groups. For example, in one study, 17% of
a sample of children with mild traumatic brain injury
(mTBI) produced noncredible response sets (Kirkwood
& Kirk, 2010). Such findings have clinical significance as
children who fail PVTs have been found to perform
more poorly on cognitive testing as compared to
children who demonstrate adequate effort (Kirkwood,
Yeates, Randolph, & Kirk, 2012).
In the context of clinical evaluation, embedded
validity indicators (EVIs) hold considerable advantage
over stand-alone PVTs. Even though individually, their
signal detection performance is generally inferior to
stand-alone PVTs, EVIs provide a valuable alternative
method of assessing test-taking effort. First, because
they utilize data that are already collected for clinical
purpose, EVIs are more cost effective and expedient.
They also tend to be more resistant to coaching, as they
cannot be easily identified as validity measures (Miele,
Gunner, Lynch, & McCaffrey, 2010). Finally, they allow
for continuous monitoring of cognitive effort in
neuropsychological evaluation, a practice that is consist-
ent with the highest forensic standards (Boone, 2009).
Although stand-alone PVTs remain the gold standard
for assessing performance validity in cognitive testing,
there is a growing body of evidence suggesting that
the aggregation of multiple EVIs into a single composite
produces results similar to those of stand-alone
PVTs (Erdodi & Roth, 2016; Erdodi, Roth, Kirsch,
Lajiness-O’Neill, & Medoff, 2014). Such findings, which
demonstrate the potential of EVIs to serve as effective
PVTs, are encouraging and warrant further research.
Despite their being an attractive tool to the scientist-
practitioner, research on EVIs in pediatrics has lagged
behind that of their adult counterparts. Moreover, most
studies investigating the effect of effort on assessment
outcome in children rely on instruments modeled after
adult PVTs, and few have provided cutoff scores specific
to pediatric populations. Nevertheless, empirical
support for the use of adult PVTs in pediatrics
CONTACT Laszlo A. Erdodi lerdodi@gmail.com Department of Psychology, University of Windsor, 168 Chrysler Hall South, 401 Sunset Avenue, Windsor,
ON N9B 3P4, Canada.
© 2016 Taylor & Francis
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
(Constantinou & McCaffrey, 2003; Donders, 2005;
Green & Flaro, 2003; Green, Flaro, Brockhaus, &
Montijo, 2012; MacAllister, Nakhutina, Bender,
Karantzoulis, & Carlson, 2009; Nagle, Everhart,
Durham, McCammon, & Walker, 2006) has been
accumulating in the research literature.
On the other hand, there is evidence that the appli-
cation of adult cutoffs to pediatric assessment may not
always be appropriate (Constantinou & McCaffrey,
2003). Such findings may be due, at least in part, to
the practice of extending existing measures to a new
population rather than designing new measures that
meet the unique challenges of assessing any given
group. The downward extension of adult assessment
tools for use with children is common in neuropsychol-
ogy, although perhaps not optimal and certainly not
without limitations. However, the fact that EVIs were
originally conceived as dynamic models (Babikian &
Boone, 2007; Boone, 2013; Larrabee, 2005; Lu, Rogers,
& Boone, 2007; Sweet & Nelson, 2007) places them in
a better position, as compared with stand-alone PVTs,
to respond to the unique challenges of pediatric neurop-
sychology and provide the flexibility to accommodate
population-specific idiosyncrasies (developmental
trajectories, higher volatility of the target construct,
increased reactivity to variables that are extraneous to
the assessment process).
Conners’ Continuous Performance Test, Second
Edition (CPT-II) was designed to quantify various
aspects of an individual’s ability to focus and sustain
attention during a basic vigilance task, and is widely
used in attention deficit/hyperactivity disorder (ADHD)
research and clinical assessments for respondents aged
6 or older (Conners, 2004). Although the instrument
was primarily developed to detect ADHD, certain subt-
ests have shown promise as EVIs. For example, Suhr,
Hammers, Dobbins-Buckland, Zimak, and Hughes
(2008) reported that individuals who failed the Word
Memory Test (WMT; Green, 2003) were also more
impaired on the CPT-II. Similarly, Marshall et al.
(2010) found that the Omissions (OMI) and Commis-
sions (COM) subtests performed well (sensitivity [SENS]
.04–.57; specificity [SPEC] .87–1.00) against established
PVTs in 268 adults referred for ADHD assessment. In
a sample of 82 adults with TBI, Ord, Boettcher, Greve,
and Bianchini (2010) found that the OMI and Hit Reac-
tion Time Standard Errors (HRT-SE) were successful at
identifying invalid response sets with a SPEC of .95
and SENS of .30–.44. In addition to OMI and COM,
Lange et al. (2013) identified Perseverations (PER) as
effective at separating valid and invalid response sets
(operationalized as failing the WMT) in a military sample
of 158 individuals assessed for TBI. Results are mixed on
HRT: Ord et al. (2010) found a significant difference on
this scale between valid and invalid response sets, while
Lange et al. (2013) did not.
More recently, Erdodi et al. (2014) added the
Variability (VAR) subtest to the list of potential EVIs
in the CPT-II. In their sample of 104 adults with
TBI, at cutoffs with SPEC .90, five EVIs produced
SENS values hovering around .50. Interestingly, the
subsample that passed reference PVTs (refPVT) per-
formed within normal limits on all 12 scales, while
those who failed refPVTs produced mean scores in
the clinical range (T >60) on six scales, suggesting that
elevations on the CPT-II following TBI are more likely
to reflect invalid responding than acquired attention
deficit.
Despite growing empirical support for the CPT-II
based EVIs in adult TBI, to our knowledge, no research
has been published on the topic in pediatric popula-
tions. The present study was conducted to investigate
whether the patterns of findings observed in adults with
TBI would replicate in a pediatric sample. Specifically,
we were interested in the signal detection profile of EVIs
within the CPT-II in children with TBI and whether the
link between impaired scores on the clinical scales and
invalid performance would replicate in a pediatric
sample.
Method
Participants
The study was based on archival data from 15 children
(53%male, M
Age
¼12.6, SD
Age
¼3.2) referred for neu-
ropsychological assessment following TBI to a private
practice in a Canadian metropolitan area. Two-thirds
of the sample had a comorbid psychiatric diagnosis:
ADHD (n ¼4), learning disability (n ¼3), conduct dis-
order (n ¼1), drug and alcohol abuse (n ¼1), and
somatization disorder (n ¼1).
Overall intellectual functioning fell in the borderline
range (M
FSIQ
¼76.9, SD
FSIQ
¼12.5), while performance
on a picture vocabulary test was average (M
PPVT-3
¼
95.5, SD
PPVT-3
¼16.1). A similar discrepancy between
verbal vs. perceptual, and receptive vs. expressive abili-
ties was also noted within the WISC-III (M
VIQ
¼79.5,
SD
VIQ
¼11.8; M
PIQ
¼84.1, SD
PIQ
¼16.2). The mean
reading level in the sample was equivalent to grade 5.3
(SD ¼3.3), which is about two years below what would
be expected based on chronological age. On the Rey
Complex Figure Test, performance was borderline to
low average [M
IR
¼35.2, SD
IR
¼12.2; M
DR
¼35.8,
SD
DR
¼13.8; M
REC
¼41.4, SD
REC
¼15.5 (T-scores)].
Likewise, on average, the sample performed broadly
2 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
within normal limits on the Wisconsin Card Sorting
Test (WCST; M
CATcomp
¼4.3, SD
CATcomp
¼2.1; M
FMS
¼
1.7, SD
FMS
¼1.7).
With respect to executive functioning, caregiver
ratings on the BRIEF produced mean scores in the
clinically significant range (T 65) on all subscales.
The highest elevation was on the general composite
(M
GEC
¼70.1, SD
GEC
¼16.3). On the PAI-A, the only
mean scores that reached a mild elevation (T 60)
were the Depression (M
DEP
¼61.5, SD
DEP
¼12.5) and
Schizophrenia (M
SCZ
¼62.4, SD
SCZ
¼7.2) subscales.
Materials
A core battery of standard neuropsychological tests
was administered to all participants, along with a
set of PVTs and rating scales (Table 1). The WMT,
Medical Symptom Validity Test (MSVT; Green,
2004) and Non-Verbal Medical Symptom Validity Test
(NV-MSVT; Green, 2008) were used, at the standard
dichotomous (Pass/Fail) cutoffs, as the free-standing
refPVT. The embedded PVTs included the Trail
Making Test B/A ratio (TMT
B/A
), at the cutoff pub-
lished by Iverson, Lange, Green, and Franzen (2002),
and the logistic regression equation (LRE) developed
by Suhr and Boyer (1999), which uses variables from
the WCST (Heaton, Chelune, Talley, Kay, & Curtiss,
1993). Given that the Suhr-Boyer LRE (S-B
LRE
) was
calibrated on adults, a more conservative cutoff
[1.69, associated P(invalid profile) .80] was used
in the present sample to protect against false positives.
Although equating a score in the failing range on any
given PVT with a globally invalid cognitive profile
remains an epistemologically contentious practice
(Bigler, 2012, 2015), for purely practical reasons, per-
formance validity was operationalized as the outcome
of these individual refPVT (Pass/Fail) in the present
study. Data (T-scores) were only available on five
CPT-II clinical scales: OMI, COM, HRT, PER, and
HRT Block Change (HRT-BC).
Data analysis
Descriptive statistics were computed for the main
variables of interest. Between-group contrasts on con-
tinuous variables were computed using independent
t tests. The statistical significance of the difference in
frequency distribution was assessed using v
2
. SENS
and SPEC of the target CPT-II variables against estab-
lished PVTs were computed using standard formulas.
Area under the curve was not reported, given the recent
controversy around its validity as a single-number
summary of overall classification accuracy (Hanczar
et al, 2010; Wald & Bestwick, 2014).
SENS is the proportion of invalid response sets cor-
rectly identified as such, or true positive rate. In con-
trast, SPEC is the proportion of correctly classified
valid response sets or true negative rate. In PVT
research, SPEC is the key parameter, given the impera-
tive to protect against false positive errors (incorrectly
labeling a valid response set as invalid). As a bench-
mark, .90 is considered the lower threshold for desirable
SPEC (Boone, 2013), with .84 being the lowest accept-
able value (Larrabee, 2003).
Results
Base rates of failure (BR
Fail
) on refPVTs ranged from
8.3%(MSVT and TMT
B/A
) to 30.0%(WMT). Given pre-
vious reports that reading level might be a confound in
the application of adult PVTs to pediatric populations
(Constantinou & McCaffrey, 2003), this link was explicitly
investigated. There was minimal overlap between reading
skills <3rd grade level and BR
Fail
. Only one child with a
reading level lower than 3rd grade failed the S-B
LRE
. All
children failing other refPVTs had higher reading levels.
The relationship between PVT failures and key
sample characteristics was evaluated to rule out
potential background variables that might account for
performance validity (Table 2). Gender was unrelated
to passing or failing any of the refPVTs. Similarly, no
age effects emerged as a function of Pass/Fail status
Table 1. List of tests administered.
Test Abbreviation Reference
Behavior Rating Inventory of Executive Function BRIEF Gioia, Isquith, Guy, and Kenworthy (2000)
Conners’ Continuous Performance Test, 2nd edition CPT-II Conners (2004)
Medical Symptom Validity Test MSVT Green (2004)
Non-Verbal Medical Symptom Validity Test NV-MSVT Green (2008)
Peabody Picture Vocabulary Test, 3rd edition PPVT-3 Dunn and Dunn (1997)
Personality Assessment Inventory - Adolescent PAI-A Morey (1991, 2007)
Rey Complex Figure Test RCFT Meyers and Meyers (1995)
Trail Making Test TMT A & B Reitan and Wolfson (1985)
Wechsler Intelligence Scale for Children, 3rd edition WISC-III Wechsler (1991)
Wide Range Achievement Test, 3rd edition WRAT-III Wilkinson (1993)
Wisconsin Card Sorting Test WCST Heaton, Chelune, Talley, Kay, and Curtiss (1993)
Word Memory Test WMT Green (2003)
APPLIED NEUROPSYCHOLOGY: CHILD 3
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
on refPVTs with the exception of TMT
B/A
. Caregiver rat-
ings of overall executive function as captured on the
BRIEF did not differ between children who passed and
those who failed refPVTs. Finally, with the exception of
TMT
B/A
, presence of psychiatric comorbidity was unre-
lated to Pass/Fail status on refPVTs. Given the low BR
Fail
on TMT
B/A
in combination with the atypical inner logic
behind this PVT (“abnormal pattern of impairment”),
the isolated age and comorbidity effects observed on it
might reflect an instrumentation artifact.
Of the CPT-II scales, OMI >65 achieved acceptable
SPEC against most refPVTs, except the MSVT (.82)
and TMT
B/A
(.80). The only conceivable cutoff on
COM was >60, as no participant scored 65. As a
result, the COM >60 cutoff produced unacceptably
low SPEC (.63–.75) against all refPVTs. HRT >65
cleared the minimum SPEC benchmark against all
refPVTs, with extreme fluctuations in SENS (.00–
1.00). PER >65 produced acceptable SPEC against all
refPVTs except the MSVT (.73), but variable SENS
(.00–.50). Finally, HRT-BC >65 produced uniformly
good SPEC (.88–1.00), in the backdrop of fluctuating
SENS (.00–.40). Further details are displayed in Table 3.
Based on these findings, four of the CPT-II scales
that produced acceptable signal detection profiles were
aggregated into a single composite labeled CPT-II
Validity Indicator (CVI-4). Each CPT-II scale was
recoded into a four-point scale (0–3). The clearly valid
range was assigned the value of zero (PASS). The first
level of suspect performance was assigned the value of
one (Borderline), the next level of invalid performance
was assigned the value of two (Fail), while the most
conservative (low SENS, high SPEC) cutoff was assigned
the value of three (FAIL), following the methodology
described by Erdodi, Abeare, et al. (2016); Erdodi,
Tyson, et al., 2016). The BR
Fail
on the most liberal cutoff
(Borderline; high SENS, low SPEC) ranged from 20%to
33%, which is broadly consistent with the values pro-
duced by the stand-alone refPVTs. The details of the
re-scaling procedure are displayed in Table 4.
Next, the distribution of the CVI-4 scores was exam-
ined and classification ranges were established. The
majority of the sample (8/15 or 53.3%) had a score of
zero, which means that they passed the most liberal cut-
off on all four components of the CVI-4. Thus, this
Table 2. Differences in age, BRIEF GEC T-scores, and rate of
psychiatric comorbidity as a function of passing of failing PVTs.
WMT MSVT
NV-MSVT
TMT
B/A
S-B
LRE
A 1 A 2
Age (Years)
Pass
M 14.9 12.8 13.8 13.5 14.2 13.6
SD 1.9 3.3 2.9 2.9 2.5 3.0
Fail
M 13.7 14.0 14.0 15.0 10.0 12.6
SD 1.5 0.0 0.0 1.4 1.4 3.0
BRIEF GEC (T-score)
Pass
M 72.5 74.9 72.5 73.0 71.8 68.3
SD 22.0 15.6 16.7 18.0 18.0 20.8
Fail
M 74.5 80.0 80.0 74.0 67.5 71.5
SD 7.8 0.0 0.0 7.8 13.4 9.0
%PSY-COM
Pass 85.7 63.6 77.8 75.0 90.0 75.0
Fail 66.7 100.0 100.0 100.0 0.0 80.0
Note. BRIEF GEC ¼Behavior Rating Inventory of Executive Function Global
Executive Composite; % PSY-COM ¼Percentage of the sample with co-
morbid psychiatric diagnoses; WMT ¼Word Memory Test (Standard
Cutoffs); MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-
MSVT ¼Non-Verbal MSVT (Standard Cutoffs); TMT
B/A
¼Trail Making
Test, B/A ratio (cutoff <1.50; Iverson et al., 2002); S-B
LRE
: Logistic
regression equation developed by Suhr and Boyer (1999) using
Wisconsin Card Sorting Test variables [cutoff 1.69; P(invalid) .80].
The only contrasts that reached statistical significance (p <.05) were age
and %PSY-COM on TMT
B/A
.
Table 3. The signal detection properties of the three embedded CPT-II validity indicators against reference PVTs.
CPT-II
EVIs Cutoff BR
Fail
WMT MSVT
NV-MSVT
TMT
B/A
S-B
LRE
A 1 A 2
30% 8% 10% 20% 17% 23%
OMI >65 20% SENS .67 1.00 1.00 .50 .00 .80
SPEC .86 .82 .89 .88 .80 1.00
COM >60 27% SENS .00 .00 .00 .50 .50 .00
SPEC .71 .64 .67 .75 .70 .63
HRT >65 20% SENS .33 1.00 1.00 .50 .00 .60
SPEC .86 .91 1.00 1.00 .90 1.00
PER >65 23% SENS .00 .00 .50 .40
SPEC .86 .73 1.00 1.00
HRT-BC >65 13% SENS .33 .00 .00 .00 .00 .40
SPEC 1.00 .91 .89 .88 .90 1.00
CVI-4 7 20% SENS .50 .00 1.00 .75
SPEC .86 .88 .89 1.00
Note. CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; EVI ¼Embedded validity indicator; BR
Fail
¼Base rate of failure (%scoring below the cutoff);
OMI ¼Omissions; COM ¼Commissions; HRT ¼Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time Block Change; CVI-4 ¼Validity
composite based on OMI, HRT, PER, and HRT-BC cutoffs; SENS ¼sensitivity; SPEC ¼specificity; WMT ¼Word Memory Test (Standard Cutoffs);
MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-MSVT ¼Non-Verbal MSVT (Standard Cutoffs); TMT
B/A
¼Trail Making Test, B/A ratio; cutoff
<1.50 (Iverson et al., 2002); S-B
LRE
1.69 [P(invalid) .80]: logistic regression equation developed by Suhr and Boyer (1999) using Wisconsin Card
Sorting Test variables.
4 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
range was labeled an unequivocal PASS. The next
observed value of CVI-4 was 2, which could either mean
one score at the second level of CVI-4 or two scores at
the first level. Neither combination provides sufficient
evidence of invalid performance, so this range was also
labeled Pass. The next highest value of CVI-4 was 3,
which could mean one score at the most conservative
cutoff, three at the most liberal cutoff, or a combination
of one score at the CVI-4 level of one and one at level
two. While this range of performance is suspect, it also
does not provide incontrovertible evidence for invalid
performance. As such, it was labeled Borderline. The
next highest value of 7, however, is extreme enough to
consider an unequivocal FAIL. Only 3/15 (20%)
performed in this range or above.
Finally, the effect of performance validity on the five
CPT-II scales was examined as a function of Pass/Fail
status on the six refPVTs, creating a 6 6 matrix. In
about half of the cases, those who failed refPVTs also
produced means in the clinical range (T >60), while
those who passed refPVTs performed in the nonclinical
range. In one case, this trend was reversed (PER >65
against MSVT). In the remaining cases the means were
within normal limits regardless of Pass/Fail status.
Overall, six of the contrasts (20%) reached statistical sig-
nificance. Table 5 displays the details of these analyses.
Discussion
This study examined the classification accuracy of the
adult cutoffs on the CPT-II based EVIs in a sample of
children with TBI against six refPVTs to control for
instrumentation artifacts in the criterion measure. Prior
to performing the signal detection analyses, the poten-
tial confounding effects of reading level, age, gender,
executive functioning and psychiatric comorbidity were
independently assessed and largely ruled out. Overall,
results suggest that obtaining two T-scores >65 on
select CPT-II scales raises concerns about the validity
of a given response set.
Of the five potential EVIs examined, OMI >65 pro-
duced the single best signal detection profile, resulting
in a good combination of SENS and SPEC against four
refPVTs. This is broadly consistent with research on
adults (Erdodi et al., 2014; Lange et al., 2013; Ord
et al., 2010), although some authors (Marshall et al.,
2010) recommended more conservative cutoffs (>80).
HRT produced a good combination of SENS and SPEC
against five refPVTs, but zero SENS against the sixth.
Similarly, while HRT-BC had a perfect SPEC against
two refPVTs, it also produced zero SENS on four occa-
sions. Although it produced a good combination of
SENS and SPEC against two refPVTs, overall, PER
>65 resulted in zero SENS on two occasions, rendering
its ability to reliably differentiate valid from invalid
response sets in children questionable. This finding
recapitulates the inconsistent results on PER as an
EVI in adults. Some studies found perseverative errors
Table 4. Cumulative base rates (BR) of failure on four CPT-II
scales across levels of CVI-4.
CPT-II
Levels of CVI-4
PASS Borderline Fail FAIL
Scales 0 1 2 3
OMI <60 60 65 70
Base Rate 73.3% 26.7% 26.7% 20.0%
HRT <60 60 65 70
Base Rate 73.3% 26.7% 20.0% 13.3%
PER <65 65 80 90
Base Rate 66.7% 33.3% 20.0% 13.3%
HRT-BC <60 60 65 70
Base Rate 80.0% 20.0% 20.0% 13.3%
Note. CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; CVI-4 ¼
CPT-II Validity Indicator; OMI ¼Omissions; COM ¼Commissions; HRT ¼
Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time
Block Change (T-scores).
Table 5. Mean CPT-II OMI, COM, HRT, PER, and HRT-BC T-scores as a function of pass/fail status on reference PVTs.
OMI COM HRT PER HRT-BC CVI-4
WMT Pass 53.3 49.8 57.0 53.9 49.8* 1.3
Fail 67.1 45.9 59.0 55.6 62.3 3.3
MSVT Pass 57.7 51.4 55.2 72.1 52.3 2.0
Fail 60.1 49.0 67.8 58.6 55.4 3.0
NV-MSVT A1 Pass 55.5 50.5 53.5** 52.3 55.7 1.0
Fail 60.1 49.0 67.8 58.6 55.4 3.0
NV-MSVT A2 Pass 55.1 49.1 53.3 50.1** 55.0 0.9
Fail 59.3 55.4 61.7 64.5 58.5 2.5
TMT
B/A
Pass 55.6 50.0 55.6 52.7* 58.9 1.2
Fail 47.9 61.4 46.7 123.2 45.2 1.5
S-B
LRE
Pass 56.9 52.1 55.4** 55.0* 54.1 0.3**
Fail 61.9 46.4 69.3 96.8 66.8 5.8
Note. Statistical significance was determined using independent t-tests; CPT-II ¼Conners’ Continuous Performance Test, 2nd edition; CVI-4 ¼CPT-II Validity
Indicator; OMI ¼Omissions; COM ¼Commissions; HRT ¼Hit Reaction Time; PER ¼Perseverations; HRT-BC ¼Hit Reaction Time Block Change WMT ¼Word
Memory Test (Standard Cutoffs); MSVT ¼Medical Symptom Validity Test (Standard Cutoffs); NV-MSVT ¼Non-Verbal MSVT (Standard Cutoffs);
TMT
B/A
¼Trail Making Test, B/A ratio (cutoff <1.50; Iverson et al., 2002); S-B
LRE
: Logistic regression equation developed by Suhr and Boyer (1999) using
Wisconsin Card Sorting Test variables [cutoff 1.69; P(invalid) .80].
*p <.05 (one-tailed). **p <.01 (one-tailed).
APPLIED NEUROPSYCHOLOGY: CHILD 5
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
on the CPT-II to be effective at separating valid and
invalid response sets (Erdodi et al., 2014; Lange et al.,
2013), whereas others produced inconclusive results
(Ord et al., 2010).
COM was a notable outlier—both within this study
and in comparison to the adult literature (Erdodi
et al., 2014; Lange et al., 2013; Marshall et al., 2010;
Ord et al., 2010). Its only conceivable cutoff (>60) con-
sistently produced SPEC values .75, which imply an
unacceptably high false positive rate. In addition,
COM >60 resulted in zero SENS against four refPVTs.
Finally, the CVI-4, a composite EVI, produced the most
efficient and stable estimate of performance on the
criterion measures. The advantage of the CVI-4 over
individual EVIs in signal detection likely stems from
combining the diagnostic power of several indicators.
Its differential success rate is predicted by the central
limit theorem, and exemplifies the methodological
superiority of multivariate models in performance
validity assessment. As such, the results of the present
study are consistent with previous research on CPT-
II based EVIs in adult samples (Erdodi et al., 2014;
Lange et al., 2013).
Even so, the CVI-4 demonstrated extreme fluctua-
tions in SENS, ranging from .00 to 1.00. SPEC values
were less variable, ranging from .80 to 1.00. The
notable discrepancy between SENS and SPEC is likely
caused by a number of factors: the natural variability
in effort across PVTs; small sample size; low BR
Fail
on both EVIs and refPVTs; and a deliberate attempt
to protect against false positives at the expense of
higher false negative rate. The fact that Marshall
et al. (2010) observed a similarly wide range of SENS
despite a vastly larger sample lends support to the first
two explanations.
Careful examination of Table 3 also suggests that sig-
nal detection profiles may be domain-specific. In other
words, they could be partially driven by the inherent
variability in the stimulus properties of the refPVTs.
The lowest classification accuracy among EVIs was
observed against the TMT
B/A
, while the best combina-
tions of SENS and SPEC occurred on the S-B
LRE
. This
heterogeneity in signal detection performance may be
another contributing factor to the fluctuation in SENS
and SPEC previously described.
Two of the individual CPT-II based EVIs developed
in adult samples (OMI and PER) remained effective
when applied to pediatric TBI. However, COM had a
consistently poor performance against all refPVTs.
Hence, it does not appear to be useful in dissociating
impairment from effort in children. HRT and HRT-
BC performed reasonably well, suggesting that they
may warrant further research in larger samples.
This pattern of findings fits the clinical interpretation
of the underlying constructs that the EVIs of interest are
designed to measure and the larger context of perfor-
mance validity assessment. Of the four successful
CPT-II based EVIs examined in the present study, the
OMI scale can be conceptualized as the purest measure
of task engagement as it only requires the willingness to
respond to the abundant and salient targets (Erdodi
et al., 2014). Thus, inflated error rates likely reflect poor
compliance with simple instructions. The manual itself
identifies extreme scores on this scale as potential
indication of invalid responding (Conners, 2004).
Similarly, HRT and its temporal derivative, HRT-BC,
are measures of motor speed, a basic construct that is
under conscious control. An overall RT that is unusually
slow or is becoming unusually slower over time could
indicate failure to put forth full effort throughout the
task or a gradual disengagement from the testing
process. Since these two scales may potentially capture
distinct forms of invalid responding, a larger scale study
exploring their divergent validity would be a worthwhile
pursuit in subtyping performance validity and, in com-
bination with more opaque scales such as HRT-SE and
VAR, modeling intent, a highly elusive construct in
signal detection analyses (Boone, 2013; Delis & Wetter,
2007; Frederick, Crosby, & Wynkoop, 2000; Frederick &
Bowden, 2009).
On the other hand, the COM scale is a measure of
the examinee’s ability to inhibit a basic response that
quickly becomes automatic as the test progresses.
Because it requires higher-order cognitive abilities that
are still developing and maturing in youth (Baron,
2004), COM errors may be equally likely in children
with full and questionable effort, and therefore they
may not differentiate between them. Given the consist-
ently poor signal detection performance of COM, it
should not be used as an EVI in children until further
research suggests otherwise.
Lastly, even though Conners (2004) endorsed
extreme elevations on the PER scale as a measure of
performance validity, perseverative errors, as defined
on the CPT-II, are multifactorial. At face value, they
are equally likely to indicate slow responses to the pre-
vious targets, repeat responses to the same target, or
random responses, resulting in an unstable validity
index in pediatric populations. This would explain
why PER >65 behaved so inconsistently against
refPVTs. Nevertheless, this scale showed enough
promise to warrant further investigation in pediatrics.
Using Pass/Fail status on the refPVTs as independent
variables and the CPT-II scales as dependent variables
produced a pattern of findings that was consistent with
the results previously described, suggesting that the
6 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
CPT-II is sensitive to poor test taking effort. On one
hand, this makes some of its scales good candidates to
become EVIs. On the other hand, it provides a note
of caution to assessors that an elevated score on the
CPT-II could mean impaired sustained visual attention
as well as poor test taking effort. Given the far reaching
implications of this differential diagnosis, scores in the
clinical range should be interpreted carefully and the
alternative conceptualization of invalid responding
should be considered (Boone, 2013).
Since adequate effort during neuropsychological
testing is often assumed rather than formally assessed,
this practice is concerning as it may enable a powerful
confounding variable to contaminate the measurement
model. Even within the present sample, there was a
notable difference between overall intellectual func-
tioning and receptive vocabulary. There is no obvious
clinical reason for such a wide discrepancy (d ¼1.08).
However, if the BR
Fail
on various established and
experimental PVTs (10–30%) is taken into account,
that could provide a tentative explanation of this other-
wise puzzling finding.
Procedurally, the FSIQ is comprised of a series of
tests, many of which require considerable mental effort
to demonstrate maximal ability level. In contrast, the
picture vocabulary test involves minimal response from
the examinee (pointing), who is being presented with
constantly changing, colorful, and pleasant visual stim-
uli. Thus, while children with poor effort are expected to
perform below their true ability on the more challenging
IQ test, they may produce valid response sets on the less
demanding picture vocabulary test. In other words, the
construct of “cognitive effort” itself may change across
different tests, potentially requiring an instrument-
specific definition of “performance validity” (Bigler,
2015; Boone, 2013; Frederick et al., 2000; Frederick &
Bowden, 2009).
The overall findings are largely consistent with
existing literature on EVIs in the CPT-II, with the
notable exception that in the present sample COM
was not associated with performance on formal mea-
sures of test taking effort (although this was at least
partly due to restricted range). The most obvious limi-
tation of the study is the small sample size, which
may render the parameter estimates unstable. There-
fore, replication using a larger sample and different
refPVTs is needed before the results can be incorporated
into routine clinical decision making. Even though, con-
sistent with previous reports (Lichtenstein, Erdodi, &
Linnea, 2016), comorbid psychiatric and neurodevelop-
mental conditions were unrelated to BR
Fail
on refPVTs
or the newly developed EVIs within this sample, the
cumulative effect of multiple disorders as a confound
in pediatric PVT research should continue to be
monitored. Finally, given that estimates of signal
detection parameters (SENS, SPEC) are dependent on
BR
Fail
, (Baldessarini, Finklestein, & Arana, 1983;
Grimes & Schultz, 2005), the values obtained in this
study likely reflects the failure rates observed in our
sample, and may not generalize to population with
vastly different BR
Fail
. All things considered, the con-
sistency of the findings across instruments and
methods, even within a limited sample, is encouraging
and suggests that the topic is worth pursuing in future
research.
References
Babikian, T., & Boone, K. B. (2007). Intelligence tests as
measures of effort. In K. B. Boone (Ed.), Assessment of
feigned cognitive impairment (pp. 103–127). New York,
NY: Guilford.
Baker, D. A., Connery, A. K., Kirk, J. W., & Kirkwood, M. W.
(2014). Embedded performance validity indicators within
the California Verbal Learning Test, Children’s Version,
The Clinical Neuropsychologist, 28, 116–127. doi:10.1080/
13854046.2013.858184
Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The
predictive power of diagnostic tests and the effect of
prevalence of illness, Archives of General Psychiatry, 40,
569–573. doi:10.1001/archpsyc.1983.01790050095011
Baron, I. S. (2004). Neuropsychological evaluation of the child.
New York, NY: Oxford University Press.
Bigler, E. D. (2012). Symptom validity testing, effort, and
neuropsychological assessment, Journal of the International
Neuropsychological Society, 18, 632–642. doi:10.1017/
S1355617712000252
Bigler, E. D. (2015). Neuroimaging as a biomarker in
symptom validity and performance validity testing. Brain
Imaging and Behavior, 9(3), 421–444. doi:10.1007/s11682-
015-9409-1
Boone, K. B. (2009). The need for continuous and
comprehensive sampling of effort/response bias during
neuropsychological examination. The Clinical Neuropsy-
chologist, 23(4), 729–741. doi:10.1080/13854040802427803
Boone, K. B. (2013). Clinical practice of forensic neuropsychol-
ogy. New York, NY: Guilford.
Conners, K. C. (2004). Conners’ Continuous Performance Test
(CPT II) version 5 for Windows technical guide and software
manual. North Tonawada, NY: Multi-Health Systems.
Constantinou, M., & McCaffrey, R. J. (2003). Using the
TOMM for evaluating children’s effort to perform opti-
mally on neuropsychological measures. Child Neuropsy-
chology, 9(2), 81–90. doi:10.1076/chin.9.2.81.14505
Delis, D., & Wetter, S. R. (2007). Cogniform disorder and
cogniform condition: Proposed diagnoses for excessive
cognitive symptoms, Archives of Clinical Neuropsychology,
22, 589–604. doi:10.1016/j.acn.2007.04.001
Donders, J. (2005). Performance on the test of memory
malingering in a mixed pediatric sample. Child Neurop-
sychology, 11(2), 221–227. doi:10.1080/09297040490
917298
APPLIED NEUROPSYCHOLOGY: CHILD 7
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
Dunn, L. M., & Dunn, L. M. (1997). The Peabody Picture
Vocabulary Test (3rd ed.). Bloomington, MN: Pearson
Assessments.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T.,
Kucharski, B., Zuccato, B. G., & Roth, R. M. (2016). WAIS-
IV processing speed scores as measures of non-credible
responding: The third generation of embedded perfor-
mance validity indicators. Psychological Assessment.
doi:10.1037/pas0000319
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE
complex ideational material are associated with invalid
performance in adults without aphasia. Applied Neuropsy-
chology: Adult. doi:10.1080/23279095.2016.1154856
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R.,
& Medoff, B. (2014). Aggregating validity indicators
embedded in Conners’ CPT-II outperforms individual
cutoffs at separating valid from invalid performance in
adults with traumatic brain injury. Archives of Clinical
Neuropsychology, 29(5), 456–466. doi:10.1093/arclin/
acu026
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D.,
Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE
complex ideational material: A measure of receptive lan-
guage or performance validity? Psychological Injury and
Law. doi:10.1007/s12207-016-9254-6
Frederick, R. I., & Bowden, S. C. (2009). Evaluating constructs
represented by symptom validity tests in forensic neuropsy-
chological assessment of traumatic brain injury. Journal of
Head Trauma Rehabilitation, 24(2), 105–122. doi:10.1097/
HTR.0b013e31819b1210
Frederick, R. I., Crosby, R. D., & Wynkoop, T. F. (2000).
Performance curve classification of invalid responding
on the validity indicator profile. Archives of Clinical
Neuropsychology, 15(4), 281–300. doi:10.1093/arclin/
15.4.281
Gioia, G. A., Isquith, P. K., Guy, S. C., & Kenworthy, L.
(2000). BRIEF: Behavior rating inventory of executive
function. Lutz, FL: Psychological Assessment Resources.
Green, P. (2003). Manual for the Computerized Word Memory
Test for Windows (revised 2005). Edmonton, AB: Green’s
Publishing.
Green, P. (2004). Manual for the Medical Symptom Validity
Test for Windows. Edmonton, AB: Green’s Publishing.
Green, P. (2008). Manual for the Green’s Non-Verbal Medical
Symptom Validity Test for Windows. Edmonton, AB,
Canada: Green’s Publishing.
Green, P., & Flaro, L. (2003). Word memory test performance
in children. Child Neuropsychology, 9(3), 189–207.
doi:10.1076/chin.9.3.189.16460
Green, P., Flaro, L., Brockhaus, R., & Montijo, J. (2012).
Performance on the WMT, MSVT, and NV-MSVT in
children with developmental disabilities and in adults with
mild traumatic brain injury. In R. Reynolds & A. M.
Horton (Eds.), Detection of malingering during head injury
litigation (pp. 201–219). New York, NY: Springer.
Grimes, D. A., & Schultz, K. F. (2005). Refining clinical
diagnosis with likelihood ratios, Lancet, 365, 1500–1505.
doi:10.1016/S0140-6736(05)66422-7
Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., &
Dougherty, E. R. (2010). Small-sample precision of ROC-
related estimates. Bioinformatics, 26(6), 822–830.
doi:10.1093/bioinformatics/btq037
Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., &
Curtiss, G. (1993). Wisconsin Card Sorting Test manual:
Revised and expanded. Odessa, FL: Psychological
Assessment Resources.
Iverson, G. L., Lange, R. T., Green, P., & Franzen, M. D.
(2002). Detecting exaggeration and malingering with
the trail making test. Clinical Neuropsychologist, 16(3),
398–406. doi:10.1076/clin.16.3.398.13861
Kirkwood, M. W., & Kirk, J. W. (2010). The base rate of
suboptimal effort in a pediatric mild TBI sample: Perfor-
mance on the Medical Symptom Validity Test. The Clinical
Neuropsychologist, 24(5), 860–872. doi:10.1080/
13854040903527287
Kirkwood, M. W., Yeates, K. O., Randolph, C., & Kirk, J. W.
(2012). The implications of Symptom Validity Test failure
for ability-based test performance in a pediatric sample.
Psychological Assessment, 24(1), 36–45. doi:10.1037/
a0024628
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T.,
Pancholi, S., Bhagwant, A., & French, L. M. (2013). Clinical
utility of the Conners’ Continuous Performance Test-II to
detect poor effort in US military personnel following
traumatic brain injury. Psychological Assessment, 25(2),
339–352. doi:10.1037/a0030915
Larrabee, G. J. (2003). Detecting of malingering using atypical
performance patterns on standard neuropsychological
tests. The Clinical Neuropsychologist, 17(3), 410–425.
doi:10.1076/clin.17.3.410.18089
Larrabee, G. J. (2005). Forensic neuropsychology – A scientific
approach. New York, NY: Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2016).
Introducing a forced-choice recognition task to the
California Verbal Learning Test – Children’s Version. Child
Neuropsychology. Advance online publication. doi:10.1080/
09297049.2015.1135422
Lu, P. H., Rogers, S. A., & Boone, K. B. (2007). Use of
standard memory tests to detect suspect effort. In K. B.
Boone (Ed.), Assessment of feigned cognitive impairment
(pp. 128–151). New York, NY: Guilford.
MacAllister, W. S., Nakhutina, L., Bender, H. A., Karantzoulis,
S., & Carlson, C. (2009). Assessing effort during neuro-
psychological evaluation with the TOMM in children and
adolescents with epilepsy. Child Neuropsychology, 15(6),
521–531. doi:10.1080/09297040902748226
Marshall, P., Schroeder, R., O’Brien, J., Fischer, R., Ries, A.,
Blesi, B., & Barker, J. (2010). Effectiveness of symptom
validity measures in identifying cognitive and behavioral
symptom exaggeration in adult attention deficit hyperactiv-
ity disorder, Clinical Neuropsychologist, 24, 1204–1237.
doi:10.1080/13854046.2010.514290
Meyers, J., & Meyers, K. (1995). Rey Complex Figure Test
and recognition trial professional manual. Lutz, FL:
Psychological Assessment Resources.
Miele, A. S., Gunner, J. H., Lynch, J. K., & McCaffrey, R. J.
(2010). Are embedded validity indices equivalent to
free-standing symptom validity tests? Archives of Clinical
Neuropsychology, 27(1), 10–22. doi:10.1093/arclin/acr084
Morey, L. (1991). Personality Assessment Inventory professional
manual. Odessa, FL: Psychological Assessment Resources.
Morey, L. (2007). Personality Assessment Inventory
Adolescent professional manual. Lutz, FL: Psychological
Assessment Resources.
8 L. A. ERDODI ET AL.
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
Nagle, A. M., Everhart, D. E., Durham, T. W., McCammon, S.
L., & Walker, M. (2006). Deception strategies in children:
Examination of forced choice recognition and verbal
learning and memory techniques. Archives of Clinical
Neuropsychology, 21(8), 777–785. doi:10.1016/j.acn.
2006.06.011
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J.
(2010). Detection of malingering in mild traumatic brain
injury with the Conners’ Continuous Performance Test-
II. Journal of Clinical and Experimental Neuropsychology,
32(4), 380–387. doi:10.1080/13803390903066881
Reitan, R. M., & Wolfson, D. (1985). The Halstead-Reitan
Neuropsychological Test Battery: Theory and interpretation.
Tucson, AZ: Neuropsychology Press.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card
sorting test in the detection of malingering in student
simulator and patient samples. Journal of Clinical and
Experimental Neuropsychology, 21(5), 701–708.
doi:10.1076/jcen.21.5.701.868
Suhr, J. A., Hammers, D., Dobbins-Buckland, K., Zimak, E., &
Hughes, C. (2008). The relationship of malingering test
failure to self-reported symptoms and neuropsychological
findings in adults referred for ADHD evaluation. Archives
of Clinical Neuropsychology, 23(5), 521–530. doi:10.1016/j.
acn.2008.05.003
Sweet, J. J., & Nelson, N. W. (2007). Validity indicators within
executive function measures: Use and limits in detection of
malingering. In K. B. Boone (Ed.), Assessment of feigned
cognitive impairment (pp. 152–177). New York, NY:
Guilford.
Wald, N. J., & Bestwick, J. P. (2014). Is the area under an ROC
curve a valid measure of performance of a screening or
diagnostic test? Journal of Medical Screening, 21(1),
51–56. doi:10.1177/0969141313517497
Wechsler, D. A. (1991). Wechsler intelligence scale for children
(3rd ed.). San Antonio, TX: The Psychological Corporation.
Wilkinson, G. S. (1993). Wide Range Achievement Test—
Revision 3. Wilmington, DE: Jastak Association.
APPLIED NEUROPSYCHOLOGY: CHILD 9
Downloaded by [University of Windsor], [Dr Laszlo A. Erdodi] at 06:40 07 July 2016
... One appeal of embedded PVTs is that little to no additional testing time is needed in order to obtain evidence of validity on that particular score or battery, although we agree that the time needed for stand-alone PVTs is medically necessary for clinical evaluations , and the combination of multiple stand-alone and embedded PVTs is most appropriate. Embedded PVTs have started to receive more attention over the past decade, with evidence being provided for both non-memory tests [e.g., Reliable Digit Span (Kirkwood, Hargrave, & Kirk, 2011;Welsh, Bender, Whitman, Vasserman, & Macallister, 2012); Matrix Reasoning (Sussman, Peterson, Connery, Baker, & Kirkwood, 2017); CNS Vital Signs (Brooks, Sherman, & Iverson, 2014); Automatized Sequences Test ; Conners Continuous Performance Test (Erdodi, Lichtenstein, Rai, & Flaro, 2017;Lichtenstein, Flaro, Baldwin, Rai, & Erdodi, 2019); Wisconsin Card Sorting Test (Lichtenstein, Erdodi, Rai, Mazur-Mosiewicz, & Flaro, 2018)] and memory tests [e.g., California Verbal Learning Test, Children's Version (CVLT-C) Brooks & Ploetz, 2015;Lichtenstein, Erdodi, & Linnea, 2017); Child and Adolescent Memory Profile (ChAMP) Lists subtest (Brooks, Plourde, MacAllister, & Sherman, 2018); and ChAMP Objects subtest (Brooks, MacAllister, Fay-McClymont, Vasserman, & Sherman, 2019b)]. ...
Article
Objective It is essential to interpret performance validity tests (PVTs) that are well-established and have strong psychometrics. This study evaluated the Child and Adolescent Memory Profile (ChAMP) Validity Indicator (VI) using a pediatric sample with traumatic brain injury (TBI). Method A cross-sectional sample of N = 110 youth (mean age = 15.1 years, standard deviation [SD] = 2.4 range = 8–18) on average 32.7 weeks (SD = 40.9) post TBI (71.8% mild/concussion; 3.6% complicated mild; 24.6% moderate-to-severe) were administered the ChAMP and two stand-alone PVTs. Criterion for valid performance was scores above cutoffs on both PVTs; criterion for invalid performance was scores below cutoffs on both PVTs. Classification statistics were used to evaluate the existing ChAMP VI and establish a new VI cutoff score if needed. Results There were no significant differences in demographics or time since injury between those deemed valid (n = 96) or invalid (n = 14), but all ChAMP scores were significantly lower in those deemed invalid. The original ChAMP VI cutoff score was highly specific (no false positives) but also highly insensitive (sensitivity [SN] = .07, specificity [SP] = 1.0). Based on area under the curve (AUC) analysis (0.94), a new cutoff score was established using the sum of scaled scores (VI-SS). A ChAMP VI-SS score of 32 or lower achieved strong SN (86%) and SP (92%). Using a 15% base rate, positive predictive value was 64% and negative predictive value was 97%. Conclusions The originally proposed ChAMP VI has insufficient SN in pediatric TBI. However, this study yields a promising new ChAMP VI-SS, with classification metrics that exceed any other current embedded PVT in pediatrics.
... EVIs are different from freestanding tests in that they are cutoff scores derived from tests measuring various cognitive functions versus tests solely measuring performance validity. EVIs are becoming more frequently used because they limit coaching exposure (Schutte & Axelrod, 2013;Victor & Abeles, 2004;Youngjohn, 1995), aging effects (Erdodi, Lichtenstein, Rai, & Flaro, 2017;Lichtenstein, Erdodi, & Linnea, 2017), time and cost, and assess validity in various cognitive tests and domains throughout the evaluation (Boone, 2018;Bush, Heilbronner, & Ruff, 2014;Chafetz et al., 2015;Miele, Gunner, Lynch, & McCaffrey, 2012). ...
Article
Objective Few studies have examined the use of embedded validity indicators (EVIs) in criminal-forensic practice settings, where judgements regarding performance validity can carry severe consequences for the individual and society. This study sought to examine how various EVIs perform in criminal defendant populations, and determine relationships between EVI scores and intrapersonal variables thought to influence performance validity. Method Performance on 16 empirically established EVI cutoffs were examined in a sample of 164 criminal defendants with valid performance who were referred for forensic neuropsychological evaluation. Subsequent analyses examined the relationship between EVI scores and intrapersonal variables in 83 of these defendants. Results Half of the EVIs (within the Wechsler Adult Intelligence Scale Digit Span Total, Conners’ Continuous Performance Test Commissions, Wechsler Memory Scale Logical Memory I and II, Controlled Oral Word Association Test, Trail Making Test Part B, and Stroop Word and Color) performed as intended in this sample. The EVIs that did not perform as intended were significantly influenced by relevant intrapersonal variables, including below-average intellectual functioning and history of moderate–severe traumatic brain injury and neurodevelopmental disorder. Conclusions This study identifies multiple EVIs appropriate for use in criminal-forensic settings. However, based on these findings, practitioners may wish to be selective in choosing and interpreting EVIs for forensic evaluations of criminal court defendants.
... Further, the CPT-II and CPT-III administration procedures are comparable and these tests measure largely the same outcome scores (including variability) across editions (Conners, 2000(Conners, , 2014Strauss et al., 2006). The CPT has been used with pediatric research for concussion (Keightley et al., 2010;Lecci et al., 2021;O'Neill et al., 2015;Studer et al., 2014), performance validity (Erdodi, Lichtenstein, Rai, & Flaro, 2016;Lichtenstein, Flaro, Baldwin, Rai, & Erdodi, 2019), ADHD (Epstein et al., 2003;Reddy, Newman, Pedigo, & Scott, 2010;Rovet & Hepworth, 2001), and elementary and high school athletes (O'Neill et al., 2015). The findings for the participants on these measures in the current study were reported as T-scores. ...
Article
Objective In concussion populations, suboptimal task engagement detected by performance validity tests (PVTs) has been associated with poorer neuropsychological scores and greater post-concussive symptoms (PCS). This study examined if Pass/Fail status on the Test of Memory Malingering—TOMM Trial 1—differentiated the neurocognitive, emotional, and behavioral profile of pediatric patients with concussion. Method This study utilized archival data from 93 patients (mean age = 14.56 and SD = 2.01) with a history of concussion who were assessed at ~5–6 weeks post-injury (mean days = 40.27 and SD = 35.41). Individuals were divided into “Pass” and “Fail” groups based on TOMM Trial 1 performance. The testing battery included ACT, CPT-II and III, HVLT-R, WJ-III and IV ACH, ImPACT, BASC-2, and BRIEF. Results The overall pass rate on Trial 1 was 70% (mean = 46.04 and SD = 4.55). Findings suggested that a passing score on Trial 1 may be associated with adequate performance across the remaining two trials of the TOMM. The Fail group scored significantly lower across attention, memory, and processing speed measures when compared with the Pass group. On rating scales, significantly more concerns were endorsed with the Fail group for attention and executive functioning relative to the Pass group. Parents generally endorsed significantly more concerns for executive functioning when compared with their children’s self-reported symptoms. There was a trend for the Fail group to report more PCS; however, they did not significantly differ from the Pass group for depression, anxiety, or somatization. Conclusions This study highlights the importance of utilizing PVTs when evaluating concussion recovery.
Article
Objective Performance validity tests (PVTs) and symptom validity tests (SVTs) are essential to neuropsychological evaluations, helping ensure findings reflect true abilities or concerns. It is unclear how PVTs and SVTs perform in children who received radiotherapy for brain tumors. Accordingly, we investigated the rate of noncredible performance on validity indicators as well as associations with fatigue and lower intellectual functioning. Method Embedded PVTs and SVTs were investigated in 98 patients with pediatric craniopharyngioma undergoing proton radiotherapy (PRT). The contribution of fatigue, sleepiness, and lower intellectual functioning to embedded PVT performance was examined. Further, we investigated PVTs and SVTs in relation to cognitive performance at pre-PRT baseline and change over time. Results SVTs on parent measures were not an area of concern. PVTs identified 0-31% of the cohort as demonstrating possible noncredible performance at baseline, with stable findings one-year following PRT. Reliable Digit Span (RDS) noted highest PVT failure rate; RDS has been criticized for false positives in pediatric populations, especially children with neurological impairment. Objective sleepiness was strongly associated with PVT failure, stressing need to consider arousal level when interpreting cognitive performance in children with craniopharyngioma. Lower intellectual functioning also needs to be considered when interpreting task engagement indices as it was strongly associated with PVT failure. Conclusions Embedded PVTs should be used with caution in pediatric craniopharyngioma patients who have received PRT. Future research should investigate different cut-off scores and validity indicator combinations to best differentiate noncredible performance due to task engagement versus variable arousal and/or lower intellectual functioning.
Article
Full-text available
Objective This study was design to evaluate the potential of the recognition trials for the Logical Memory (LM), Visual Reproduction (VR), and Verbal Paired Associates (VPA) subtests of the Wechsler Memory Scales–Fourth Edition (WMS-IV) to serve as embedded performance validity tests (PVTs). Method The classification accuracy of the three WMS-IV subtests was computed against three different criterion PVTs in a sample of 103 adults with traumatic brain injury (TBI). Results The optimal cutoffs (LM ≤ 20, VR ≤ 3, VPA ≤ 36) produced good combinations of sensitivity (.33–.87) and specificity (.92–.98). An age-corrected scaled score of ≤5 on either of the free recall trials on the VPA was specific (.91–.92) and relatively sensitive (.48–.57) to psychometrically defined invalid performance. A VR I ≤ 5 or VR II ≤ 4 had comparable specificity, but lower sensitivity (.25–.42). There was no difference in failure rate as a function of TBI severity. Conclusions In addition to LM, VR, and VPA can also function as embedded PVTs. Failing validity cutoffs on these subtests signals an increased risk of non-credible presentation and is robust to genuine neurocognitive impairment. However, they should not be used in isolation to determine the validity of an overall neurocognitive profile.
Article
Full-text available
This study was designed to expand on a recent meta-analysis that identified ≤42 as the optimal cutoff on the Word Choice Test (WCT). We examined the base rate of failure and the classification accuracy of various WCT cutoffs in four independent clinical samples (N = 252) against various psychometrically defined criterion groups. WCT ≤ 47 achieved acceptable combinations of specificity (.86–.89) at .49 to .54 sensitivity. Lowering the cutoff to ≤45 improved specificity (.91–.98) at a reasonable cost to sensitivity (.39–.50). Making the cutoff even more conservative (≤42) disproportionately sacrificed sensitivity (.30–.38) for specificity (.98–1.00), while still classifying 26.7% of patients with genuine and severe deficits as non-credible. Critical item (.23–.45 sensitivity at .89–1.00 specificity) and time-to-completion cutoffs (.48–.71 sensitivity at .87–.96 specificity) were effective alternative/complementary detection methods. Although WCT ≤ 45 produced the best overall classification accuracy, scores in the 43 to 47 range provide comparable objective psychometric evidence of non-credible responding. Results question the need for designating a single cutoff as “optimal,” given the heterogeneity of signal detection environments in which individual assessors operate. As meta-analyses often fail to replicate, ongoing research is needed on the classification accuracy of various WCT cutoffs.
Article
Full-text available
Objective The study was designed to expand on the results of previous investigations on the D-KEFS Stroop as a performance validity test (PVT), which produced diverging conclusions. Method The classification accuracy of previously proposed validity cutoffs on the D-KEFS Stroop was computed against four different criterion PVTs in two independent samples: patients with uncomplicated mild TBI (n = 68) and disability benefit applicants (n = 49). Results Age-corrected scaled scores (ACSSs) ≤6 on individual subtests often fell short of specificity standards. Making the cutoffs more conservative improved specificity, but at a significant cost to sensitivity. In contrast, multivariate models (≥3 failures at ACSS ≤6 or ≥2 failures at ACSS ≤5 on the four subtests) produced good combinations of sensitivity (.39-.79) and specificity (.85-1.00), correctly classifying 74.6-90.6% of the sample. A novel validity scale, the D-KEFS Stroop Index correctly classified between 78.7% and 93.3% of the sample. Conclusions A multivariate approach to performance validity assessment provides a methodological safeguard against sample- and instrument-specific fluctuations in classification accuracy, strikes a reasonable balance between sensitivity and specificity, and mitigates the invalid before impaired paradox.
Article
Full-text available
Objective: This study was designed to empirically investigate the signal detection profile of various multivariate models of performance validity tests (MV-PVTs) and explore several contested assumptions underlying validity assessment in general and MV-PVTs specifically. Method: Archival data were collected from 167 patients (52.4%male; MAge = 39.7) clinicially evaluated subsequent to a TBI. Performance validity was psychometrically defined using two free-standing PVTs and five composite measures, each based on five embedded PVTs. Results: MV-PVTs had superior classification accuracy compared to univariate cutoffs. The similarity between predictor and criterion PVTs influenced signal detection profiles. False positive rates (FPR) in MV-PVTs can be effectively controlled using more stringent multivariate cutoffs. In addition to Pass and Fail, Borderline is a legitimate third outcome of performance validity assessment. Failing memory-based PVTs was associated with elevated self-reported psychiatric symptoms. Conclusions: Concerns about elevated FPR in MV-PVTs are unsubstantiated. In fact, MV-PVTs are psychometrically superior to individual components. Instrumentation artifacts are endemic to PVTs, and represent both a threat and an opportunity during the interpretation of a given neurocognitive profile. There is no such thing as too much information in performance validity assessment. Psychometric issues should be evaluated based on empirical, not theoretical models. As the number/severity of embedded PVT failures accumulates, assessors must consider the possibility of non-credible presentation and its clinical implications to neurorehabilitation.
Article
Full-text available
This study was designed to investigate the potential of extreme scores on the Behavioral Rating Inventory of Executive Function-Adult Self-Report Version (BRIEF-A-SR) to serve as validity indicators. The BRIEF-A-SR was administered to 73 university students and 50 clinically referred adults. In the student sample, symptom validity was operationalized as the outcome on the Inventory of Problems (IOP-29). In the patient sample, performance validity was operationalized as the outcome on a combination of free-standing and embedded indicators. The BRIEF-A-SR had better classification accuracy in the student sample (.13–.56 sensitivity at .88–.95 specificity) compared with the patient sample (.22–.44 sensitivity at .85–.97 specificity). Combining individual cutoffs into a multivariate model improved specificity (.93) and stabilized sensitivity (.33) in the clinical sample. Failing the newly introduced cutoffs (T ≥ 65/T ≥ 80 in the student sample and T ≥ 80/T ≥ 90 in the clinical sample) was associated with failure on performance validity tests and elevations on other symptom inventories. Results provide preliminary support for an alternative method for establishing the credibility of symptom reports both within the BRIEF-A-SR and other inventories. Pending replication by future research, the newly proposed cutoffs could provide a much needed psychometric safeguard against over-diagnosing neuropsychiatric disorders due to undetected symptom exaggeration.
Article
Full-text available
Objective: This study was designed to evaluate the classification accuracy of the recently introduced forced-choice recognition trial to the Hopkins Verbal Learning Test - Revised (FCR HVLT-R ) as a performance validity test (PVT) in a clinical sample. Time-to-completion (T2C) for FCR HVLT-R was also examined. Method: Forty-three students were assigned to either the control or the experimental malingering (expMAL) condition. Archival data were collected from 52 adults clinically referred for neuropsychological assessment. Invalid performance was defined using expMAL status, two free-standing PVTs and two validity composites. Results: Among students, FCR HVLT-R ≤11 or T2C ≥45 seconds was specific (0.86-0.93) to invalid performance. Among patients, an FCR HVLT-R ≤11 was specific (0.94-1.00), but relatively insensitive (0.38-0.60) to non-credible responding0. T2C ≥35 s produced notably higher sensitivity (0.71-0.89), but variable specificity (0.83-0.96). The T2C achieved superior overall correct classification (81-86%) compared to the accuracy score (68-77%). The FCR HVLT-R provided incremental utility in performance validity assessment compared to previously introduced validity cutoffs on Recognition Discrimination. Conclusions: Combined with T2C, the FCR HVLT-R has the potential to function as a quick, inexpensive and effective embedded PVT. The time-cutoff effectively attenuated the low ceiling of the accuracy scores, increasing sensitivity by 19%. Replication in larger and more geographically and demographically diverse samples is needed before the FCR HVLT-R can be endorsed for routine clinical application.
Article
Full-text available
The purpose of this article is to provide evidence for the validity of performance curve classification on the nonverbal subtest of the Validity Indicator Profile (VIP-NV). A four-fold classification scheme of performance on cognitive testing is proposed. This scheme combines effort and motivation to generate four response classifications: compliant, careless, irrelevant, and malingering. Data are presented across six studies from cognitive and personality testing for 737 male pretrial criminal defendants. Additionally, computer-generated VIP-NV performances were subjected to four levels of randomization to investigate VIP-NV carelessness indicators. The findings support the validity of the four-fold classification scheme and support the classification of response on the basis of motivation and effort. Published by Elsevier Science Ltd
Article
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
Article
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Article
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
Article
Full-text available
The importance of performance validity tests (PVTs) is increasingly recognized in pediatric neuropsychology. To date, research has focused on investigating whether PVTs designed for adults function similarly in children. The downward extension of adult cutoffs is counter-intuitive considering the robust effect of age-related changes in basic cognitive skills in children and adolescents. The purpose of this study was to examine the signal detection properties of a forced-choice recognition trial (FCR-C) for the California Verbal Learning Test - Children's Version. A total of 72 children aged 6-15 years (M = 11.1 , SD = 2.6) completed the FCR-C as part of a larger neuropsychological assessment battery. Cross-validation analyses revealed that the FCR-C had good signal detection performance against reference PVTs. The first level of failure (≤14/15) produced the best combination of overall sensitivity (.31) and specificity (.87). A more conservative FCR-C cutoff (≤13) resulted in a predictable trade-off between sensitivity (.15) and specificity (.94), but also a net loss in discriminant power. Lowering the cutoff to ≤12 resulted in a slight improvement in specificity (.97) but further deterioration in sensitivity (.14). These preliminary findings suggest that the FCR-C has the potential to become the newest addition to a growing arsenal of pediatric PVTs.
Article
How neuropsychological assessment findings are deemed valid has been a topic of numerous articles but few have addressed any role that neuroimaging studies could provide. Within military and various clinical samples of individuals undergoing neuropsychological evaluations, high levels of failure on measures of symptom validity testing (SVT) and/or performance validity testing (PVT) have been reported. Where 'failure' is defined as a below cut-score performance on some pre-determined set-point on a SVT/PVT measure, are such failures always indicative of invalid test findings or are there other explanations, especially based on informative neuroimaging findings? This review starts with the premise that even though the SVT/PVT task is designed to be simple and easy to perform, it nonetheless requires intact frontoparietal attention, working memory and task engagement (motivation) networks. If there is damage or pathology within any aspect of these networks as demonstrated by neuroimaging findings, the patient may perform below the cut-point as a result of the underlying damage or pathophysiology. The argument is made that neuroimaging findings should be considered as to where SVT/PVT cut-points are established and there should be much greater flexibility in SVT/PVT measures based on other personal, demographic and neuroimaging information. Several case studies are used to demonstrate these points.