ArticlePDF Available

The Stroop Test as a Measure of Performance Validity in Adults Clinically Referred for Neuropsychological Assessment

Authors:

Abstract and Figures

This study was designed to develop performance validity indicators embedded within the Delis-Kaplan Executive Function Systems (D-KEFS) version of the Stroop task. Archival data from a mixed clinical sample of 132 patients (50% male; MAge= 43.4; MEducation= 14.1) clinically referred for neuropsychological assessment were analyzed. Criterion measures included the Warrington Recognition Memory Test-Words and 2 composites based on several independent validity indicators. An age-corrected scaled score ≤6 on any of the 4 trials reliably differentiated psychometrically defined credible and noncredible response sets with high specificity (.87-.94) and variable sensitivity (.34-.71). An inverted Stroop effect was less sensitive (.14-.29), but comparably specific (.85-90) to invalid performance. Aggregating the newly developed D-KEFS Stroop validity indicators further improved classification accuracy. Failing the validity cutoffs was unrelated to self-reported depression or anxiety. However, it was associated with elevated somatic symptom report. In addition to processing speed and executive function, the D-KEFS version of the Stroop task can function as a measure of performance validity. A multivariate approach to performance validity assessment is generally superior to univariate models. (PsycINFO Database Record
Content may be subject to copyright.
Psychological Assessment
The Stroop Test as a Measure of Performance Validity in
Adults Clinically Referred for Neuropsychological
Assessment
Laszlo A. Erdodi, Sanya Sagar, Kristian Seke, Brandon G. Zuccato, Eben S. Schwartz, and Robert
M. Roth
Online First Publication, February 22, 2018. http://dx.doi.org/10.1037/pas0000525
CITATION
Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2018, February 22).
The Stroop Test as a Measure of Performance Validity in Adults Clinically Referred for
Neuropsychological Assessment. Psychological Assessment. Advance online publication.
http://dx.doi.org/10.1037/pas0000525
The Stroop Test as a Measure of Performance Validity in Adults Clinically
Referred for Neuropsychological Assessment
Laszlo A. Erdodi, Sanya Sagar, Kristian Seke,
and Brandon G. Zuccato
University of Windsor
Eben S. Schwartz
Waukesha Memorial Hospital, Waukesha, Wisconsin
Robert M. Roth
Geisel School of Medicine at Dartmouth/Dartmouth-Hitchcock Medical Center
This study was designed to develop performance validity indicators embedded within the Delis-Kaplan
Executive Function Systems (D-KEFS) version of the Stroop task. Archival data from a mixed clinical
sample of 132 patients (50% male; M
Age
43.4; M
Education
14.1) clinically referred for neuropsy-
chological assessment were analyzed. Criterion measures included the Warrington Recognition Memory
Test—Words and 2 composites based on several independent validity indicators. An age-corrected scaled
score 6 on any of the 4 trials reliably differentiated psychometrically defined credible and noncredible
response sets with high specificity (.87–.94) and variable sensitivity (.34 –.71). An inverted Stroop effect
was less sensitive (.14 –.29), but comparably specific (.85–90) to invalid performance. Aggregating the
newly developed D-KEFS Stroop validity indicators further improved classification accuracy. Failing the
validity cutoffs was unrelated to self-reported depression or anxiety. However, it was associated with
elevated somatic symptom report. In addition to processing speed and executive function, the D-KEFS
version of the Stroop task can function as a measure of performance validity. A multivariate approach
to performance validity assessment is generally superior to univariate models.
Public Significance Statement
The Stroop test can function as a performance validity indicator by identifying unusual patterns of
responding. Invalid performance was associated with higher levels of self-reported somatic symptoms.
Keywords: Stroop task, performance validity, embedded validity indicators
The validity of the neuropsychological evaluation hinges on the
examinees’ ability and willingness to demonstrate their typical
level of cognitive functioning (Bigler, 2015). Therefore, there is a
broad consensus within the profession that a thorough performance
validity assessment is an essential part of the examination (Bush,
Ruff, & Heilbronner, 2014; Chafetz et al., 2015; Heilbronner et al.,
2009). As a result, the administration of multiple, nonredundant
performance validity tests (PVTs) has become a widely accepted
practice standard (Boone, 2013; Larrabee, 2014).
Although stand-alone instruments are considered the gold stan-
dard for validity assessment (Green, 2013), embedded validity
indicators (EVIs) are increasing in popularity. EVIs are derived
from traditional neuropsychological tests originally designed to
measure cognitive ability, but were subsequently coopted as PVTs.
Many EVIs have strong empirical support and a long presence in
the research literature. Some predate the most acclaimed stand-
alone PVTs, such as those based in verbal fluency (Hayward, Hall,
Hunt, & Zubrick, 1987), digit span (Greiffenstein, Baker, & Gola,
1994), or symbol substitution (Trueblood, 1994) tasks.
EVIs have several advantages over stand-alone PVTs. First,
they allow clinicians to use multiple validity indicators without
adding new measures, resulting in significant savings in test ma-
terial and administration time. Compressing the battery also lowers
the demand on patients’ mental stamina, which is especially im-
portant when assessing individuals with complex medical and
psychiatric history (Lichtenstein, Erdodi, & Linnea, 2017). EVIs
may also be more resistant to coaching, as they are less likely to be
identified as PVTs than stand-alone instruments (Chafetz et al.,
2015; Schutte, Axelrod, & Montoya, 2015). Finally, they automat-
ically address concerns about the generalizability of the PVT
scores to the rest of the battery (Bigler, 2014). Overall, these
features enable EVIs to achieve the ideal of ongoing monitoring of
test-taking effort (Boone, 2009) without placing a significant ad-
ditional burden on either the examiner or examinee.
Laszlo A. Erdodi and Sanya Sagar, Department of Psychology, Univer-
sity of Windsor; Kristian Seke, Brain-Cognition-Neuroscience Program,
University of Windsor; Brandon G. Zuccato, Department of Psychology,
University of Windsor; Eben S. Schwartz, Waukesha Memorial Hospital,
Waukesha, Wisconsin; Robert M. Roth, Geisel School of Medicine at
Dartmouth/Dartmouth-Hitchcock Medical Center.
Correspondence concerning this article should be addressed to Laszlo A.
Erdodi, 168 Chrysler Hall South, 401 Sunset Avenue, Windsor, ON N9B
3P4, Canada. E-mail: lerdodi@gmail.com
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychological Assessment © 2018 American Psychological Association
2018, Vol. 1, No. 2, 000 1040-3590/18/$12.00 http://dx.doi.org/10.1037/pas0000525
1
The Stroop (1935) paradigm, of which there are many variants,
has the potential to function as an EVI. The task usually consists
of at least three trials (MacLeod & MacDonald, 2000). In the first
trial, the participant is asked to read a series of color words, printed
in black ink, as quickly as possible. In the second trial, the
participant is asked to look at a series of color squares, and name
the colors as quickly as possible. The third trial is the test of
interference and the evoker of the classic Stroop effect: the par-
ticipant is asked to look at a series of color words, printed in
incongruent ink colors, and name the color of the ink instead of
reading the word, as quickly as possible. For example, if the word
“red” is printed in green ink, the examinee is asked to say “green”
instead of “red.” Because reading words is more automatized than
naming ink colors, inhibiting the overlearned response requires
additional cognitive resources, which results in increased comple-
tion time relative to the word reading and color naming trials
(MacLeod & MacDonald, 2000).
The Stroop task within the Delis-Kaplan Executive Function
System (D-KEFS; Delis, Kaplan, & Kramer, 2001) includes a
fourth trial (inhibition/switching) designed to further increase the
cognitive load by requiring examinees to switch back and forth
between two sets of rules. On Trial 4, half of the words are
enclosed in boxes. The examinee is instructed to name the color of
the ink for free-standing items (as in Trial 3 of the classic Stroop
task), but read the word (rather than name the ink color) for items
inside a box. Trial 4 was meant to be more difficult than the
interference trial to capture more subtle executive deficits. How-
ever, the empirical evidence on this difficulty gradient is mixed
(Lippa & Davis, 2010).
The Stroop paradigm has been shown to be sensitive to neuro-
psychiatric conditions with executive dysfunction as a common
feature, such as traumatic brain injury (TBI; Larson, Kaufman,
Schmalfuss, & Perlstein, 2007; Schroeter et al., 2007) and
attention-deficit-hyperactivity disorder (ADHD; Lansbergen, Ken-
emans, & Van Engeland, 2007). However, there is limited research
examining the utility of the Stroop paradigm as a measure of
noncredible performance. Arentsen and colleagues (2013) intro-
duced validity cutoffs for the word reading (66 s), color naming
(93 s), and interference (191 s) trials in the Comalli Stroop
Test (Comalli, Wapner, & Werner, 1962). All of these cutoffs
achieved specificity .90 in a mixed clinical population, with
.29 –.53 sensitivity.
A raw residual score (i.e., predicted score minus actual score)
of 47 on the word reading trial of the Stroop Color and Word
Test (Golden & Freshwater, 2002) discriminated noncredible from
credible responders at .95 specificity and .29 sensitivity using
Slick, Sherman, and Iverson’s (1999) criteria for malingered neu-
rocognitive dysfunction (Guise, Thompson, Greve, Bianchini, &
West, 2014). Other studies (Egeland & Langfjaeran, 2007;
Osimani, Alon, Berger, & Abarbanel, 1997) have found that non-
credible performers may display slower overall reaction time (RT)
and an inverted Stroop effect (i.e., better performance on the
interference trial than the word reading or color naming trials).
While Osimani and colleagues (1997) did not perform signal
detection analyses, Egeland and Langfjaeran (2007) reported un-
acceptably low specificity (.59) for the inverted Stroop effect, even
though the majority of noncredible performers exhibited this vio-
lation of the difficulty gradient. Furthermore, the inverted Stroop
effect as an index of validity has not been replicated consistently
in the literature (Arentsen et al., 2013).
To our knowledge, the potential of the D-KEFS Stroop to
function as an EVI has not been investigated. Given its more
nuanced difficulty gradient because of the unique combined inhi-
bition/switching task (Trial 4), it may be particularly useful as a
measure of performance validity. The purpose of this study is to
examine the utility of the D-KEFS Stroop in differentiating cred-
ible and noncredible response sets in a clinical setting.
Method
Participants
Data were collected from a consecutive sequence of 132 patients
(50% male, 89.4% right-handed), clinically referred for neuropsy-
chological assessment at a northeastern academic medical center.
The vast majority of them (95%) were White, reflecting the
demographic composition of the region. Age (M43.4, SD 16)
followed a bimodal distribution, with one peak around 20 years
and another around 55 years. Mean level of education was 14.1
years (SD 2.8). Overall intellectual functioning was in the
average range (M
FSIQ
101.2, SD
FSIQ
16.4), as were scores on
a single word reading test (M
WRAT-4
104.4, SD
WRAT-4
14.8).
The most common primary diagnosis was psychiatric (46.2%),
followed by TBI (35.6%), neurological disorders (14.4%) and
other medical conditions (3.8%). Within the psychiatric sub-
sample, most patients had been diagnosed with depression
(45.9%), followed by somatic (19.7%) and anxiety disorders
(13.1%). The majority of the TBI patients (81.1%) had sustained a
mild injury. Likewise, the average self-reported depression was in
the mild range (M
BDI-II
16.4, SD
BDI-II
14.8). Most patients
(45.6%) scored in the minimal range (13), while 21.6% scored in
the mild (14 –19), 17.6% scored in the moderate (20 –28), and
15.2% scored in the severe (29) range for self-reported depres-
sion.
Procedures
Data were collected through a retrospective chart review from
patients assessed between December 2012 and July 2014. The
main inclusion criterion was a complete administration of the
D-KEFS Stroop. The study was approved by the ethics board of
the hospital where the data were collected, and that of the univer-
sity where the research project was finalized. Relevant guidelines
regulating research with human participants were followed
throughout the study.
The names and abbreviation of tests administered are provided
in Table 1. The percentage of the sample with scores on each test
is also listed. A core battery of tests was administered to most
patients, while the rest of the instruments were selected based on
the specific referral question. Therefore, they vary from patient to
patient.
The main stand-alone PVT was Warrington’s Recognition
Memory Test—Words (RMT). Failure was defined as an accuracy
score of 43 or a completion time of 192 s (Erdodi, Tyson, et
al., 2017). In addition, a composite of 11 validity indicators labeled
“Effort Index Eleven” (EI-11) was developed to provide a com-
prehensive measure of performance validity (Erdodi, Abeare, et
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
2ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
al., 2017; Erdodi, Kirsch, Lajiness-O’Neill, Vingilis, & Medoff,
2014; Erdodi & Roth, 2017). The constituent PVTs were dichot-
omized into Pass (0) and Fail (1) along published cutoffs.
Some PVTs have multiple indicators; failing any indicator was
considered as failing the entire PVT (1). Failing multiple indi-
cators nested within the same measure was counted as a single
failure (1). Missing data were coded as Pass (0), although it
is recognized that this may increase error variance by potentially
misclassifying noncredible patients as credible.
The value of EI-11 is the sum of failures on its components.
Given the relatively large number of indicators, and that the most
liberal cutoffs were used to maximize sensitivity (see Table 2), the
EI-11 is prone to false positive errors by design. To correct for
that, the more conservative threshold of 3 independent PVT
Table 1
List of Tests Administered: Abbreviations, Scales, and Norms
Test name Abbreviation Norms % ADM
Beck Depression Inventory, 2nd Edition BDI-II 94.4
Beck Anxiety Inventory BAI 69.7
California Verbal Leaning Test, 2nd Edition CVLT-II Manual 100.0
Complex Ideational Material CIM Heaton 32.6
Conners’ Continuous Performance Test, 2nd Edition CPT-II Manual 78.8
Delis-Kaplan Executive Systems–Stroop D-KEFS Manual 100.0
Finger Tapping Test FTT Heaton 81.1
Letter and Category Fluency Test FAS & Animals Heaton 84.1
Personality Assessment Inventory PAI Manual 43.9
Recognition Memory Test–Words RMT 100.0
Rey 15-Item Test Rey-15 81.8
Rey Complex Figure Test RCFT Manual 96.2
Trail Making Test (A & B) TMT (A & B) Heaton 56.8
Wechsler Adult Intelligence Scale, 4th Edition WAIS-IV Manual 99.2
Wechsler Memory Scale, 4th Edition WMS-IV Manual 99.2
Wide Range Achievement Test, 4th Edition WRAT-4 Manual 83.3
Wisconsin Card Sorting Test WCST Manual 91.7
Note. Heaton Demographically adjusted norms published by Heaton, Miller, Taylor, and Grant (2004);
Manual Normative data published in the technical manual; % ADM Percent of the sample to which the test
was administered.
Table 2
Base Rates of Failure for EI-11 Components, Cutoffs, and References for Each Indicator
Test BR
Fail
Indicator Cutoff Reference
Rey-15 10.6 Free recall 9 Lezak, 1995; Boone et al., 2002
TMT 15.9 A B (“) 137 Shura et al., 2016
Digit Span 25.0 RDS 7 Greiffenstein et al., 1994; Pearson, 2009
ACSS 6 Axelrod et al., 2006; Spencer et al., 2013; Trueblood, 1994
LDF 4 Heinly et al., 2005
WCST 14.4 FMS 2 Larrabee, 2003; Suhr & Boyer, 1999
LRE 1.9 Greve et al., 2002; Suhr and Boyer, 1999
CIM 6.1 Raw 9 Erdodi and Roth, 2017; Erdodi, Tyson, et al., 2016
T-score 29 Erdodi and Roth, 2017; Erdodi, Tyson, et al., 2016
LM
WMS-IV
15.9 I ACSS 3 Bortnik et al., 2010
II ACSS 4 Bortnik et al., 2010
Recognition 20 Bortnik et al., 2010; Pearson, 2009
VR
WMS-IV
18.2 Recognition 4 Pearson, 2009
CVLT-II 12.9 Hits
Recognition
10 Bauer et al., 2005; Greve et al., 2009; Wolfe et al., 2010
FCR 15 Bauer et al., 2005; D. Delis (personal communication, May 10, 2012
RCFT 34.1 Copy raw 26 Lu et al., 2003; Reedy et al., 2013
3-min raw 9.5 Lu et al., 2003; Reedy et al., 2013
TP
Recognition
6 Lu et al., 2003; Reedy et al., 2013
Atyp RE 1 Blaskewitz et al., 2009; Lu et al., 2003
FAS 9.8 T-score 33 Curtis et al., 2008; Sugarman and Axelrod, 2015
Animals 16.7 T-score 33 Hayward et al., 1987; Sugarman and Axelrod, 2015
Note.BR
Fail
Base rate of failure (% of the sample that failed one or more indicators within the test); TMT Trail Making Test; RDS Reliable digit
span; ACSS Age-corrected scaled score; LDF longest digit span forward; WCST Wisconsin Card Sorting Test; FMS Failure to maintain set;
UE Unique errors; LRE Logistical regression equation; CIM Complex Ideational Material from the Boston Diagnostic Aphasia Battery; WMS-IV
Wechsler Memory Scale, 4th Edition; LM Logical Memory; VR Visual Reproduction; CVLT-II California Verbal Learning Test, 2nd Edition;
FCR Forced choice recognition; RCFT Rey Complex Figure Test; TP
Recognition
Recognition true positives; Atyp RE Atypical recognition errors.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
3
STROOP TEST
failures was used to define Fail on the EI-11. At the same time, to
maintain the purity of the credible group, Pass was defined as 1.
Hence, patients with EI-11 scores of two were considered Border-
line and excluded from the analyses (Erdodi & Roth, 2017; Erdodi,
Tyson, et al., 2017).
Relying on a mixture of PVTs representing a wide range of
sensory modalities, cognitive domains, and testing paradigms is a
desirable feature of the EI-11, as it provides an ecologically valid
index of performance validity. However, this heterogeneity could
also become a source of error variance, especially when the pur-
pose of the instrument is to establish the credibility of the perfor-
mance on a specific test, and not on the overall neurocognitive
profile. The issue of modality-specificity as a confound in signal
detection analyses was raised as a theoretical concern (Leighton,
Weinborn, & Maybery, 2014) and has found empirical support
(Erdodi, Abeare, et al., 2017; Erdodi, Tyson, et al., 2017).
Therefore, because the D-KEFS Stroop is timed, another valid-
ity composite was developed based on constituent PVTs that were
based on processing speed measures, labeled “Erdodi Index
Seven” (EI-7
PSP
). The unique feature of the EI-7
PSP
is that instead
of the traditional Pass/Fail dichotomy, each of its components is
coded on a 4-point scale ranging from zero (unequivocal Pass)to
three (unequivocal Fail), with one and two reflecting intermediate
levels of failure (see Table 3). As such, the EI-7
PSP
captures both
the number and extent of PVT failures, recognizing the underlying
continuity in test-taking effort (Erdodi, Roth, Kirsch, Lajiness-
O’Neill, & Medoff, 2014; Erdodi, Tyson, et al., 2016).
Because the practical demands of clinical classification require
a dichotomous outcome, EI-7
PSP
scores 1 were defined as Pass,
and 4asFail. EI-7
PSP
values of two and three represent an
indeterminate range, as they could reflect either multiple failures at
a liberal cutoff or a single failure at the most conservative cutoff.
As these performances are considered “near-passes” by some
(Bigler, 2012, 2015), patients with EI-7
PSP
scores in this range
were excluded from signal detection analyses in the interest of
obtaining diagnostically pure criterion groups, following method-
ological guidelines established by previous researchers (Axelrod,
Meyers, & Davis, 2014; Greve & Bianchini, 2004). The majority
of the sample (62.1%) performed in the passing range; only 15.9%
scored 4 (see Table 4).
Data Analysis
Descriptive statistics (mean, SD,BR
Fail
) are reported for the
relevant variables. The main inferential statistics were one-way
analyses of variance (ANOVAs) and independent sample ttests.
Effect size estimates are reported in Cohen’s dand partial eta
squares (
2
). Classification accuracy (sensitivity and specificity)
was calculated using standard formulas (Grimes & Schulz, 2005).
The emerging standard for specificity is .90 (Boone, 2013), with
the minimum acceptable value at .84 (Larrabee, 2003).
Results
One-way ANOVAs using the trichotomized EI-11 (Pass-
Borderline-Fail) as the independent variable, and the RMT accu-
racy score, completion time, and the EI-7
PSP
scores as the depen-
dent variables, were statistically significant. Associated effect
sizes were large (
2
: .16 –.23). Scores in the Pass range were
always significantly lower than scores in the Fail range. However,
scores in the Borderline range did not differ consistently from the
other two classification ranges (see Table 5).
These analyses were repeated using the trichotomized EI-7
PSP
(Pass-Borderline-Fail) as the independent variable, and the RMT
accuracy score, completion time, and the EI-11 scores as depen-
dent variables. All contrasts were significant with large effects (
2
:
.11–.34). As before, scores in the Pass range were always signif-
icantly lower than scores in the Fail range, but scores in the
Borderline range did not differ consistently from the other two
classification ranges (see Table 6). Overall, these findings provide
empirical support for eliminating participants with EI-11 and EI-
7
PSP
scores in the Borderline range when computing the classifi-
cation accuracy of the D-KEFS Stroop to minimize error variance
and establish criterion groups with neurocognitive profiles that are
either clearly valid or invalid.
Mean D-KEFS Stroop age-corrected scaled scores (ACSS) were
in the average range on all four trials. However, they were signif-
icantly below the nominal mean of 10, with small to medium effect
sizes (Cohen’s d: .25–.44). Skew and kurtosis were well within
1.0 (see Table 7). However, visual inspection revealed bimodal
distributions with one peak in the impaired range and another in
the average-to-high average range.
Trial 1 ACSS 7 failed to clear the minimum threshold for
specificity (.84; Larrabee, 2003) against the RMT and EI-11.
The 6 cutoff produced good combinations of sensitivity (.43–
.71) and specificity (.86 –.94) against all three reference PVTs.
Lowering the cutoff to 5 produced negligible changes in classi-
fication accuracy. The more conservative 4 cutoff resulted in
predictable tradeoffs: improved specificity (.93–.99) at the expense
of sensitivity (.29 –.43).
Trial 2 ACSS 7 cleared the minimum threshold for specificity
against the EI-11 and EI-7
PSP
, but fell short against the RMT.
The 6 cutoff produced good combinations of sensitivity (.45–
.62) and specificity (.87–.91) against all three reference PVTs.
Lowering the cutoff to 5 improved specificity across all refer-
ence PVTs (.92–.96) with minimal loss in sensitivity (.38 –.57).
The more conservative 4 cutoff produced excellent specificity
(.96 –.99) with relatively well-preserved sensitivity (.33–.48).
Trial 3 ACSS 7 cleared the minimum threshold for specificity
against the EI-11 and EI-7
PSP
, but once again, fell short of expec-
tations against the RMT. Lowering the cutoff to 6 resulted in the
predictable tradeoffs, but still failed to reach minimum specificity
against the RMT. Lowering the cutoff to 5 improved specificity
across all reference PVTs (.87–.99) with minimal loss in sensitiv-
ity (.26 –.62). The more conservative 4 cutoff produced excellent
specificity (.94 –1.00) with acceptable sensitivity (.26 –.52).
Trial 4 ACSS 7 failed to clear the minimum threshold for
specificity against the RMT and EI-11. The 6 cutoff produced
acceptable combinations of sensitivity (.29 –.48) and specificity
(.84 –.91) against all three reference PVTs. Lowering the cutoff
to 5 produced negligible changes in classification accuracy. The
more conservative 4 cutoff resulted in predictable tradeoffs:
improved specificity (.92–.96) at the expense of sensitivity (.21–
.33).
To evaluate whether the pattern of performance across the trials
can reveal invalid responding, two additional derivative validity
indices were examined: the Trials 4/3 raw score ratio, and the
(Trials 1 2)/(Trials 3 4) ACSS ratio. The former index is a
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
4ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
measure of the absolute difficulty gradient, as examinees are
expected to take longer to finish Trial 4, given its added cognitive
load of switching between two sets of rules. Indeed, on average,
patients produced higher completion times on Trial 4 relative to
Trial 3 (Trials 4/3 raw score ratio: M1.17, SD 0.32, range:
0.65–2.59). The distribution was bimodal, with the bulk of the
sample forming a bell-shaped distribution around the mean and a
small group of positive outliers. The latter index, a ratio of aggre-
gated easy (Trials 1 and 2) versus hard (3 and 4) trials, is a
measure of the relative difficulty gradient. Norm-referencing (i.e.,
age-correction) is expected to equalize the increase in task de-
mands from the first two to the last two trials. As expected, the
overall (Trials 1 2)/(Trials 3 4) ACSS ratio was close to 1.00:
M1.05, SD 0.44, range: 0.33–3.50. Again, the distribution
was bimodal, with a bell-shaped majority around the mean and a
small group of positive outliers.
A Trials 4/3 raw score ratio 0.90 cleared the minimum thresh-
old for specificity against all reference PVTs, but sensitivity was
low (.14 –.24). Lowering the cutoff to 0.85 resulted in predict-
able tradeoffs, with notable increase in specificity (.95–.97), but
further loss in sensitivity (.10 –.14). Lowering the cutoff to 0.80
produced negligible changes in classification accuracy. The
more conservative 0.75 cutoff produced excellent specificity
(.98 –1.00), but very low sensitivity (.07–.14).
A (Trials 1 2)/(Trials 3 4) ACSS ratio 0.80 failed to
achieve minimum specificity against any of the reference PVTs.
Lowering the cutoff to 0.75 cleared the lower threshold for
specificity against all reference PVTs (.88 –.89), with accept-
able sensitivity (.26 –.29). Lowering the cutoff to 0.70 pro-
duced predictable tradeoffs: improved specificity (.92–.94) at
the expense of sensitivity (.21–.24). The more conserva-
tive 0.65 cutoff produced excellent specificity (.98 –1.00), but
low sensitivity (.12–.24).
Finally, the effect of cumulative failures on independent
D-KEFS Stroop validity indicators was examined (see Table 8).
Failing at least two of the six newly introduced embedded PVTs
produced good combinations of sensitivity (.61–.81) and specific-
ity (.86 –.87) against the EI-11 and EI-7
PSP
, but fell short of the
minimum specificity standard against the RMT. Failing at least
three indicators cleared the specificity threshold against all three
reference PVTs (.84 –.93), at the expense of sensitivity (.36 –.57).
Failing at least four indicators produced consistently high speci-
ficity (.92–.99), with further loss in sensitivity (.29 –.52). Failing
five indicators (the highest value observed) was associated with
perfect specificity, but low sensitivity (.10 –.14).
Given the high base rate of psychiatric disorders in the sample,
we examined the relationship between self-reported emotional
distress and PVT failure. There was no difference in BDI-II scores
between patients who passed and those who failed the three
reference PVTs and five of the newly developed validity cutoffs
embedded in the D-KEFS Stroop. Trial 4 ACSS 6 was an
isolated exception; failing this cutoff was associated with lower
levels of depression (d.42). Similarly, no difference was found
in BAI scores between patients who passed and those who failed
the three reference PVTs and four of the newly developed validity
cutoffs embedded in the D-KEFS Stroop. The exceptions were
Trial 2 ACSS 6 and (Trials 1 2)/(Trials 3 4) ACSS
ratio 0.75. In both cases, failing the cutoff was associated with
increased levels of anxiety (d: .41–.55).
To further explore the potential contribution of psychiatric
symptoms to PVT failure, we performed a series of ttests using
Pass/Fail status on the PVTs as independent variables, and the PAI
clinical scales as dependent variables (see Table 9). All patients
passed the validity cutoff on the Negative Impression Management
scale. The Somatic Concerns scale was the only scale with signif-
icant contrasts. Effect sizes ranged from medium (d.54) to large
(d.77). No difference emerged against the EI-7
PSP
and the
derivative D-KEFS Stroop validity indices. Within the Somatic
Concerns scale, effect sizes were generally larger on the Conver-
sion subscale (d: .50 –1.06), but again, contrasts involving the
Table 3
The Components of the EI-7
PSP
With Base Rates of Failure
Corresponding to Each Cutoff
EI-7
PSP
component
EI-7
PSP
values
01 23
FTT number of failures 0 1 2
Base rate 96.2 .8 3.0
FAS T-scores 33 32–33 28–31 27
Base rate 87.9 5.3 2.3 4.5
Animals T-scores 33 25–33 21–24 20
Base rate 83.3 8.3 3.8 4.5
TMT A B raw scores 137 137–221 222–255 256
Base rate 84.1 10.6 2.3 3.0
CPT-II number of failures 0 1 2 3
Base rate 66.7 15.2 5.3 12.9
WAIS-IV CD (ACSS) 55 4 3
Base rate 90.2 1.5 5.3 3.0
WAIS-IV SS (ACSS) 55 4 3
Base rate 85.6 6.8 4.5 3.0
Note. EI-7
PSP
“Erdodi Index Seven” based on processing speed mea-
sures; FTT Failures Finger tapping test, number of scores at 35
(men)/28 (women) dominant hand and 66 (men)/58 (women) combined
mean raw scores (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014);
FAS Letter fluency T-score (Curtis et al., 2008; Sugarman & Axelrod,
2015); Animals Category fluency T-score (Sugarman & Axelrod, 2015);
CPT-II Failures Conners’ Continuous Performance Test, 2nd Edition;
number of T-scores 70 on Omissions, Hit Reaction Time Standard Error,
Variability and Perseverations (Erdodi, Pelletier, & Roth, 2016; Erdodi,
Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve, & Bianchini,
2010); WAIS-IV CD (ACSS) Age-corrected scaled score on the Coding
subtest of the Wechsler Adult Intelligence Scale—Fourth Edition (Erdodi,
Abeare, et al., 2017; Etherton et al., 2006; N. Kim et al., 2010; Trueblood,
1994); WAIS-IV SS (ACSS) Age-corrected scaled score on the Symbol
Search subtest of the Wechsler Adult Intelligence Scale—Fourth Edition
(Erdodi, Abeare, et al., 2017; Etherton et al., 2006; Trueblood, 1994).
Table 4
Frequency Distribution of the EI-7
PSP
With Classification Ranges
EI-7
PSP
f%%
Cumulative
Classification
0 61 46.2 46.2 Pass
1 21 15.9 62.1 Pass
2 12 9.1 71.2 Borderline
3 17 12.9 84.1 Borderline
4 5 3.8 87.9 Fail
5 2 1.5 89.4 Fail
6 4 3.0 92.4 Fail
7 3 2.3 94.7 Fail
8 0 .0 94.7 Fail
9 1 .8 95.5 Fail
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
5
STROOP TEST
derivative D-KEFS Stroop validity indices failed to reach signif-
icance. The Somatization subscale was only associated with failing
the RMT (d.54). Significant differences reemerged on the
Health Concerns subscale, with effect sizes ranging from medium
(d.52) to large (d.84). Contrasts involving the EI-7
PSP
,
D-KEFS Stroop Trial 2 and the derivative validity indices failed to
reach significance.
Discussion
This study explored the potential of D-KEFS Stroop to function as
a PVT. A scaled score below 1 SD of the normative mean on any of
the four trials was a reliable indicator of psychometrically defined
invalid performance. Violating the difficulty gradient (i.e., scoring
better on difficult tasks than on easier tasks) was also reliably asso-
ciated with failure on reference PVTs. All six EVIs produced bimodal
distributions with a distinct cluster of outliers in the range of non-
credible impairment, indicating that valid and invalid performance
may start to diverge at the level of descriptive statistics. Overall,
results suggest that in addition to measuring basic processing speed
and executive function, the D-KEFS Stroop is also an effective PVT.
This finding is consistent with earlier investigations using different
versions of the Stroop task (Arentsen et al., 2013; Egeland & Lang-
fjaeran, 2007; Guise et al., 2014; Osimani et al., 1997).
Labeling a score that is only 1 SD below the mean as invalid
may appear an extreme measure at first, as it implies that as many
as 16% of the original normative sample demonstrated invalid
performance. However, the practice is not without precedent.
Shura, Miskey, Rowland, Yoash-Gantz, and Denning (2016) dem-
onstrated that an ACSS 7 (Low Average) on Letter-Number
Sequencing was a reliable indicator of noncredible responding.
Moreover, Baker, Connery, Kirk, and Kirkwood (2014) found a
recognition discriminability z-score of 0.5 (Average) to be the
marker of invalid performance on the California Verbal Learning
Test—Children’s Version.
This phenomenon of the noncredible range of performance
expanding into the traditional range of normal cognitive function-
ing has been recently labeled as the “invalid-before-impaired’
paradox.” Erdodi and Lichtenstein (2017) recently argued that this
apparent psychometric anomaly has multiple possible explana-
tions, one of which is that few (if any) normative samples are
screened for invalid performance. Therefore, noncredible respond-
ing contaminates the scaling process used to establish ACSSs. In
Table 5
Results of One-Way ANOVAs on RMT and EI-7
PSP
Scores Across EI-11Classification Ranges
Outcome
Measure
Descriptive
Statistics
EI-11
Fp
2
Significant
post hocs
0–1 2 3
n76
PASS
n18
BOR
n38
FAIL
RMT
Accuracy
M47.4 44.5 42.2 12.1 .001 .16 PASS vs. BOR
SD 3.4 6.6 7.7 PASS vs. FAIL
RMT
Time
M130.0 147.3 191.3 11.1 .001 .15 PASS vs. FAIL
SD 63.4 52.9 74.4 BOR vs. FAIL
EI-7
PSP
M0.8 1.8 4.2 19.1 .001 .23 PASS vs. FAIL
SD 1.6 1.4 4.5 BOR vs. FAIL
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-11 Effort
Index Eleven; BOR Borderline; RMT
Accuracy
Recognition Memory Test–Words (Accuracy score);
RMT
Time
Recognition Memory Test–Words (completion time in seconds); EI-7
PSP
“Erdodi Index Seven”
based on processing speed measures; ANOVA analysis of variance.
Table 6
Results of One-Way ANOVAs on RMT and EI-11Scores Across EI-7
PSP
Classification Ranges
Outcome
Measure
Descriptive
Statistics
EI-7
PSP
Fp
2
Significant
post hocs
0 –1 2–3 4
n82
PASS
n29
BOR
n21
FAIL
RMT
Accuracy
M47.0 43.3 42.7 8.08 .001 .11 PASS vs. BOR
SD 4.1 7.2 7.7 PASS vs. FAIL
RMT
Time
M123.2 189.9 207.9 21.5 .001 .25 PASS vs. BOR
SD 53.8 70.4 75.3 PASS vs. FAIL
EI-11 M1.0 2.2 4.2 33.6 .001 .34 PASS vs. FAIL
SD 1.3 1.7 2.2 PASS vs. BOR
BOR vs. FAIL
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-7
PSP
“Erdodi Index Seven” based on processing speed measures; BOR Borderline; RMT
Accuracy
Recognition
Memory Test–Words (Accuracy score); RMT
Time
Recognition Memory Test–Words (Completion time in
seconds); EI-11 Effort Index Eleven; ANOVA analysis of variance.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
6ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
turn, later research discovers that scores commonly interpreted as
within normal limits are in fact indicative of invalid performance.
They concluded that EVI cutoffs that reach into the range of
functioning traditionally considered intact provide valuable infor-
mation about the credibility of the response set and therefore,
should not be automatically discounted.
Within the D-KEFS Stroop, derivative validity indicators be-
haved differently both relative to reference PVTs and to single-
trial validity cutoffs. First, the derivative validity indicators had
consistently lower BR
Fail
, suggesting that pattern violations are
less common manifestations of noncredible responding than ab-
normally slow completion time. This finding is congruent with
earlier reports that an inverted or absent Stroop effect does not
occur in credible examinees; therefore, it is highly specific to
invalid performance (Osimani et al., 1997). As a direct conse-
quence of this, derivative validity indicators were generally less
sensitive, which may also reflect the inconsistency in the literature
regarding the inverted Stroop effect as an index of performance
validity. While some studies found that noncredible performers
perform better on more difficult trials, this pattern of performance
failed to demonstrate adequate classification accuracy (Arentsen et
al., 2013; Egeland & Langfjaeran, 2007).
Despite the variability in sample characteristics, methodology,
version of Stroop task, reference PVTs, and BR
Fail
, our findings
Table 7
D-KEFS Stroop Age-Corrected Scaled Scores Across the Four
Trials for the Entire Sample (N 132)
Name
D-KEFS Stroop Trials
Color naming Word reading Inhibition Inhibition/switching
Number 1 2 3 4
M8.6 9.2 9.0 9.1
SD 3.4 3.5 3.8 3.5
Median 10 10 9.5 10
Skew .57 .66 .53 .63
Kurtosis .33 .33 .27 .12
Range 1–15 1–15 1–15 1–15
Note. D-KEFS Delis-Kaplan Executive Systems.
Table 8
Classification Accuracy of Validity Indicators Embedded in the D-KEFS Stroop Task Against Reference PVTs
RMT (n132) EI-11 (n114) EI-7
PSP
(n103)
D-KEFS
Stroop
31.8 33.3 20.4
Cutoff BR
Fail
SENS SPEC SENS SPEC SENS SPEC
Trial 1 7 34.1 .55 .76 .61 .82 .76 .88
6 23.5 .43 .86 .53 .92 .71 .94
5 21.2 .43 .89 .50 .93 .62 .94
4 13.6 .29 .93 .37 .96 .43 .99
Trial 2 7 26.5 .48 .83 .52 .87 .62 .88
6 23.5 .45 .87 .45 .88 .62 .91
5 17.4 .38 .92 .44 .95 .57 .96
4 13.6 .33 .96 .39 .98 .48 .99
Trial 3 7 31.1 .45 .76 .55 .88 .71 .85
6 22.0 .31 .82 .42 .91 .62 .93
5 17.4 .26 .87 .37 .93 .62 .99
4 12.1 .26 .94 .32 .97 .52 1.00
Trial 4 7 26.5 .33 .77 .47 .83 .67 .88
6 19.7 .29 .84 .34 .87 .48 .91
5 16.7 .24 .88 .29 .90 .48 .95
4 12.1 .21 .92 .21 .92 .33 .96
Trials 4/3 .90 14.3 .14 .86 .21 .90 .24 .85
Raw score .85 6.8 .10 .96 .13 .97 .14 .95
.80 5.3 .07 .97 .11 .99 .14 .98
.75 3.8 .07 .98 .11 .99 .14 .99
Trials .80 22.3 .29 .80 .32 .82 .29 .81
(1 2)/(3 4) .75 16.7 .26 .88 .27 .88 .29 .89
ACSS .70 11.4 .21 .93 .18 .92 .24 .94
.65 7.6 .14 .96 .16 .96 .24 .99
Cumulative 2 31.1 .50 .78 .61 .86 .81 .87
Failures 3 22.0 .36 .84 .45 .92 .57 .93
4 14.4 .29 .92 .32 .93 .52 .99
5 3.0 .10 1.00 .11 1.00 .14 1.00
Note. D-KEFS Delis-Kaplan Executive Function System; PVT performance validity tests; RMT Warrington Recognition Memory Test–Words
[Pass: accuracy score 43 and time-to-completion 192”; Fail: accuracy score 43 or time-to-completion 192” (Erdodi, Kirsch, et al., 2014; Erdodi,
Tyson, et al., 2017; M. S. Kim et al., 2010)]; EI-11 Effort Index Eleven [Pass 1; Fail 3 (Erdodi & Roth, 2017; Erdodi, Tyson, et al., 2017)]; EI-7
PSP
“Erdodi Index Seven” based on processing speed measures [Pass 1; Fail 4 (Erdodi, Roth, et al., 2014; Erdodi, Tyson, et al., 2016, 2017)]; BR
Fail
Base rate of failure (percentage); SENS Sensitivity; SPEC Specificity; Trial 1 Color Naming age-corrected scaled score (ACSS); Trial 2 Word
Reading ACSS; Trial 3 Inhibition ACSS (classic Stroop task); Trial 4 Inhibition/Switching ACSS; Cumulative Failures Number of validity indices
failed (Trials 1– 4 ACSS 6; Trials 3/4 raw score ratio .90; Trials (1 2)/(3 4) ACSS ratio .75).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
7
STROOP TEST
are broadly consistent with the extant literature in that the inverted
Stroop effect is more common in noncredible examinees, but has
limited discriminant power. Arentsen and colleagues (2013) note
that the interference trial may be associated with poor specificity
because while most PVTs are designed to appear difficult, but are
in fact easy, the opposite is true for the Stroop task: the interfer-
ence trial is actually difficult for most individuals.
As such, the inverted Stroop effect as an EVI follows the reverse
logic compared with classic stand-alone PVTs, as performing well
on a difficult task is meant to expose noncredible responding rather
than performing poorly on an easy task. Although the inverted
Stroop effect seems less effective at separating valid and invalid
response sets, it appears to tap different manifestations of noncred-
ible performance. Therefore, it may provide valuable nonredun-
dant information for the multivariate model of validity assessment
(Boone, 2013; Larrabee, 2003).
An emergent finding of cross-validation analyses is the
modality-specificity of classification accuracy (Leighton et al.,
2014). Of the three reference PVTs, one was a traditional stand-
alone measure based on the forced choice recognition paradigm
(RMT), one was a composite measure based on the number of
independent PVT failures (EI-11), and one was a composite of
validity indicators specifically selected to match the target con-
structs in the Stroop task (EI-7
PSP
). The four base trials of the
D-KEFS Stroop and the (Trials 1 2)/(Trials 3 4) ACSS ratio
produced the best overall classification accuracy against the EI-
7
PSP
. The Trials 4/3 raw score ratio was a marginal exception,
reiterating the divergence in the psychometric properties of the
derivative validity indices. Nevertheless, all newly introduced
D-KEFS Stroop based validity cutoffs had the highest sensitivity
against the EI-7
PSP
. In several cases, sensitivity values were dou-
ble than that observed against the RMT.
Table 9
Results of Independent T-Tests Comparing Scores on PAI Somatization Scales as a Function of Passing or Failing PVTs
PAI scale PVT-outcome RMT EI-11 EI-7
PSP
D-KEFS Stroop trials
12344
3
12
34
n58 49 47 58 58 58 58 58 58
SOM Pass
M59.1 58.4 59.2 59.3 60.0 59.4 60.3 61.2 61.0
SD 11.1 11.8 11.8 11.7 11.8 11.7 12.0 13.0 12.5
Fail
M67.4 65.9 65.2 70.0 69.9 69.2 70.3 61.9 63.3
SD 16.1 15.7 18.5 15.7 17.8 15.9 19.3 13.9 17.2
p.05 .05 .11 .01 .05 .05 .05 .44 .33
d.60 .54 — .77 .66 .70 .62
SOM
CONV
Pass
M56.4 52.7 56.1 55.6 56.4 55.7 57.3 58.2 57.2
SD 11.1 9.2 12.2 11.2 12.2 11.6 12.7 14.2 13.0
Fail
M64.4 65.4 64.3 71.6 72.1 71.0 70.2 60.6 65.5
SD 19.5 17.8 19.0 18.1 18.8 17.9 22.1 14.6 20.1
p.05 .05 .05 .01 .01 .01 .05 .31 .07
d.50 .90 .51 1.06 .99 1.01 .72
SOM
SOM
Pass
M57.6 59.9 58.5 58.7 58.9 58.5 59.2 59.9 60.0
SD 14.2 14.6 13.6 14.0 13.9 14.1 13.8 15.0 14.1
Fail
M65.2 60.6 60.2 64.1 64.6 64.9 63.8 59.0 57.6
SD 13.9 15.2 19.0 16.3 17.7 15.4 20.6 12.6 17.4
p.05 .43 .37 .14 .15 .10 .23 .44 .33
D.54— — ——————
SOM
H-CON
Pass
M59.2 58.9 59.1 59.7 60.3 59.9 60.1 61.0 60.9
SD 9.6 10.8 10.1 10.4 10.3 10.3 10.7 10.8 10.8
Fail
M65.8 65.4 65.0 66.9 65.6 66.1 69.3 61.3 61.9
SD 13.3 11.9 14.0 12.4 15.2 13.2 11.1 12.6 13.0
p.05 .05 .07 .05 .11 .05 .05 .47 .41
d.57 .57 .63 — .52 .84 —
Note. D-KEFS Delis-Kaplan Executive Function System; PVT performance validity tests; PAI Personality Assessment Inventory; SOM
Somatic Concerns scale; SOM
CONV
Conversion subscale; SOM
SOM
Somatization subscale; SOM
H-CON
Health Concerns subscale; RMT
Warrington Recognition Memory Test–Words [Pass: accuracy score 43 and time-to-completion 192”; Fail: accuracy score 43 or time-to-
completion 192” (Erdodi, Kirsch, et al., 2014; Erdodi, Tyson, et al., 2017; M. S. Kim et al., 2010)]; EI-11 Effort Index Eleven [Pass 1; Fail 3
(Erdodi & Roth, 2017; Erdodi, Tyson, et al., 2017)]; EI-7
PSP
“Erdodi Index Seven” based on processing speed measures [Pass 1; Fail 4 (Erdodi,
Kirsch, et al., 2014; Erdodi, Tyson, et al., 2017; Erdodi, Abeare, et al., 2017)]; BR
Fail
Base rate of failure (percentage); SENS Sensitivity; SPEC
Specificity; Trial 1 Color Naming age-corrected scaled score (cutoff for failure 6); Trial 2 Word Reading age-corrected scaled score (cutoff for
failure 6); Trial 3 Inhibition (classic Stroop task) age-corrected scaled score (cutoff for failure 6); Trial 4 Inhibition/Switching age-corrected scaled
score (cutoff for failure 6); Trial 4/3 The ratio of Trial 4 and Trial 3 raw scores (cutoff for failure .90); Trials (1 2)/(3 4) The ratio of the
sum of Trials 1 and 2 over the sum of Trials 3 and 4 age-corrected scaled scores (cutoff for failure .75)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
8ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
These findings resonate with earlier studies (Erdodi, Abeare, et
al., 2017; Erdodi, Tyson, et al., 2017; Lichtenstein et al., 2017),
and serve as a reminder that the choice of criterion measure can
influence the perceived utility of the test being evaluated. In
addition, they illustrate the importance of the methodological
pluralism in cross-validating PVTs (Boone, 2013; Larrabee, 2014)
at group level, and determining the veracity of an individual
response set (Larrabee, 2003, 2008; Vallabhajosula, & van Gorp,
2001), as it can protect against instrumentation artifacts. Knowing
that a new cutoff performs well against several different reference
PVTs increases confidence in the reliability of its signal detection
performance (Erdodi & Roth, 2017).
Combining the newly developed EVIs within the D-KEFS
Stroop improved overall classification accuracy. Cutoffs based on
cumulative failures produced superior signal detection profiles
relative to individual EVIs at comparable BR
Fail
, consistent with
previous research (Larrabee, 2003, 2008). Even though the internal
logic behind the practice of aggregating multiple validity indica-
tors prioritizes sensitivity over specificity (Proto et al., 2014), at
the appropriate cutoffs, multivariate models actually reduce false
positive rates (Davis & Millis, 2014; Larrabee, 2014).
Passing or failing the newly developed validity cutoffs within
the D-KEFS Stroop was largely unrelated to depression and anx-
iety, consistent with previous reports investigating the relationship
between depression and PVT failure (Considine et al., 2011; Rees,
Tombaugh, & Boulay, 2001). However, patients who failed the
reference PVTs and the newly introduced validity cutoffs in Trials
1– 4 of the D-KEFS Stroop reported higher levels of somatization
on the PAI, even though no systematic differences were observed
on any of the other clinical scales. This finding is consistent with
previous reports on the relationship between the somatization scale
of the PAI and PVT failures (Whiteside et al., 2010).
In this study, we introduced a range of validity cutoffs for each
of the four base trials of the D-KEFS Stroop, as well as two
derivative validity indices, recognizing the need for flexible,
population-specific cutoff scores (Bigler, 2015). To our knowl-
edge, this is the first attempt to develop EVIs within the D-KEFS
version of the Stroop task. In addition, we examined the relation-
ship between PVT failures and self-reported psychiatric symp-
toms. The signal detection profiles of the new validity indicators
across the engineered differences among the reference PVTs pro-
vided an opportunity to reflect on the instrumentation artifacts as
potential confounds in the cross-validation methodology used to
calibrate new validity indices.
The results of the study should be interpreted in the context of
its limitations. The sample was geographically restricted and un-
usually high functioning for a clinical setting. However, the overall
intellectual functioning in our sample was comparable with previ-
ous research involving patients with neurological disorders from
the Northeastern United States (Blonder, Gur, Gur, Saykin, &
Hurtig, 1989; Erdodi, Pelletier, & Roth, 2016; Saykin et al., 1995).
In addition, the sample was diagnostically heterogeneous. There-
fore, it is unclear if the newly introduced cutoffs will perform
similarly across patients with different neuropsychiatric condi-
tions. Until replicated in different clinical populations, these cut-
offs should only be applied to patients with clinical characteristics
that are similar to the present sample, as they may be associated
with unacceptably high false positive error rates in examinees with
severe neurological conditions.
Further, as indeterminate cases were excluded from the analyses
to maximize the diagnostic purity of the criterion groups, this
practice may have inflated classification accuracy estimates. More-
over, the time or sequence of administration was not available for
the D-KEFS Stroop, even though these factors have been raised as
potential confounds in the clinical interpretation of cognitive tests
in general (Erdodi & Lajiness-O’Neill, 2014), and of PVT failures
specifically (Bigler, 2015). Finally, in the absence of data on
litigation status, the criterion groups (Valid/Invalid) were psycho-
metrically defined. Given that external incentive to appear im-
paired has been previously suggested as a relevant diagnostic
criterion for noncredible neurocognitive performance (Slick, Sher-
man, & Iverson, 1999), the newly introduced cutoffs would benefit
from cross-validation using known-group designs that incorporate
incentive status. As always, future research using different sam-
ples, diagnostic categories, and reference PVTs are needed to
establish the generalizability of these findings.
References
Arentsen, T. J., Boone, K. B., Lo, T. T., Goldberg, H. E., Cottingham,
M. E., Victor, T. L.,...Zeller, M. A. (2013). Effectiveness of the
Comalli Stroop Test as a measure of negative response bias. The Clinical
Neuropsychologist, 27, 1060 –1076. http://dx.doi.org/10.1080/13854046
.2013.803603
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPher-
son, S. (2005). Sensitivity and specificity of finger tapping test scores for
the detection of suspect effort. The Clinical Neuropsychologist, 19,
105–120. http://dx.doi.org/10.1080/13854040490888567
Axelrod, B. N., Fichtenberg, N. L., Millis, S. R., & Wertheimer, J. C.
(2006). Detecting incomplete effort with Digit Span from the Wechsler
Adult Intelligence Scale-Third Edition. The Clinical Neuropsychologist,
20, 513–523. http://dx.doi.org/10.1080/13854040590967117
Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger Tapping Test
performance as a measure of performance validity. The Clinical Neuro-
psychologist, 28, 876 – 888. http://dx.doi.org/10.1080/13854046.2014
.907583
Baker, D. A., Connery, A. K., Kirk, J. W., & Kirkwood, M. W. (2014).
Embedded performance validity indicators within the California Verbal
Learning Test, Children’s Version. The Clinical Neuropsychologist, 28,
116 –127. http://dx.doi.org/10.1080/13854046.2013.858184
Bauer, L., Yantz, C. L., Ryan, L. M., Warden, D. L., & McCaffrey, R. J.
(2005). An examination of the California Verbal Learning Test II to
detect incomplete effort in a traumatic brain-injury sample. Applied
Neuropsychology, 12, 202–207. http://dx.doi.org/10.1207/s15324826
an1204_3
Bigler, E. D. (2012). Symptom validity testing, effort, and neuropsycho-
logical assessment. Journal of the International Neuropsychological
Society, 18, 632– 640. http://dx.doi.org/10.1017/S1355617712000252
Bigler, E. D. (2014). Effort, symptom validity testing, performance validity
testing and traumatic brain injury. Brain Injury, 28, 1623–1638. http://
dx.doi.org/10.3109/02699052.2014.947627
Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity
and performance validity testing. Brain Imaging and Behavior, 9, 421–
444. http://dx.doi.org/10.1007/s11682-015-9409-1
Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection of subop-
timal effort with the Rey Complex Figure Test and recognition trial.
Applied Neuropsychology, 16, 54 – 61. http://dx.doi.org/10.1080/
09084280802644227
Blonder, L. X., Gur, R. E., Gur, R. C., Saykin, A. J., & Hurtig, H. I. (1989).
Neuropsychological functioning in hemiparkinsonism. Brain and Cog-
nition, 9, 244 –257. http://dx.doi.org/10.1016/0278-2626(89)90034-1
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
9
STROOP TEST
Boone, K. B. (2009). The need for continuous and comprehensive sam-
pling of effort/response bias during neuropsychological examinations.
The Clinical Neuropsychologist, 23, 729 –741. http://dx.doi.org/10.1080/
13854040802427803
Boone, K. B. (2013). Clinical Practice of Forensic Neuropsychology. New
York, NY: Guilford Press.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002).
The Rey 15-item recognition trial: A technique to enhance sensitivity of
the Rey 15-item memorization test. Journal of Clinical and Experimen-
tal Neuropsychology, 24, 561–573. http://dx.doi.org/10.1076/jcen.24.5
.561.1004
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E.,
Cottingham, M. E.,...Zeller, M. A. (2010). Examination of various
WMS-III logical memory scores in the assessment of response bias. The
Clinical Neuropsychologist, 24, 344 –357. http://dx.doi.org/10.1080/
13854040903307268
Bush, S. S., Heilbronner, R. L., & Ruff, R. (2014). Psychological assess-
ment of symptom and performance validity, response bias, and malin-
gering: Official position of the Association of Psychological Advance-
ment in Psychological Injury and Law. Psychological Injury and Law, 7,
197–205. http://dx.doi.org/10.1007/s12207-014-9198-7
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini, K. J.,
Boone, K. B., Kirkwood, M. W.,...Ord, J. S. (2015). Official position
of the American Academy of Clinical Neuropsychology Social Security
Administration policy on validity testing: Guidance and recommenda-
tions for change. The Clinical Neuropsychologist, 29, 723–740. http://
dx.doi.org/10.1080/13854046.2015.1099738
Comalli, P. E., Jr., Wapner, S., & Werner, H. (1962). Interference effects
of Stroop color-word test in childhood, adulthood, and aging. The
Journal of Genetic Psychology, 100, 47–53. http://dx.doi.org/10.1080/
00221325.1962.10533572
Considine, C. M., Weisenbach, S. L., Walker, S. J., McFadden, E. M.,
Franti, L. M., Bieliauskas, L. A.,...Langenecker, S. A. (2011).
Auditory memory decrements, without dissimulation, among patients
with major depressive disorder. Archives of Clinical Neuropsychology,
26, 445– 453. http://dx.doi.org/10.1093/arclin/acr041
Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini, K. J. (2008).
Verbal fluency indicators of malingering in traumatic brain injury:
Classification accuracy in known groups. The Clinical Neuropsycholo-
gist, 22, 930 –945. http://dx.doi.org/10.1080/13854040701563591
Davis, J. J., & Millis, S. R. (2014). Examination of performance validity
test failure in relation to number of tests administered. The Clinical
Neuropsychologist, 28, 199 –214. http://dx.doi.org/10.1080/13854046
.2014.884633
Delis, D. C., Kaplan, E., & Kramer, J. H. (2001). Delis-Kaplan executive
function system (D-KEFS). San Antonio, TX: Psychological Corpora-
tion.
Egeland, J., & Langfjaeran, T. (2007). Differentiating malingering from
genuine cognitive dysfunction using the Trail Making Test-ratio and
Stroop Interference scores. Applied Neuropsychology, 14, 113–119.
http://dx.doi.org/10.1080/09084280701319953
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski,
B., Zuccato, B. G., & Roth, R. M. (2017). Wechsler Adult Intelligence
Scale-Fourth Edition (WAIS-IV) processing speed scores as measures of
noncredible responding: The third generation of embedded performance
validity indicators. Psychological Assessment, 29, 148 –157. http://dx
.doi.org/10.1037/pas0000319
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff,
B. (2014). Comparing the Recognition Memory Test and the Word
Choice Test in a mixed clinical sample: Are they equivalent? Psycho-
logical Injury and Law, 7, 255–263. http://dx.doi.org/10.1007/s12207-
014-9197-8
Erdodi, L. A., & Lajiness-O’Neill, R. (2014). Time-related changes in
Conners’ CPT-II scores: A replication study. Applied Neuropsychology
Adult, 21, 43–50. http://dx.doi.org/10.1080/09084282.2012.724036
Erdodi, L. A., & Lichtenstein, J. D. (2017). Invalid before impaired: An
emerging paradox of embedded validity indicators. The Clinical Neuro-
psychologist. Advance online publication. http://dx.doi.org/10.1080/
13854046.2017.1323119
Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2016). Elevations on select
Conners’ CPT-II scales indicate noncredible responding in adults with
traumatic brain injury. Applied Neuropsychology: Adult, 22, 851– 858.
Erdodi, L., & Roth, R. (2017). Low scores on BDAE Complex Ideational
Material are associated with invalid performance in adults without
aphasia. Applied Neuropsychology: Adult, 24, 264 –274. http://dx.doi
.org/10.1080/23279095.2016.1154856
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’neill, R., & Medoff,
B. (2014). Aggregating validity indicators embedded in Conners’ CPT-II
outperforms individual cutoffs at separating valid from invalid perfor-
mance in adults with traumatic brain injury. Archives of Clinical Neu-
ropsychology, 29, 456 – 466. http://dx.doi.org/10.1093/arclin/acu026
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier,
C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE Complex Ideational
Material—A measure of receptive language or performance validity?
Psychological Injury and Law, 9, 112–120. http://dx.doi.org/10.1007/
s12207-016-9254-6
Erdodi, L. A., Tyson, B. T., Shahein, A. G., Lichtenstein, J. D., Abeare,
C. A., Pelletier, C. L.,...Roth, R. M. (2017). The power of timing:
Adding a time-to-completion cutoff to the Word Choice Test and Rec-
ognition Memory Test improves classification accuracy. Journal of
Clinical and Experimental Neuropsychology, 39, 369 –383. http://dx.doi
.org/10.1080/13803395.2016.1230181
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W. (2006).
Pain, malingering, and performance on the WAIS-III Processing Speed
Index. Journal of Clinical and Experimental Neuropsychology, 28,
1218 –1237. http://dx.doi.org/10.1080/13803390500346595
Golden, C., & Freshwater, S. (2002). A Manual for the Adult Stroop Color
and Word Test. Chicago, IL: Stoelting.
Green, P. (2013). Spoiled for choice: Making comparisons between forced-
choice effort tests. In K. B. Boone (Ed.), Clinical practice of forensic
neuropsychology. New York, NY: Guilford Press.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of
malingered amnesia measures with a large clinical sample. Psycholog-
ical Assessment, 6, 218 –224. http://dx.doi.org/10.1037/1040-3590.6.3
.218
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on
psychometric indicators of negative response bias: A methodological
commentary with recommendations. Archives of Clinical Neuropsychol-
ogy, 19, 533–541. http://dx.doi.org/10.1016/j.acn.2003.08.002
Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J., & Crouch,
J. A. (2002). Detecting malingered performance with the Wisconsin card
sorting test: A preliminary investigation in traumatic brain injury. The
Clinical Neuropsychologist, 16, 179 –191. http://dx.doi.org/10.1076/clin
.16.2.179.13241
Greve, K. W., Ord, J. S., Bianchini, K. J., & Curtis, K. L. (2009).
Prevalence of malingering in patients with chronic pain referred for
psychologic evaluation in a medico-legal context. Archives of Physical
Medicine and Rehabilitation, 90, 1117–1126. http://dx.doi.org/10.1016/
j.apmr.2009.01.018
Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with
likelihood ratios. The Lancet, 365, 1500 –1505. http://dx.doi.org/10
.1016/S0140-6736(05)66422-7
Guise, B. J., Thompson, M. D., Greve, K. W., Bianchini, K. J., & West, L.
(2014). Assessment of performance validity in the Stroop Color and
Word Test in mild traumatic brain injury patients: A criterion-groups
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
10 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
validation design. Journal of Neuropsychology, 8, 20 –33. http://dx.doi
.org/10.1111/jnp.12002
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can localised
brain impairment be simulated on neuropsychological test profiles?
Australian and New Zealand Journal of Psychiatry, 21, 87–93. http://
dx.doi.org/10.3109/00048678709160904
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised
comprehensive norms for an expanded Halstead-Reitan battery: Demo-
graphically adjusted neuropsychological norms for African American
and Caucasian adults. Lutz, FL: Psychological Assessment Resources.
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis,
S. R., & the Conference Participants. (2009). American Academy of
Clinical Neuropsychology Consensus Conference Statement on the neu-
ropsychological assessment of effort, response bias, and malingering.
The Clinical Neuropsychologist, 23, 1093–1129. http://dx.doi.org/10
.1080/13854040903155063
Heinly, M. T., Greve, K. W., Bianchini, K. J., Love, J. M., & Brennan, A.
(2005). WAIS digit span-based indicators of malingered neurocognitive
dysfunction: Classification accuracy in traumatic brain injury. Assess-
ment, 12, 429 – 444. http://dx.doi.org/10.1177/1073191105281099
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cotting-
ham, M. E.,...Zeller, M. A. (2010). The Warrington Recognition
Memory Test for words as a measure of response bias: Total score and
response time cutoffs developed on “real world” credible and noncred-
ible subjects. Archives of Clinical Neuropsychology, 25, 60 –70. http://
dx.doi.org/10.1093/arclin/acp088
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C.
(2010). Sensitivity and specificity of a digit symbol recognition trial in
the identification of response bias. Archives of Clinical Neuropsychol-
ogy, 25, 420 – 428. http://dx.doi.org/10.1093/arclin/acq040
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., Pancholi, S.,
Bhagwat, A., & French, L. M. (2013). Clinical utility of the Conners’
Continuous Performance Test-II to detect poor effort in U.S. military
personnel following traumatic brain injury. Psychological Assessment,
25, 339 –352. http://dx.doi.org/10.1037/a0030915
Lansbergen, M. M., Kenemans, J. L., & van Engeland, H. (2007). Stroop
interference and attention-deficit/hyperactivity disorder: A review and
meta-analysis. Neuropsychology, 21, 251–262. http://dx.doi.org/10
.1037/0894-4105.21.2.251
Larrabee, G. J. (2003). Detection of malingering using atypical perfor-
mance patterns on standard neuropsychological tests. The Clinical Neu-
ropsychologist, 17, 410 – 425. http://dx.doi.org/10.1076/clin.17.3.410
.18089
Larrabee, G. J. (2008). Aggregation across multiple indicators improves
the detection of malingering: Relationship to likelihood ratios. The
Clinical Neuropsychologist, 22, 666 – 679. http://dx.doi.org/10.1080/
13854040701494987
Larrabee, G. J. (2014). False-positive rates associated with the use of
multiple performance and symptom validity tests. Archives of Clinical
Neuropsychology, 29, 364 –373. http://dx.doi.org/10.1093/arclin/acu019
Larson, M. J., Kaufman, D. A., Schmalfuss, I. M., & Perlstein, W. M.
(2007). Performance monitoring, error processing, and evaluative con-
trol following severe TBI. Journal of the International Neuropsycholog-
ical Society, 13, 961–971. http://dx.doi.org/10.1017/S135561770
7071305
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap
between neurocognitive processing theory and performance validity
assessment among the cognitively impaired: A review and methodolog-
ical approach. Journal of the International Neuropsychological Society,
20, 873– 886. http://dx.doi.org/10.1017/S135561771400085X
Lezak, M. D. (1995). Neuropsychological assessment. New York, NY:
Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017). Introducing a
forced-choice recognition task to the California Verbal Learning Test—
Children’s Version. Child Neuropsychology, 23, 284 –299.
Lippa, S. M., & Davis, R. N. (2010). Inhibition/switching is not necessarily
harder than inhibition: An analysis of the D-KEFS color-word interfer-
ence test. Archives of Clinical Neuropsychology, 25, 146 –152. http://dx
.doi.org/10.1093/arclin/acq001
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness
of the Rey-Osterrieth Complex Figure Test and the Meyers and Meyers
recognition trial in the detection of suspect effort. The Clinical Neuro-
psychologist, 17, 426 – 440. http://dx.doi.org/10.1076/clin.17.3.426
.18083
MacLeod, C. M., & MacDonald, P. A. (2000). Interdimensional interfer-
ence in the Stroop effect: Uncovering the cognitive and neural anatomy
of attention. Trends in Cognitive Sciences, 4, 383–391. http://dx.doi.org/
10.1016/S1364-6613(00)01530-8
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J. (2010).
Detection of malingering in mild traumatic brain injury with the Con-
ners’ Continuous Performance Test-II. Journal of Clinical and Experi-
mental Neuropsychology, 32(4), 380–387.
Osimani, A., Alon, A., Berger, A., & Abarbanel, J. M. (1997). Use of the
Stroop phenomenon as a diagnostic tool for malingering. Journal of
Neurology, Neurosurgery & Psychiatry, 62, 617– 621. http://dx.doi.org/
10.1136/jnnp.62.6.617
Pearson, N. C. S. (2009). Advanced clinical solutions for WAIS-IV and
WMS-IV: Administration and scoring manual. San Antonio, TX: The
Psychological Corporation.
Proto, D. A., Pastorek, N. J., Miller, B. I., Romesser, J. M., Sim, A. H., &
Linck, J. F. (2014). The dangers of failing one or more performance
validity tests in individuals claiming mild traumatic brain injury-related
postconcussive symptoms. Archives of Clinical Neuropsychology, 29,
614 – 624. http://dx.doi.org/10.1093/arclin/acu044
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F., Lu, P. H.,
Victor, T. L.,...Wright, M. J. (2013). Cross validation of the Lu and
colleagues (2003). Rey-Osterrieth Complex Figure Test effort equation
in a large known-group sample. Archives of Clinical Neuropsychology,
28, 30 –37. http://dx.doi.org/10.1093/arclin/acs106
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the
test of memory malingering. Archives of Clinical Neuropsychology, 16,
501–506. http://dx.doi.org/10.1093/arclin/16.5.501
Saykin, A. J., Stafiniak, P., Robinson, L. J., Flannery, K. A., Gur, R. C.,
O’Connor, M. J., & Sperling, M. R. (1995). Language before and after
temporal lobectomy: Specificity of acute changes and relation to early
risk factors. Epilepsia, 36, 1071–1077. http://dx.doi.org/10.1111/j.1528-
1157.1995.tb00464.x
Schroeter, M. L., Ettrich, B., Schwier, C., Scheid, R., Guthke, T., & von
Cramon, D. Y. (2007). Diffuse axonal injury due to traumatic brain
injury alters inhibition of imitative response tendencies. Neuropsycho-
logia, 45, 3149 –3156. http://dx.doi.org/10.1016/j.neuropsychologia
.2007.07.004
Schutte, C., Axelrod, B. N., & Montoya, E. (2015). Making sure neuro-
psychological data are meaningful: Use of performance validity testing
in medicolegal and clinical contexts. Psychological Injury and Law, 8,
100 –105. http://dx.doi.org/10.1007/s12207-015-9225-3
Shura, R. D., Miskey, H. M., Rowland, J. A., Yoash-Gantz, R. E., &
Denning, J. H. (2016). Embedded performance validity measures with
postdeployment veterans: Cross-validation and efficiency with multiple
measures. Applied Neuropsychology: Adult, 23, 94 –104. http://dx.doi
.org/10.1080/23279095.2015.1014556
Slick, D. J., Sherman, E. M., & Iverson, G. L. (1999). Diagnostic criteria
for malingered neurocognitive dysfunction: Proposed standards for clin-
ical practice and research. Clinical Neuropsychologist, 13, 545–561.
http://dx.doi.org/10.1076/1385-4046(199911)13:04;1-Y;FT545
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
11
STROOP TEST
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine, B., Pangili-
nan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable digit span is
no more accurate than age corrected scaled score as an indicator of
invalid performance in a veteran sample undergoing evaluation for
mTBI. The Clinical Neuropsychologist, 27, 1362–1372. http://dx.doi
.org/10.1080/13854046.2013.845248
Stroop, J. R. (1935). Studies of interference in serial verbal reactions.
Journal of Experimental Psychology, 18, 643– 662. http://dx.doi.org/10
.1037/h0054651
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of
performance validity using verbal fluency tests in a clinical sample.
Applied Neuropsychology Adult, 22, 141–146. http://dx.doi.org/10.1080/
23279095.2013.873439
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin Card Sorting Test
in the detection of malingering in student simulator and patient samples.
Journal of Clinical and Experimental Neuropsychology, 21, 701–708.
http://dx.doi.org/10.1076/jcen.21.5.701.868
Trueblood, W. (1994). Qualitative and quantitative characteristics of ma-
lingered and other invalid WAIS-R and clinical memory data. Journal of
Clinical and Experimental Neuropsychology, 16, 597– 607. http://dx.doi
.org/10.1080/01688639408402671
Vallabhajosula, B., & van Gorp, W. G. (2001). Post-Daubert admissibility
of scientific evidence on malingering of cognitive deficits. Journal of the
American Academy of Psychiatry and the Law, 29, 207–215.
Whiteside, D., Clinton, C., Diamonti, C., Stroemel, J., White, C., Zimber-
off, A., & Waters, D. (2010). Relationship between suboptimal cognitive
effort and the clinical scales of the Personality Assessment Inventory.
The Clinical Neuropsychologist, 24, 315–325. http://dx.doi.org/10.1080/
13854040903482822
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., &
Sweet, J. J. (2010). Effort indicators within the California verbal learn-
ing test-II (CVLT-II). The Clinical Neuropsychologist, 24, 153–168.
http://dx.doi.org/10.1080/13854040903107791
Received October 5, 2016
Revision received July 11, 2017
Accepted July 17, 2017
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
12 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH
... In the hearing threshold test, an audible stimulus was played in each trial, and the subjects were asked to press the "1" button on the keyboard when they could hear these stimuli or, otherwise, to press the "0" button. In the duration discrimination test [49] and the frequency discrimination test [50], the subjects would always hear one tone after the other tone, and they had to judge which tone was longer for the former and which tone was higher for the latter. A sound test was performed before the formal tests to make sure that the subjects could hear the audible stimulus clearly, and then the sound volume remained unchanged during the experiments. ...
Article
Full-text available
Individuals exposed to elevated noise levels experience heightened emotional intensity, leading to increased cognitive disruption and a higher likelihood of accidents. This study seeks to investigate the impact of noise exposure on human cognitive performance, and the moderating role of emotion. Twelve healthy male college-age students underwent exposure to three noise conditions, each characterized by different sound pressure levels and sharpness. Each condition included an initial acoustic/thermal adaption period lasting approximately 40 min, followed by intermittent questionnaire tests and a battery of computerized cognitive tests. Statistical analysis revealed that reducing noise levels proved advantageous, enhancing perceived sound quality, positive emotions, and auditory perception abilities, while concurrently reducing false alerts and accelerating execution speed. Many of these effects were found to be counteracted by elevated sharpness. Correlation analyses and partial least squares structural equation modeling (PLS-SEM) results suggested that human emotions mediate the relationship between noise exposure and cognitive performance. The potential underlying mechanism suggests that negative feelings towards noise contribute to poor emotional states, subsequently influencing cognitive processes and impairing executive function. The outcomes of this study provide valuable insights into the mechanism of noise exposure and its effects on human cognition and subjective perceptions.
... Abeare et al., 2019;C. A. Abeare, An, et al., 2022;Kim et al., 2010;Lichtenstein et al., 2017;Rai et al., 2019), indicators or cutoffs (Arentsen et al., 2013Ashendorf, 2019;Ashendorf et al., 2017Ashendorf et al., , 2021Blaskewitz et al., 2009;Bortnik et al., 2010;Boskovic et al., 2018;Boucher et al., 2023;Chang et al., 2023;Curtis et al., 2008Curtis et al., , 2009Deloria et al., 2021;Dunn et al., 2021;Erdodi et al., 2016Erdodi et al., , 2018Glassmire et al., 2019;Greve et al., 2009;Guise et al., 2014;M. J. Holcomb et al., 2022;Hurtubise et al., 2020;Johnson et al., 2012;C. ...
Article
Full-text available
This editorial article introduces the second special issue of Psychology & Neuroscience devoted to performance and symptom validity testing. The reason for including the second special issue is that we received an unusually large number of high-quality submissions that could not fit into a single volume. The articles included in this second part offer practical, immediately actionable knowledge to assessors while simultaneously advancing the methodology for calibrating instruments designed to evaluate the credibility of a given clinical presentation. In this introduction, we briefly summarize each article and reflect on an emerging epistemological question about the interpretation of noncredible results in the context of a clinical research study: If a relatively large proportion of clinical patients fail a validity test without any apparent external incentives to appear impaired, should this be interpreted as a possible vulnerability of that validity test to false-positive classifications or as evidence that noncredible responding is relatively common outside of medicolegal/forensic assessments? The methodological implications of symptom and performance validity research are discussed.
... Similarly, ePVTs nested within Digit Span, verbal fluency and TMT-A underperformed relative to the rest of them (Fig. 2). Shading, capitalization and bold face provide a visual representation of the change in confidence in correctly classifying a given score as invalid (darker background, capital letters and bold mean increased likelihood of non-credible performance) (Denning, 2012;Erdodi, 2022Erdodi, , 2023Greve et al., 2006;Jones, 2013;Kulas et al., 2014;Rai & Erdodi, 2021;Schroeder et al., 2013); WCT: Word Choice Test (accuracy score; Barhon et al., 2015;Cutler et al., 2022;Davis, 2014;Erdodi, 2021;Erdodi et al., 2014;Holcomb et al., 2022;Pearson, 2009;Tyson & Shahein, 2023;Zuccato et al., 2018); WRT: Rey Word Recognition Test (true positives; Bell-Sprinkel et al., 2013;Goworowski et al., 2020;Nitch et al., 2006;Smith et al., 2014); Animals RO : Animal fluency T-score (Romanian administration; Abeare et al., 2021a;Deloria et al., 2021;Hurtubise et al., 2020;Sugarman & Axelrod, 2015); CD WAIS-III : Coding subtest of the Wechsler Adult Intelligence Scale -Third edition (age-corrected scaled score; Ashendorf et al., 2017;Erdodi & Abeare, 2020;Erdodi et al., 2017a;Etherton et al., 2006); FCR HVLT-R : Forced Choice Recognition trial of the Hopkins Verbal Learning Test -Revised; (accuracy score; Abeare et al., 2021b;Ali et al., 2022a, b;Cutler et al., 2021); RDS: Reliable Digit Span (Babikian et al., 2006;Heinly et al., 2005;Mathias et al., 2002;Pearson, 2009;Schoeder et al., 2012;Shura et al., 2020;Whitney et al., 2009;Young et al., 2012); WORD D-KEFS : Word Reading subtest of the Delis-Kaplan Executive System (age-corrected scaled score; Cutler et al., 2022;Eglit et al., 2020;Erdodi et al., 2018a) ...
Article
Full-text available
This study was designed to evaluate the susceptibility of various performance validity tests (PVTs) to limited English proficiency (LEP). A battery of free-standing and embedded PVTs was administered to 95 undergraduate students at a Romanian university, randomly assigned to the control (n = 65) or experimental malingering group (n = 30). Overall correct classification (OCC) at the first cutoff to clear .90 specificity (with group membership as criterion) was used as the main metric to compare PVTs. Mean OCC for free-standing PVTs (.784) was comparable to mean OCC for embedded PVTs (.780). Cutoffs on embedded PVTs often had to be adjusted (more conservative) to meet the specificity standard. Contrary to our predictions, embedded PVTs with high verbal mediation outperformed those with low verbal mediation (mean OCC .807 versus .719). Although multivariate models of PVTs performed very well (mean OCC = .892), several individual freestanding and embedded PVTs produced comparable mean OCC (.863-.895). Other embedded PVTs had trivial sensitivity (.03-.13) at ≥ .90 specificity. PVTs administered in both languages (English and Romanian) provided conclusive evidence of both the deleterious effects of LEP and the cross-cultural validity of existing methods of performance validity testing. Results defied most of our a priori predictions: level of verbal mediation was an influential, but not a decisive factor in the classification accuracy of PVTs; free-standing PVTs were not necessarily superior to embedded PVTs; multivariate models of performance validity assessment outperformed most, but not all their individual components. Our findings suggest that some PVTs may be inherently unfit to be used with examinees with LEP. The multiple unexpected findings signal a fundamental uncertainty about the psychometric properties of instruments developed and validated in North America when applied to examinees outside the US or Canada. Although several existing PVTs have the potential to be useful in examinees with LEP, their relevant psychometric properties should be independently verified in new target populations to ensure the validity of their clinical interpretation. The classification accuracy observed in native speakers of English cannot be assumed to transfer to members of linguistically and culturally different communities – doing so risks potentially consequential errors in performance validity assessment. Of course, the abundance of counterintuitive findings also serves as a note of caution: our findings may not generalize to different samples.
... Cognitive tests can be classified according to the specific brain functions needed to address task demands. For example, the Stroop test can be used to measure human executive ability [31,32]; the psychomotor vigilance test (PVT) and the visual search (VS) test can be used to evaluate attention [33][34][35]; and the 2-back test and the digit span memory test can be used to measure short-term memory [36,37]. Pang et al. [38] reported a significant decline in subjects' attention (according to the PVT) at CO 2 concentrations above 3500 ppm. ...
Article
Full-text available
Objective: This paper describes a clinical case illustrating the limitations of the Complex Ideational Material (CIM) as an embedded performance validity test (PVT). Method: A comprehensive neuropsychological battery was administered to a 19-year-old male to evaluate the residual effects of a motor vehicle collision. Results: The patient passed all free-standing PVTs and the majority of embedded validity indicators. Failing the CIM (≤9) in the context of long-standing, documented deficits in semantic processing and following the difficulty gradient inherent in the task (i.e. all errors occurred on later items) likely represents a false positive. Conclusions: Future research on CIM as PVT should include an item-level analysis in addition to the overall score to reduce the risk of misidentifying bona fide deficits as non-credible responding. More broadly, genuine impairment and invalid performance may be psychometrically indistinguishable in individual embedded PVTs. Failures on embedded validity cutoffs should be interpreted in the context of the patient's clinical history. Routinely administering a comprehensive battery of neuropsychological tests can aid the interpretation of isolated atypical scores.
Article
Full-text available
There are growing concerns that increasing the number of performance validity tests (PVTs) may inflate the false positive rate. Indeed, the number of available embedded PVTs increased exponentially within the last decades. However, the standard for declaring a neurocognitive profile invalid (≥ 2 PVT failures) has not been adjusted to reflect this change. Data were collected from 100 clinically referred patients with traumatic brain injury. Two distinct aggregation methods were used to combine multiple (5, 7, 9, 11 and 13) embedded PVTs into a single-number summary of performance validity using two established free-standing PVTs as criteria. Multivariate cutoffs had to be adjusted to contain false positives: ≥ 2 failures out of nine or more dichotomized (Pass/Fail) PVTs had unexpectedly low multivariate specificity (.76-.79). However, ≥ 4 failures resulted in high specificity (.90-.96), even out of 13 embedded PVTs. Multivariate models of embedded PVTs correctly classified between 92 and 96% of the sample at ≥ .90 specificity. Alternative aggregation methods produced similar results. Findings support the notion of the elasticity of multivariate cutoffs: as the number of PVTs interpreted increases, more stringent cutoffs are required to deem the profile invalid – at least until a certain level of evidence for non-credible responding accumulates (cutoff elasticity). A desirable byproduct of increasing the number of PVTs was improved sensitivity (.85–1.00). There is no such thing as too many PVTs – only insufficiently conservative multivariate cutoffs.
Article
Full-text available
Objective: This study was designed to replicate previous research on the clinical utility of the Verbal Paired Associates (VPA) and Visual Reproduction (VR) subtests of the WMS-IV as embedded performance validity tests (PVTs) and perform a critical item (CR) analysis within the VPA recognition trial. Method: Archival data were collected from a mixed clinical sample of 119 adults (MAge = 42.5, MEducation = 13.9). Classification accuracy was computed against psychometrically defined criterion groups based on the outcome of various free-standing and embedded PVTs. Results: Age-corrected scaled scores ≤6 were specific (.89−.98) but had variable sensitivity (.36−.64). A VPA recognition cutoff of ≤34 produced a good combination of sensitivity (.46−.56) and specificity (.92−.93), as did a VR recognition cutoff of ≤4 (.48−.53 sensitivity at .86−.94 specificity). Critical item analysis expanded the VPA’s sensitivity by 3.5%–7.0% and specificity by 5%–8%. Negative learning curves (declining output on subsequent encoding trials) were rare but highly specific (.99–1.00) to noncredible responding. Conclusions: Results largely support previous reports on the clinical utility of the VPA and VR as embedded PVTs. Sample-specific fluctuations in their classification accuracy warrant further research into the generalizability of the findings. Critical item analysis offers a cost-effective method for increasing confidence in the interpretation of the VPA recognition trial as a PVT.
Article
Full-text available
This editorial article introduces a special issue of Psychology & Neuroscience dealing with performance and symptom validity testing (SVTs). We first discuss the importance of assessing the credibility of observed performance on cognitive tasks and of symptoms reported in questionnaires or clinical interviews, both in research and in clinical and forensic settings. We then briefly summarize the content of each article in this special issue and discuss their contribution to this topic. We conclude that practitioners have an increasing number of embedded performance validity tests (PVTs) at their disposal, so current research trends are focused on finding newer and better algorithms for integrating results from multiple PVTs. In contrast, there are significantly fewer SVTs available to practitioners, so researchers in this area currently seem to be focused on developing and validating both embedded and free-standing SVTs.
Article
Full-text available
The Conners’ Continuous Performance Test – Second Edition (CPT-II) has demonstrated utility as a performance validity test (PVT). Early research also identified several benefits of repeat administration. This study was designed to evaluate the potential of repeat administrations of the CPT-II to enhance its clinical utility in detecting ADHD or non-credible responding. Data were collected from a consecutive case sequence of 100 patients (MAge = 41.5; MEducation = 13.8) referred for neuropsychological assessment. Performance validity was psychometrically defined using a combination of free-standing and embedded PVTs. The CPT-II was administered twice to all patients: once in the morning and once at the end of the testing appointment. Data supported previously identified validity cutoffs for the CPT-II, with the notable exception of Commission errors. Patients with ADHD were less likely to fail validity cutoffs than those without ADHD, suggesting that ADHD alone does not explain failure on PVTs. At Time 1, the CPT-II was insensitive to a clinical diagnosis of ADHD; at Time 2, only one significant contrast emerged. Self-report measures more effectively differentiated between patients with and without ADHD. Test-retest reliability was generally higher in patients with valid performance (0.33-0.83) compared to those with invalid performance (0.38-0.77), with notable variation across scales. Two sets of CPT-II scores increased confidence in the ADHD diagnosis and the remainder of the neurocognitive profile. The CPT-II is self-administered and automatically scored, making routine double administration more practical than might first be thought.
Article
Full-text available
Background Neurophysiological adaptation following anterior cruciate ligament (ACL) rupture and repair (ACLR) is critical in establishing neural pathways during the rehabilitation process. However, there is limited objective measures available to assess neurological and physiological markers of rehabilitation. Purpose To investigate the innovative use of quantitative electroencephalography (qEEG) to monitor the longitudinal change in brain and central nervous systems activity while measuring musculoskeletal function during an anterior cruciate ligament repair rehabilitation. Case Description A 19 year-old, right-handed, Division I NCAA female lacrosse midfielder suffered an anterior cruciate ligament rupture, with a tear to the posterior horn of the lateral meniscus of the right knee. Arthroscopic reconstruction utilizing a hamstring autograft and a 5% lateral meniscectomy was performed. An evidence-based ACLR rehabilitation protocol was implemented while using qEEG. Outcomes Central nervous system, brain performance and musculoskeletal functional biomarkers were monitored longitudinally at three separate time points following anterior cruciate injury: twenty-four hours post ACL rupture, one month and 10 months following ACLR surgery. Biological markers of stress, recovery, brain workload, attention and physiological arousal levels yielded elevated stress determinants in the acute stages of injury and were accompanied with noted brain alterations. Brain and musculoskeletal dysfunction longitudinally reveal a neurophysiological acute compensation and recovering accommodations from time point one to three. Biological responses to stress, brain workload, arousal, attention and brain connectivity all improved over time. Discussion The neurophysiological responses following acute ACL rupture demonstrates significant dysfunction and asymmetries neurocognitively and physiologically. Initial qEEG assessments revealed hypoconnectivity and brain state dysregulation. Progressive enhanced brain efficiency and functional task progressions associated with ACLR rehabilitation had notable simultaneous improvements. There may be a role for monitoring CNS/brain state throughout rehabilitation and return to play. Future studies should investigate the use of qEEG and neurophysiological properties in tandem during the rehabilitation progression and return to play.
Article
Full-text available
Elevations on certain Conners? CPT-II scales are known to be associated with invalid responding. However, scales and cutoffs vary across studies. In addition, the methodology behind developing performance validity tests (PVTs) has been challenged for mistaking true impairment for noncredible presentation. Using ability-based tests as a PVT makes clinicians especially vulnerable to this criticism. The present study examined the ability of CPT-II to dissociate effort from impairment in 47 adults clinically referred for neuropsychological assessment. CPT-II scales previously identified as PVTs (Omissions, Commissions, Hit Reaction Time SE, Variability, and Perseverations) produced classification accuracies hovering around .50 sensitivity at .90 specificity. The subsample that failed these PVTs performed within normal range on other tests of working memory, processing speed, visual attention, and executive function. Results suggest that the select CPT-II based PVTs are sensitive to invalid responding, and are associated with depression and anxiety, but are unrelated to cognitive functioning.
Article
Full-text available
Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
Article
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
Article
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Article
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
Article
Full-text available
The importance of performance validity tests (PVTs) is increasingly recognized in pediatric neuropsychology. To date, research has focused on investigating whether PVTs designed for adults function similarly in children. The downward extension of adult cutoffs is counter-intuitive considering the robust effect of age-related changes in basic cognitive skills in children and adolescents. The purpose of this study was to examine the signal detection properties of a forced-choice recognition trial (FCR-C) for the California Verbal Learning Test - Children's Version. A total of 72 children aged 6-15 years (M = 11.1 , SD = 2.6) completed the FCR-C as part of a larger neuropsychological assessment battery. Cross-validation analyses revealed that the FCR-C had good signal detection performance against reference PVTs. The first level of failure (≤14/15) produced the best combination of overall sensitivity (.31) and specificity (.87). A more conservative FCR-C cutoff (≤13) resulted in a predictable trade-off between sensitivity (.15) and specificity (.94), but also a net loss in discriminant power. Lowering the cutoff to ≤12 resulted in a slight improvement in specificity (.97) but further deterioration in sensitivity (.14). These preliminary findings suggest that the FCR-C has the potential to become the newest addition to a growing arsenal of pediatric PVTs.
Article
Full-text available
Objective: The AACN sought to provide independent expert guidance and recommendations concerning the use of validity testing in disability determinations. Method: A panel of contributors to the science of validity testing and its application to the disability process was charged with describing why the disability process for SSA needs improvement, and indicating the necessity for validity testing in disability exams. Results: This work showed how the determination of malingering is a probability proposition, described how different types of validity tests are appropriate, provided evidence concerning non-credible findings in children and low-functioning individuals, and discussed the appropriate evaluation of pain disorders typically seen outside of mental consultations. Conclusions: A scientific plan for validity assessment that additionally protects test security is needed in disability determinations and in research on classification accuracy of disability decisions.
Article
Full-text available
Embedded validity measures support comprehensive assessment of performance validity. The purpose of this study was to evaluate the accuracy of individual embedded measures and to reduce them to the most efficient combination. The sample included 212 postdeployment veterans (average age = 35 years, average education = 14 years). Thirty embedded measures were initially identified as predictors of Green’s Word Memory Test (WMT) and were derived from the California Verbal Learning Test-Second Edition (CVLT-II), Conners’ Continuous Performance Test-Second Edition (CPT-II), Trail Making Test, Stroop, Wisconsin Card Sorting Test-64, the Wechsler Adult Intelligence Scale-Third Edition Letter-Number Sequencing, Rey Complex Figure Test (RCFT), Brief Visuospatial Memory Test-Revised, and the Finger Tapping Test. Eight nonoverlapping measures with the highest area-under-the-curve (AUC) values were retained for entry into a logistic regression analysis. Embedded measure accuracy was also compared to cutoffs found in the existing literature. Twenty-one percent of the sample failed the WMT. Previously developed cutoffs for individual measures showed poor sensitivity (SN) in the current sample except for the CPT-II (Total Errors, SN = .41). The CVLT-II (Trials 1–5 Total) showed the best overall accuracy (AUC = .80). After redundant measures were statistically eliminated, the model included the RCFT (Recognition True Positives), CPT-II (Total Errors), and CVLT-II (Trials 1–5 Total) and increased overall accuracy compared with the CVLT-II alone (AUC = .87). The combination of just 3 measures from the CPT-II, CVLT-II, and RCFT was the most accurate/efficient in predicting WMT performance.