ArticlePDF Available

Utility of critical items within the Recognition Memory Test and Word Choice Test Utility of critical items within the Recognition Memory Test and Word Choice Test

Authors:

Abstract and Figures

This study was designed to examine the clinical utility of critical items within the Recognition Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean age ¼ 45.3 years; mean level of education ¼ 13.9 years). The credibility of a given response set was psychometrically defined using three separate composite measures, each of which was based on multiple independent performance validity indicators. Critical items improved the classification accuracy of both tests. They increased sensitivity by correctly identifying an additional 2–17% of the invalid response sets that passed the traditional cutoffs based on total score. They also increased specificity by providing additional evidence of noncredible performance in response sets that failed the total score cutoff. The combination of failing the traditional cutoff, but passing critical items was associated with increased risk of misclassifying the response set as invalid. Critical item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items require no additional test material or administration time, but help reduce both false positive and false negative errors, they represent a versatile, valuable, and time-and cost-effective supplement to performance validity assessment.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=hapn21
Download by: [University of Windsor], [Dr Laszlo A. Erdodi] Date: 17 March 2017, At: 11:50
Applied Neuropsychology: Adult
ISSN: 2327-9095 (Print) 2327-9109 (Online) Journal homepage: http://www.tandfonline.com/loi/hapn21
Utility of critical items within the Recognition
Memory Test and Word Choice Test
Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.
Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth
To cite this article: Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.
Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth (2017): Utility of critical
items within the Recognition Memory Test and Word Choice Test, Applied Neuropsychology: Adult
To link to this article: http://dx.doi.org/10.1080/23279095.2017.1298600
Published online: 17 Mar 2017.
Submit your article to this journal
View related articles
View Crossmark data
APPLIED NEUROPSYCHOLOGY: ADULT
http://dx.doi.org/10.1080/23279095.2017.1298600
Utility of critical items within the Recognition Memory Test and
Word Choice Test
Laszlo A. Erdodia,b, Bradley T. Tysonc,b, Christopher A. Abearea, Brandon G. Zuccatoa, Jaspreet K. Raia,
Kristian R. Sekea, Sanya Sagara and Robert M. Rothb
aDepartment of Psychology, University of Windsor, Windsor, Ontario, Canada; bDepartment of Psychiatry, Geisel School of Medicine at
Dartmouth, Lebanon, New Hampshire, USA; cWestern Washington Medical Group, Everett, Washington, USA
ABSTRACT
This study was designed to examine the clinical utility of critical items within the Recognition
Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed
clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean
age ¼45.3 years; mean level of education ¼13.9 years). The credibility of a given response set was
psychometrically defined using three separate composite measures, each of which was based on
multiple independent performance validity indicators. Critical items improved the classification
accuracy of both tests. They increased sensitivity by correctly identifying an additional 2–17% of
the invalid response sets that passed the traditional cutoffs based on total score. They also
increased specificity by providing additional evidence of noncredible performance in response sets
that failed the total score cutoff. The combination of failing the traditional cutoff, but passing
critical items was associated with increased risk of misclassifying the response set as invalid. Critical
item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items
require no additional test material or administration time, but help reduce both false positive and
false negative errors, they represent a versatile, valuable, and time- and cost-effective supplement
to performance validity assessment.
KEYWORDS
Critical item analysis;
performance validity testing;
Recognition Memory Test;
Word Choice Test
Introduction
The current climate of health care is characterized by
increasing emphasis on time- and cost-effective service
delivery. As a result, neuropsychologists are under
growing pressure to administer shorter test batteries.
In order to maximize the quantity and quality of
information gleaned from these brief assessments, the
strategic selection of assessment tools has never been
more important. This shift toward a more resource-
conscious model of assessment is reflected in the
development of abbreviated batteries (e.g., Repeatable
Battery for the Assessment of Neuropsychological
Status; Randolph, 1998) and shorter versions of existing
tests (e.g., Boston Naming Test-15: Morris et al., 1989;
California Verbal Learning Test Second Edition
[CVLT-II] Short Form: Delis, Kramer, Kaplan, & Ober,
2000).
In addition to conducting an adequate assessment of
cognitive functioning, however, neuropsychologists
must also assess performance validity. Indeed, the
clinical utility of neuropsychological testing depends
on the examinee’s ability and willingness to demon-
strate their true ability level (Bigler, 2015), and there
is an emerging consensus in the field that an objective
evaluation of performance validity must be an integral
part of the assessment process (Bush, Heilbronner, &
Ruff, 2014; Chafetz et al., 2015). The administration of
multiple, non-redundant performance validity tests
(PVTs) distributed throughout the assessment has been
identified as the best approach for differentiating
credible from non-credible response sets (Boone, 2009;
Larrabee, 2012).
Given that PVTs provide little information about
cognitive functioning, it is becoming increasingly
important for neuropsychologists to glean information
about performance validity without additional test
material or increased administration and scoring time.
Over the past decades, researchers have explored cre-
ative ways of improving the signal detection properties
of existing PVTs, including the development of new
indicators within existing neuropsychological tests
(Arnold et al., 2005; Erdodi, Tyson, Abeare, et al.,
2016; Greiffenstein, Baker, & Gola, 1994).
Boone, Salazar, Lu, Warner-Chacon, and Razani
(2002) developed a recognition trial to the Rey 15-item
test that adds only about 30 seconds in administration
CONTACT Laszlo A. Erdodi lerdodi@gmail.com 168 Chrysler Hall South, 401 Sunset Ave, Windsor ON N9B 3P4.
© 2017 Taylor & Francis Group, LLC
time, but significantly improves the instrument’s
sensitivity while maintaining high specificity. Similarly,
the first trial of the Test of Memory Malingering
(TOMM; Tombaugh, 1996), although initially conceived
as an “inactive” learning trial, has been shown to
effectively discriminate valid from invalid cognitive test
performance. Specifically, Trial 1 has been found to
have adequate sensitivity and specificity against the
standard, full administration of the TOMM (Bauer,
O’Bryant, Lynch, McCaffrey, & Fisher, 2007; Fazio,
Denning, & Denney, 2017); Hilsabeck, Gordon,
Hietpas-Wilson, & Zartman, 2011; Horner, Bedwell, &
Duong, 2006; Wisdom, Brown, Chen, & Collins,
2012), and other stand-alone PVTs used in isolation
(Denning, 2012) and in combination (Jones, 2013;
Kulas, Axelrod, & Rinaldi, 2014). Based on this
evidence, some researchers suggested that Trial 1 of
the TOMM can function as a stand-alone PVT (Bauer
et al., 2007; Hilsabeck et al., 2011; Horner et al., 2006;
O’Bryant, Engel, Kleiner, Vasterling, & Black, 2007).
Another example of this “after-market enhancement”
of an existing PVT was the introduction of a time-cutoff
to the Recognition Memory Test (RMT; Warrington,
1984), which effectively differentiated valid and invalid
responders independent of the traditional accuracy
score. It also boosted the RMT’s overall sensitivity when
combined with the accuracy score while maintaining
high specificity. (M. S. Kim, Boone, Victor, Marion,
et al., 2010). Similarly, it was recently shown that adding
a time-cutoff to the Word Choice Test (WCT; Pearson,
2009) not only enhanced the sensitivity of the accuracy
score, but also functioned as an independent validity
indicator (Erdodi, Tyson, Shahein, et al., 2017).
The present study was designed to explore the
clinical utility of critical items within the RMT and
WCT. Previous research suggests that while the RMT
is more difficult than the WCT at the raw score level,
once the cutoffs are adjusted to account for the
difference, the two instruments have comparable
classification accuracy (Davis, 2014; Erdodi, Kirsch,
Lajiness-O’Neill, Vingilis, & Medoff, 2014). However,
despite the imperfect classification accuracy of
traditional cutoffs based on RMT and WCT total scores,
the discriminant power of item-level data has not been
investigated within these tests.
Cutoffs established by earlier studies (RMT 39:
Iverson & Franzen, 1994; RMT 42: M. S. Kim, Boone,
Victor, Marion, et al., 2010; Erdodi, Kirsch, et al., 2014;
WCT 42: Barhon, Batchelor, Meares, Chekaluk, &
Shores, 2015; WCT 46: Davis, 2014) and the
technical manual (WCT 32–47; Pearson, 2009) imply
that one can provide a correct answer on the majority of
the items and still fail these PVTs. Moreover, the upper
limit of theoretical chance level responding is 32. In
other words, random responding and a 64% overall
accuracy can coexist, suggesting that a large proportion
of test items have poor negative predictive power.
Since test items tend to vary in difficulty level and
hence, in their relative contribution to the diagnostic
accuracy of the overall scale, a critical item analysis has
the potential to increase the clinical utility of the instru-
ment by identifying items that best discriminate between
credible and noncredible response sets. It has long been
recognized in psychometric theory that shorter tests can
be more reliable than longer tests if they are based on
carefully calibrated items (Embretson, 1996). Although
averaging performance across a large number of item
responses with heterogeneous item characteristic curves
is common practice in test development, it can weaken
the measurement model. Conversely, reducing the num-
ber of test items to a select few that have the strongest
relationship with the target construct can preserve
(Bilker, Wierzbicki, Brensinger, Gur, & Gur, 2014) or
even improve (Erdodi, Jongsma, & Issa, 2017) overall
diagnostic power. Therefore, we hypothesized that
critical items would enhance the overall classification
accuracy of the RMT and WCT by increasing either
the sensitivity or the specificity of the total score to
invalid responding.
Cutoff scores based on such critical items can offer
additional information about performance validity that
is non-redundant with results obtained from cutoffs
based on the total score. This “second opinion,” in turn,
can be used to confirm or challenge the outcome based
on traditional cutoffs. The availability of multiple indica-
tors of performance validity within a single PVT is
especially useful in the interpretation of scores that
fall in the indeterminate range (“near passes”; Bigler,
2012, 2015), where the classification of an examinee’s
performance as either Pass or Fail is particularly difficult.
Method
Participants
The sample consisted of 202 patients (54.5% female,
87.1% right-handed) clinically referred for neuropsy-
chological testing at a northeastern academic medical
center. Mean age was 45.3 years (SD ¼16.8), while
mean level of education was 13.9 years (SD ¼2.7).
The most common diagnostic categories were psychi-
atric (44.1%), traumatic brain injury (37.1%), mixed
neurological (15.8%), or general medical (3%) con-
ditions. Overall, patients reported a mild level of
depression (M
BDI-II
¼16.9, SD
BDI-II
¼12.0) and anxiety
(M
BAI
¼13.1, SD
BAI
¼10.0).
2 L. A. ERDODI ET AL.
Materials
A core battery of neuropsychological tests was adminis-
tered to the majority of the sample (Table 1). However,
the exact test list varied based on the unique assessment
needs of individual patients. The main criterion PVT
was a composite of eleven independent validity indica-
tors labeled “Validity Index Eleven” (VI-11). The
VI-11 reflects the traditional approach of counting the
number of PVT failures along dichotomized (Pass/Fail)
cutoffs (Boone et al., 2002; M. S. Kim, Boone, Victor,
Marion et al., 2010; Nelson et al., 2003), a well-
established practice that represents the conceptual
foundation of performance validity assessment (Boone,
2013; Larrabee, 2012).
Some components of the VI-11 had multiple
different indicators (Table 2). Failing any of these was
counted as an overall Fail (¼1). Failing multiple
indicators within the same component did not change
the outcome (¼1). Missing scores were counted as Pass
(¼0). The heterogeneity in stimulus properties, testing
paradigm, sensory modality, and number of indicators
contributing to the final outcome (i.e., valid vs. invalid)
in each of the constituent PVTs likely results in a non-
linear combination of the cumulative evidence on the
credibility of the overall neurocognitive profile. How-
ever, such method variance is a ubiquitous feature in
performance validity research, and is generally con-
sidered more of a strength than a weakness (Boone,
2007; Iverson & Binder, 2000; Larrabee, 2003, 2014;
Lichtenstein, Erdodi, & Linnea, 2017).
The total value of the VI-11 was computed by
summing its components. A VI-11 1 was considered
Table 1. List of tests administered.
Test name Abbreviation Norms %
ADM
Beck Anxiety Inventory BAI 60.9
Beck Depression Inventory, 2nd edition BDI-II 89.1
California Verbal Leaning Test, 2nd edition CVLT-II Manual 99.0
Complex Ideational Material CIM Heaton 44.6
Conners’ Continuous Performance Test, 2nd edition CPT-II Manual 62.3
Letter and Category Fluency Test FAS & animals Heaton 91.1
Finger Tapping Test FTT Heaton 52.5
Recognition Memory Test RMT 100.0
Rey Complex Figure Test RCFT Manual 90.1
Wechsler Adult Intelligence Scale, 4th edition WAIS-IV Manual 98.0
Wechsler Memory Scale, 4th edition WMS-IV Manual 97.0
Wide Range Achievement Test, 4th edition WRAT-4 Manual 70.8
Wisconsin Card Sorting Test WCST Manual 88.1
Word Choice Test WCT 100.0
Note. T: Heaton: Demographically adjusted norms published by Heaton, Miller, Taylor, and Grant (2004); Manual: Normative data published in the technical
manual; %
ADM
: Percentage of the sample to which each test was administered.
Table 2. Base rates of failure for VI-11 components, cutoffs, and references for each indicator.
Test BR
Fail
Indicator Cutoff Reference
Animals 18.3 T-score 33 Hayward, Hall, Hunt, and Zubrick (1987); Sugarman and Axelrod (2015)
CIM 7.9 Raw score 9 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
T-score 29 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
CVLT-II 16.8 Hits
Recognition
10 Greve, Curtis, Bianchini, and Ord (2009); Wolfe et al. (2010)
FCR 15 Bauer et al. (2007); D. Delis (personal communication, May 2012)
Digit Span 29.2 RDS 7 Greiffenstein et al. (1994); Pearson (2009)
ACSS 6 Axelrod, Fichteberg, Millis, and Wertheimer (2006); Spencer et al. (2013); Trueblood (1994)
LDF 4 Heinly, Greve, Bianchini, Love, and Brennan (2005)
FAS 12.4 T-score 33 Curtis, Thompson, Greve, and Bianchini (2008); Sugarman and Axelrod (2015)
Rey-15 12.4 Recall 9 Lezak (1995); Boone et al. (2002)
RCFT 34.2 Copy raw 26 Lu, Boone, Cozolino, and Mitchell (2003); Reedy et al. (2013)
3-min raw 9.5 Lu et al. (2003); Reedy et al. (2013)
TP
Recognition
6 Lu et al. (2003); Reedy et al. (2013)
Atyp RE 1 Blaskewitz, Merten, and Brockhaus (2009); Lu et al. (2003)
Symbol Search 20.8 ACSS 6 Etherton, Bianchini, Heinly, and Greve (2006); Erdodi, Abeare, et al. (2017)
WCST 17.3 FMS 2 Larrabee (2003); Suhr and Boyer (1999)
LRE >1.9 Greve, Bianchini, Mathias, Houston, and Crouch (2002); Suhr and Boyer (1999)
WMS-IV LM 19.8 I ACSS 3 Bortnik et al. (2010)
II ACSS 4 Bortnik et al. (2010)
Recognition 20 Bortnik et al. (2010); Pearson (2009)
WMS-IV VR 19.3 Recognition 4 Pearson (2009)
Note. BR
Fail
: Base rate of failure (% of the sample that failed one or more indicators within the test); CIM: Complex Ideational Material; CVLT-II: California Verbal
Learning Test, 2nd edition; FCR: Forced choice recognition; RDS: Reliable digit span; ACSS: Age-corrected scaled score; LDF: longest digit span forward; RCFT:
Rey Complex Figure Test; TP
Recognition
: Recognition true positives; Atyp RE: Atypical recognition errors; WCST: Wisconsin Card Sorting Test; FMS: Failure to
maintain set; UE: Unique errors; LRE: Logistical regression equation; WMS-IV: Wechsler Memory Scale, 4th edition; LM: Logical Memory; VR: Visual
Reproduction.
APPLIED NEUROPSYCHOLOGY: ADULT 3
a Pass. Given that the most liberal cutoff available was
applied to a relatively high number of constituent
PVTs, the model is optimized for sensitivity by design.
Therefore, to protect against false positive errors, a
higher threshold (≥3) was used to define Fail on the
VI-11. A score of two was considered inconclusive
and hence, excluded from further analyses involving
the VI-11 to preserve the diagnostic purity of the cri-
terion groups (Erdodi & Roth, 2016; Greve & Bianchini,
2004; Lichtenstein, Erdodi, Rai, Mazur-Mosiewicz, &
Flaro, 2016; Sugarman & Axelrod, 2015).
As an aggregate measure of several PVTs represent-
ing a wide range of sensory modalities and testing
paradigms, the VI-11 is a representative measure of per-
formance validity that incorporates information from
multiple independent instruments. At the same time,
it is a heterogeneous composite that may introduce a
source of error into the signal detection analyses. To
address that, two new composite measures were
developed, labeled “Erdodi Index.” The first one was
constructed by aggregating five forced-choice recog-
nition based PVTs (EI-5
REC
), and the second one by
aggregating five processing speed based PVTs
(EI-5
PSP
), following the methodology described by
Erdodi and colleagues (Erdodi, Pelletier, & Roth, 2016;
Erdodi, Roth, et al., 2014).
The two versions of the EI-5 were designed to mirror
the dual nature of the RMT and WCT: the overall rec-
ognition accuracy score (EI-5
REC
) and the time taken to
complete the recognition trial (EI-5
PSP
). As such, they
serve as modality-specific criterion measures, providing
a more nuanced analysis of the RMT’s and WCT’s
classification accuracy. Previous research found that
the inherent signal detection properties of the reference
PVT alter the classification accuracy of the instrument
under investigation (Erdodi, Tyson, Shahein, et al.,
2017), arguing for methodological pluralism in calibrat-
ing new tests (Erdodi, Abeare, et al., 2017).
In addition, the EI-5s have the advantage of
capturing the underlying continuity in performance
validity by differentiating between “near passes” (Bigler,
2012, 2015) and extreme forms of failure (Table 3).
Two-thirds of the sample obtained values 1 on both
versions of the EI-5, placing them in the passing range.
Around 20% of the sample obtained EI-5 values of two
or three, indicating either a single failure at the most
conservative cutoff or multiple failures at more liberal
cutoffs. Regardless of the specific combination, this
range of performance starts to raise doubts about the
credibility of the profile, without providing evidence
that is strong enough to render the entire data set
invalid. Therefore, this range was labeled as Borderline,
and excluded from calculating classification accuracy.
An EI-5 value ≥4, however, indicates either multiple
failures at the most liberal cutoffs, or at least two failures
at more conservative cutoffs. As such, this range of
performance provides sufficient evidence to confidently
classify the profile as invalid.
Procedure
All tests were administered and scored by trained staff
psychometricians, pre-doctoral interns or post-doctoral
fellows under the supervision of licensed psychologists
with specialty training in neuropsychology, following
standard instructions. The RMT and WCT were
administered in counterbalanced order, either at the
beginning or the end of the test battery. The study
Table 3. Components of the EI-5s with different levels of cutoff scores and corresponding base rates of failure.
EI-5
REC
EI-5
PSP
Components 0 1 2 3 Components 0 1 2 3
FCR
CVLT-II
16 15 14 13 Animals T >33 25–33 21–24 20
BR 85.6 4.6 3.1 6.7 BR 81.7 9.9 4.5 4.0
LM
WMS-IV
Recognition >20 18–20 17 16 CPT-II #Fail 0 1 2 3
BR 84.7 8.9 2.5 4.0 BR 71.8 12.4 4.5 11.4
RCFT REC-TP >6 6 4 3 FAS T >33 32–33 28–31 27
BR 86.7 6.1 4.4 2.8 BR 86.1 5.9 4.0 4.0
VPA
WMS-IV
Recognition >35 32–35 28–29 27 FTT # Fail 0 1 2
BR 85.1 8.4 4.5 2.0 BR 92.7 5.8 1.5
VR
WMS-IV
Recognition >4 4 3 2 WAIS-IV CD >5 5 4 3
BR 83.2 7.4 4.0 5.4 BR 85.8 4.1 6.1 4.1
Note. EI-5
REC
: Erdodi Index – Five-variable model based on measures of recognition memory; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of
processing speed; LM: Logical Memory (Bortnik et al., 2010; Pearson, 2009); VPA: Verbal Paired Associates (Pearson, 2009); VR Recog: Visual Reproduction
(Pearson, 2009); FCR
CVLT-II
: California Verbal Learning Test, 2nd Edition, Forced Choice Recognition (Bauer, Yantz, Ryan, Warned, & McCaffrey, 2005;
D. Delis, personal communication, May 2012; Erdodi, Kirsch, et al., 2014; Erdodi, Roth, et al., 2014); RCFT REC-TP: Rey Complex Figure Test recognition
true positives (Lu et al., 2003; Reedy et al., 2013); FTT Failures: Finger tapping test, number of scores at 35/28 dominant hand and 66/58 combined
mean raw scores (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014); FAS: Letter fluency T-score (Curtis et al., 2008; Sugarman & Axelrod, 2015);
Animals: Category fluency T-score (Sugarman & Axelrod, 2015); CPT-II Failures: Conners’ Continuous Performance Test, 2nd edition; number of T-scores
>70 on Omissions, Hit Reaction Time Standard Error, Variability, and Perseverations (Erdodi, Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve,
& Bianchini, 2010); WAIS-IV CD: Coding age-corrected scaled score (Etherton et al., 2006; N. Kim, Boone, Victor, Lu, et al., 2010; Trueblood, 1994); BR:
Base rate (%).
4 L. A. ERDODI ET AL.
was approved by the ethics board of the hospital where
the data were collected and the university where the
project was finalized. Relevant APA ethical guidelines
regulating research with human participants were
followed throughout the study.
Data analysis
Descriptive statistics (mean, standard deviation, base
rates of failure) were reported for relevant variables.
The main inferential analyses were one-way analyses
of variance (ANOVAs), independent t-tests and
Chi-square tests of independence. Effect size estimates
were computed using partial eta squared (g
2
) and
Cohen’s d. Sensitivity and specificity were calculated
using standard formulas (Grimes & Schultz, 2005).
Results
Validating the criterion measures
All ANOVAs using the trichotomized VI-11 (Pass-
Borderline-Fail) as the independent variable and the
RMT, WCT, EI-5
REC
and EI-5
PSP
as dependent variables
were statistically significant. Effect size estimates ranged
from large (g
2
¼.16) to very large (g
2
¼.36). Although
all post hoc contrasts between the Pass and Fail con-
ditions were significant, the Pass vs. Borderline contrast
failed to reach significance on the RMT and WCT accu-
racy scores. Likewise, the Borderline vs. Fail contrast
was not significant on the WCT completion time
(Table 4).
Similarly, all ANOVAs using the trichotomized
EI-5
REC
(Pass-Borderline-Fail) as the independent
variable and the RMT, WCT, VI-11 and EI-5
PSP
as
dependent variables were statistically significant. Effect
sizes ranged from .07 (medium) to .41 (very large).
All post hoc contrasts were significant, excepting the
RMT completion time and EI-5
PSP
. On these two out-
come measures, the Borderline vs. Fail contrast did
not reach significance (Table 5).
Finally, although all ANOVAs using the trichoto-
mized EI-5
PSP
(Pass-Borderline-Fail) as the independent
variable, and the RMT, WCT, VI-11 and EI-5
REC
as
dependent variables were statistically significant, effect
sizes were noticeably smaller (g
2
¼.06–.26). As before,
post hoc contrasts between Borderline vs. Fail conditions
were non-significant, with the exception of the VI-11 as
the outcome measure (Table 6). These analyses provide
empirical support for the three validity composites, as
well as the exclusion of the Borderline scores. Patients
scoring in this indeterminate range had significantly
more evidence of invalid responding than those in the
Pass condition. At the same time, they did not demon-
strate PVT failures severe enough to be classified as
invalid beyond a reasonable doubt. The spiking
within-group variability in the Borderline group further
substantiates concerns that assigning these patients to
either the Pass or the Fail group would inadvertently
misclassify a large proportion of this subsample.
Classification accuracy of traditional cutoffs
At the 42 cutoff proposed by M. S. Kim, Boone,
Victor, Marion, et al., (2010), the RMT had .46
sensitivity at .88 specificity against the VI-11. This is
comparable to the .47 sensitivity and .85 specificity
against the EI-5
PSP
. Classification accuracy improved
notably against the EI-5
REC
(.88 sensitivity at .91
specificity).
Table 4. Results of one-way ANOVAs on RMT, WCT, EI-5
REC
, and EI-5
PSP
scores across VI-11classification ranges.
VI-11
0–1
n ¼101
2
n ¼32
≥3
n ¼69
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 46.9 44.1 39.9 21.0 <.001 .17 PASS vs. FAIL; BOR vs. FAIL
SD 3.8 9.0 9.1
RMT
Time
M 132.9 154.9 194.2 17.3 <.001 .15 PASS vs. FAIL; PASS vs. BOR
SD 62.8 56.1 75.7
WCT
Accuracy
M 49.0 46.5 44.0 18.7 <.001 .16 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 7.2 7.3
WCT
Time
M 110.7 132.6 177.7 18.7 <.001 .16 PASS vs. FAIL; BOR vs. FAIL
SD 50.1 56.5 94.8
EI-5
REC
M 0.2 1.2 3.0 55.6 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 0.6 1.2 2.7
EI-5
PSP
M 0.6 1.7 2.7 22.3 <.001 .18 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.1 1.9 2.9
Note. Post hoc pairwise contrasts were computed using the least significant difference method; VI-11: Validity Index Eleven; BOR: Borderline; g
2
: Partial eta
squared; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory Test – Words (Completion time in seconds);
WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds); EI-5
REC
: Erdodi Index – Five-variable model
based on measures of recognition memory; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of processing speed.
APPLIED NEUROPSYCHOLOGY: ADULT 5
Previous research suggests that a WCT cutoff of 45
corresponds to an RMT 42 (Davis, 2014; Erdodi,
Kirsch, et al., 2014). This cutoff had .41 sensitivity at
.95 specificity against the VI-11, which is similar to
the .33 sensitivity and .86 specificity observed against
the EI-5
PSP
. Sensitivity improved against the EI-5
REC
(.74), while specificity remained essentially the same
(.94).
Identifying a pool of critical items
The failure rate on each RMT and WCT item was
compared between those who passed and those who
failed the criterion PVTs. The items that were retained
met the following inclusion criteria: (1) The proportion
of correct responses was significantly higher in the
valid group compared to the invalid group; (2) The
proportion of correct responses in the valid group was
at least 15% higher compared to the invalid group;
and (3) The item met the first two criteria against all
three criterion PVTs. This last restriction was intro-
duced to minimize the effect of instrumentation
artifacts and therefore, improve the generalizability of
the findings, ensuring that the critical items will per-
form well against a variety of different criterion PVTs.
Establishing groups of critical items
The seven best items meeting all three criteria were
selected for further analyses (“critical seven” or CR-7)
within both tests. Next, a smaller group of five critical
items (CR-5) was created by dropping the two CR-7
items with the least discriminant power. Finally, the
number of critical items was further reduced to three
Table 5. Results of one-way ANOVAS on RMT, WCT, VI-11, and EI-5
PSP
scores across EI-5
REC
classification ranges.
EI-5
REC
0–1
n ¼138
2–3
n ¼41
≥4
n ¼23
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 46.8 42.1 32.4 56.50 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 4.1 9.5 8.3
RMT
Time
M 141.0 184.1 209.8 13.80 <.001 .12 PASS vs. BOR; PASS vs. FAIL
SD 67.5 62.5 76.6
WCT
Accuracy
M 48.9 45.5 37.4 68.40 <.001 .41 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.8 6.8 8.6
WCT
Time
M 114.4 165.6 225.3 31.20 <.001 .24 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 50.5 67.4 125.3
VI-11 M 1.2 3.6 4.6 73.00 <.001 .42 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.5 1.7 1.6
EI-5
PSP
M 1.1 2.4 2.1 7.32 <.001 .07 PASS vs. BOR; PASS vs. FAIL
SD 1.7 3.0 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5
REC
: Erdodi Index – Five-variable model based on measures of
recognition memory; BOR: Borderline; g
2
: Partial eta square; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory
Test – Words (Completion time in seconds); WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds); VI-11:
Validity Index Eleven; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of processing speed.
Table 6. Results of one-way ANOVAs on RMT, WCT, VI-11, and EI-5
REC
scores across EI-5
PSP
classification ranges.
EI-5
PSP
0–1
n ¼133
2–3
n ¼48
≥4
n ¼21
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 45.8 41.5 39.8 10.10 <.001 .09 PASS vs. BOR; PASS vs. FAIL
SD 6.2 8.5 10.2
RMT
Time
M 135.5 193.5 209.2 21.10 <.001 .18 PASS vs. BOR; PASS vs. FAIL
SD 57.9 80.1 73.3
WCT
Accuracy
M 47.9 45.4 44.4 6.02 <.01 .06 PASS vs. BOR; PASS vs. FAIL
SD 3.9 7.3 9.2
WCT
Time
M 122.5 166.0 160.7 7.35 <.01 .07 PASS vs. BOR; PASS vs. FAIL
SD 70.3 86.9 54.6
VI-11 M 1.4 2.8 4.5 34.20 <.001 .26 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 1.9 2.0
EI-5
REC
M 1.0 1.8 2.1 3.96 <.05 .04 PASS vs. BOR; PASS vs. FAIL
SD 2.0 2.4 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of
processing speed; BOR: Borderline; g
2
: Partial eta square; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory
Test – Words (Completion time in seconds); WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds);
VI-11: Validity Index Eleven; EI-5
REC
: Erdodi Index – Five-variable model based on measures of recognition memory.
6 L. A. ERDODI ET AL.
(CR-3), retaining only those with the highest discrimi-
nant power. The value of each subset of critical items
reflects the number of incorrect responses (i.e., higher
values indicate stronger evidence of invalid perfor-
mance). Having these three combinations of critical
items increases the chances of identifying non-credible
responding as it provides alternative detection strate-
gies. The specific combination of critical items is not
disclosed within this manuscript to protect test security
and to guard the newly developed diagnostic tool from
unauthorized use. However, the information will be
provided to qualified clinicians. Interested readers
should contact the first author.
Signal detection performance of critical items in
the RMT
A CR-7
RMT
cutoff ≥3 achieved good specificity (.91–.96)
and variable sensitivity (.38–.70). Increasing the cutoff
to ≥ 4 produced the predictable trade-off between
improved specificity (.97–1.00) and declining sensitivity
(.18–.35). Increasing the cutoff to ≥5 reached the point
of diminishing return with minimal gains in specificity
(.98–1.00), but further deterioration in sensitivity
(.09–.24).
A CR-5
RMT
cutoff ≥2 produced an acceptable combi-
nation of sensitivity (.54) and specificity (.86) against the
VI-11, but failed to reach the lower threshold for speci-
ficity against the EI-5s. Increasing the cutoff to ≥3
resulted in a marked increase in specificity (.95–.99),
but a proportional loss in sensitivity (.27–.39). A further
increase to ≥4 sacrificed much of the sensitivity (.10–.22)
for negligible gains in specificity (.97–1.00).
A CR-3
RMT
cutoff ≥1 failed to reach minimum speci-
ficity against any of the criterion PVTs. Therefore, it
received no further consideration. However, increasing
the cutoff to ≥2 produced good combinations of sensi-
tivity (.33–.43) and specificity (.91–.95). Failing all three
of the CR-3
RMT
items was associated with near-perfect
specificity (.97–1.00), but low sensitivity (.10–.19).
Further details are displayed in Table 7.
Signal detection performance of critical items in
the WCT
A CR-7
WCT
cutoff ≥1 produced an acceptable combi-
nation of sensitivity (.58) and specificity (.84) against
the VI-11, but failed to reach the lower threshold for
specificity against the EI-5s. Increasing the cutoff to
≥2 resulted in notable improvement in specificity
(.91–.99), with relatively well-preserved, although
fluctuating sensitivity (.34–.70). Further increasing the
cutoff to ≥3 achieved minimal gains in specificity
(.92–.99), but sacrificed some of the sensitivity (.19–.65).
A CR-5
WCT
cutoff ≥1 produced good combinations
of sensitivity (.49–.87) and specificity (.84–.95) against
all criterion PVTs. Increasing the cutoff to ≥2 produced
the predictable trade-off between rising specificity
(.95–1.00) and declining sensitivity (.27–.65). Raising
the cutoff to ≥3 resulted in consistently high speci-
ficity (.95–1.00), but low and fluctuating sensitivity
(.14–.48).
Similarly, a CR-3
WCT
cutoff ≥1 produced good
combinations of sensitivity (.40–.74) and specificity
(.88–.95) against all criterion PVTs. Increasing the cut-
off to ≥2 sacrificed half of the sensitivity (.22–.57) for
small gains in specificity (.95–1.00). As with the RMT,
failing all three of the CR-3
WCT
items was associated
with near-perfect specificity (.98–1.00), but low sensi-
tivity (.05–.13). Further details are displayed in Table 8.
Unique contribution of critical items to the
classification accuracy of the RMT and WCT
To objectively evaluate the unique contribution of
critical items above and beyond traditional cutoffs, we
examined the profiles that failed one cutoff, but passed
the other in relation to VI-11 scores. Between 1.8% and
Table 7. Sensitivity and specificity of three combinations of critical items within the RMT across different cutoffs.
VI-11 EI-5
REC
EI-5
PSP
n ¼170
34.2%
n ¼161
14.3%
n ¼154
13.6%
Cutoff BR
Fail
(%) SENS SPEC SENS SPEC SENS SPEC
CR-7
RMT
≥3 17.6 .39 .96 .70 .92 .38 .91
≥4 7.0 .18 1.00 .35 .97 .29 .97
≥5 4.0 .09 1.00 .13 .98 .24 .98
CR-5
RMT
≥2 28.6 .54 .86 .78 .83 .52 .80
≥3 11.1 .27 .99 .39 .95 .38 .95
≥4 3.5 .10 1.00 .22 .99 .14 .97
CR-3
RMT
≥1 40.2 .67 .72 .78 .69 .76 .69
≥2 15.6 .33 .95 .43 .91 .33 .92
¼3 4.5 .10 1.00 .17 .99 .19 .97
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); EI-5
REC
: Recognition memory based validity composite
(Pass 1, Fail ≥4); EI-5
PSP
: Processing speed based validity composite (Pass 1, Fail ≥4); SENS: Sensitivity; SPEC: Specificity; BR
Fail
: Base rate of failure (% of
sample that scored below the given cutoff); RMT: Recognition Memory Test – Words; CR: Critical items.
APPLIED NEUROPSYCHOLOGY: ADULT 7
5.0% of patients who passed the traditional cutoff on the
RMT (>42) failed select CR
RMT
cutoffs (Table 9).
Within the WCT, between 5.6% and 17.5% of those
who passed the traditional cutoff (>45) failed select
CR
WCT
cutoffs (Table 10). These results suggest that
critical items increase the sensitivity of both tests, while
maintaining high specificity.
Failing critical items increases the confidence in the
decision to classify the response set as invalid even in
examinees who already failed the traditional cutoff.
Among the subset of patients who failed the traditional
cutoff on the RMT (42), those who also failed CR
RMT
had higher VI-11 scores, providing stronger evidence of
non-credible presentation. As with the RMT, higher
VI-11 scores were observed among patients who failed
the CR
WCT
cutoffs even within the subsample that failed
the traditional WCT cutoff (45).
Conversely, patients who failed the traditional cutoff,
but provided correct answers on the critical items were
examined separately. Everybody who scored 42 on the
RMT failed at least one critical item. Eight of them
produced CR-7
RMT
¼1. Three of these patients were
clear false positive errors based on their PVT profile:
they passed all three reference PVTs (VI-11, EI-5
REC
and EI-5
PSP
), the most liberal cutoff on the WCT
(47) and other free-standing PVTs. The remaining
five patients will be discussed below.
One of these could be considered a false positive
based on the combination of clinical history and
neurocognitive profile: a retired physician in his 70s
diagnosed with amnestic Mild Cognitive Impairment.
He scored in the Borderline range on the VI-11 and
EI-5
REC
, had a WCT score of 47, and 11 unique errors
on the WCST. His FSIQ was 118, with a Coding ACSS
of 15. His performance on the acquisition trials of the
CVLT-II was high average (43/80), but his long-delay
free recall was borderline (3/16). He obtained a perfect
score on the Rey-15 and the first trial of the Test of
Memory Malingering. Thus, his performance is broadly
consistent with his diagnosis.
The fifth patient was a 19-year-old woman who
scored in the Borderline range on the VI-11 EI-5
REC
and EI-5
PSP
. She failed the WCT (40) and the Test of
Memory Malingering (35–36–36), produced eight
unique errors on the WCST, a Vocabulary Digit Span
ACSS of 6, and a CVLT-II logistic regression equation
score (.72) in the failing range (Wolfe et al., 2010). In
addition, she produced several combinations of scores
that are internally inconsistent. Given the identifiable
external incentive to appear impaired (on athletic
scholarship, struggling in her classes, seeking an ADHD
diagnosis, stimulant medication, and academic accom-
modations), she met criteria for Malingered
Table 8. Sensitivity and specificity of three combinations of critical items within the WCT across different cutoffs.
VI-11 EI-5
REC
EI-5
PSP
n ¼170
34.2%
n ¼161
14.3%
n ¼154
13.6%
Cutoff BR
Fail
(%) SENS SPEC SENS SPEC SENS SPEC
CR-7
WCT
≥1 31.8 .58 .84 .91 .83 .52 .74
≥2 14.6 .34 .99 .70 .97 .38 .91
≥3 10.6 .25 .99 .65 .98 .19 .92
CR-5
WCT
≥1 22.2 .49 .95 .87 .93 .52 .84
≥2 10.6 .27 1.00 .65 .99 .29 .95
≥3 6.6 .16 1.00 .48 1.00 .14 .95
CR-3
WCT
≥1 19.2 .40 .95 .74 .93 .43 .88
≥2 9.1 .22 1.00 .57 .99 .24 .95
¼3 2.0 .05 1.00 .13 1.00 .05 .98
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); EI-5
REC
: Recognition memory based validity composite
(Pass 1, Fail ≥4); EI-5
PSP
: Processing speed based validity composite (Pass 1, Fail ≥4); SENS: Sensitivity; SPEC: Specificity; BR
Fail
: Base rate of failure (% of
sample that scored below the given cutoff); WCT: Word Choice Test; CR: Critical items.
Table 9. VI-11 scores as a function of passing or failing the traditional RMT cutoff or select cutoffs on the critical items.
RMT Pass (>42) RMT Fail (42)
VI-11 VI-11
n M SD p d n M SD p d
CR-7
RMT
<3 144 1.59 1.71 <.05 .78 20 2.50 2.24 <.05 .72
≥3 6 3.33 2.66 29 3.97 1.80
CR-5
RMT
<3 147 1.62 1.73 <.05 .79 30 2.97 2.30 <.05 .52
≥3 3 3.67 3.22 19 4.00 1.60
CR-3
RMT
<2 142 1.61 1.76 .06 .55 26 3.00 2.28 .10 .24
≥2 8 2.63 1.92 23 3.78 1.83
Note. RMT: Recognition Memory Test (Words); VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); CR: Critical
items.
8 L. A. ERDODI ET AL.
Neurocognitive Dysfunction introduced by Slick,
Sherman, and Iverson (1999).
The sixth patient was a 47-year-old woman with a
history of childhood abuse who scored in the Borderline
range on the EI-5
REC
, failed the VI-11 (3), the WCT
(40), and the Dot Counting Test (28.3) in addition to
the validity cutoffs in Coding (ACSS ¼5) and Symbol
Search (ACSS ¼4). Her CVLT-II profile was internally
inconsistent; with a low average acquisition score of
42/80 and a Forced Choice Recognition score of 11/16
(invalid beyond reasonable doubt). Overall, her profile
can be considered invalid, and hence, a true positive.
The last two patients are more difficult to classify.
One of them was a 59-year-old woman with a history
of incestuous sexual trauma. She passed the EI-5
PSP
,
but failed the WCT (43), Test of Memory Malingering
(34–47–49), the EI-5
REC
(8), and the VI-11 (7). Despite
ample evidence of invalid performance, all her PVT
failures were limited to memory tests, which was
consistent with her self-reported decline of attention
and memory. In addition, she performed in the
high average range on Coding, letter fluency and the
Stroop test.
The last patient had an identifiable external incentive
to appear impaired in combination with a documented
history of stroke and brain tumor treated with radiation.
She passed the EI-5
REC
, EI-5
PSP
, and VI-11, but failed
the WCT (47) and the logistic regression equation
(.74) by Wolfe et al. (2010). In addition, she performed
in the average to high average range on Coding, Logical
Memory, animal fluency, and Trail Making Test.
On the WCT, three patients failed the traditional
cutoff (45) but provided correct answers on all of
the CR-7
WCT
items. Two of them were clear false
positive errors: a 29-year-old man with 17 years of edu-
cation referred for a mild TBI and a 57-year-old man
with 18 years of education referred for Parkinson’s dis-
ease. The former passed the EI-5
REC
, EI-5
PSP
, VI-11, the
RMT (49), as well as the Test of Memory Malingering
(47–50–50) and produced a largely intact neurocogni-
tive profile. The latter scored in the Borderline range
on the EI-5
REC
, and VI-11, passed the RMT (43) as well
as the Test of Memory Malingering (48–50–50) and
produced a profile consistent with his diagnosis.
The third patient was a 39-year-old woman
diagnosed with Personality Disorder NOS with
Borderline Features and fibromyalgia. She passed the
EI-5
REC
, scored in the Borderline range on the
EI-5
PSP
, but failed the VI-11, the RMT (30), Test of
Memory Malingering (33–42–39) and Rey-15 (7). As
such, her neurocognitive profile can be confidently
classified as invalid. Nevertheless, she demonstrated
average to high average performance on tests of
memory and processing speed.
Discussion
This study examined the potential of critical items
within the RMT and WCT to enhance the diagnostic
power of the traditional cutoffs based on total scores
in a large sample of patients clinically referred for neu-
ropsychological assessment. Our hypothesis that critical
items would improve classification accuracy was sup-
ported. Critical items increased the sensitivity for both
tests by correctly identifying 2–17% of response sets that
passed the traditional cutoffs as invalid. They also
increased specificity by providing additional empirical
evidence that response sets identified as invalid by the
traditional cutoffs were correctly classified as such. In
addition, a total score in the failing range combined
with correct responses on the critical items was associa-
ted with an increased risk of a false positive error,
indicating the need for further analysis of the PVT
profile. Although a simultaneous increase in both
sensitivity and specificity seems paradoxical at face
value, this bidirectional improvement in classification
accuracy follows the inner logic behind critical item
analysis.
The heuristic underlying this method is that the total
number of items failed and the specific combination of
items failed contribute nonredundant information
about performance validity (Bortnik et al., 2010;
Killgore & DellaPietra, 2000). If an examinee passed
the cutoff based on total score, but failed the cutoff
Table 10. VI-11 scores as a function of passing or failing the traditional WCT cutoff or select cutoffs on the critical items.
WCT Pass (>45) WCT Fail (45)
VI-11 VI-11
n M SD p d n M SD p d
CR-7
WCT
<2 132 1.54 1.71 <.05 .48 3 2.00 1.00 .06 1.18
≥2 28 2.43 2.01 35 3.89 2.03
CR-5
WCT
¼0 148 1.53 1.68 <.05 1.14 6 2.33 1.21 <.05 .99
≥1 12 3.67 2.06 32 4.00 2.05
CR-3
WCT
¼0 151 1.59 1.71 <.05 .91 9 3.00 1.80 .11 .38
≥1 9 3.44 2.30 29 3.97 2.06
Note. WCT: Word Choice Test; VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); CR: Critical items.
APPLIED NEUROPSYCHOLOGY: ADULT 9
based on critical items, the new method increased
sensitivity by correctly detecting a non-credible
examinee that was missed by the traditional method.
Conversely, if an examinee just barely failed the most
liberal cutoff based on total score, some would interpret
this performance as a “near-pass” (Bigler, 2012), and
argue that it actually represents a false positive error
(i.e., the instrument has unacceptably low specificity).
However, if the examinee in question also failed a
certain combination of critical items associated with
higher specificity, that pattern of performance would
strengthen the evidence that the overall profile is indeed
invalid. As such, critical item analysis could effectively
increase the specificity of a given diagnostic decision.
Critical items could also serve as a safeguard against
false positive errors. If an examinee failed the traditional
cutoff based on total score, but provided correct
responses on the critical items, the profile may warrant
a more in-depth analysis. Our data suggest that at least
half of such cases were incorrectly classified as invalid
(i.e., they are false positives). In addition, patients with
complex psychiatric history were overrepresented
among those who were eventually determined to be true
positives. If this finding replicates in larger samples, the
discrepancy between the total score and critical item
cutoff on the RMT and WCT could provide useful in
subtyping invalid performance. Since malingering (Slick
et al., 1999), a “cry for help” (Berry et al., 1996), and
psychogenic interference (Erdodi, Tyson, Abeare,
et al., 2016) differ in etiology, distinguishing among
them would enhance diagnostic accuracy as well as
improve the clinical management of patients currently
lumped together on the failing side of PVTs.
The three levels of critical items (CR-7, CR-5, and
CR-3) employ different detection strategies. Essentially,
they trade the size of the item pool for the number of
item failures required to deem a response set invalid.
For example, within the RMT ≥3 failures on the CR-7
or CR-5 have comparable classification accuracy to ≥2
failures on the CR-3. Likewise, within the WCT ≥2
failures on the CR-7 or CR-5 have similar classification
accuracy to ≥1 failures on the CR-3.
However, critical items in both tests capitalize on the
inherent differences among test items regarding their
ability to differentiate between credible and non-
credible responding. They rescale the original test by
eliminating inactive items, and only retain the ones with
the highest discriminant power (Embretson, 1996). As
such, they provide evaluators with an alternative
method of assessing performance validity. Instead of
counting how many of the total items the examinee
failed, they are focusing on which items were failed.
Therefore, critical item analyses enhance the
interpretation of a given score on the RMT or WCT
above and beyond the total score by demonstrating that
even within the subset of patients who passed the old
cutoff based on all 50 items, there was a significant dif-
ference between those who passed and those who failed
the cutoff based on critical items. For example, those
who scored >42 on RMT and also passed the CR-
7
RMT
, had statistically and clinically significantly lower
VI-11 scores (i.e., more likely to be valid) than those
who scored >42 on RMT, but failed the CR-7
RMT
.
Depending on the specific combination of item-level
scores, this approach can promote superior overall
classification accuracy. For example, while an RMT total
score of 48 is considered a clear Pass (Iverson &
Franzen, 1994; M. S. Kim, Boone, Victor, Marion,
et al., 2010), if the two incorrect responses were from
CR-3
RMT
, that score provides strong evidence (.91–.95
specificity) of non-credible responding. Even if the total
score is already below the cutoff, critical items can still
enhance the confidence in classifying it as invalid. For
example, a WCT total score of 46 is already considered
a Fail, with specificity between .87 (Davis, 2014) and .92
(Erdodi, Kirsch, et al., 2014). However, if at least two of
the incorrect responses were from the CR-5
WCT
,
specificity increases to .95–1.00.
When compared to the RMT, critical items appear
more useful within the WCT. They expand sensitivity
further (6–18%) compared to RMT (2–5%). Also,
among patients who failed the traditional cutoff based
on total score, passing or failing the critical items had
a stronger relationship with number of PVT failures
within the WCT (d: .38–1.18) relative to the RMT
(d: .24–.72). This finding adds to the accumulating
empirical evidence supporting the clinical utility of the
WCT (Barhon et al., 2015; Davis, 2014; Erdodi, Kirsch,
et al., 2014; Miller et al., 2011).
This study developed three different sets of critical
items within the RMT and WCT that enhance the
classification accuracy of both instruments in a clinically
meaningful way. Critical item analysis requires no
additional test material or administration time, yet
provides a time- and cost-effective alternative to evalu-
ate performance validity independent of traditional total
score cutoffs. In a sense, they re-examine the data and
provide a “second opinion” regarding clinical classi-
fication of a given RMT or WCT response set.
Additionally, they provide objective, data-driven infor-
mation to address the contested issue of “near passes”
(Bigler, 2012, 2015), with clear clinical and forensic
implications.
The findings should be interpreted within the context
of the study’s limitations. The sample was geographi-
cally restricted, and diagnostically heterogeneous.
10 L. A. ERDODI ET AL.
Future research would benefit from replication using
different, more homogenous samples and different
reference PVTs. Finally, given that the clinical utility
of critical item analysis ultimately depends on the
number of incorrect answers that occur on these
specific items, the model is still vulnerable to chance.
Therefore, the generalizability of the results can only
be determined by independent replications of the
present study.
Acknowledgments
This project received no financial support from outside funding
agencies. Relevant ethical guidelines regulating research
involving human participants were followed throughout the
project. All data collection, storage and processing was done
in compliance with the Helsinki Declaration. The authors have
no disclosures to make that could be interpreted as conflict of
interests.
References
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., &
McPhearson, S. (2005). Sensitivity and specificity of finger
tapping test scores for the detection of suspect effort. The
Clinical Neuropsychologist, 19(1), 105–120. doi:10.1080/
13854040490888567
Axelrod, B. N., Fichteberg, N. L., Millis, S. R., & Wertheimer,
J. C. (2006). Detecting incomplete effort with digit span
from the Wechsler Adult Intelligence Scale – Third Edition.
The Clinical Neuropsychologist, 10, 513–523. doi:10.1080/
13854040590967117
Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger
tapping test performance as a measure of performance
validity. The Clinical Neuropsychologist, 28(5), 876–888.
doi:10.1080/13854046.2014.907583
Barhon, L. I., Batchelor, J., Meares, S., Chekaluk, E., & Shores,
E. A. (2015). A comparison of the degree of effort involved
in the TOMM and the ACS word choice test using a dual-
task paradigm. Applied Neuropsychology: Adult, 22(2),
114–123.
Bauer, L., O’Bryant, S. E., Lynch, J. K., McCaffrey, R. J., &
Fisher, J. M. (2007). Examining the test of memory malin-
gering trial 1 and word memory test immediate recognition
as screening tools for insufficient effort. Assessment, 14(3),
215–222. doi:10.1177/1073191106297617
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., &
McCaffrey, R. J. (2005). An examination of the California
verbal learning test II to detect incomplete effort in a trau-
matic brain injury sample. Applied Neuropsychology, 12(4),
202–207. doi:10.1207/s15324826an1204_3
Berry, D. T. R., Adams, J. J., Clark, C. D., Thacker, S. R.,
Burger, T. L., Wetter, M. W., Baer, R. A., & Borden,
J. W. (1996). Detection of a cry for help on the MMPI-2:
An analog investigation. Journal of Personality Assessment,
67(1), 26–36.
Bigler, E. D. (2012). Symptom validity testing, effort and
neuropsychological assessment. Journal of the International
Neuropsychological Society, 18, 632–642. doi:10.1017/
s1355617712000252
Bigler, E. D. (2015). Neuroimaging as a biomarker in
symptom validity and performance validity testing. Brain
Imaging and Behavior, 9(3), 421–444. doi:10.1007/s11682-
015-9409-1
Bilker, W. B., Wierzbicki, M. R., Brensinger, C. M., Gur, R. E.,
& Gur, R. C. (2014). Development of abbreviated eight-
item form of the Penn verbal reasoning test. Assessment,
21, 669–678. doi:10.1177/1073191114524270
Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection
of suboptimal effort with the Rey complex figure test and
recognition trial. Applied Neuropsychology, 16, 54–61.
doi:10.1080/09084280802644227
Boone, K. B. (2007). Assessment of feigned cognitive impair-
ment. A neuropsychological perspective. New York, NY:
Guilford.
Boone, K. B. (2009). The need for continuous and
comprehensive sampling of effort/response bias during
neuropsychological examination. The Clinical Neuropsy-
chologist, 23(4), 729–741. doi:10.1080/13854040802427803
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.
New York, NY: Guilford.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., &
Razani, J. (2002). The Rey 15-item recognition trial: A
technique to enhance sensitivity of the Rey 15-item
memorization test. Journal of Clinical and Experimental
Neuropsychology, 24(5), 561–573.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler,
E., Victor, T. L., & Zeller, M. A. (2010). Examination of
various WMS-III logical memory scores in the assessment
of response bias. The Clinical Neuropsychologist, 24(2),
344–357. doi:10.1080/13854040903307268
Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psycho-
logical assessment of symptom and performance validity,
response bias, and malingering: Official position of the
association for scientific advancement in psychological
injury and law. Psychological Injury and Law, 7(3),
197–205. doi:10.1007/s12207-014-9198-7
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini,
K. J., Boone, K. B., Kirkwood, M. W., … Ord, J. S. (2015).
Official position of the American academy of clinical
neuropsychology social security administration policy on
validity testing: Guidance and recommendations for
change. The Clinical Neuropsychologist, 29(6), 723–740.
doi:10.1080/13854046.2015.1099738
Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini,
K. J. (2008). Verbal fluency indicators of malingering in
traumatic brain injury: Classification accuracy in known
groups. The Clinical Neuropsychologist, 22, 930–945.
doi:10.1080/13854040701563591
Davis, J. J. (2014). Further consideration of advanced clinical
solutions word choice: Comparison to the recognition
memory test Words and classification accuracy on a
clinical sample. The Clinical Neuropsychologist, 28(8),
1278–1294. doi:10.1080/13854046.2014.975844
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (2000).
California Verbal Learning Test – Second Edition, Adult
Version, manual. San Antonio, TX: Psychological
Corporation.
Denning, J. H. (2012). The efficiency and accuracy of the test
of memory malingering trial 1, errors on the first 10 items
of the test of memory malingering, and five embedded
measures in predicting invalid test performance. Archives
APPLIED NEUROPSYCHOLOGY: ADULT 11
of Clinical Neuropsychology, 27(4), 417–432. doi:10.1093/
arclin/acs044
Embretson, S. E. (1996). The new rules of measurement.
Psychological Assessment, 8(4), 341.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T.,
Kucharski, B., Zuccato, B. G., & Roth, R. M. (2017).
WAIS-IV processing speed scores as measures of non-
credible responding–The third generation of embedded
performance validity indicators. Psychological Assessment,
29(2), 148–157. doi:10.1037/pas0000319
Erdodi, L. A., Jongsma, K. A., & Issa, M. (2017). The 15-item
version of the Boston naming test as an index of English
proficiency. The Clinical Neuropsychologist, 31(1), 168–178.
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., &
Medoff, B. (2014). Comparing the recognition memory test
and the word choice test in a mixed clinical sample: Are
they equivalent? Psychological Injury and Law, 7(3),
255–263. doi:10.1007/s12207-014-9197-8
Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2016). Elevations
on select Conners’ CPT-II scales indicate noncredible
responding in adults with traumatic brain injury. Applied
Neuropsychology: Adult, 1–10. doi:10.1080/23279095.2016.
1232262 [Advance online publication]
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE
complex ideational material are associated with invalid
performance in adults without aphasia. Applied Neuropsy-
chology: Adult, 1–11. doi:10.1080/23279095.2016.1154856
[Advance online publication]
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R.,
& Medoff, B. (2014). Aggregating validity indicators
embedded in Conners’ CPT-II outperforms individual cut-
offs at separating valid from invalid performance in adults
with traumatic brain injury. Archives of Clinical Neuropsy-
chology, 29(5), 456–466. doi:10.1093/arclin/acu026
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D.,
Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE
complex ideational material A measure of receptive
language or performance validity? Psychological Injury
and Law, 9, 112–120. doi:10.1007/s12207-016-9254-6
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D.,
Abeare, C. A., Pelletiere, C. L., … Roth, R. M. (2017). The
power of timing: Adding a time-to-completion cutoff to
the Word Choice Test and Recognition Memory Test
improves classification accuracy. Journal of Clinical and
Experimental Neuropsychology, 39(4), 369–383. doi:10.1080/
13803395.2016.1230181 [Advance online publication]
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W.
(2006). Pain, malingering, and performance on the
WAIS-III processing speed index. Journal of Clinical and
Experimental Neuropsychology, 28, 1218–1237. doi:10.1080/
13803390500346595
Fazio, R. L., Denning, J. H., & Denney, R. L. (2017). TOMM
Trial 1 as a performance validity indicator in a criminal
forensic sample. The Clinical Neuropsychologist, 31(1):
251–267.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994).
Validation of malingered amnesia measures with a large
clinical sample. Psychological Assessment, 6, 218–224.
doi:10.1037//1040-3590.6.3.218
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-
offs on psychometric indicators of negative response bias:
A methodological commentary with recommendation.
Archives of Clinical Neuropsychology, 19, 533–541.
Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J.,
& Crouch, J. A. (2002). Detecting malingered neurocogni-
tive dysfunction with the wisconsin card sorting test:
A preliminary investigation in traumatic brain injury. The
Clinical Neuropsychologist, 16(2), 179–191.
Greve, K. W., Curtis, K. L., Bianchini, K. J., & Ord, J. S.
(2009). Are the original and second edition of the
California verbal learning test equally accurate in detecting
malingering? Assessment, 16(3), 237–248.
Grimes, D. A., & Schulz, K. F. (2005). Refining clinical
diagnosis with likelihood ratios. The Lancet, 365(9469),
1500–1505. doi:10.1016/s0140-6736(05)66422-7
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can
localized brain impairment be simulated on neuropsycho-
logical test profiles? Australian and New Zealand Journal
of Psychiatry, 21, 87–93. doi:10.3109/00048678709160904
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004).
Revised comprehensive norms for an expanded Halstead-
Reitan battery: Demographically adjusted neuropsychologi-
cal norms for African American and Caucasian adults. Lutz,
FL: Psychological Assessment Resources.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., &
Brennan, A. (2005). WAIS digit-span-based indicators of
malingered neurocognitive dysfunction: Classification
accuracy in traumatic brain injury. Assessment, 12(4),
429–444.
Hilsabeck, R. C., Gordon, S. N., Hietpas-Wilson, T., &
Zartman, A. L. (2011). Use of trial 1 of the test of memory
malingering (TOMM) as a screening measure of effort: Sug-
gested discontinuation rules. The Clinical Neuropsychologist,
25(7), 1228–1238. doi:10.1080/13854046.2011.589409
Horner, M. D., Bedwell, J. S., & Duong, A. (2006). Abbrevi-
ated form of the test of memory malingering. International
Journal of Neuroscience, 116, 1181–1186. doi:10.1080/
00207450500514029
Iverson, G. I., & Binder, L. M. (2000). Detecting exaggeration
and malingering in neuropsychological assessment. Journal
of Head Trauma Rehabilitation, 15(2), 829–858.
doi:10.1097/00001199-200004000-00006
Iverson, G. L., & Franzen, M. D. (1994). The recognition
memory test, digit span, and Knox cube test as markers
of malingered memory impairment. Assessment, 1(4),
323–334.
Jones, A. (2013). Test of memory malingering: Cutoff scores
for psychometrically defined malingering groups in a
military sample. The Clinical Neuropsychologist, 27(6),
1043–1059. doi:10.1080/13854046.2013.804949
Killgore, W. D., & DellaPietra, L. (2000). Using the WMS-III
to detect malingering: Empirical validation of the rarely
missed index (RMI). Journal of Clinical and Experimental
Neuropsychology, 22(6), 761–771. doi:10.1076/jcen.22.6.
761.960
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S.,
Cottingham, M. E., … Zeller, M. A. (2010). The Warrington
recognition memory test for words as a measure of
response bias: Total score and response time cutoffs
developed on “real world” credible and noncredible sub-
jects. Archives of Clinical Neuropsychology, 25, 60–70.
doi:10.1093/arclin/acp088
12 L. A. ERDODI ET AL.
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., &
Mitchell, C. (2010). Sensitivity and specificity of a
digit symbol recognition trial in the identification of
response bias. Archives of Clinical Neuropsychology, 25,
420–428.
Kulas, J. F., Axelrod, B. N., & Rinaldi, A. R. (2014). Cross-
validation of supplemental test of memory malingering
scores as performance validity measures. Psychological
Injury and Law, 7(3), 236–244. doi:10.1007/s12207-014-
9200-4
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T.,
Pancholi, S., Bhagwat, A., & French, L. M. (2013). Clinical
utility of the Conners’ continuous performance test-II to
detect poor effort in U.S. military personnel following
traumatic brain injury. Psychological Assessment, 25(2),
339–352. doi:10.1037/a0030915
Larrabee, G. J. (2003). Detection of malingering using atypical
performance patterns on standard neuropsychological
tests. The Clinical Neuropsychologist, 17(3), 410–425.
doi:10.1076/clin.17.3.410.18089
Larrabee, G. J. (2012). Assessment of malingering. In G. J.
Larrabee (Ed.), Forensic neuropsychology: A scientific
approach (pp. 117–159) (2nd ed.). New York, NY: Oxford
University Press.
Larrabee, G. J. (2014). False-positive rates associated with the
use of multiple performance and symptom validity
tests. Archives of Clinical Neuropsychology, 29, 364–373.
doi:10.1093/arclin/acu019
Lezak, M. D. (1995). Neuropsychological assessment. New
York, NY: Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017).
Introducing a forced-choice recognition task to the
California Verbal Learning Test–Children’s Version. Child
Neuropsychology, 23(3): 284–299. doi:10.1080/09297049.
2015.1135422
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz,
A., & Flaro, L. (2016). Wisconsin card sorting test embed-
ded validity indicators developed for adults can be extended
to children. Child Neuropsychology, 1–14. doi:10.1080/
09297049.2016.1259402 [Advance online publication]
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003).
Effectiveness of the Rey-Osterrieth complex figure test
and the Meyers and Meyers recognition trial in the detec-
tion of suspect effort. The Clinical Neuropsychologist,
17(3), 426–440.
Miller, J. B., Millis, S. R., Rapport, L. J., Bashem, J. R., Hanks,
R. A., & Axelrod, B. N. (2011). Detection of insufficient
effort using the advanced clinical solutions for the
Wechsler Memory scale. The Clinical Neuropsychologist,
25(1), 160–172. doi:10.1080/13854046.2010.533197
Morris, J. C., Heyman, A., Mohs, R. C., Hughes, J. P., van
Belle, G., Fillenbaum, G., … Clark, C. (1989). The
consortium to establish a registry for Alzheimer’s disease
(CERAD). Part I. Clinical and neuropsychological
assessment of Alzheimer’s disease. Neurology, 39(9),
1159–1165.
Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., &
Grills, C. (2003). The relationship between eight measures
of suspect effort. The Clinical Neuropsychologist, 17(2),
263–272. doi:10.1076/clin.17.2.263.16511
O’Bryant, S. E., Engel, L. R., Kleiner, J. S., Vasterling, J. J., &
Black, F. W. (2007). Test of memory malingering (TOMM)
trial 1 as a screening measure for insufficient effort. The
Clinical Neuropsychologist, 21, 511–521.
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J.
(2010). Detection of malingering in mild traumatic brain
injury with the Conners’ Continuous performance test-II.
Journal of Clinical and Experimental Neuropsychology,
32(4), 380–387. doi:10.1080/13803390903066881
Pearson. (2009). Advanced clinical solutions for the WAIS-IV
and WMS-IV Technical manual. San Antonio, TX:
Author.
Randolph, C. (1998). Repeatable Battery for the Assessment of
Neuropsychological Status (RBANS): Manual. San Antonio,
TX: Psychological Corporation.
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F.,
Lu, P. H., Victor, T. L., … Wright, M. J. (2013). Cross
validation of the Lu and colleagues (2003) Rey-Osterrieth
complex figure test effort equation in a large known-group
sample. Archives of Clinical Neuropsychology, 28, 30–37.
doi:10.1093/arclin/acs106
Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999).
Diagnostic criteria for malingered neurocognitive
dysfunction: Proposed standards for clinical practice and
research. The Clinical Neuropsychologist, 13(4), 545–561.
doi:10.1076/1385-4046(199911)13:04;1-y;ft545
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine,
B., Pangilinan, P. H., & Bieliauskas, L. A. (2013).
WAIS-IV reliable digit span is no more accurate than age
corrected scaled score as an indicator of invalid perfor-
mance in a veteran sample undergoing evaluation for
mTBI. The Clinical Neuropsychologist, 27(8), 1362–1372.
doi:10.1080/13854046.2013.845248
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded
measures of performance validity using verbal fluency tests
in a clinical sample. Applied Neuropsychology: Adult, 22(2),
141–146.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card
sorting test in the detection of malingering in student
simulator and patient samples. Journal of Clinical and
Experimental Psychology, 21(5), 701–708.
Tombaugh, T. N. (1996). Test of Memory Malingering.
New York, NY: Multi-Health Systems.
Trueblood, W. (1994). Qualitative and quantitative
characteristics of malingered and other invalid WAIS-R
and clinical memory data. Journal of Clinical and Experi-
mental Neuropsychology, 14(4), 697–607. doi:10.1080/
01688639408402671
Warrington, E. K. (1984). Recognition Memory Test manual.
Berkshire, UK: NFERNelson.
Wisdom, N. M., Brown, W. L., Chen, D. K., & Collins, R. L.
(2012). The use of all three tests of memory malingering
trials in establishing the level of effort. Archives of
Clinical Neuropsychology, 27, 208–212. doi:10.1093/arclin/
acr107
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee,
G. J., & Sweet, J. J. (2010). Effort indicators within the
California verbal learning test-II (CVLT-II). The Clinical
Neuropsychologist, 24(1), 153–168. doi:10.1080/
13854040903107791
APPLIED NEUROPSYCHOLOGY: ADULT 13
... A potential solution for improving an instrument's precision without increasing the related costs (administration and scoring time, complex interpretive algorithms, logistic regression equations, etc.) is critical item analysis. The assumption underlying critical item analysis is that predictive power is not equally distributed along a scale: certain items contribute more information about the target construct than others (Dunn et al., 2021;Erdodi, Tyson, et al., 2018). Identifying a small subset of items with comparable, better or differential predictive power can abbreviate the testing process, improve classification accuracy or, ideally, both (Denning, 2012). ...
... This engineered method variance in criterion grouping was developed to protect against instrumentation artifacts (Campbell & Fiske, 1959), and provide an empirical estimate of the generalizability of findings. Based on previous research (Dunn et al., 2021;Erdodi, Tyson, et al., 2018), we hypothesized that critical items would improve both the sensitivity (by identifying invalid response sets missed by traditional cutoffs) and specificity (by identifying potentially valid response sets that failed the traditional cutoffs) of the WCT. ...
... The measures in this study are available to qualified users through the test publishers. Critical item analysis on the WCT was performed based on the string of items (CR-7, CR-5, and CR-3) identified by the original study (Erdodi, Tyson, et al., 2018). The actual item numbers are not disclosed here to protect test security. ...
Article
Full-text available
Objective: This study was designed to replicate previous research on critical item analysis within the Word Choice Test (WCT). Method: Archival data were collected from a mixed clinical sample of 119 consecutively referred adults (Mage = 51.7, Meducation = 14.7). The classification accuracy of the WCT was calculated against psychometrically defined criterion groups. Results: Critical item analysis identified an additional 2%-5% of the sample that passed traditional cutoffs as noncredible. Passing critical items after failing traditional cutoffs was associated with weaker independent evidence of invalid performance, alerting the assessor to the elevated risk for false positives. Failing critical items in addition to failing select traditional cutoffs increased overall specificity. Non-White patients were 2.5 to 3.5 times more likely to Fail traditional WCT cutoffs, but select critical item cutoffs limited the risk to 1.5-2. Conclusions: Results confirmed the clinical utility of critical item analysis. Although the improvement in sensitivity was modest, critical items were effective at containing false positive errors in general, and especially in racially diverse patients. Critical item analysis appears to be a cost-effective and equitable method to improve an instrument's classification accuracy. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... Results provided tentative support for the PIH, by confirming an association between self-reported emotional distress and underperformance on cognitive tests. Specifically, patients with elevated somatic or depressive symptoms on the MMPI-2 were 2-2.5 times more likely to fail PVTs, consistent with previous reports (Erdodi & Roth, 2017;Jak et al., 2019;Miskey et al., 2020;Qureshi et al., 2011;Rock et al., 2014;Rowland et al., 2017). At the same time, given the overlap between symptom and performance validity observed within this study and in previous research (Gaasedelen et al., 2019;Gervais et al., 2007;Haggerty et al., 2007;Larrabee et al., 2017;Mathias et al., 2002;Merten et al., 2016;Whiteside et al., 2009), the credibility of clinical elevations on the MMPI-2 scales cannot be taken at face value. ...
Article
Full-text available
This study was designed to examine the relative contribution of symptom (SVT) and performance validity tests (PVTs) to the evaluation of the credibility of neuropsychological profiles in mild traumatic brain injury (mTBI). An archival sample of 326 patients with mTBI was divided into four psychometrically defined criterion groups: pass both SVT and PVT; pass one, but fail the other; and fail both. Scores on performance-based tests of neurocognitive ability and self-reported symptom inventories were compared across the groups. As expected, PVT failure was associated with lower scores on ability tests (ηp2 .042–.184; d 0.56–1.00; medium-large effects), and SVT failure was associated with higher levels of symptom report (ηp2 .039–.312; d 0.32–1.58; small-very large effects). However, SVT failure also had a marginal deleterious effect on performance based measures (ηp2 .017–.023; d 0.23–0.46; small-medium effects) and elevations on self-report inventories were observed in the context of PVT failure (ηp2 .026; d 0.23–0.57; small-medium effects). SVT failure was associated with not only inflated symptom reports but also distorted configural patterns of psychopathology. Patients with clinically elevated somatic and depressive symptoms were twice as likely to fail PVTs. Consistent with previous research, SVTs and PVTs provide overlapping, but non-redundant information about the credibility of neuropsychological profiles associated with mTBI. Therefore, they should be used in combination to afford a comprehensive evaluation of cognitive and emotional functioning. The heuristic value of validity tests has both clinical and forensic relevance.
Article
Full-text available
This study was designed to determine the clinical utility of embedded performance validity indicators (EVIs) in adults with intellectual disability (ID) during neuropsychological assessment. Based on previous research, unacceptably high (>16%) base rates of failure (BRFail) were predicted on EVIs using on the method of threshold, but not on EVIs based on alternative detection methods. A comprehensive battery of neuropsychological tests was administered to 23 adults with ID (MAge = 37.7 years, MFSIQ = 64.9). BRFail were computed at two levels of cut-offs for 32 EVIs. Patients produced very high BRFail on 22 EVIs (18.2%-100%), indicating unacceptable levels of false positive errors. However, on the remaining ten EVIs BRFail was <16%. Moreover, six of the EVIs had a zero BRFail, indicating perfect specificity. Consistent with previous research, individuals with ID failed the majority of EVIs at high BRFail. However, they produced BRFail similar to cognitively higher functioning patients on select EVIs based on recognition memory and unusual patterns of performance, suggesting that the high BRFail reported in the literature may reflect instrumentation artefacts. The implications of these findings for clinical and forensic assessment are discussed.
Article
Full-text available
Base rates of failure (BRFail) on performance validity tests (PVTs) were examined in university students with limited English proficiency (LEP). BRFail was calculated for several free-standing and embedded PVTs. All free-standing PVTs and certain embedded indicators were robust to LEP. However, LEP was associated with unacceptably high BRFail (20–50%) on several embedded PVTs with high levels of verbal mediation (even multivariate models of PVT could not contain BRFail). In conclusion, failing free-standing/dedicated PVTs cannot be attributed to LEP. However, the elevated BRFail on several embedded PVTs in university students suggest an unacceptably high overall risk of false positives associated with LEP.
Article
Full-text available
Objective The study was designed to expand on the results of previous investigations on the D-KEFS Stroop as a performance validity test (PVT), which produced diverging conclusions. Method The classification accuracy of previously proposed validity cutoffs on the D-KEFS Stroop was computed against four different criterion PVTs in two independent samples: patients with uncomplicated mild TBI (n = 68) and disability benefit applicants (n = 49). Results Age-corrected scaled scores (ACSSs) ≤6 on individual subtests often fell short of specificity standards. Making the cutoffs more conservative improved specificity, but at a significant cost to sensitivity. In contrast, multivariate models (≥3 failures at ACSS ≤6 or ≥2 failures at ACSS ≤5 on the four subtests) produced good combinations of sensitivity (.39-.79) and specificity (.85-1.00), correctly classifying 74.6-90.6% of the sample. A novel validity scale, the D-KEFS Stroop Index correctly classified between 78.7% and 93.3% of the sample. Conclusions A multivariate approach to performance validity assessment provides a methodological safeguard against sample- and instrument-specific fluctuations in classification accuracy, strikes a reasonable balance between sensitivity and specificity, and mitigates the invalid before impaired paradox.
Article
Full-text available
Objective : Replicate previous research on Logical Memory Recognition (LMRecog) and perform a critical item analysis. Method : Performance validity was psychometrically operationalized in a mixed clinical sample of 213 adults. Classification of the LMRecog and nine critical items (CR-9) was computed. Results : LMRecog ≤20 produced a good combination of sensitivity (.30-.35) and specificity (.89-.90). CR-9 ≥5 and ≥6 had comparable classification accuracy. CR-9 ≥5 increased sensitivity by 4% over LMRecog ≤20; CR-9 ≥6 increased specificity by 6–8% over LMRecog ≤20; CR-9 ≥7 increased specificity by 8–15%. Conclusions : Critical item analysis enhances the classification accuracy of the optimal LMRecog cutoff (≤20).
Article
Full-text available
Objective This project was designed to cross-validate existing performance validity cutoffs embedded within measures of verbal fluency (FAS and animals) and develop new ones for the Emotion Word Fluency Test (EWFT), a novel measure of category fluency. Method The classification accuracy of the verbal fluency tests was examined in two samples (70 cognitively healthy university students and 52 clinical patients) against psychometrically defined criterion measures. Results A demographically adjusted T-score of ≤31 on the FAS was specific (.88–.97) to noncredible responding in both samples. Animals T ≤ 29 achieved high specificity (.90–.93) among students at .27–.38 sensitivity. A more conservative cutoff (T ≤ 27) was needed in the patient sample for a similar combination of sensitivity (.24–.45) and specificity (.87–.93). An EWFT raw score ≤5 was highly specific (.94–.97) but insensitive (.10–.18) to invalid performance. Failing multiple cutoffs improved specificity (.90–1.00) at variable sensitivity (.19–.45). Conclusions Results help resolve the inconsistency in previous reports, and confirm the overall utility of existing verbal fluency tests as embedded validity indicators. Multivariate models of performance validity assessment are superior to single indicators. The clinical utility and limitations of the EWFT as a novel measure are discussed.
Article
Full-text available
In this study we attempted to replicate the classification accuracy of the newly introduced Forced Choice Recognition trial (FCR) of the Rey Complex Figure Test (RCFT) in a clinical sample. We administered the RCFT FCR and the earlier Yes/No Recognition trial from the RCFT to 52 clinically referred patients as part of a comprehensive neuropsychological test battery and incentivized a separate control group of 83 university students to perform well on these measures. We then computed the classification accuracies of both measures against criterion performance validity tests (PVTs) and compared results between the two samples. At previously published validity cutoffs (≤16 & ≤17), the RCFT FCR remained specific (.84-1.00) to psychometrically defined non-credible responding. Simultaneously, the RCFT FCR was more sensitive to examinees' natural variability in visual-perceptual and verbal memory skills than the Yes/No Recognition trial. Even after being reduced to a seven-point scale (18-24) by the validity cutoffs, both RCFT recognition scores continued to provide clinically useful information on visual memory. This is the first study to validate the RCFT FCR as a PVT in a clinical sample. Our data also support its use for measuring cognitive ability. Replication studies with more diverse samples and different criterion measures are still needed before large-scale clinical application of this scale.
Article
Full-text available
This study was designed to investigate the potential of extreme scores on the Behavioral Rating Inventory of Executive Function-Adult Self-Report Version (BRIEF-A-SR) to serve as validity indicators. The BRIEF-A-SR was administered to 73 university students and 50 clinically referred adults. In the student sample, symptom validity was operationalized as the outcome on the Inventory of Problems (IOP-29). In the patient sample, performance validity was operationalized as the outcome on a combination of free-standing and embedded indicators. The BRIEF-A-SR had better classification accuracy in the student sample (.13–.56 sensitivity at .88–.95 specificity) compared with the patient sample (.22–.44 sensitivity at .85–.97 specificity). Combining individual cutoffs into a multivariate model improved specificity (.93) and stabilized sensitivity (.33) in the clinical sample. Failing the newly introduced cutoffs (T ≥ 65/T ≥ 80 in the student sample and T ≥ 80/T ≥ 90 in the clinical sample) was associated with failure on performance validity tests and elevations on other symptom inventories. Results provide preliminary support for an alternative method for establishing the credibility of symptom reports both within the BRIEF-A-SR and other inventories. Pending replication by future research, the newly proposed cutoffs could provide a much needed psychometric safeguard against over-diagnosing neuropsychiatric disorders due to undetected symptom exaggeration.
Article
Full-text available
Objective: This study was designed to evaluate the classification accuracy of the recently introduced forced-choice recognition trial to the Hopkins Verbal Learning Test - Revised (FCR HVLT-R ) as a performance validity test (PVT) in a clinical sample. Time-to-completion (T2C) for FCR HVLT-R was also examined. Method: Forty-three students were assigned to either the control or the experimental malingering (expMAL) condition. Archival data were collected from 52 adults clinically referred for neuropsychological assessment. Invalid performance was defined using expMAL status, two free-standing PVTs and two validity composites. Results: Among students, FCR HVLT-R ≤11 or T2C ≥45 seconds was specific (0.86-0.93) to invalid performance. Among patients, an FCR HVLT-R ≤11 was specific (0.94-1.00), but relatively insensitive (0.38-0.60) to non-credible responding0. T2C ≥35 s produced notably higher sensitivity (0.71-0.89), but variable specificity (0.83-0.96). The T2C achieved superior overall correct classification (81-86%) compared to the accuracy score (68-77%). The FCR HVLT-R provided incremental utility in performance validity assessment compared to previously introduced validity cutoffs on Recognition Discrimination. Conclusions: Combined with T2C, the FCR HVLT-R has the potential to function as a quick, inexpensive and effective embedded PVT. The time-cutoff effectively attenuated the low ceiling of the accuracy scores, increasing sensitivity by 19%. Replication in larger and more geographically and demographically diverse samples is needed before the FCR HVLT-R can be endorsed for routine clinical application.
Article
Full-text available
Past studies have examined the ability of the Wisconsin Card Sorting Test (WCST) to discriminate valid from invalid performance in adults using both individual embedded validity indicators (EVIs) and multivariate approaches. This study is designed to investigate whether the two most stable of these indicators—failures to maintain set (FMS) and the logistical regression equation S-BLRE—can be extended to pediatric populations. The classification accuracy for FMS and S-BLRE was examined in a mixed clinical sample of 226 children aged 7 to 17 years (64.6% male, MAge = 13.6 years) against a combination of established performance validity tests (PVTs). The results show that at adult cutoffs, FMS and S-BLRE produce an unacceptably high failure rate (33.2% and 45.6%) and low specificity (.55–.72), but an upward adjustment in cutoffs significantly improves classification accuracy. Defining Pass as <2 and Fail as ≥4 on FMS results in consistently good specificity (.89–.92) but low and variable sensitivity (.00–.33). Similarly, cutting the S-BLRE distribution at 3.68 produces good specificity (.90–.92) but variable sensitivity (.06–.38). Passing or failing FMS or S-BLRE is unrelated to age, gender and IQ. The data from this study suggest that in a pediatric sample, adjusted cutoffs on the FMS and S-BLRE ensure good specificity, but with low or variable sensitivity. Thus, they should not be used in isolation to determine the credibility of a response set. At the same time, they can make valuable contributions to pediatric neuropsychology by providing empirically-supported, expedient and cost-effective indicators to enhance performance validity assessment.
Article
Full-text available
Elevations on certain Conners? CPT-II scales are known to be associated with invalid responding. However, scales and cutoffs vary across studies. In addition, the methodology behind developing performance validity tests (PVTs) has been challenged for mistaking true impairment for noncredible presentation. Using ability-based tests as a PVT makes clinicians especially vulnerable to this criticism. The present study examined the ability of CPT-II to dissociate effort from impairment in 47 adults clinically referred for neuropsychological assessment. CPT-II scales previously identified as PVTs (Omissions, Commissions, Hit Reaction Time SE, Variability, and Perseverations) produced classification accuracies hovering around .50 sensitivity at .90 specificity. The subsample that failed these PVTs performed within normal range on other tests of working memory, processing speed, visual attention, and executive function. Results suggest that the select CPT-II based PVTs are sensitive to invalid responding, and are associated with depression and anxiety, but are unrelated to cognitive functioning.
Article
Full-text available
Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
Article
Full-text available
Objective: The present study was designed to examine the potential of the Boston Naming Test - Short Form (BNT-15) to provide an objective estimate of English proficiency. A secondary goal was to examine the effect of limited English proficiency (LEP) on neuropsychological test performance. Method: A brief battery of neuropsychological tests was administered to 79 bilingual participants (40.5% male, MAge = 26.9, MEducation = 14.2). The majority (n = 56) were English dominant (EN), and the rest were Arabic dominant (AR). The BNT-15 was further reduced to 10 items that best discriminated between EN and AR (BNT-10). Participants were divided into low, intermediate, and high English proficiency subsamples based on BNT-10 scores (≤6, 7-8, and ≥9). Performance across groups was compared on neuropsychological tests with high and low verbal mediation. Results: The BNT-15 and BNT-10 respectively correctly identified 89 and 90% of EN and AR participants. Level of English proficiency had a large effect (partial η(2) = .12-.34; Cohen's d = .67-1.59) on tests with high verbal mediation (animal fluency, sentence comprehension, word reading), but no effect on tests with low verbal mediation (auditory consonant trigrams, clock drawing, digit-symbol substitution). Conclusions: The BNT-15 and BNT-10 can function as indices of English proficiency and predict the deleterious effect of LEP on neuropsychological tests with high verbal mediation. Interpreting low scores on such measures as evidence of impairment in examinees with LEP would likely overestimate deficits.
Article
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
Article
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Article
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
Article
Objective: To determine the effectiveness of the Test of Memory Malingering Trial 1 (TOMM1) as a freestanding Performance Validity Test (PVT) as compared to the full TOMM in a criminal forensic sample. Method: Participants included 119 evaluees in a Midwestern forensic hospital. Criterion groups were formed based on passing/failing scores on other freestanding PVTs. This resulted in three groups: +MND (Malingered Neurocognitive Dysfunction), who failed two or more freestanding PVTs; possible MND (pMND), who failed one freestanding PVT; and -MND, who failed no other freestanding PVTs. All three groups were compared initially, but only +MND and -MND groups were retained for final analyses. TOMM1 performance was compared to standard TOMM performance using Receiver Operating Characteristic (ROC) analyses. Results: TOMM1 was highly predictive of the standard TOMM decision rules (AUC = .92). Overall accuracy rate for TOMM1 predicting failure on 2 PVTs was quite robust as well (AUC = .80), and TOMM1 ≤ 39 provided acceptable diagnostic statistics (Sensitivity = .68, Specificity = .89). These results were essentially no different from the standard TOMM accuracy statistics. In addition, by adjusting for those strongly suspected of being inaccurately placed into the -MND group (e.g. false negatives), TOMM1 diagnostics slightly improved (AUC = .84) at a TOMM1 ≤ 40 (sensitivity = .71, specificity = .94). Conclusions: Results support use of TOMM1 in a criminal forensic setting where accuracy, shorter evaluation times, and more efficient use of resources are often critical in informing legal decision-making.