ArticlePDF Available

Utility of critical items within the Recognition Memory Test and Word Choice Test Utility of critical items within the Recognition Memory Test and Word Choice Test

Authors:

Abstract and Figures

This study was designed to examine the clinical utility of critical items within the Recognition Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean age ¼ 45.3 years; mean level of education ¼ 13.9 years). The credibility of a given response set was psychometrically defined using three separate composite measures, each of which was based on multiple independent performance validity indicators. Critical items improved the classification accuracy of both tests. They increased sensitivity by correctly identifying an additional 2–17% of the invalid response sets that passed the traditional cutoffs based on total score. They also increased specificity by providing additional evidence of noncredible performance in response sets that failed the total score cutoff. The combination of failing the traditional cutoff, but passing critical items was associated with increased risk of misclassifying the response set as invalid. Critical item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items require no additional test material or administration time, but help reduce both false positive and false negative errors, they represent a versatile, valuable, and time-and cost-effective supplement to performance validity assessment.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=hapn21
Download by: [University of Windsor], [Dr Laszlo A. Erdodi] Date: 17 March 2017, At: 11:50
Applied Neuropsychology: Adult
ISSN: 2327-9095 (Print) 2327-9109 (Online) Journal homepage: http://www.tandfonline.com/loi/hapn21
Utility of critical items within the Recognition
Memory Test and Word Choice Test
Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.
Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth
To cite this article: Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.
Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth (2017): Utility of critical
items within the Recognition Memory Test and Word Choice Test, Applied Neuropsychology: Adult
To link to this article: http://dx.doi.org/10.1080/23279095.2017.1298600
Published online: 17 Mar 2017.
Submit your article to this journal
View related articles
View Crossmark data
APPLIED NEUROPSYCHOLOGY: ADULT
http://dx.doi.org/10.1080/23279095.2017.1298600
Utility of critical items within the Recognition Memory Test and
Word Choice Test
Laszlo A. Erdodia,b, Bradley T. Tysonc,b, Christopher A. Abearea, Brandon G. Zuccatoa, Jaspreet K. Raia,
Kristian R. Sekea, Sanya Sagara and Robert M. Rothb
aDepartment of Psychology, University of Windsor, Windsor, Ontario, Canada; bDepartment of Psychiatry, Geisel School of Medicine at
Dartmouth, Lebanon, New Hampshire, USA; cWestern Washington Medical Group, Everett, Washington, USA
ABSTRACT
This study was designed to examine the clinical utility of critical items within the Recognition
Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed
clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean
age ¼45.3 years; mean level of education ¼13.9 years). The credibility of a given response set was
psychometrically defined using three separate composite measures, each of which was based on
multiple independent performance validity indicators. Critical items improved the classification
accuracy of both tests. They increased sensitivity by correctly identifying an additional 2–17% of
the invalid response sets that passed the traditional cutoffs based on total score. They also
increased specificity by providing additional evidence of noncredible performance in response sets
that failed the total score cutoff. The combination of failing the traditional cutoff, but passing
critical items was associated with increased risk of misclassifying the response set as invalid. Critical
item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items
require no additional test material or administration time, but help reduce both false positive and
false negative errors, they represent a versatile, valuable, and time- and cost-effective supplement
to performance validity assessment.
KEYWORDS
Critical item analysis;
performance validity testing;
Recognition Memory Test;
Word Choice Test
Introduction
The current climate of health care is characterized by
increasing emphasis on time- and cost-effective service
delivery. As a result, neuropsychologists are under
growing pressure to administer shorter test batteries.
In order to maximize the quantity and quality of
information gleaned from these brief assessments, the
strategic selection of assessment tools has never been
more important. This shift toward a more resource-
conscious model of assessment is reflected in the
development of abbreviated batteries (e.g., Repeatable
Battery for the Assessment of Neuropsychological
Status; Randolph, 1998) and shorter versions of existing
tests (e.g., Boston Naming Test-15: Morris et al., 1989;
California Verbal Learning Test Second Edition
[CVLT-II] Short Form: Delis, Kramer, Kaplan, & Ober,
2000).
In addition to conducting an adequate assessment of
cognitive functioning, however, neuropsychologists
must also assess performance validity. Indeed, the
clinical utility of neuropsychological testing depends
on the examinee’s ability and willingness to demon-
strate their true ability level (Bigler, 2015), and there
is an emerging consensus in the field that an objective
evaluation of performance validity must be an integral
part of the assessment process (Bush, Heilbronner, &
Ruff, 2014; Chafetz et al., 2015). The administration of
multiple, non-redundant performance validity tests
(PVTs) distributed throughout the assessment has been
identified as the best approach for differentiating
credible from non-credible response sets (Boone, 2009;
Larrabee, 2012).
Given that PVTs provide little information about
cognitive functioning, it is becoming increasingly
important for neuropsychologists to glean information
about performance validity without additional test
material or increased administration and scoring time.
Over the past decades, researchers have explored cre-
ative ways of improving the signal detection properties
of existing PVTs, including the development of new
indicators within existing neuropsychological tests
(Arnold et al., 2005; Erdodi, Tyson, Abeare, et al.,
2016; Greiffenstein, Baker, & Gola, 1994).
Boone, Salazar, Lu, Warner-Chacon, and Razani
(2002) developed a recognition trial to the Rey 15-item
test that adds only about 30 seconds in administration
CONTACT Laszlo A. Erdodi lerdodi@gmail.com 168 Chrysler Hall South, 401 Sunset Ave, Windsor ON N9B 3P4.
© 2017 Taylor & Francis Group, LLC
time, but significantly improves the instrument’s
sensitivity while maintaining high specificity. Similarly,
the first trial of the Test of Memory Malingering
(TOMM; Tombaugh, 1996), although initially conceived
as an “inactive” learning trial, has been shown to
effectively discriminate valid from invalid cognitive test
performance. Specifically, Trial 1 has been found to
have adequate sensitivity and specificity against the
standard, full administration of the TOMM (Bauer,
O’Bryant, Lynch, McCaffrey, & Fisher, 2007; Fazio,
Denning, & Denney, 2017); Hilsabeck, Gordon,
Hietpas-Wilson, & Zartman, 2011; Horner, Bedwell, &
Duong, 2006; Wisdom, Brown, Chen, & Collins,
2012), and other stand-alone PVTs used in isolation
(Denning, 2012) and in combination (Jones, 2013;
Kulas, Axelrod, & Rinaldi, 2014). Based on this
evidence, some researchers suggested that Trial 1 of
the TOMM can function as a stand-alone PVT (Bauer
et al., 2007; Hilsabeck et al., 2011; Horner et al., 2006;
O’Bryant, Engel, Kleiner, Vasterling, & Black, 2007).
Another example of this “after-market enhancement”
of an existing PVT was the introduction of a time-cutoff
to the Recognition Memory Test (RMT; Warrington,
1984), which effectively differentiated valid and invalid
responders independent of the traditional accuracy
score. It also boosted the RMT’s overall sensitivity when
combined with the accuracy score while maintaining
high specificity. (M. S. Kim, Boone, Victor, Marion,
et al., 2010). Similarly, it was recently shown that adding
a time-cutoff to the Word Choice Test (WCT; Pearson,
2009) not only enhanced the sensitivity of the accuracy
score, but also functioned as an independent validity
indicator (Erdodi, Tyson, Shahein, et al., 2017).
The present study was designed to explore the
clinical utility of critical items within the RMT and
WCT. Previous research suggests that while the RMT
is more difficult than the WCT at the raw score level,
once the cutoffs are adjusted to account for the
difference, the two instruments have comparable
classification accuracy (Davis, 2014; Erdodi, Kirsch,
Lajiness-O’Neill, Vingilis, & Medoff, 2014). However,
despite the imperfect classification accuracy of
traditional cutoffs based on RMT and WCT total scores,
the discriminant power of item-level data has not been
investigated within these tests.
Cutoffs established by earlier studies (RMT 39:
Iverson & Franzen, 1994; RMT 42: M. S. Kim, Boone,
Victor, Marion, et al., 2010; Erdodi, Kirsch, et al., 2014;
WCT 42: Barhon, Batchelor, Meares, Chekaluk, &
Shores, 2015; WCT 46: Davis, 2014) and the
technical manual (WCT 32–47; Pearson, 2009) imply
that one can provide a correct answer on the majority of
the items and still fail these PVTs. Moreover, the upper
limit of theoretical chance level responding is 32. In
other words, random responding and a 64% overall
accuracy can coexist, suggesting that a large proportion
of test items have poor negative predictive power.
Since test items tend to vary in difficulty level and
hence, in their relative contribution to the diagnostic
accuracy of the overall scale, a critical item analysis has
the potential to increase the clinical utility of the instru-
ment by identifying items that best discriminate between
credible and noncredible response sets. It has long been
recognized in psychometric theory that shorter tests can
be more reliable than longer tests if they are based on
carefully calibrated items (Embretson, 1996). Although
averaging performance across a large number of item
responses with heterogeneous item characteristic curves
is common practice in test development, it can weaken
the measurement model. Conversely, reducing the num-
ber of test items to a select few that have the strongest
relationship with the target construct can preserve
(Bilker, Wierzbicki, Brensinger, Gur, & Gur, 2014) or
even improve (Erdodi, Jongsma, & Issa, 2017) overall
diagnostic power. Therefore, we hypothesized that
critical items would enhance the overall classification
accuracy of the RMT and WCT by increasing either
the sensitivity or the specificity of the total score to
invalid responding.
Cutoff scores based on such critical items can offer
additional information about performance validity that
is non-redundant with results obtained from cutoffs
based on the total score. This “second opinion,” in turn,
can be used to confirm or challenge the outcome based
on traditional cutoffs. The availability of multiple indica-
tors of performance validity within a single PVT is
especially useful in the interpretation of scores that
fall in the indeterminate range (“near passes”; Bigler,
2012, 2015), where the classification of an examinee’s
performance as either Pass or Fail is particularly difficult.
Method
Participants
The sample consisted of 202 patients (54.5% female,
87.1% right-handed) clinically referred for neuropsy-
chological testing at a northeastern academic medical
center. Mean age was 45.3 years (SD ¼16.8), while
mean level of education was 13.9 years (SD ¼2.7).
The most common diagnostic categories were psychi-
atric (44.1%), traumatic brain injury (37.1%), mixed
neurological (15.8%), or general medical (3%) con-
ditions. Overall, patients reported a mild level of
depression (M
BDI-II
¼16.9, SD
BDI-II
¼12.0) and anxiety
(M
BAI
¼13.1, SD
BAI
¼10.0).
2 L. A. ERDODI ET AL.
Materials
A core battery of neuropsychological tests was adminis-
tered to the majority of the sample (Table 1). However,
the exact test list varied based on the unique assessment
needs of individual patients. The main criterion PVT
was a composite of eleven independent validity indica-
tors labeled “Validity Index Eleven” (VI-11). The
VI-11 reflects the traditional approach of counting the
number of PVT failures along dichotomized (Pass/Fail)
cutoffs (Boone et al., 2002; M. S. Kim, Boone, Victor,
Marion et al., 2010; Nelson et al., 2003), a well-
established practice that represents the conceptual
foundation of performance validity assessment (Boone,
2013; Larrabee, 2012).
Some components of the VI-11 had multiple
different indicators (Table 2). Failing any of these was
counted as an overall Fail (¼1). Failing multiple
indicators within the same component did not change
the outcome (¼1). Missing scores were counted as Pass
(¼0). The heterogeneity in stimulus properties, testing
paradigm, sensory modality, and number of indicators
contributing to the final outcome (i.e., valid vs. invalid)
in each of the constituent PVTs likely results in a non-
linear combination of the cumulative evidence on the
credibility of the overall neurocognitive profile. How-
ever, such method variance is a ubiquitous feature in
performance validity research, and is generally con-
sidered more of a strength than a weakness (Boone,
2007; Iverson & Binder, 2000; Larrabee, 2003, 2014;
Lichtenstein, Erdodi, & Linnea, 2017).
The total value of the VI-11 was computed by
summing its components. A VI-11 1 was considered
Table 1. List of tests administered.
Test name Abbreviation Norms %
ADM
Beck Anxiety Inventory BAI 60.9
Beck Depression Inventory, 2nd edition BDI-II 89.1
California Verbal Leaning Test, 2nd edition CVLT-II Manual 99.0
Complex Ideational Material CIM Heaton 44.6
Conners’ Continuous Performance Test, 2nd edition CPT-II Manual 62.3
Letter and Category Fluency Test FAS & animals Heaton 91.1
Finger Tapping Test FTT Heaton 52.5
Recognition Memory Test RMT 100.0
Rey Complex Figure Test RCFT Manual 90.1
Wechsler Adult Intelligence Scale, 4th edition WAIS-IV Manual 98.0
Wechsler Memory Scale, 4th edition WMS-IV Manual 97.0
Wide Range Achievement Test, 4th edition WRAT-4 Manual 70.8
Wisconsin Card Sorting Test WCST Manual 88.1
Word Choice Test WCT 100.0
Note. T: Heaton: Demographically adjusted norms published by Heaton, Miller, Taylor, and Grant (2004); Manual: Normative data published in the technical
manual; %
ADM
: Percentage of the sample to which each test was administered.
Table 2. Base rates of failure for VI-11 components, cutoffs, and references for each indicator.
Test BR
Fail
Indicator Cutoff Reference
Animals 18.3 T-score 33 Hayward, Hall, Hunt, and Zubrick (1987); Sugarman and Axelrod (2015)
CIM 7.9 Raw score 9 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
T-score 29 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
CVLT-II 16.8 Hits
Recognition
10 Greve, Curtis, Bianchini, and Ord (2009); Wolfe et al. (2010)
FCR 15 Bauer et al. (2007); D. Delis (personal communication, May 2012)
Digit Span 29.2 RDS 7 Greiffenstein et al. (1994); Pearson (2009)
ACSS 6 Axelrod, Fichteberg, Millis, and Wertheimer (2006); Spencer et al. (2013); Trueblood (1994)
LDF 4 Heinly, Greve, Bianchini, Love, and Brennan (2005)
FAS 12.4 T-score 33 Curtis, Thompson, Greve, and Bianchini (2008); Sugarman and Axelrod (2015)
Rey-15 12.4 Recall 9 Lezak (1995); Boone et al. (2002)
RCFT 34.2 Copy raw 26 Lu, Boone, Cozolino, and Mitchell (2003); Reedy et al. (2013)
3-min raw 9.5 Lu et al. (2003); Reedy et al. (2013)
TP
Recognition
6 Lu et al. (2003); Reedy et al. (2013)
Atyp RE 1 Blaskewitz, Merten, and Brockhaus (2009); Lu et al. (2003)
Symbol Search 20.8 ACSS 6 Etherton, Bianchini, Heinly, and Greve (2006); Erdodi, Abeare, et al. (2017)
WCST 17.3 FMS 2 Larrabee (2003); Suhr and Boyer (1999)
LRE >1.9 Greve, Bianchini, Mathias, Houston, and Crouch (2002); Suhr and Boyer (1999)
WMS-IV LM 19.8 I ACSS 3 Bortnik et al. (2010)
II ACSS 4 Bortnik et al. (2010)
Recognition 20 Bortnik et al. (2010); Pearson (2009)
WMS-IV VR 19.3 Recognition 4 Pearson (2009)
Note. BR
Fail
: Base rate of failure (% of the sample that failed one or more indicators within the test); CIM: Complex Ideational Material; CVLT-II: California Verbal
Learning Test, 2nd edition; FCR: Forced choice recognition; RDS: Reliable digit span; ACSS: Age-corrected scaled score; LDF: longest digit span forward; RCFT:
Rey Complex Figure Test; TP
Recognition
: Recognition true positives; Atyp RE: Atypical recognition errors; WCST: Wisconsin Card Sorting Test; FMS: Failure to
maintain set; UE: Unique errors; LRE: Logistical regression equation; WMS-IV: Wechsler Memory Scale, 4th edition; LM: Logical Memory; VR: Visual
Reproduction.
APPLIED NEUROPSYCHOLOGY: ADULT 3
a Pass. Given that the most liberal cutoff available was
applied to a relatively high number of constituent
PVTs, the model is optimized for sensitivity by design.
Therefore, to protect against false positive errors, a
higher threshold (≥3) was used to define Fail on the
VI-11. A score of two was considered inconclusive
and hence, excluded from further analyses involving
the VI-11 to preserve the diagnostic purity of the cri-
terion groups (Erdodi & Roth, 2016; Greve & Bianchini,
2004; Lichtenstein, Erdodi, Rai, Mazur-Mosiewicz, &
Flaro, 2016; Sugarman & Axelrod, 2015).
As an aggregate measure of several PVTs represent-
ing a wide range of sensory modalities and testing
paradigms, the VI-11 is a representative measure of per-
formance validity that incorporates information from
multiple independent instruments. At the same time,
it is a heterogeneous composite that may introduce a
source of error into the signal detection analyses. To
address that, two new composite measures were
developed, labeled “Erdodi Index.” The first one was
constructed by aggregating five forced-choice recog-
nition based PVTs (EI-5
REC
), and the second one by
aggregating five processing speed based PVTs
(EI-5
PSP
), following the methodology described by
Erdodi and colleagues (Erdodi, Pelletier, & Roth, 2016;
Erdodi, Roth, et al., 2014).
The two versions of the EI-5 were designed to mirror
the dual nature of the RMT and WCT: the overall rec-
ognition accuracy score (EI-5
REC
) and the time taken to
complete the recognition trial (EI-5
PSP
). As such, they
serve as modality-specific criterion measures, providing
a more nuanced analysis of the RMT’s and WCT’s
classification accuracy. Previous research found that
the inherent signal detection properties of the reference
PVT alter the classification accuracy of the instrument
under investigation (Erdodi, Tyson, Shahein, et al.,
2017), arguing for methodological pluralism in calibrat-
ing new tests (Erdodi, Abeare, et al., 2017).
In addition, the EI-5s have the advantage of
capturing the underlying continuity in performance
validity by differentiating between “near passes” (Bigler,
2012, 2015) and extreme forms of failure (Table 3).
Two-thirds of the sample obtained values 1 on both
versions of the EI-5, placing them in the passing range.
Around 20% of the sample obtained EI-5 values of two
or three, indicating either a single failure at the most
conservative cutoff or multiple failures at more liberal
cutoffs. Regardless of the specific combination, this
range of performance starts to raise doubts about the
credibility of the profile, without providing evidence
that is strong enough to render the entire data set
invalid. Therefore, this range was labeled as Borderline,
and excluded from calculating classification accuracy.
An EI-5 value ≥4, however, indicates either multiple
failures at the most liberal cutoffs, or at least two failures
at more conservative cutoffs. As such, this range of
performance provides sufficient evidence to confidently
classify the profile as invalid.
Procedure
All tests were administered and scored by trained staff
psychometricians, pre-doctoral interns or post-doctoral
fellows under the supervision of licensed psychologists
with specialty training in neuropsychology, following
standard instructions. The RMT and WCT were
administered in counterbalanced order, either at the
beginning or the end of the test battery. The study
Table 3. Components of the EI-5s with different levels of cutoff scores and corresponding base rates of failure.
EI-5
REC
EI-5
PSP
Components 0 1 2 3 Components 0 1 2 3
FCR
CVLT-II
16 15 14 13 Animals T >33 25–33 21–24 20
BR 85.6 4.6 3.1 6.7 BR 81.7 9.9 4.5 4.0
LM
WMS-IV
Recognition >20 18–20 17 16 CPT-II #Fail 0 1 2 3
BR 84.7 8.9 2.5 4.0 BR 71.8 12.4 4.5 11.4
RCFT REC-TP >6 6 4 3 FAS T >33 32–33 28–31 27
BR 86.7 6.1 4.4 2.8 BR 86.1 5.9 4.0 4.0
VPA
WMS-IV
Recognition >35 32–35 28–29 27 FTT # Fail 0 1 2
BR 85.1 8.4 4.5 2.0 BR 92.7 5.8 1.5
VR
WMS-IV
Recognition >4 4 3 2 WAIS-IV CD >5 5 4 3
BR 83.2 7.4 4.0 5.4 BR 85.8 4.1 6.1 4.1
Note. EI-5
REC
: Erdodi Index – Five-variable model based on measures of recognition memory; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of
processing speed; LM: Logical Memory (Bortnik et al., 2010; Pearson, 2009); VPA: Verbal Paired Associates (Pearson, 2009); VR Recog: Visual Reproduction
(Pearson, 2009); FCR
CVLT-II
: California Verbal Learning Test, 2nd Edition, Forced Choice Recognition (Bauer, Yantz, Ryan, Warned, & McCaffrey, 2005;
D. Delis, personal communication, May 2012; Erdodi, Kirsch, et al., 2014; Erdodi, Roth, et al., 2014); RCFT REC-TP: Rey Complex Figure Test recognition
true positives (Lu et al., 2003; Reedy et al., 2013); FTT Failures: Finger tapping test, number of scores at 35/28 dominant hand and 66/58 combined
mean raw scores (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014); FAS: Letter fluency T-score (Curtis et al., 2008; Sugarman & Axelrod, 2015);
Animals: Category fluency T-score (Sugarman & Axelrod, 2015); CPT-II Failures: Conners’ Continuous Performance Test, 2nd edition; number of T-scores
>70 on Omissions, Hit Reaction Time Standard Error, Variability, and Perseverations (Erdodi, Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve,
& Bianchini, 2010); WAIS-IV CD: Coding age-corrected scaled score (Etherton et al., 2006; N. Kim, Boone, Victor, Lu, et al., 2010; Trueblood, 1994); BR:
Base rate (%).
4 L. A. ERDODI ET AL.
was approved by the ethics board of the hospital where
the data were collected and the university where the
project was finalized. Relevant APA ethical guidelines
regulating research with human participants were
followed throughout the study.
Data analysis
Descriptive statistics (mean, standard deviation, base
rates of failure) were reported for relevant variables.
The main inferential analyses were one-way analyses
of variance (ANOVAs), independent t-tests and
Chi-square tests of independence. Effect size estimates
were computed using partial eta squared (g
2
) and
Cohen’s d. Sensitivity and specificity were calculated
using standard formulas (Grimes & Schultz, 2005).
Results
Validating the criterion measures
All ANOVAs using the trichotomized VI-11 (Pass-
Borderline-Fail) as the independent variable and the
RMT, WCT, EI-5
REC
and EI-5
PSP
as dependent variables
were statistically significant. Effect size estimates ranged
from large (g
2
¼.16) to very large (g
2
¼.36). Although
all post hoc contrasts between the Pass and Fail con-
ditions were significant, the Pass vs. Borderline contrast
failed to reach significance on the RMT and WCT accu-
racy scores. Likewise, the Borderline vs. Fail contrast
was not significant on the WCT completion time
(Table 4).
Similarly, all ANOVAs using the trichotomized
EI-5
REC
(Pass-Borderline-Fail) as the independent
variable and the RMT, WCT, VI-11 and EI-5
PSP
as
dependent variables were statistically significant. Effect
sizes ranged from .07 (medium) to .41 (very large).
All post hoc contrasts were significant, excepting the
RMT completion time and EI-5
PSP
. On these two out-
come measures, the Borderline vs. Fail contrast did
not reach significance (Table 5).
Finally, although all ANOVAs using the trichoto-
mized EI-5
PSP
(Pass-Borderline-Fail) as the independent
variable, and the RMT, WCT, VI-11 and EI-5
REC
as
dependent variables were statistically significant, effect
sizes were noticeably smaller (g
2
¼.06–.26). As before,
post hoc contrasts between Borderline vs. Fail conditions
were non-significant, with the exception of the VI-11 as
the outcome measure (Table 6). These analyses provide
empirical support for the three validity composites, as
well as the exclusion of the Borderline scores. Patients
scoring in this indeterminate range had significantly
more evidence of invalid responding than those in the
Pass condition. At the same time, they did not demon-
strate PVT failures severe enough to be classified as
invalid beyond a reasonable doubt. The spiking
within-group variability in the Borderline group further
substantiates concerns that assigning these patients to
either the Pass or the Fail group would inadvertently
misclassify a large proportion of this subsample.
Classification accuracy of traditional cutoffs
At the 42 cutoff proposed by M. S. Kim, Boone,
Victor, Marion, et al., (2010), the RMT had .46
sensitivity at .88 specificity against the VI-11. This is
comparable to the .47 sensitivity and .85 specificity
against the EI-5
PSP
. Classification accuracy improved
notably against the EI-5
REC
(.88 sensitivity at .91
specificity).
Table 4. Results of one-way ANOVAs on RMT, WCT, EI-5
REC
, and EI-5
PSP
scores across VI-11classification ranges.
VI-11
0–1
n ¼101
2
n ¼32
≥3
n ¼69
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 46.9 44.1 39.9 21.0 <.001 .17 PASS vs. FAIL; BOR vs. FAIL
SD 3.8 9.0 9.1
RMT
Time
M 132.9 154.9 194.2 17.3 <.001 .15 PASS vs. FAIL; PASS vs. BOR
SD 62.8 56.1 75.7
WCT
Accuracy
M 49.0 46.5 44.0 18.7 <.001 .16 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 7.2 7.3
WCT
Time
M 110.7 132.6 177.7 18.7 <.001 .16 PASS vs. FAIL; BOR vs. FAIL
SD 50.1 56.5 94.8
EI-5
REC
M 0.2 1.2 3.0 55.6 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 0.6 1.2 2.7
EI-5
PSP
M 0.6 1.7 2.7 22.3 <.001 .18 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.1 1.9 2.9
Note. Post hoc pairwise contrasts were computed using the least significant difference method; VI-11: Validity Index Eleven; BOR: Borderline; g
2
: Partial eta
squared; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory Test – Words (Completion time in seconds);
WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds); EI-5
REC
: Erdodi Index – Five-variable model
based on measures of recognition memory; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of processing speed.
APPLIED NEUROPSYCHOLOGY: ADULT 5
Previous research suggests that a WCT cutoff of 45
corresponds to an RMT 42 (Davis, 2014; Erdodi,
Kirsch, et al., 2014). This cutoff had .41 sensitivity at
.95 specificity against the VI-11, which is similar to
the .33 sensitivity and .86 specificity observed against
the EI-5
PSP
. Sensitivity improved against the EI-5
REC
(.74), while specificity remained essentially the same
(.94).
Identifying a pool of critical items
The failure rate on each RMT and WCT item was
compared between those who passed and those who
failed the criterion PVTs. The items that were retained
met the following inclusion criteria: (1) The proportion
of correct responses was significantly higher in the
valid group compared to the invalid group; (2) The
proportion of correct responses in the valid group was
at least 15% higher compared to the invalid group;
and (3) The item met the first two criteria against all
three criterion PVTs. This last restriction was intro-
duced to minimize the effect of instrumentation
artifacts and therefore, improve the generalizability of
the findings, ensuring that the critical items will per-
form well against a variety of different criterion PVTs.
Establishing groups of critical items
The seven best items meeting all three criteria were
selected for further analyses (“critical seven” or CR-7)
within both tests. Next, a smaller group of five critical
items (CR-5) was created by dropping the two CR-7
items with the least discriminant power. Finally, the
number of critical items was further reduced to three
Table 5. Results of one-way ANOVAS on RMT, WCT, VI-11, and EI-5
PSP
scores across EI-5
REC
classification ranges.
EI-5
REC
0–1
n ¼138
2–3
n ¼41
≥4
n ¼23
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 46.8 42.1 32.4 56.50 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 4.1 9.5 8.3
RMT
Time
M 141.0 184.1 209.8 13.80 <.001 .12 PASS vs. BOR; PASS vs. FAIL
SD 67.5 62.5 76.6
WCT
Accuracy
M 48.9 45.5 37.4 68.40 <.001 .41 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.8 6.8 8.6
WCT
Time
M 114.4 165.6 225.3 31.20 <.001 .24 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 50.5 67.4 125.3
VI-11 M 1.2 3.6 4.6 73.00 <.001 .42 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.5 1.7 1.6
EI-5
PSP
M 1.1 2.4 2.1 7.32 <.001 .07 PASS vs. BOR; PASS vs. FAIL
SD 1.7 3.0 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5
REC
: Erdodi Index – Five-variable model based on measures of
recognition memory; BOR: Borderline; g
2
: Partial eta square; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory
Test – Words (Completion time in seconds); WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds); VI-11:
Validity Index Eleven; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of processing speed.
Table 6. Results of one-way ANOVAs on RMT, WCT, VI-11, and EI-5
REC
scores across EI-5
PSP
classification ranges.
EI-5
PSP
0–1
n ¼133
2–3
n ¼48
≥4
n ¼21
PASS BOR FAIL F p g
2
Significant post hocs
RMT
Accuracy
M 45.8 41.5 39.8 10.10 <.001 .09 PASS vs. BOR; PASS vs. FAIL
SD 6.2 8.5 10.2
RMT
Time
M 135.5 193.5 209.2 21.10 <.001 .18 PASS vs. BOR; PASS vs. FAIL
SD 57.9 80.1 73.3
WCT
Accuracy
M 47.9 45.4 44.4 6.02 <.01 .06 PASS vs. BOR; PASS vs. FAIL
SD 3.9 7.3 9.2
WCT
Time
M 122.5 166.0 160.7 7.35 <.01 .07 PASS vs. BOR; PASS vs. FAIL
SD 70.3 86.9 54.6
VI-11 M 1.4 2.8 4.5 34.20 <.001 .26 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 1.9 2.0
EI-5
REC
M 1.0 1.8 2.1 3.96 <.05 .04 PASS vs. BOR; PASS vs. FAIL
SD 2.0 2.4 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5
PSP
: Erdodi Index – Five-variable model based on measures of
processing speed; BOR: Borderline; g
2
: Partial eta square; RMT
Accuracy
: Recognition Memory Test – Words (Accuracy score); RMT
Time
: Recognition Memory
Test – Words (Completion time in seconds); WCT
Accuracy
: Word Choice Test (Accuracy score); WCT
Time
: Word Choice Test (Completion time in seconds);
VI-11: Validity Index Eleven; EI-5
REC
: Erdodi Index – Five-variable model based on measures of recognition memory.
6 L. A. ERDODI ET AL.
(CR-3), retaining only those with the highest discrimi-
nant power. The value of each subset of critical items
reflects the number of incorrect responses (i.e., higher
values indicate stronger evidence of invalid perfor-
mance). Having these three combinations of critical
items increases the chances of identifying non-credible
responding as it provides alternative detection strate-
gies. The specific combination of critical items is not
disclosed within this manuscript to protect test security
and to guard the newly developed diagnostic tool from
unauthorized use. However, the information will be
provided to qualified clinicians. Interested readers
should contact the first author.
Signal detection performance of critical items in
the RMT
A CR-7
RMT
cutoff ≥3 achieved good specificity (.91–.96)
and variable sensitivity (.38–.70). Increasing the cutoff
to ≥ 4 produced the predictable trade-off between
improved specificity (.97–1.00) and declining sensitivity
(.18–.35). Increasing the cutoff to ≥5 reached the point
of diminishing return with minimal gains in specificity
(.98–1.00), but further deterioration in sensitivity
(.09–.24).
A CR-5
RMT
cutoff ≥2 produced an acceptable combi-
nation of sensitivity (.54) and specificity (.86) against the
VI-11, but failed to reach the lower threshold for speci-
ficity against the EI-5s. Increasing the cutoff to ≥3
resulted in a marked increase in specificity (.95–.99),
but a proportional loss in sensitivity (.27–.39). A further
increase to ≥4 sacrificed much of the sensitivity (.10–.22)
for negligible gains in specificity (.97–1.00).
A CR-3
RMT
cutoff ≥1 failed to reach minimum speci-
ficity against any of the criterion PVTs. Therefore, it
received no further consideration. However, increasing
the cutoff to ≥2 produced good combinations of sensi-
tivity (.33–.43) and specificity (.91–.95). Failing all three
of the CR-3
RMT
items was associated with near-perfect
specificity (.97–1.00), but low sensitivity (.10–.19).
Further details are displayed in Table 7.
Signal detection performance of critical items in
the WCT
A CR-7
WCT
cutoff ≥1 produced an acceptable combi-
nation of sensitivity (.58) and specificity (.84) against
the VI-11, but failed to reach the lower threshold for
specificity against the EI-5s. Increasing the cutoff to
≥2 resulted in notable improvement in specificity
(.91–.99), with relatively well-preserved, although
fluctuating sensitivity (.34–.70). Further increasing the
cutoff to ≥3 achieved minimal gains in specificity
(.92–.99), but sacrificed some of the sensitivity (.19–.65).
A CR-5
WCT
cutoff ≥1 produced good combinations
of sensitivity (.49–.87) and specificity (.84–.95) against
all criterion PVTs. Increasing the cutoff to ≥2 produced
the predictable trade-off between rising specificity
(.95–1.00) and declining sensitivity (.27–.65). Raising
the cutoff to ≥3 resulted in consistently high speci-
ficity (.95–1.00), but low and fluctuating sensitivity
(.14–.48).
Similarly, a CR-3
WCT
cutoff ≥1 produced good
combinations of sensitivity (.40–.74) and specificity
(.88–.95) against all criterion PVTs. Increasing the cut-
off to ≥2 sacrificed half of the sensitivity (.22–.57) for
small gains in specificity (.95–1.00). As with the RMT,
failing all three of the CR-3
WCT
items was associated
with near-perfect specificity (.98–1.00), but low sensi-
tivity (.05–.13). Further details are displayed in Table 8.
Unique contribution of critical items to the
classification accuracy of the RMT and WCT
To objectively evaluate the unique contribution of
critical items above and beyond traditional cutoffs, we
examined the profiles that failed one cutoff, but passed
the other in relation to VI-11 scores. Between 1.8% and
Table 7. Sensitivity and specificity of three combinations of critical items within the RMT across different cutoffs.
VI-11 EI-5
REC
EI-5
PSP
n ¼170
34.2%
n ¼161
14.3%
n ¼154
13.6%
Cutoff BR
Fail
(%) SENS SPEC SENS SPEC SENS SPEC
CR-7
RMT
≥3 17.6 .39 .96 .70 .92 .38 .91
≥4 7.0 .18 1.00 .35 .97 .29 .97
≥5 4.0 .09 1.00 .13 .98 .24 .98
CR-5
RMT
≥2 28.6 .54 .86 .78 .83 .52 .80
≥3 11.1 .27 .99 .39 .95 .38 .95
≥4 3.5 .10 1.00 .22 .99 .14 .97
CR-3
RMT
≥1 40.2 .67 .72 .78 .69 .76 .69
≥2 15.6 .33 .95 .43 .91 .33 .92
¼3 4.5 .10 1.00 .17 .99 .19 .97
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); EI-5
REC
: Recognition memory based validity composite
(Pass 1, Fail ≥4); EI-5
PSP
: Processing speed based validity composite (Pass 1, Fail ≥4); SENS: Sensitivity; SPEC: Specificity; BR
Fail
: Base rate of failure (% of
sample that scored below the given cutoff); RMT: Recognition Memory Test – Words; CR: Critical items.
APPLIED NEUROPSYCHOLOGY: ADULT 7
5.0% of patients who passed the traditional cutoff on the
RMT (>42) failed select CR
RMT
cutoffs (Table 9).
Within the WCT, between 5.6% and 17.5% of those
who passed the traditional cutoff (>45) failed select
CR
WCT
cutoffs (Table 10). These results suggest that
critical items increase the sensitivity of both tests, while
maintaining high specificity.
Failing critical items increases the confidence in the
decision to classify the response set as invalid even in
examinees who already failed the traditional cutoff.
Among the subset of patients who failed the traditional
cutoff on the RMT (42), those who also failed CR
RMT
had higher VI-11 scores, providing stronger evidence of
non-credible presentation. As with the RMT, higher
VI-11 scores were observed among patients who failed
the CR
WCT
cutoffs even within the subsample that failed
the traditional WCT cutoff (45).
Conversely, patients who failed the traditional cutoff,
but provided correct answers on the critical items were
examined separately. Everybody who scored 42 on the
RMT failed at least one critical item. Eight of them
produced CR-7
RMT
¼1. Three of these patients were
clear false positive errors based on their PVT profile:
they passed all three reference PVTs (VI-11, EI-5
REC
and EI-5
PSP
), the most liberal cutoff on the WCT
(47) and other free-standing PVTs. The remaining
five patients will be discussed below.
One of these could be considered a false positive
based on the combination of clinical history and
neurocognitive profile: a retired physician in his 70s
diagnosed with amnestic Mild Cognitive Impairment.
He scored in the Borderline range on the VI-11 and
EI-5
REC
, had a WCT score of 47, and 11 unique errors
on the WCST. His FSIQ was 118, with a Coding ACSS
of 15. His performance on the acquisition trials of the
CVLT-II was high average (43/80), but his long-delay
free recall was borderline (3/16). He obtained a perfect
score on the Rey-15 and the first trial of the Test of
Memory Malingering. Thus, his performance is broadly
consistent with his diagnosis.
The fifth patient was a 19-year-old woman who
scored in the Borderline range on the VI-11 EI-5
REC
and EI-5
PSP
. She failed the WCT (40) and the Test of
Memory Malingering (35–36–36), produced eight
unique errors on the WCST, a Vocabulary Digit Span
ACSS of 6, and a CVLT-II logistic regression equation
score (.72) in the failing range (Wolfe et al., 2010). In
addition, she produced several combinations of scores
that are internally inconsistent. Given the identifiable
external incentive to appear impaired (on athletic
scholarship, struggling in her classes, seeking an ADHD
diagnosis, stimulant medication, and academic accom-
modations), she met criteria for Malingered
Table 8. Sensitivity and specificity of three combinations of critical items within the WCT across different cutoffs.
VI-11 EI-5
REC
EI-5
PSP
n ¼170
34.2%
n ¼161
14.3%
n ¼154
13.6%
Cutoff BR
Fail
(%) SENS SPEC SENS SPEC SENS SPEC
CR-7
WCT
≥1 31.8 .58 .84 .91 .83 .52 .74
≥2 14.6 .34 .99 .70 .97 .38 .91
≥3 10.6 .25 .99 .65 .98 .19 .92
CR-5
WCT
≥1 22.2 .49 .95 .87 .93 .52 .84
≥2 10.6 .27 1.00 .65 .99 .29 .95
≥3 6.6 .16 1.00 .48 1.00 .14 .95
CR-3
WCT
≥1 19.2 .40 .95 .74 .93 .43 .88
≥2 9.1 .22 1.00 .57 .99 .24 .95
¼3 2.0 .05 1.00 .13 1.00 .05 .98
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); EI-5
REC
: Recognition memory based validity composite
(Pass 1, Fail ≥4); EI-5
PSP
: Processing speed based validity composite (Pass 1, Fail ≥4); SENS: Sensitivity; SPEC: Specificity; BR
Fail
: Base rate of failure (% of
sample that scored below the given cutoff); WCT: Word Choice Test; CR: Critical items.
Table 9. VI-11 scores as a function of passing or failing the traditional RMT cutoff or select cutoffs on the critical items.
RMT Pass (>42) RMT Fail (42)
VI-11 VI-11
n M SD p d n M SD p d
CR-7
RMT
<3 144 1.59 1.71 <.05 .78 20 2.50 2.24 <.05 .72
≥3 6 3.33 2.66 29 3.97 1.80
CR-5
RMT
<3 147 1.62 1.73 <.05 .79 30 2.97 2.30 <.05 .52
≥3 3 3.67 3.22 19 4.00 1.60
CR-3
RMT
<2 142 1.61 1.76 .06 .55 26 3.00 2.28 .10 .24
≥2 8 2.63 1.92 23 3.78 1.83
Note. RMT: Recognition Memory Test (Words); VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); CR: Critical
items.
8 L. A. ERDODI ET AL.
Neurocognitive Dysfunction introduced by Slick,
Sherman, and Iverson (1999).
The sixth patient was a 47-year-old woman with a
history of childhood abuse who scored in the Borderline
range on the EI-5
REC
, failed the VI-11 (3), the WCT
(40), and the Dot Counting Test (28.3) in addition to
the validity cutoffs in Coding (ACSS ¼5) and Symbol
Search (ACSS ¼4). Her CVLT-II profile was internally
inconsistent; with a low average acquisition score of
42/80 and a Forced Choice Recognition score of 11/16
(invalid beyond reasonable doubt). Overall, her profile
can be considered invalid, and hence, a true positive.
The last two patients are more difficult to classify.
One of them was a 59-year-old woman with a history
of incestuous sexual trauma. She passed the EI-5
PSP
,
but failed the WCT (43), Test of Memory Malingering
(34–47–49), the EI-5
REC
(8), and the VI-11 (7). Despite
ample evidence of invalid performance, all her PVT
failures were limited to memory tests, which was
consistent with her self-reported decline of attention
and memory. In addition, she performed in the
high average range on Coding, letter fluency and the
Stroop test.
The last patient had an identifiable external incentive
to appear impaired in combination with a documented
history of stroke and brain tumor treated with radiation.
She passed the EI-5
REC
, EI-5
PSP
, and VI-11, but failed
the WCT (47) and the logistic regression equation
(.74) by Wolfe et al. (2010). In addition, she performed
in the average to high average range on Coding, Logical
Memory, animal fluency, and Trail Making Test.
On the WCT, three patients failed the traditional
cutoff (45) but provided correct answers on all of
the CR-7
WCT
items. Two of them were clear false
positive errors: a 29-year-old man with 17 years of edu-
cation referred for a mild TBI and a 57-year-old man
with 18 years of education referred for Parkinson’s dis-
ease. The former passed the EI-5
REC
, EI-5
PSP
, VI-11, the
RMT (49), as well as the Test of Memory Malingering
(47–50–50) and produced a largely intact neurocogni-
tive profile. The latter scored in the Borderline range
on the EI-5
REC
, and VI-11, passed the RMT (43) as well
as the Test of Memory Malingering (48–50–50) and
produced a profile consistent with his diagnosis.
The third patient was a 39-year-old woman
diagnosed with Personality Disorder NOS with
Borderline Features and fibromyalgia. She passed the
EI-5
REC
, scored in the Borderline range on the
EI-5
PSP
, but failed the VI-11, the RMT (30), Test of
Memory Malingering (33–42–39) and Rey-15 (7). As
such, her neurocognitive profile can be confidently
classified as invalid. Nevertheless, she demonstrated
average to high average performance on tests of
memory and processing speed.
Discussion
This study examined the potential of critical items
within the RMT and WCT to enhance the diagnostic
power of the traditional cutoffs based on total scores
in a large sample of patients clinically referred for neu-
ropsychological assessment. Our hypothesis that critical
items would improve classification accuracy was sup-
ported. Critical items increased the sensitivity for both
tests by correctly identifying 2–17% of response sets that
passed the traditional cutoffs as invalid. They also
increased specificity by providing additional empirical
evidence that response sets identified as invalid by the
traditional cutoffs were correctly classified as such. In
addition, a total score in the failing range combined
with correct responses on the critical items was associa-
ted with an increased risk of a false positive error,
indicating the need for further analysis of the PVT
profile. Although a simultaneous increase in both
sensitivity and specificity seems paradoxical at face
value, this bidirectional improvement in classification
accuracy follows the inner logic behind critical item
analysis.
The heuristic underlying this method is that the total
number of items failed and the specific combination of
items failed contribute nonredundant information
about performance validity (Bortnik et al., 2010;
Killgore & DellaPietra, 2000). If an examinee passed
the cutoff based on total score, but failed the cutoff
Table 10. VI-11 scores as a function of passing or failing the traditional WCT cutoff or select cutoffs on the critical items.
WCT Pass (>45) WCT Fail (45)
VI-11 VI-11
n M SD p d n M SD p d
CR-7
WCT
<2 132 1.54 1.71 <.05 .48 3 2.00 1.00 .06 1.18
≥2 28 2.43 2.01 35 3.89 2.03
CR-5
WCT
¼0 148 1.53 1.68 <.05 1.14 6 2.33 1.21 <.05 .99
≥1 12 3.67 2.06 32 4.00 2.05
CR-3
WCT
¼0 151 1.59 1.71 <.05 .91 9 3.00 1.80 .11 .38
≥1 9 3.44 2.30 29 3.97 2.06
Note. WCT: Word Choice Test; VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail ≥3); CR: Critical items.
APPLIED NEUROPSYCHOLOGY: ADULT 9
based on critical items, the new method increased
sensitivity by correctly detecting a non-credible
examinee that was missed by the traditional method.
Conversely, if an examinee just barely failed the most
liberal cutoff based on total score, some would interpret
this performance as a “near-pass” (Bigler, 2012), and
argue that it actually represents a false positive error
(i.e., the instrument has unacceptably low specificity).
However, if the examinee in question also failed a
certain combination of critical items associated with
higher specificity, that pattern of performance would
strengthen the evidence that the overall profile is indeed
invalid. As such, critical item analysis could effectively
increase the specificity of a given diagnostic decision.
Critical items could also serve as a safeguard against
false positive errors. If an examinee failed the traditional
cutoff based on total score, but provided correct
responses on the critical items, the profile may warrant
a more in-depth analysis. Our data suggest that at least
half of such cases were incorrectly classified as invalid
(i.e., they are false positives). In addition, patients with
complex psychiatric history were overrepresented
among those who were eventually determined to be true
positives. If this finding replicates in larger samples, the
discrepancy between the total score and critical item
cutoff on the RMT and WCT could provide useful in
subtyping invalid performance. Since malingering (Slick
et al., 1999), a “cry for help” (Berry et al., 1996), and
psychogenic interference (Erdodi, Tyson, Abeare,
et al., 2016) differ in etiology, distinguishing among
them would enhance diagnostic accuracy as well as
improve the clinical management of patients currently
lumped together on the failing side of PVTs.
The three levels of critical items (CR-7, CR-5, and
CR-3) employ different detection strategies. Essentially,
they trade the size of the item pool for the number of
item failures required to deem a response set invalid.
For example, within the RMT ≥3 failures on the CR-7
or CR-5 have comparable classification accuracy to ≥2
failures on the CR-3. Likewise, within the WCT ≥2
failures on the CR-7 or CR-5 have similar classification
accuracy to ≥1 failures on the CR-3.
However, critical items in both tests capitalize on the
inherent differences among test items regarding their
ability to differentiate between credible and non-
credible responding. They rescale the original test by
eliminating inactive items, and only retain the ones with
the highest discriminant power (Embretson, 1996). As
such, they provide evaluators with an alternative
method of assessing performance validity. Instead of
counting how many of the total items the examinee
failed, they are focusing on which items were failed.
Therefore, critical item analyses enhance the
interpretation of a given score on the RMT or WCT
above and beyond the total score by demonstrating that
even within the subset of patients who passed the old
cutoff based on all 50 items, there was a significant dif-
ference between those who passed and those who failed
the cutoff based on critical items. For example, those
who scored >42 on RMT and also passed the CR-
7
RMT
, had statistically and clinically significantly lower
VI-11 scores (i.e., more likely to be valid) than those
who scored >42 on RMT, but failed the CR-7
RMT
.
Depending on the specific combination of item-level
scores, this approach can promote superior overall
classification accuracy. For example, while an RMT total
score of 48 is considered a clear Pass (Iverson &
Franzen, 1994; M. S. Kim, Boone, Victor, Marion,
et al., 2010), if the two incorrect responses were from
CR-3
RMT
, that score provides strong evidence (.91–.95
specificity) of non-credible responding. Even if the total
score is already below the cutoff, critical items can still
enhance the confidence in classifying it as invalid. For
example, a WCT total score of 46 is already considered
a Fail, with specificity between .87 (Davis, 2014) and .92
(Erdodi, Kirsch, et al., 2014). However, if at least two of
the incorrect responses were from the CR-5
WCT
,
specificity increases to .95–1.00.
When compared to the RMT, critical items appear
more useful within the WCT. They expand sensitivity
further (6–18%) compared to RMT (2–5%). Also,
among patients who failed the traditional cutoff based
on total score, passing or failing the critical items had
a stronger relationship with number of PVT failures
within the WCT (d: .38–1.18) relative to the RMT
(d: .24–.72). This finding adds to the accumulating
empirical evidence supporting the clinical utility of the
WCT (Barhon et al., 2015; Davis, 2014; Erdodi, Kirsch,
et al., 2014; Miller et al., 2011).
This study developed three different sets of critical
items within the RMT and WCT that enhance the
classification accuracy of both instruments in a clinically
meaningful way. Critical item analysis requires no
additional test material or administration time, yet
provides a time- and cost-effective alternative to evalu-
ate performance validity independent of traditional total
score cutoffs. In a sense, they re-examine the data and
provide a “second opinion” regarding clinical classi-
fication of a given RMT or WCT response set.
Additionally, they provide objective, data-driven infor-
mation to address the contested issue of “near passes”
(Bigler, 2012, 2015), with clear clinical and forensic
implications.
The findings should be interpreted within the context
of the study’s limitations. The sample was geographi-
cally restricted, and diagnostically heterogeneous.
10 L. A. ERDODI ET AL.
Future research would benefit from replication using
different, more homogenous samples and different
reference PVTs. Finally, given that the clinical utility
of critical item analysis ultimately depends on the
number of incorrect answers that occur on these
specific items, the model is still vulnerable to chance.
Therefore, the generalizability of the results can only
be determined by independent replications of the
present study.
Acknowledgments
This project received no financial support from outside funding
agencies. Relevant ethical guidelines regulating research
involving human participants were followed throughout the
project. All data collection, storage and processing was done
in compliance with the Helsinki Declaration. The authors have
no disclosures to make that could be interpreted as conflict of
interests.
References
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., &
McPhearson, S. (2005). Sensitivity and specificity of finger
tapping test scores for the detection of suspect effort. The
Clinical Neuropsychologist, 19(1), 105–120. doi:10.1080/
13854040490888567
Axelrod, B. N., Fichteberg, N. L., Millis, S. R., & Wertheimer,
J. C. (2006). Detecting incomplete effort with digit span
from the Wechsler Adult Intelligence Scale – Third Edition.
The Clinical Neuropsychologist, 10, 513–523. doi:10.1080/
13854040590967117
Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger
tapping test performance as a measure of performance
validity. The Clinical Neuropsychologist, 28(5), 876–888.
doi:10.1080/13854046.2014.907583
Barhon, L. I., Batchelor, J., Meares, S., Chekaluk, E., & Shores,
E. A. (2015). A comparison of the degree of effort involved
in the TOMM and the ACS word choice test using a dual-
task paradigm. Applied Neuropsychology: Adult, 22(2),
114–123.
Bauer, L., O’Bryant, S. E., Lynch, J. K., McCaffrey, R. J., &
Fisher, J. M. (2007). Examining the test of memory malin-
gering trial 1 and word memory test immediate recognition
as screening tools for insufficient effort. Assessment, 14(3),
215–222. doi:10.1177/1073191106297617
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., &
McCaffrey, R. J. (2005). An examination of the California
verbal learning test II to detect incomplete effort in a trau-
matic brain injury sample. Applied Neuropsychology, 12(4),
202–207. doi:10.1207/s15324826an1204_3
Berry, D. T. R., Adams, J. J., Clark, C. D., Thacker, S. R.,
Burger, T. L., Wetter, M. W., Baer, R. A., & Borden,
J. W. (1996). Detection of a cry for help on the MMPI-2:
An analog investigation. Journal of Personality Assessment,
67(1), 26–36.
Bigler, E. D. (2012). Symptom validity testing, effort and
neuropsychological assessment. Journal of the International
Neuropsychological Society, 18, 632–642. doi:10.1017/
s1355617712000252
Bigler, E. D. (2015). Neuroimaging as a biomarker in
symptom validity and performance validity testing. Brain
Imaging and Behavior, 9(3), 421–444. doi:10.1007/s11682-
015-9409-1
Bilker, W. B., Wierzbicki, M. R., Brensinger, C. M., Gur, R. E.,
& Gur, R. C. (2014). Development of abbreviated eight-
item form of the Penn verbal reasoning test. Assessment,
21, 669–678. doi:10.1177/1073191114524270
Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection
of suboptimal effort with the Rey complex figure test and
recognition trial. Applied Neuropsychology, 16, 54–61.
doi:10.1080/09084280802644227
Boone, K. B. (2007). Assessment of feigned cognitive impair-
ment. A neuropsychological perspective. New York, NY:
Guilford.
Boone, K. B. (2009). The need for continuous and
comprehensive sampling of effort/response bias during
neuropsychological examination. The Clinical Neuropsy-
chologist, 23(4), 729–741. doi:10.1080/13854040802427803
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.
New York, NY: Guilford.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., &
Razani, J. (2002). The Rey 15-item recognition trial: A
technique to enhance sensitivity of the Rey 15-item
memorization test. Journal of Clinical and Experimental
Neuropsychology, 24(5), 561–573.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler,
E., Victor, T. L., & Zeller, M. A. (2010). Examination of
various WMS-III logical memory scores in the assessment
of response bias. The Clinical Neuropsychologist, 24(2),
344–357. doi:10.1080/13854040903307268
Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psycho-
logical assessment of symptom and performance validity,
response bias, and malingering: Official position of the
association for scientific advancement in psychological
injury and law. Psychological Injury and Law, 7(3),
197–205. doi:10.1007/s12207-014-9198-7
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini,
K. J., Boone, K. B., Kirkwood, M. W., … Ord, J. S. (2015).
Official position of the American academy of clinical
neuropsychology social security administration policy on
validity testing: Guidance and recommendations for
change. The Clinical Neuropsychologist, 29(6), 723–740.
doi:10.1080/13854046.2015.1099738
Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini,
K. J. (2008). Verbal fluency indicators of malingering in
traumatic brain injury: Classification accuracy in known
groups. The Clinical Neuropsychologist, 22, 930–945.
doi:10.1080/13854040701563591
Davis, J. J. (2014). Further consideration of advanced clinical
solutions word choice: Comparison to the recognition
memory test Words and classification accuracy on a
clinical sample. The Clinical Neuropsychologist, 28(8),
1278–1294. doi:10.1080/13854046.2014.975844
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (2000).
California Verbal Learning Test – Second Edition, Adult
Version, manual. San Antonio, TX: Psychological
Corporation.
Denning, J. H. (2012). The efficiency and accuracy of the test
of memory malingering trial 1, errors on the first 10 items
of the test of memory malingering, and five embedded
measures in predicting invalid test performance. Archives
APPLIED NEUROPSYCHOLOGY: ADULT 11
of Clinical Neuropsychology, 27(4), 417–432. doi:10.1093/
arclin/acs044
Embretson, S. E. (1996). The new rules of measurement.
Psychological Assessment, 8(4), 341.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T.,
Kucharski, B., Zuccato, B. G., & Roth, R. M. (2017).
WAIS-IV processing speed scores as measures of non-
credible responding–The third generation of embedded
performance validity indicators. Psychological Assessment,
29(2), 148–157. doi:10.1037/pas0000319
Erdodi, L. A., Jongsma, K. A., & Issa, M. (2017). The 15-item
version of the Boston naming test as an index of English
proficiency. The Clinical Neuropsychologist, 31(1), 168–178.
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., &
Medoff, B. (2014). Comparing the recognition memory test
and the word choice test in a mixed clinical sample: Are
they equivalent? Psychological Injury and Law, 7(3),
255–263. doi:10.1007/s12207-014-9197-8
Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2016). Elevations
on select Conners’ CPT-II scales indicate noncredible
responding in adults with traumatic brain injury. Applied
Neuropsychology: Adult, 1–10. doi:10.1080/23279095.2016.
1232262 [Advance online publication]
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE
complex ideational material are associated with invalid
performance in adults without aphasia. Applied Neuropsy-
chology: Adult, 1–11. doi:10.1080/23279095.2016.1154856
[Advance online publication]
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R.,
& Medoff, B. (2014). Aggregating validity indicators
embedded in Conners’ CPT-II outperforms individual cut-
offs at separating valid from invalid performance in adults
with traumatic brain injury. Archives of Clinical Neuropsy-
chology, 29(5), 456–466. doi:10.1093/arclin/acu026
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D.,
Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE
complex ideational material A measure of receptive
language or performance validity? Psychological Injury
and Law, 9, 112–120. doi:10.1007/s12207-016-9254-6
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D.,
Abeare, C. A., Pelletiere, C. L., … Roth, R. M. (2017). The
power of timing: Adding a time-to-completion cutoff to
the Word Choice Test and Recognition Memory Test
improves classification accuracy. Journal of Clinical and
Experimental Neuropsychology, 39(4), 369–383. doi:10.1080/
13803395.2016.1230181 [Advance online publication]
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W.
(2006). Pain, malingering, and performance on the
WAIS-III processing speed index. Journal of Clinical and
Experimental Neuropsychology, 28, 1218–1237. doi:10.1080/
13803390500346595
Fazio, R. L., Denning, J. H., & Denney, R. L. (2017). TOMM
Trial 1 as a performance validity indicator in a criminal
forensic sample. The Clinical Neuropsychologist, 31(1):
251–267.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994).
Validation of malingered amnesia measures with a large
clinical sample. Psychological Assessment, 6, 218–224.
doi:10.1037//1040-3590.6.3.218
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-
offs on psychometric indicators of negative response bias:
A methodological commentary with recommendation.
Archives of Clinical Neuropsychology, 19, 533–541.
Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J.,
& Crouch, J. A. (2002). Detecting malingered neurocogni-
tive dysfunction with the wisconsin card sorting test:
A preliminary investigation in traumatic brain injury. The
Clinical Neuropsychologist, 16(2), 179–191.
Greve, K. W., Curtis, K. L., Bianchini, K. J., & Ord, J. S.
(2009). Are the original and second edition of the
California verbal learning test equally accurate in detecting
malingering? Assessment, 16(3), 237–248.
Grimes, D. A., & Schulz, K. F. (2005). Refining clinical
diagnosis with likelihood ratios. The Lancet, 365(9469),
1500–1505. doi:10.1016/s0140-6736(05)66422-7
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can
localized brain impairment be simulated on neuropsycho-
logical test profiles? Australian and New Zealand Journal
of Psychiatry, 21, 87–93. doi:10.3109/00048678709160904
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004).
Revised comprehensive norms for an expanded Halstead-
Reitan battery: Demographically adjusted neuropsychologi-
cal norms for African American and Caucasian adults. Lutz,
FL: Psychological Assessment Resources.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., &
Brennan, A. (2005). WAIS digit-span-based indicators of
malingered neurocognitive dysfunction: Classification
accuracy in traumatic brain injury. Assessment, 12(4),
429–444.
Hilsabeck, R. C., Gordon, S. N., Hietpas-Wilson, T., &
Zartman, A. L. (2011). Use of trial 1 of the test of memory
malingering (TOMM) as a screening measure of effort: Sug-
gested discontinuation rules. The Clinical Neuropsychologist,
25(7), 1228–1238. doi:10.1080/13854046.2011.589409
Horner, M. D., Bedwell, J. S., & Duong, A. (2006). Abbrevi-
ated form of the test of memory malingering. International
Journal of Neuroscience, 116, 1181–1186. doi:10.1080/
00207450500514029
Iverson, G. I., & Binder, L. M. (2000). Detecting exaggeration
and malingering in neuropsychological assessment. Journal
of Head Trauma Rehabilitation, 15(2), 829–858.
doi:10.1097/00001199-200004000-00006
Iverson, G. L., & Franzen, M. D. (1994). The recognition
memory test, digit span, and Knox cube test as markers
of malingered memory impairment. Assessment, 1(4),
323–334.
Jones, A. (2013). Test of memory malingering: Cutoff scores
for psychometrically defined malingering groups in a
military sample. The Clinical Neuropsychologist, 27(6),
1043–1059. doi:10.1080/13854046.2013.804949
Killgore, W. D., & DellaPietra, L. (2000). Using the WMS-III
to detect malingering: Empirical validation of the rarely
missed index (RMI). Journal of Clinical and Experimental
Neuropsychology, 22(6), 761–771. doi:10.1076/jcen.22.6.
761.960
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S.,
Cottingham, M. E., … Zeller, M. A. (2010). The Warrington
recognition memory test for words as a measure of
response bias: Total score and response time cutoffs
developed on “real world” credible and noncredible sub-
jects. Archives of Clinical Neuropsychology, 25, 60–70.
doi:10.1093/arclin/acp088
12 L. A. ERDODI ET AL.
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., &
Mitchell, C. (2010). Sensitivity and specificity of a
digit symbol recognition trial in the identification of
response bias. Archives of Clinical Neuropsychology, 25,
420–428.
Kulas, J. F., Axelrod, B. N., & Rinaldi, A. R. (2014). Cross-
validation of supplemental test of memory malingering
scores as performance validity measures. Psychological
Injury and Law, 7(3), 236–244. doi:10.1007/s12207-014-
9200-4
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T.,
Pancholi, S., Bhagwat, A., & French, L. M. (2013). Clinical
utility of the Conners’ continuous performance test-II to
detect poor effort in U.S. military personnel following
traumatic brain injury. Psychological Assessment, 25(2),
339–352. doi:10.1037/a0030915
Larrabee, G. J. (2003). Detection of malingering using atypical
performance patterns on standard neuropsychological
tests. The Clinical Neuropsychologist, 17(3), 410–425.
doi:10.1076/clin.17.3.410.18089
Larrabee, G. J. (2012). Assessment of malingering. In G. J.
Larrabee (Ed.), Forensic neuropsychology: A scientific
approach (pp. 117–159) (2nd ed.). New York, NY: Oxford
University Press.
Larrabee, G. J. (2014). False-positive rates associated with the
use of multiple performance and symptom validity
tests. Archives of Clinical Neuropsychology, 29, 364–373.
doi:10.1093/arclin/acu019
Lezak, M. D. (1995). Neuropsychological assessment. New
York, NY: Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017).
Introducing a forced-choice recognition task to the
California Verbal Learning Test–Children’s Version. Child
Neuropsychology, 23(3): 284–299. doi:10.1080/09297049.
2015.1135422
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz,
A., & Flaro, L. (2016). Wisconsin card sorting test embed-
ded validity indicators developed for adults can be extended
to children. Child Neuropsychology, 1–14. doi:10.1080/
09297049.2016.1259402 [Advance online publication]
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003).
Effectiveness of the Rey-Osterrieth complex figure test
and the Meyers and Meyers recognition trial in the detec-
tion of suspect effort. The Clinical Neuropsychologist,
17(3), 426–440.
Miller, J. B., Millis, S. R., Rapport, L. J., Bashem, J. R., Hanks,
R. A., & Axelrod, B. N. (2011). Detection of insufficient
effort using the advanced clinical solutions for the
Wechsler Memory scale. The Clinical Neuropsychologist,
25(1), 160–172. doi:10.1080/13854046.2010.533197
Morris, J. C., Heyman, A., Mohs, R. C., Hughes, J. P., van
Belle, G., Fillenbaum, G., … Clark, C. (1989). The
consortium to establish a registry for Alzheimer’s disease
(CERAD). Part I. Clinical and neuropsychological
assessment of Alzheimer’s disease. Neurology, 39(9),
1159–1165.
Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., &
Grills, C. (2003). The relationship between eight measures
of suspect effort. The Clinical Neuropsychologist, 17(2),
263–272. doi:10.1076/clin.17.2.263.16511
O’Bryant, S. E., Engel, L. R., Kleiner, J. S., Vasterling, J. J., &
Black, F. W. (2007). Test of memory malingering (TOMM)
trial 1 as a screening measure for insufficient effort. The
Clinical Neuropsychologist, 21, 511–521.
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J.
(2010). Detection of malingering in mild traumatic brain
injury with the Conners’ Continuous performance test-II.
Journal of Clinical and Experimental Neuropsychology,
32(4), 380–387. doi:10.1080/13803390903066881
Pearson. (2009). Advanced clinical solutions for the WAIS-IV
and WMS-IV Technical manual. San Antonio, TX:
Author.
Randolph, C. (1998). Repeatable Battery for the Assessment of
Neuropsychological Status (RBANS): Manual. San Antonio,
TX: Psychological Corporation.
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F.,
Lu, P. H., Victor, T. L., … Wright, M. J. (2013). Cross
validation of the Lu and colleagues (2003) Rey-Osterrieth
complex figure test effort equation in a large known-group
sample. Archives of Clinical Neuropsychology, 28, 30–37.
doi:10.1093/arclin/acs106
Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999).
Diagnostic criteria for malingered neurocognitive
dysfunction: Proposed standards for clinical practice and
research. The Clinical Neuropsychologist, 13(4), 545–561.
doi:10.1076/1385-4046(199911)13:04;1-y;ft545
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine,
B., Pangilinan, P. H., & Bieliauskas, L. A. (2013).
WAIS-IV reliable digit span is no more accurate than age
corrected scaled score as an indicator of invalid perfor-
mance in a veteran sample undergoing evaluation for
mTBI. The Clinical Neuropsychologist, 27(8), 1362–1372.
doi:10.1080/13854046.2013.845248
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded
measures of performance validity using verbal fluency tests
in a clinical sample. Applied Neuropsychology: Adult, 22(2),
141–146.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card
sorting test in the detection of malingering in student
simulator and patient samples. Journal of Clinical and
Experimental Psychology, 21(5), 701–708.
Tombaugh, T. N. (1996). Test of Memory Malingering.
New York, NY: Multi-Health Systems.
Trueblood, W. (1994). Qualitative and quantitative
characteristics of malingered and other invalid WAIS-R
and clinical memory data. Journal of Clinical and Experi-
mental Neuropsychology, 14(4), 697–607. doi:10.1080/
01688639408402671
Warrington, E. K. (1984). Recognition Memory Test manual.
Berkshire, UK: NFERNelson.
Wisdom, N. M., Brown, W. L., Chen, D. K., & Collins, R. L.
(2012). The use of all three tests of memory malingering
trials in establishing the level of effort. Archives of
Clinical Neuropsychology, 27, 208–212. doi:10.1093/arclin/
acr107
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee,
G. J., & Sweet, J. J. (2010). Effort indicators within the
California verbal learning test-II (CVLT-II). The Clinical
Neuropsychologist, 24(1), 153–168. doi:10.1080/
13854040903107791
APPLIED NEUROPSYCHOLOGY: ADULT 13
... Considering potential extenuating factors [congenital perceptual deficits (color blindness), SLD, orthopedic injuries, etc.), the aggregate versus individual data points (Medici, 2013;Odland et al., 2015;A. S. Ord et al., 2020;Pearson, 2009;Tyson et al., 2018), searching for disconfirming evidence (Dunn et al., 2021;Erdodi, Tyson, et al., 2018) before interpreting one or more failed PVTs as proof of globally non-credible presentation is a necessary practice to protect examinees against a potentially consequential mischaracterization of their test taking behavior and to preserve the integrity of performance validity assessment (Erdodi, 2023a). In addition to following these general safeguards against false positives, examining the pattern of performance of the CIM can provide valuable contextual information on the credibility of the response set. ...
... Unfortunately, existing research on the CIM as a PVT has only focused on the total score. In light of the present case study and emerging evidence that examining specific item responses can improve the classification accuracy above and beyond that of a single score (Dunn et al., 2021;Erdodi, Tyson, et al., 2018), future research on validity cutoffs within the CIM should examine the potential of item-level scores to eliminate the type of false-positive error observed in the current neurocognitive profile. Alternatively, more conservative cutoffs should be applied in cases where a legitimate impairment can be reasonably expected to contribute to a PVT failure. ...
Article
Full-text available
Objective: This paper describes a clinical case illustrating the limitations of the Complex Ideational Material (CIM) as an embedded performance validity test (PVT). Method: A comprehensive neuropsychological battery was administered to a 19-year-old male to evaluate the residual effects of a motor vehicle collision. Results: The patient passed all free-standing PVTs and the majority of embedded validity indicators. Failing the CIM (≤9) in the context of long-standing, documented deficits in semantic processing and following the difficulty gradient inherent in the task (i.e. all errors occurred on later items) likely represents a false positive. Conclusions: Future research on CIM as PVT should include an item-level analysis in addition to the overall score to reduce the risk of misidentifying bona fide deficits as non-credible responding. More broadly, genuine impairment and invalid performance may be psychometrically indistinguishable in individual embedded PVTs. Failures on embedded validity cutoffs should be interpreted in the context of the patient's clinical history. Routinely administering a comprehensive battery of neuropsychological tests can aid the interpretation of isolated atypical scores.
... There is growing support from independent research groups for the utility of the five individual PVTs that make up the ACS model (Bain et al., 2021;Barhon et al., 2015;Crişan et al., 2023;Davis, 2014;Dunn et al., 2021;Erdodi, 2021;Erdodi, Roth, et al., 2014;Schroeder et al., 2019;. Alternative detection mechanisms, such as time-to-completion (Cutler et al., 2023; and critical item analysis (Dunn et al., 2021;Erdodi, Tyson, et al., 2018;Holcomb et al., 2022) have been validated to complement accuracy scores on the WCT. ...
Article
Full-text available
Objective: The study was designed to evaluate the performance validity module of Advanced Clinical Solutions (ACS) against external criterion measures and compare two alternative aggregation methods for its five components. Method: The ACS was evaluated against psychometrically defined criterion groups in a sample of 93 outpatients with TBI. In addition to the default method, the component performance validity tests (PVTs) were either dichotomized along a single cutoff (VI-ACS) or recoded to capture various degrees of failure (EI-ACS). Results: The standard ACS model correctly classified 75-83% of the sample. The alternative aggregation methods produced superior overall correct classification: 80-91% (VI-ACS) and 86-91% (EI-ACS). Mild TBI was associated with higher failure rates than moderate/severe TBI. Failing just one of the five ACS components resulted in a 3- to 8-fold increase in the likelihood of failing criterion PVTs. Conclusions: Results support the use of the standard PVT module for ACS: it is an effective measure of performance validity that is robust to moderate-to-severe TBI. Post-publication research on individual ACS components and methodological advances in PVT research provide an opportunity to enhance the overall classification accuracy of the ACS model. Passing stringent multivariate PVT cutoffs does not indicate valid performance.
... Finally, we want to note the possible effect of modality specificity, which has received recent empirical attention (Erdodi, 2019;Erdodi, 2021;Schroeder et al., 2019a). It implies that the paradigm similarity between predictor and criterion PVT (e.g., both rely on forced-choice recognition) affects classification accuracy, posing a serious threat to studies' internal validity (Abeare et al., 2019;Erdodi et al., 2018;Rai & Erdodi, 2021). Indeed, the TOMM-1, Rey-15 COMB, and MODEMM Mean RG share the recognition paradigm, which could have artificially increased accuracies for MODEMM's single indicators and multivariable models. ...
Article
Full-text available
Objective This study investigated performance validity in the understudied Romanian clinical population by exploring classification accuracies of the Dot Counting Test (DCT) and the first Romanian performance validity test (PVT) (Memory of Objects and Digits and Evaluation of Memory Malingering/MODEMM) in a heterogeneous clinical sample. Methods We evaluated 54 outpatients (26 females; MAge = 62.02; SDAge = 12.3; MEducation = 2.41, SDEducation = 2.82) with the Test of Memory Malingering 1 (TOMM-1), Rey Fifteen Items Test (Rey-15) (free recall and recognition trials), DCT, MODEMM, and MMSE/MoCA as part of their neuropsychological assessment. Accuracy parameters and base failure rates were computed for the DCT and MODEMM indicators against the TOMM-1 and Rey-15. Two patient groups were constructed according to psychometrically defined credible/noncredible performance (i.e., pass/fail both TOMM-1 and Rey-15). Results Similar to other cultures, a cutoff of ≥18 on the DCT E score produced the best combination between sensitivity (0.50–0.57) and specificity (≥0.90). MODEMM indicators based on recognition accuracy, inconsistencies, and inclusion false positives generated 0.75–0.86 sensitivities at ≥0.90 specificities. Multivariable models of MODEMM indicators reached perfect sensitivities at ≥0.90 specificities against two PVTs. Patients who failed the TOMM-1 and Rey-15 were significantly more likely to fail the DCT and MODEMM than patients who passed both PVTs. Conclusions Our results offer proof of concept for the DCT’s cross-cultural validity and the applicability of the MODEMM on Romanian clinical examinees, further recommending the use of heterogeneous validity indicators in clinical assessments.
... Both of these studies were based on patients with traumatic brain injuries (TBI), limiting the generalizability of their findings. Further, neither examined the pattern of item-level responses, even though research on critical item analysis has demonstrated incremental validity beyond the summary score (i.e., total number of correct responses) both within the WMS (Dunn et al., 2021) and other PVTs based on the forced-choice recognition paradigm (L. A. Erdodi, Tyson, et al., 2018;Holcomb et al., 2022;Root et al., 2006). The present study was designed to address these limitations. ...
Article
Full-text available
Objective: This study was designed to replicate previous research on the clinical utility of the Verbal Paired Associates (VPA) and Visual Reproduction (VR) subtests of the WMS-IV as embedded performance validity tests (PVTs) and perform a critical item (CR) analysis within the VPA recognition trial. Method: Archival data were collected from a mixed clinical sample of 119 adults (MAge = 42.5, MEducation = 13.9). Classification accuracy was computed against psychometrically defined criterion groups based on the outcome of various free-standing and embedded PVTs. Results: Age-corrected scaled scores ≤6 were specific (.89−.98) but had variable sensitivity (.36−.64). A VPA recognition cutoff of ≤34 produced a good combination of sensitivity (.46−.56) and specificity (.92−.93), as did a VR recognition cutoff of ≤4 (.48−.53 sensitivity at .86−.94 specificity). Critical item analysis expanded the VPA’s sensitivity by 3.5%–7.0% and specificity by 5%–8%. Negative learning curves (declining output on subsequent encoding trials) were rare but highly specific (.99–1.00) to noncredible responding. Conclusions: Results largely support previous reports on the clinical utility of the VPA and VR as embedded PVTs. Sample-specific fluctuations in their classification accuracy warrant further research into the generalizability of the findings. Critical item analysis offers a cost-effective method for increasing confidence in the interpretation of the VPA recognition trial as a PVT.
Article
Full-text available
This study was designed to evaluate the classification accuracy of the Warrington's Recognition Memory Test (RMT) in 167 patients (97 or 58.1% men; MAge = 40.4; MEducation= 13.8) medically referred for neuropsychological evaluation against five psychometrically defined criterion groups. At the optimal cutoff (≤42), the RMT produced an acceptable combination of sensitivity (.36-.60) and specificity (.85-.95), correctly classifying 68.4-83.3% of the sample. Making the cutoff more conservative (≤41) improved specificity (.88-.95) at the expense of sensitivity (.30-.60). Lowering the cutoff to ≤40 achieved uniformly high specificity (.91-.95) but diminished sensitivity (.27-.48). RMT scores were unrelated to lateral dominance, education, or gender. The RMT was sensitive to a three-way classification of performance validity (Pass/Borderline/Fail), further demonstrating its discriminant power. Despite a notable decline in research studies focused on its classification accuracy within the last decade, the RMT remains an effective free-standing PVT that is robust to demographic variables. Relatively low sensitivity is its main liability. Further research is needed on its cross-cultural validity (sensitivity to limited English proficiency).
Article
Full-text available
Objective: This study aimed to investigate the classification accuracy of various free-standing and embedded performance validity tests (PVTs) administered in a medical-legal setting. Method: Archival data were analyzed from 4,721 adults (MAge = 42.3 years; MEducation = 12.0) referred for psychological or neuropsychological assessments in a medical-legal or civil disability context. Classification accuracy was compared at multiple cutoffs on nine PVTs of interest (three free-standing and six embedded), using the Word Memory Test as criterion. Results: Most previously published PVT cutoffs achieved acceptable specificity. Free-standing PVTs demonstrated better classification accuracy than embedded validity indicators. Conclusions: Results support the utility of commonly used free-standing and embedded PVTs with medical-legal or forensic civil disability samples. The findings highlight the importance of combining multiple PVTs to improve overall classification accuracy.
Article
Full-text available
This study was designed to evaluate the susceptibility of various performance validity tests (PVTs) to limited English proficiency (LEP). A battery of free-standing and embedded PVTs was administered to 95 undergraduate students at a Romanian university, randomly assigned to the control (n = 65) or experimental malingering group (n = 30). Overall correct classification (OCC) at the first cutoff to clear .90 specificity (with group membership as criterion) was used as the main metric to compare PVTs. Mean OCC for free-standing PVTs (.784) was comparable to mean OCC for embedded PVTs (.780). Cutoffs on embedded PVTs often had to be adjusted (more conservative) to meet the specificity standard. Contrary to our predictions, embedded PVTs with high verbal mediation outperformed those with low verbal mediation (mean OCC .807 versus .719). Although multivariate models of PVTs performed very well (mean OCC = .892), several individual freestanding and embedded PVTs produced comparable mean OCC (.863-.895). Other embedded PVTs had trivial sensitivity (.03-.13) at ≥ .90 specificity. PVTs administered in both languages (English and Romanian) provided conclusive evidence of both the deleterious effects of LEP and the cross-cultural validity of existing methods of performance validity testing. Results defied most of our a priori predictions: level of verbal mediation was an influential, but not a decisive factor in the classification accuracy of PVTs; free-standing PVTs were not necessarily superior to embedded PVTs; multivariate models of performance validity assessment outperformed most, but not all their individual components. Our findings suggest that some PVTs may be inherently unfit to be used with examinees with LEP. The multiple unexpected findings signal a fundamental uncertainty about the psychometric properties of instruments developed and validated in North America when applied to examinees outside the US or Canada. Although several existing PVTs have the potential to be useful in examinees with LEP, their relevant psychometric properties should be independently verified in new target populations to ensure the validity of their clinical interpretation. The classification accuracy observed in native speakers of English cannot be assumed to transfer to members of linguistically and culturally different communities – doing so risks potentially consequential errors in performance validity assessment. Of course, the abundance of counterintuitive findings also serves as a note of caution: our findings may not generalize to different samples.
Article
Full-text available
Objective: This study was designed to explore the classification accuracy of various Word Choice Test (WCT) cutoffs. Method: Archival data were collected from a consecutive case series of a mixed sample of 79 patients clinically referred for neuropsychological assessment. Criterion groups were established using a combination of seven embedded validity indicators. Results: A WCT accuracy score of ≤45 emerged as the optimal accuracy cutoff (.47 sensitivity at .92 specificity). A time cutoff of ≥190” achieved a good combination of sensitivity (.40) and specificity (.93). Critical item analysis produced high specificity (.93–.98) but low and variable sensitivity (.19–.44). Combining the accuracy and time cutoffs resulted in perfect specificity, correctly classifying 91% of the sample. Conclusion: Imposing highly conservative accuracy cutoffs on the WCT unnecessarily sacrifices sensitivity. Recruiting alternative detection methods (completion time and critical items) seems a better approach, as it minimizes false positives while maximizing overall classification accuracy.
Article
Objective: The objective of the current investigation was to validate and establish the psychometric properties of an abbreviated, 10-item version of the Word Choice Test (WCT). Method: Data from one hundred ten clinically-referred participants (M age = 55.92, SD = 14.07; M education = 13.74, SD = 2.43; 84.5% Male) in a Veterans Affairs neuropsychology outpatient clinic was analyzed. All participants completed the WCT, the TOMM T1, the WMT, and the Digit Span subtest of the WAIS-IV as part of a larger battery of neuropsychological tests. Results: Correlation analyses revealed significant relationships between the 10-item WCT-10, the TOMM T1, the RDS forward/backward, as well as the IR, DR, and CNS subtests of the WMT. ROC analysis for the WCT-10 indicated optimal cutoff of 2 or more errors, with 52% sensitivity and 97% specificity (AUC=.786, p<.001), compared with the standard administration of the WCT with a cutoff of 8 or more errors, which had 67% sensitivity and 91% specificity. Specificity/sensitivity values remained adequate at a cutoff of two or more errors when participants with cognitive impairment (Sensitivity=.52, Specificity=.92) and without cognitive impairment (Sensitivity=.52, Specificity = 1.0) were examined separately. Conclusions: The present investigation revealed that the WCT-10, an abbreviated free-standing PVT comprised of the initial 10 items of the WCT, demonstrated clinical utility in a mixed clinical sample of Veterans and was robust to cognitive impairment. This abbreviated PVT may benefit researchers and clinicians through adequate identification of invalid performance while minimizing completion time.
Article
Full-text available
This study was designed to empirically evaluate the classification accuracy of various definitions of invalid performance in two forced-choice recognition performance validity tests (PVTs; FCR CVLT-II and Test of Memory Malingering [TOMM-2]). The proportion of at and below chance level responding defined by the binomial theory and making any errors was computed across two mixed clinical samples from the United States and Canada (N = 470) and two sets of criterion PVTs. There was virtually no overlap between the bino-mial and empirical distributions. Over 95% of patients who passed all PVTs obtained a perfect score. At chance level responding was limited to patients who failed ≥2 PVTs (91% of them failed 3 PVTs). No one scored below chance level on FCR CVLT-II or TOMM-2. All 40 patients with dementia scored above chance. Although at or below chance level performance provides very strong evidence of non-credible responding, scores above chance level have no negative predictive value. Even at chance level scores on PVTs provide compelling evidence for non-credible presentation. A single error on the FCR CVLT-II or TOMM-2 is highly specific (0.95) to psychometrically defined invalid performance. Defining non-credible responding as below chance level This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
Article
Full-text available
Past studies have examined the ability of the Wisconsin Card Sorting Test (WCST) to discriminate valid from invalid performance in adults using both individual embedded validity indicators (EVIs) and multivariate approaches. This study is designed to investigate whether the two most stable of these indicators—failures to maintain set (FMS) and the logistical regression equation S-BLRE—can be extended to pediatric populations. The classification accuracy for FMS and S-BLRE was examined in a mixed clinical sample of 226 children aged 7 to 17 years (64.6% male, MAge = 13.6 years) against a combination of established performance validity tests (PVTs). The results show that at adult cutoffs, FMS and S-BLRE produce an unacceptably high failure rate (33.2% and 45.6%) and low specificity (.55–.72), but an upward adjustment in cutoffs significantly improves classification accuracy. Defining Pass as <2 and Fail as ≥4 on FMS results in consistently good specificity (.89–.92) but low and variable sensitivity (.00–.33). Similarly, cutting the S-BLRE distribution at 3.68 produces good specificity (.90–.92) but variable sensitivity (.06–.38). Passing or failing FMS or S-BLRE is unrelated to age, gender and IQ. The data from this study suggest that in a pediatric sample, adjusted cutoffs on the FMS and S-BLRE ensure good specificity, but with low or variable sensitivity. Thus, they should not be used in isolation to determine the credibility of a response set. At the same time, they can make valuable contributions to pediatric neuropsychology by providing empirically-supported, expedient and cost-effective indicators to enhance performance validity assessment.
Article
Full-text available
Elevations on certain Conners? CPT-II scales are known to be associated with invalid responding. However, scales and cutoffs vary across studies. In addition, the methodology behind developing performance validity tests (PVTs) has been challenged for mistaking true impairment for noncredible presentation. Using ability-based tests as a PVT makes clinicians especially vulnerable to this criticism. The present study examined the ability of CPT-II to dissociate effort from impairment in 47 adults clinically referred for neuropsychological assessment. CPT-II scales previously identified as PVTs (Omissions, Commissions, Hit Reaction Time SE, Variability, and Perseverations) produced classification accuracies hovering around .50 sensitivity at .90 specificity. The subsample that failed these PVTs performed within normal range on other tests of working memory, processing speed, visual attention, and executive function. Results suggest that the select CPT-II based PVTs are sensitive to invalid responding, and are associated with depression and anxiety, but are unrelated to cognitive functioning.
Article
Full-text available
Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
Article
Full-text available
Objective: The present study was designed to examine the potential of the Boston Naming Test - Short Form (BNT-15) to provide an objective estimate of English proficiency. A secondary goal was to examine the effect of limited English proficiency (LEP) on neuropsychological test performance. Method: A brief battery of neuropsychological tests was administered to 79 bilingual participants (40.5% male, MAge = 26.9, MEducation = 14.2). The majority (n = 56) were English dominant (EN), and the rest were Arabic dominant (AR). The BNT-15 was further reduced to 10 items that best discriminated between EN and AR (BNT-10). Participants were divided into low, intermediate, and high English proficiency subsamples based on BNT-10 scores (≤6, 7-8, and ≥9). Performance across groups was compared on neuropsychological tests with high and low verbal mediation. Results: The BNT-15 and BNT-10 respectively correctly identified 89 and 90% of EN and AR participants. Level of English proficiency had a large effect (partial η(2) = .12-.34; Cohen's d = .67-1.59) on tests with high verbal mediation (animal fluency, sentence comprehension, word reading), but no effect on tests with low verbal mediation (auditory consonant trigrams, clock drawing, digit-symbol substitution). Conclusions: The BNT-15 and BNT-10 can function as indices of English proficiency and predict the deleterious effect of LEP on neuropsychological tests with high verbal mediation. Interpreting low scores on such measures as evidence of impairment in examinees with LEP would likely overestimate deficits.
Article
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
Article
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Article
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
Article
Objective: To determine the effectiveness of the Test of Memory Malingering Trial 1 (TOMM1) as a freestanding Performance Validity Test (PVT) as compared to the full TOMM in a criminal forensic sample. Method: Participants included 119 evaluees in a Midwestern forensic hospital. Criterion groups were formed based on passing/failing scores on other freestanding PVTs. This resulted in three groups: +MND (Malingered Neurocognitive Dysfunction), who failed two or more freestanding PVTs; possible MND (pMND), who failed one freestanding PVT; and -MND, who failed no other freestanding PVTs. All three groups were compared initially, but only +MND and -MND groups were retained for final analyses. TOMM1 performance was compared to standard TOMM performance using Receiver Operating Characteristic (ROC) analyses. Results: TOMM1 was highly predictive of the standard TOMM decision rules (AUC = .92). Overall accuracy rate for TOMM1 predicting failure on 2 PVTs was quite robust as well (AUC = .80), and TOMM1 ≤ 39 provided acceptable diagnostic statistics (Sensitivity = .68, Specificity = .89). These results were essentially no different from the standard TOMM accuracy statistics. In addition, by adjusting for those strongly suspected of being inaccurately placed into the -MND group (e.g. false negatives), TOMM1 diagnostics slightly improved (AUC = .84) at a TOMM1 ≤ 40 (sensitivity = .71, specificity = .94). Conclusions: Results support use of TOMM1 in a criminal forensic setting where accuracy, shorter evaluation times, and more efficient use of resources are often critical in informing legal decision-making.