ArticlePDF Available

They are not destined to fail: a systematic examination of scores on embedded performance validity indicators in patients with intellectual disability

Authors:
  • EvergreenHealth Medical Center

Abstract and Figures

This study was designed to determine the clinical utility of embedded performance validity indicators (EVIs) in adults with intellectual disability (ID) during neuropsychological assessment. Based on previous research, unacceptably high (>16%) base rates of failure (BRFail) were predicted on EVIs using on the method of threshold, but not on EVIs based on alternative detection methods. A comprehensive battery of neuropsychological tests was administered to 23 adults with ID (MAge = 37.7 years, MFSIQ = 64.9). BRFail were computed at two levels of cut-offs for 32 EVIs. Patients produced very high BRFail on 22 EVIs (18.2%-100%), indicating unacceptable levels of false positive errors. However, on the remaining ten EVIs BRFail was <16%. Moreover, six of the EVIs had a zero BRFail, indicating perfect specificity. Consistent with previous research, individuals with ID failed the majority of EVIs at high BRFail. However, they produced BRFail similar to cognitively higher functioning patients on select EVIs based on recognition memory and unusual patterns of performance, suggesting that the high BRFail reported in the literature may reflect instrumentation artefacts. The implications of these findings for clinical and forensic assessment are discussed.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tajf20
Australian Journal of Forensic Sciences
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tajf20
They are not destined to fail: a systematic
examination of scores on embedded performance
validity indicators in patients with intellectual
disability
Isabelle Messa, Matthew Holcomb, Jonathan D Lichtenstein, Brad T Tyson,
Robert M Roth & Laszlo A Erdodi
To cite this article: Isabelle Messa, Matthew Holcomb, Jonathan D Lichtenstein, Brad T Tyson,
Robert M Roth & Laszlo A Erdodi (2022) They are not destined to fail: a systematic examination
of scores on embedded performance validity indicators in patients with intellectual disability,
Australian Journal of Forensic Sciences, 54:5, 664-680, DOI: 10.1080/00450618.2020.1865457
To link to this article: https://doi.org/10.1080/00450618.2020.1865457
Published online: 24 Aug 2021. Submit your article to this journal
Article views: 212 View related articles
View Crossmark data Citing articles: 6 View citing articles
They are not destined to fail: a systematic examination of
scores on embedded performance validity indicators in
patients with intellectual disability
Isabelle Messa
a
, Matthew Holcomb
b
, Jonathan D Lichtenstein
c
, Brad T Tyson
d
,
Robert M Roth
c
and Laszlo A Erdodi
a
a
Department of Psychology, University of Windsor, Windsor, ON, Canada;
b
Jefferson Neurobehavioral Group,
New Orleans, LA, USA;
c
Department of Psychiatry, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA;
d
Neuropsychological Service, EvergreenHealth Medical Center, Kirkland, WA, USA
ABSTRACT
This study was designed to determine the clinical utility of embedded
performance validity indicators (EVIs) in adults with intellectual dis-
ability (ID) during neuropsychological assessment. Based on previous
research, unacceptably high (>16%) base rates of failure (BR
Fail
) were
predicted on EVIs using on the method of threshold, but not on EVIs
based on alternative detection methods. A comprehensive battery of
neuropsychological tests was administered to 23 adults with ID
(M
Age
= 37.7 years, M
FSIQ
= 64.9). BR
Fail
were computed at two levels
of cut-os for 32 EVIs. Patients produced very high BR
Fail
on 22 EVIs
(18.2%-100%), indicating unacceptable levels of false positive errors.
However, on the remaining ten EVIs BR
Fail
was <16%. Moreover, six of
the EVIs had a zero BR
Fail
, indicating perfect specicity. Consistent
with previous research, individuals with ID failed the majority of EVIs
at high BR
Fail
. However, they produced BR
Fail
similar to cognitively
higher functioning patients on select EVIs based on recognition
memory and unusual patterns of performance, suggesting that the
high BR
Fail
reported in the literature may reect instrumentation
artefacts. The implications of these ndings for clinical and forensic
assessment are discussed.
ARTICLE HISTORY
Received 29 October 2020
Accepted 25 November 2020
KEYWORDS
Intellectual disability;
performance validity;
embedded validity
indicators; false positive rate;
forensic assessment
Introduction
The prevalence of Intellectual disability (ID) is dicult to determine due to dierences in
epidemiological and diagnostic practices around the world. However, it is thought to
aect approximately 1% of the global population, with rates in low- and middle- income
countries nearly double those in high-income ones
1
. Calculating the prevalence of ID
within the United States is complicated by the inclusion of mild ID (traditionally dened as
an IQ falling between 50–70) resulting in estimates ranging from 8.7 to 36.8 per 1,000
births
2–4
. The most recent edition of the Diagnostic and Statistical Manual of Mental
Disorders [DSM-5;
5
] denes ID as a disorder that impacts an individual’s general intellec-
tual functioning, produces impairments in several domains of adaptive functioning, and
has its onset in the developmental period.
CONTACT Laszlo A Erdodi lerdodi@gmail.com
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES
2022, VOL. 54, NO. 5, 664–680
https://doi.org/10.1080/00450618.2020.1865457
© 2020 Australian Academy of Forensic Sciences
Historically, an IQ of 70 (i.e., two or more standard deviations below the normative
mean) was a proverbial ‘line in the sand’ for diagnosing ID. In contrast, the DSM-5 allows
for a ‘oating’ criterion, where the individual’s adaptive behavioural functioning is taken
into account. This distinction is important for several reasons. First, a exible cut-o
recognizes the measurement error inherent in the IQ test. Second, the oating criteria
for diagnosing ID have been accepted by the United States Supreme Court (Hall v Florida).
In addition, the spectrum of ID is divided into severity ranges. The present study was
restricted to individuals with mild ID (i.e., IQ between 50 and 70), for several reasons. First,
given the focus on neurocognitive proles, the scope of this investigation is restricted to
‘testable’ patients (i.e., individuals who are capable of completing a comprehensive
battery of neurocognitive tests). Performance validity in examinees who cannot partici-
pate in psychometric testing due to severe sensory-motor decits is a moot point. Second,
mild ID is the most prevalent severity range within the United States (85% of ID cases).
Third, within the paediatric neuropsychological literature there is a lack of consensus
about the developmental abilities needed to understand engagement, eort, and
deception
6,7
. Individuals within the mild range of ID are thought to function at a mental
age between 9 and 12 years of age , which coincides with the lower age limit for
meaningful performance validity testing
8–12
.
Nevertheless, a DSM-5 diagnosis of ID requires both a clinical assessment and standar-
dized intelligence testing. As such, the validity of the diagnosis depends on the integrity
of the psychometric data. To make an accurate diagnosis, clinicians must be condent
that test scores provide a true reection of an individual’s intellectual and adaptive
functioning. The assessment of adaptive functioning incorporates several independent
sources of information such as self- and informant- reports, objective measures of activ-
ities of daily living, the individual’s medical, educational and social history including
current life circumstances. The validity of the FSIQ, however, is dicult to assess through
behavioural observations or clinical judgement
13,14
. As such, assessors typically rely on
psychometric measures to determine the validity of an individual’s performance.
Performance validity is the extent to which scores on cognitive tests accurately reect
an examinee’s true ability level Bigler
15,16,
refers to it as the ‘Achilles heel’ of cognitive
testing, as it exposes a serious weakness in an otherwise robust system. In clinical
neuropsychology, the assessment of performance validity has become a standard com-
ponent of evaluations, and its routine use is recommended by various professional
organizations
17,18–20
and PVT researchers
21–25
. Moreover, best practice guidelines pre-
scribe the use of multiple performance validity tests (PVTs) throughout the assessment,
tapping varying cognitive domains
20
.
By design, there are two types of PVTs: free-standing instruments and embedded
validity indicators (EVIs). Free-standing PVTs were developed specically to assess the
credibility of a given neurocognitive prole, whereas EVIs are nested within standard
neuropsychological tests. As such, EVIs provide a second function without requiring
additional resources, enabling clinicians to simultaneously assess both cognitive function
and credibility of performance. Although free-standing PVTs have long been considered
the gold-standard, and generally possess superior signal detection properties relative to
their embedded counterparts
26
, they are more cumbersome in that they require addi-
tional test material, take up valuable assessment time, and contribute to patient fatigue
while failing to provide information regarding an individual’s cognitive functioning.
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 665
EVIs, on the other hand, are more ecient and cost-eective, as they use data already
being collected for clinical purposes
27
. They are also less susceptible to coaching, as they
are less readily identiable as validity indicators
28
. Though EVIs in isolation tend to have
inferior signal detection relative to PVTs, aggregating several EVIs across the testing
battery not only provides for the inconspicuous measurement of performance validity
at several points, but has also been shown to improve classication accuracy to levels
comparable with standalone measures
23
. A fundamental limitation of EVIs, however, is
their potential to confound invalid performance with genuine decits in cognitive
ability
29
. One context in which this confound may be especially consequential is in the
assessment of ID.
Indeed, a primary concern of performance validity assessment in individuals with ID is
whether the classication accuracy of commonly used PVTs generalizes to this
population
30
. Genuine and severe cognitive decits have been shown to result in unac-
ceptably high false positive rates on many PVTs. Given that ability and eort are especially
intertwined in EVIs, at face value they are increasingly susceptible to this confound. To
further complicate matters, dierentiating bona de and feigned ID is extremely challen-
ging in the presence of external incentive to appear impaired
30,31,32
,aptly described this as
a ‘chicken-egg problem’ (i.e., is a FSIQ <70 an extenuating circumstance for PVT failures or
an invalid result because of them?), leaving clinicians sensitized to a signicant diagnostic
issue, without oering concrete, practical solutions.
Although feigning ID to qualify for disability payments or other social services seems
a plausible scenario
30
, a more extreme and compelling incentive to malinger emerged
at the turn of the century, when the US Supreme Court determined that individuals with
ID should be exempt from the death penalty in the case of Atkins v. Virginia
33
. At the
time, Justice Scalia voiced his dissent, citing concerns that this would incentivize
malingered ID among defendants trying to evade the death penalty
34
. Given the
potentially extreme consequences of a false positive error (erroneously labelling an
individual with genuine ID as a malingerer) in these cases, there has since been
increased emphasis on the development of PVTs that are appropriate for use in deter-
mining the presence or absence of ID
34–36
.
Though some measures have been identied as potentially useful in this
population
34,35
, most of the evidence suggests that the PVTs applied to the assessment
of ID have an unacceptably high [>16%;
37
] false positive rate
36,38–43
. Progress on the issue
of high false positive rates on PVTs among patients with ID is hindered by the common
practice of excluding individuals with FSIQ <75 from cross-validation studies as
a methodological safeguard against contaminating criterion grouping
44–52
.
Despite these discouraging ndings, there may be a psychometric solution to the
‘chicken-egg problem’
30
. Rather than relying on the method of threshold (i.e., identifying
a cut-o on the ability scale below which patients with genuine cognitive impairment
rarely score), assessors could explore PVTs based on alternative detection mechanisms.
On rational grounds, PVTs designed to identify internal inconsistencies and neurologically
implausible patterns of performance may provide an accurate measure of performance
validity even in the presence of credible severe cognitive decits. EVIs could be particu-
larly well-suited to this task, eectively transforming their primary weakness [i.e., that the
measurement of cognitive ability and performance validity is inextricably intertwined
within a task) into a strength. For example Erdodi et al.
53
, found that although two EVIs
666 I. MESSA ET AL.
nested within measures of visuomotor processing speed had adequate specicity overall,
they were associated with unacceptably high false positive rates in patients with severe
traumatic brain injury. However, the absolute value of the dierence between the two
scores maintained high specicity. Their ndings were subsequently replicated in
a forensic sample
54,55–59
, reinforcing the potential of derivative EVIs to detect non-
credible responding even in examinees with genuine and severe cognitive impairment.
The current study was designed to systematically evaluate a large number of EVIs
representing a variety of cognitive domains, diculty levels, and detection mechanisms
to determine their suitability for clinical and forensic use in patients with ID. Based on
previous research, we predicted a high overall failure rate (>50%). At the same time, we
hypothesized that patients would pass EVIs designed to detect neurologically implausible
patterns of performance, because these indices monitor the internal consistency of test
taking behaviour, and do not penalize examinees for genuine impairment.
Method
Participants
A consecutive case sequence of 23 adults with ID was selected form an archival data set of
patients referred for neuropsychological testing at a tertiary care hospital in the
Northeastern US. Inclusion criteria were: 1. FSIQ <75 (to account for the standard error
of measurement around the point estimate of 70); 2. A documented history of develop-
mental delay and decits in adaptive functioning; and 3. A full administration of the
California Verbal Learning Test Second Edition (CVLT-II). Having data on the CVLT-II (an
extensive measure of verbal memory) ensured that patients were able to complete
a comprehensive battery of tests and thus, provide the opportunity to examine their
full EVI prole. The majority of the sample was male (12 or 55%) and right-handed (15 or
68%). Mean age was 37.7 years (SD = 14.1). Mean FSIQ was 64.9 (SD = 5.1; range: 57–74).
Materials and procedure
A comprehensive battery of neuropsychological tests was administered to all patients,
encompassing ve core domains: attention/processing speed, memory, language, execu-
tive function and manual dexterity. Psychometric testing was administrated and scored
by Masters or doctoral level psychometrists working under the supervision of licenced
clinical neuropsychologists. The study was approved by the hospital’s research ethics
board. APA ethical guidelines regulating research with human participants were followed
throughout the process.
Data analysis
Given the scope of the study, base rate of failure (BR
Fail
) was the main outcome of interest.
BR
Fail
is a descriptive statistic, representing the percentage of people in the sample who
failed a given cut-o. As ID is considered an exempt category from PVTs, the performance of
the entire sample is considered credible on neuropsychological testing. As such, BR
Fail
is
conceptually equivalent to false positive rate (i.e., 100 – specicity). In a clinical and forensic
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 667
setting, the highest acceptable level of false positive rate is 16%
37
, although ≤10% is the
emerging threshold.
Where possible, BR
Fail
was calculated at two dierent levels: liberal and conservative.
Liberal cut-os prioritize sensitivity over specicity. Consequently, they are more likely to
detect non-credible responding, but are prone to false positive errors. In contrast, con-
servative cut-os prioritize specicity over sensitivity. As such, they are less likely to detect
non-credible responding, but when they do, they are more likely to be correct. To help
visually identify EVIs and cut-os that met specicity standards, BR
Fail
≤16% were shaded in
the data tables. Instances of zero BR
Fail
(i.e., perfect specicity) were also marked in boldface.
Results
BR
Fail
on EVIs within tests of attention and processing speed
The majority of the sample (mean BR
Fail
= 75.7%; SD = 30.3) failed the liberal cut-os, with
one notable exception: the absolute value of the dierence score between age-corrected
scaled scores on the Coding and Symbol Search subtests on the WAIS-IV, which had a BR
Fail
of zero. The logic behind this EVI is that the normative dierence between the scores on
these two tests measuring similar constructs (i.e., visuomotor speed) is small
60
. A large
discrepancy reveals that the lower score of the pair likely underestimates the examinee’s
true ability level. The present results suggest that patients demonstrated an even perfor-
mance across both tests, producing a valid prole. At conservative cut-os, BR
Fail
declined
(M = 51.3%; SD = 24.1), but remained much above the lowest acceptable threshold (42.9%-
85.7%). Further details are provided in Table 1.
BR
Fail
on EVIs within memory tests
Across nine EVIs at liberal cut-os mean BR
Fail
was 42.7% (SD = 28.2). Minimum acceptable
specicity was achieved on the CVLT-II Forced Choice Recognition (FCR), RCFT Yes/No
Recognition and True Positives. Predictably, mean BR
Fail
at conservative cut-os was
Table 1. EVIs within tests of attention and processing speed.
LIB CON
EVI Cut-Off BR
Fail
Cut-Off BR
Fail
References
RDS ≤7 86.4 ≤6 50.0 61,62,63,53
DS
WAIS-IV
≤6 100.0 ≤4 42.9 29,63
CD
WAIS-IV
≤5 93.3 ≤4 85.7 65,66,54,67
SS
WAIS-IV
≤6 94.7 ≤4 68.4 68,69,69
|CD – SS| ≥3 0.0 - - 70,71
Trails
D-KEFS
1 ≤5 64.3 ≤4 42.9 127,72
Trails
D-KEFS
2 ≤5 78.6 ≤4 64.3 72,73
Trails
D-KEFS
3 ≤5 85.7 ≤4 64.3 74,75
Trails
D-KEFS
5 ≤8 78.6 ≤5 42.9 76,77
EVI: Embedded validity indicator; RDS: Reliable Digit Span; DS: Digit Span age-corrected scaled score (ACSS; M = 10;
SD = 3]; WAIS-IV: Wechsler Adult Intelligence Scale – Fourth Edition; CD: Coding ACSS; SS: Symbol Search ACSS; Trails
D-KEFS
: Trails of the Delis-Kaplan Executive Function Systems (ACSS; M = 10; SD = 3); LIB: Liberal (optimized for
sensitivity); CON: Conservative (optimized for specificity); BR
Fail
: Base rate of failure (% of the sample classified as non-
credible at a given cut-off).
668 I. MESSA ET AL.
notably lower (19.1%; SD = 19.5). More important, only 9.1% of the sample failed the
FCR
CVLT-II
cut-o, achieving >.90 specicity. Remarkably, no one failed the RCFT recogni-
tion cut-os (Table 2). Likewise, all three trials of Logical Memory (Immediate and Delayed
Free Recall; Yes/No Recognition) produced BR
Fail
<16%. However, BR
Fail
remained high on
the validity cut-os embedded within the CVLT-II acquisition trials (63.6%) and the RCFT
copy trial (33.3%).
BR
Fail
on EVIs within language tests
At liberal cut-os, BR
Fail
on all ve EVIs within language tests were above the minimum
acceptable threshold (mean = 58.1%; SD = 26.8). However, at conservative cut-os, mean
BR
Fail
was lower (38.4%; SD = 26.4). In addition, on one derivative EVI based on a dis-
crepancy score (Vocabulary minus Digit Span) had a zero BR
Fail
even at the most liberal
cut-o available (Table 3).
BR
Fail
on EVIs within tests of executive function
Mean BR
Fail
was 45.1% (SD = 35.8) at liberal cut-os. One derivative EVI based on the ratio
between two raw scores achieved .90 specicity. At conservative cut-os BR
Fail
was slightly
lower (M = 34.1%; SD = 34.6). Although two of the EVIs had a zero BR
Fail
, the remaining three
had unacceptably high failure rates (35.7–78.6%). Table 4 provides additional details.
BR
Fail
on EVIs within tests of manual dexterity
On motor measures, BR
Fail
remained high across levels of cut-o (liberal: M = 68.8%,
SD = 9.7; conservative: M = 45.1%, SD = 18.2), tests (GPB and FTT) and whether the
dominant or non-dominant hand was used (Table 5).
Table 2. EVIs within memory tests.
LIB CON
EVI Cut-Off BR
Fail
Cut-Off BR
Fail
References
T 1–5
CVLT-II
≤37 72.7 ≤31 63.6 78,79
RH
CVLT-II
≤11 27.3 ≤10 18.2 80,81
FCR
CVLT-II
≤15 13.6 ≤14 9.1 82,83,84,85,86
RCFT Copy ≤26 88.9 ≤23 33.3 87,127,129
RCFT REC ≤16 11.1 ≤13 0.0 88
RCFT TP ≤6 12.5 ≤4 0.0 88,89,130
LM I ≤3 47.4 ≤1 15.8 90
LM II ≤4 57.9 ≤2 15.8 91
LM REC ≤20 52.6 ≤16 15.8 76,92
EVI: Embedded validity indicator; T 1–5
CVLT-II
: Acquisition trials (sum of scores on Trials 1 through 5) on the California
Verbal Learning Test – Second Edition (raw scores); RH
CVLT-II
: Yes/No Recognition hits on the California Verbal Learning
Test – Second Edition (raw scores); FCR
CVLT-II
: Forced Choice Recognition trial on the California Verbal Learning Test –
Second Edition (raw scores); RCFT: Rey Complex Figure Test (raw scores); REC: Yes/No Recognition (raw scores); TP: Yes/
No Recognition true positives (raw scores); LM: Logical Memory; I: Immediate Recall age-corrected scaled score (ACSS;
M = 10; SD = 3); II: Delayed Recall age-corrected scaled score (ACSS; M = 10; SD = 3); REC: Recognition (raw scores); LIB:
Liberal (optimized for sensitivity); CON: Conservative (optimized for specificity); BR
Fail
: Base rate of failure (% of the
sample classified as non-credible at a given cut-off).
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 669
Discussion
This study was designed to empirically evaluate the doctrine within clinical neuropsychol-
ogy that patients with ID should be exempt from performance validity testing because of
the genuine and severe cognitive impairment inherent in the diagnosis. We predicted
that on EVIs nested within tests that are globally sensitive to diuse neurological decits,
BR
Fail
would be high well above the acceptable threshold. However, on derivative
Table 3. EVIs within language tests.
LIB CON
EVI Cut-Off BR
Fail
Cut-Off BR
Fail
References
Animals ≤33 75.0 ≤29 56.3 93,94
BNT ≤37 78.9 ≤33 47.4 62,63,94,95
CIM
BDAE
≤9 41.2 ≤8 23.5 69,96,97,98
FAS ≤33 76.5 ≤29 64.7 69,99,100
VC – DS
WAIS-IV
≥3 19.0 ≥5 0.0 101,102,103,131
EVI: Embedded validity indicator; Animals: Category fluency test T-scores (M = 50; SD = 10] based on norms by
Heaton et al. (2004
134
); BNT: Boston Naming Test T-scores (M = 50; SD = 10) based on norms by Heaton et al.
(2004
134
); CIM
BDAE
: Complex Ideational Material subtest of the Boston Diagnostic Aphasia Battery (raw scores);
FAS: Letter fluency test T-scores (M = 50; SD = 10) based on norms by Heaton et al. (2004
134
); VC DS:
Vocabulary minus Digit Span age-corrected scaled score (M = 10; SD = 3); WAIS-IV: Wechsler Adult Intelligence
Scale Fourth Edition; LIB: Liberal (optimized for sensitivity); CON: Conservative (optimized for specificity);
BR
Fail
: Base rate of failure (% of the sample classified as non-credible at a given cut-off).
Table 4. EVIs within tests of executive function.
LIB CON
EVI Cut-Off BR
Fail
Cut-Off BR
Fail
References
FMS
WCST
≥2 15.4 ≥3 0.0 64,104,105,106
UE
WCST
≥1 66.7 ≥4 41.7 64,65,107
PMM
WCST
- - ≥1 55.6 108
Trails
D-KEFS
4 ≤4 84.6 ≤1 76.9 109
Trails
D-KEFS
4/2 ≤1.60 15.4 ≤1.50 0.0 110,111,112,113,114
EVI: Embedded validity indicator; FMS
WCST
: Failures to maintain set on the Wisconsin Card Sorting Test (raw score];
UE
WCST
: Unique errors (‘Other’ responses) on the Wisconsin Card Sorting Test (raw score); PMM
WCST
: Perfect
matches missed on the Wisconsin Card Sorting Test (raw score); Trails
D-KEFS
4: Trails 4 of the Delis-Kaplan Executive
Function Systems (ACSS; M = 10; SD = 3); Trails
D-KEFS
4/2: Raw score ratio of Trails 4/Trails 2 of the Delis-Kaplan
Executive Function Systems; LIB: Liberal (optimized for sensitivity); CON: Conservative (optimized for specificity);
BR
Fail
: Base rate of failure (% of the sample classified as non-credible at a given cut-off).
Table 5. EVIs within tests of manual dexterity.
LIB CON
EVI Cut-Off BR
Fail
Cut-Off BR
Fail
References
GPB DH ≤31 73.3 ≤27 66.7 115
GPB ND ≤31 80.0 ≤27 53.3 116
FTT DH ≤33 58.3 ≤25 33.0 117
FTT ND ≤33 63.6 ≤25 27.3 117
EVI: Embedded validity indicator; GPB: Grooved Pegboard Test T-scores (M = 50; SD = 10] based on norms
by Heaton et al. (2004
134
); DH: Dominant hand; ND: Non-dominant hand; FTT: Finger Tapping Test
T-scores (M = 50; SD = 10) based on norms by Heaton et al. (2004
134
); LIB: Liberal (optimized for
sensitivity); CON: Conservative (optimized for specificity); BR
Fail
: Base rate of failure (% of the sample
classified as non-credible at a given cut-off).
670 I. MESSA ET AL.
indices designed to measure neurologically implausible uctuations in cognitive ability,
we expected to nd signicantly lower BR
Fail
.
Results support both hypotheses. Overall BR
Fail
was unacceptably high across all ve
neurocognitive domains, consistent with previous research
43
. Although predictably, con-
servative cut-os were more ecient at containing BR
Fail
overall (19.1%-51.3%) compared
to liberal cut-os (42.7%-75.7%), they failed to provide a uniformly eective safeguard
against false positive errors.
In contrast, on ten out of 32 EVIs examined in this study BR
Fail
were below 16%. Among
these, for six EVIs there was a zero false positive rate, converging with the promising results
of earlier investigations
35
. This is an important nding, as zero BR
Fail
is rare even in
cognitively high functioning samples. As a reference
118
, reported BR
Fail
on EVIs within
Digit Span and verbal uency ranging from 5.8% to 10.9% in healthy university students
who volunteered for academic research and were instructed to perform to the best of their
abilities. Their ndings were subsequently replicated by Abeare et al. (2019
55
) and
119
.
Therefore, having identied several EVIs with perfect specicity in patients with ID is
a remarkable (although not unprecedented) nding.
Clinical implications
Detection methods matter
Feigning symptoms and disability is a ubiquitous phenomenon in the general
population
28,120
and relatively common in clinical settings
121,122
. The most common
approach to dierentiating credible from non-credible responding is the method of
threshold: identifying a cut-o score that most (≥90%) individuals with genuine impair-
ment can pass. However, the cumulative evidence (including the present study) suggests
that cognitive impairment in patients with ID is so severe and pervasive that traditional
PVTs routinely misclassify them as non-credible
41,43
.
Some EVIs remain effective in patients with ID
Our ndings revealed that alternative detection methods based on patterns of perfor-
mance provide a promising psychometric solution to the dilemma, consistent with pre-
vious research
35,54
. Three of the ve EVIs with zero BR
Fail
are derivative indices: they
combine information from two test scores, and evaluate the credibility of the prole
based on the relationship between them, not absolute performance on individual tasks. As
such, patients with ID who demonstrate their true ability on testing will produce con-
sistently low scores and thus, easily pass the validity cut-os. In contrast, individuals
attempting to feign low cognitive ability in general or ID specically, may be prone to
‘slip-ups’ (i.e., accidentally performing well on one of the tests and thus, failing the
derivative EVI).
Another detection method for PVTs is the violation of the inherent diculty gradient in
cognitive tasks (i.e., doing well on dicult tests, while doing poorly on easy ones). An
example of this would be higher performance on the genuinely dicult acquisition trials
and delayed free recall during a memory test compared to the signicantly easier
recognition testing. Results show that as a group, the patients within the sample demon-
strated the normative pattern of performance (high BR
Fail
on EVIs based on acquisition
trials, with a marked improvement on recognition trials), indicating credible responding.
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 671
Overall, results suggest that a handful of EVIs continue to be useful in patients with ID, as
they protect against false positive errors. Equally important, ndings consolidate the
signal detection prole of a number of EVIs, providing another source of empirical
evidence for their specicity and refuting reexive claims that failing a validity cut-o is
likely a false positive error even in cognitively high functioning examinees.
Forensic implications
In the presence of strong external incentives to appear impaired
5
it is incumbent on the
assessor to verify the veracity of test scores suggesting extremely low ability
17,18,19,20,22,25
,
as failing to detect non-credible decits comes at a high societal cost
123,124
. In the context
of criminal justice, establishing or ruling out ID can play a central role in high-stake legal
arguments such as competency to stand trial or carrying out the death penalty
35,36,125
.
Given the ‘exempt status’ from performance validity assessment in ID, assessors are faced
with the ‘chicken-egg problem’
30
: are PVT failures the natural consequence of an ID
diagnosis, or evidence that the examinee is feigning the condition?
The present study provides a list of PVTs with perfect specicity in patients with bona
de ID. Consequently, if examinees with external incentive to feign ID and insucient
evidence of a developmental history of adaptive decits fail these EVIs, assessors can
make the case for non-credible presentation with reasonable certainty. Naturally, the
evidence presented in this study is far from being conclusive. However, it provides
forensic experts with a compelling example that not all PVTs are contaminated by the
genuine and severe cognitive decits characteristic of ID. Therefore, a mere claim of ID
would not automatically render a carefully calibrated PVT arsenal null and void.
Limitations
The ndings should be interpreted in the light of the study’s limitations. The most
obvious one is the small sample size, although it is actually higher than some previously
published studies
40
. To some extent, the small sample is an inevitable trade-o for
having data on a comprehensive battery of commonly used neuropsychological tests. In
contrast, many assessors evaluating patients with ID administer abbreviated batteries
with lower cognitive load, and rely on informant report of their adaptive functioning.
Second, this is a relatively high functioning sample (reading level was broadly within
normal range), which is likely an artefact of a full CVLT-II administration being one of the
inclusion criteria. While this was necessary condition to obtain a sample with sucient
data to answer the main research question, it also means that the study was restricted to
‘testable’ patients (i.e., with high enough cognitive functioning and mental stamina to
complete several hours’ worth of psychometric testing). Thus, results may not generalize
to patients with signicantly lower overall cognitive functioning. Finally, BR
Fail
was
reported only on individual EVIs. While this is a necessary step in early stages of the
investigation, combining the evidence using composite measures of performance
validity has been shown to improve classication accuracy in general
26,126,127
and in
examinees with ID specically
42
. Future investigations would benet from exploring
multivariate models of EVIs.
672 I. MESSA ET AL.
Strengths
The study also has several strengths. Patients were administered a comprehensive battery
of neuropsychological tests that contained 32 EVIs, providing an opportunity to simulta-
neously examine, compare and contrast a large number of validity indices tapping various
cognitive domains and using dierent detection mechanisms. The present investigation
extended and consolidated previous sporadic reports that carefully chosen PVTs can
maintain appropriate levels of specicity. As such, it provides clinical and forensic asses-
sors with actionable practical knowledge on test selection and interpretation.
Conclusions
Assessors should continue to exercise great caution when they encounter EVI failures in
patients with ID, as this population is indeed vulnerable to high false positive rates. At the
same time, assuming that all examinees with ID will fail all EVIs by virtue of their diagnosis
is an overgeneralization that is inconsistent with empirical evidence. Rather, the present
study identied ve EVIs with a zero false positive rate – lower than what was observed in
higher functioning samples in previous research (Abeare et al., 2019
55
)
128,129,130,131
. The fact
that a minority of EVIs performed surprisingly well underscores the importance of psycho-
metric diversity (in cognitive domain, diculty level, detection mechanism) in PVT devel-
opment and deployment, a frequently echoed conclusion of the research literature
21,23,132
.
Although our results suggest that the unacceptably high BR
Fail
on most EVIs cannot be
contained by simply applying more conservative cut-os, we uncovered a number of EVIs
that demonstrated high specicity even at commonly used cut-os. If replicated by future
investigations, these instruments can provide eective psychometric tools for monitoring
the credibility of neurocognitive proles in this patient population. Since non-credible
responding can coexist with any type of neurological disorder, it is important to have EVIs
that can carry out their mission even in the unusually challenging signal detection
environment created by the presence of genuine and severe cognitive decits.
Moreover, they may be used to detect feigned ID in forensic settings, lling an existing
void in the current assessment methods
36,38,43,125,133
.
Acknowledgments
Relevant ethical guidelines were followed throughout the project. All data collection, storage and
processing was done with the approval of relevant institutional authorities regulating research
involving human participants, in compliance with the 1964 Helsinki Declaration and its subsequent
amendments or comparable ethical standards.
Disclosure statement
No potential conict of interest was reported by the author(s).
ORCID
Brad T Tyson http://orcid.org/0000-0002-5113-790X
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 673
Laszlo A Erdodi http://orcid.org/0000-0003-0575-9991
References
1. Maulik PK, Mascarenhas MN, Mathers CD, Dua T, Saxena S. Prevalence of intellectual disability:
a meta-analysis of population-based studies. Res Dev Disabil. 2011;32(2):419–436.
2. Bhasin TK, Brocksen S, Avchen RN, Braun KVN. Prevalence of four developmental disabilities
among children aged 8 years: metropolitan Atlanta developmental disabilities surveillance
program, 1996 and 2000. Morbidity Mortality Week Rep. 2006;55(SS01):1–9.
3. Boyle CA, Lary JM. Prevalence of selected developmental disabilities in children 3-10 years of
age: the Metropolitan Atlanta developmental disabilities surveillance program, 1991.
Morbidity Mortality Week Rep. 1996;45(SS02):1–14.
4. Camp BW, Broman SH, Nichols PL, Le M. Maternal and neonatal risk factors for mental
retardation: dening the “at-risk” child. Early Hum Dev. 1998;50(2):159–173.
5. American Psychiatric Association. Diagnostic and statistical manual of mental disorders (5th
ed.). Washington (DC): Author; 2013.
6. Kirkwood MW, Kirk JW. The base rate of suboptimal eort in a pediatric mild TBI sample:
performance on the medical symptom validity test. Clin Neuropsychol. 2010;24
(5):860–872.
7. Kirkwood MW, Kirk JW, Blaha RZ, Wilson P. Noncredible eort during pediatric neuropsycho-
logical exam: a case series and literature review. Child Neuropsychol. 2010;16(6):604–618.
8. Abeare CA, Messa I, Zuccato BG, Merker B, Erdodi LA. Prevalence of invalid performance on
baseline testing for sport-related concussion by age and validity indicator. JAMA Neurol.
2018;75(6):697–703. doi:10.1001/jamaneurol.2018.0031.
9. Blaskewitz N, Merten T, Kathmann N. Performance of children on symptom validity tests:
TOMM, MSVT, and FIT. Arch Clin Neuropsychol. 2008;23(4):379–391.
10. Constantinou M, McCarey RJ. Using the TOMM for evaluating children’s eort to perform
optimally on neuropsychological measures. Child Neuropsychol. 2003;9(2):81–90.
11. Holcomb MJ. Pediatric performance validity testing: state of the eld and current research.
J Pedia Neuropsychol. 2018;4:83–85. doi:10.1007/s40817-018-00062-y.
12. Lichtenstein JD, Erdodi LA, Linnea KS. Introducing a forced-choice recognition task to the
California verbal learning test – children’s version. Child Neuropsychol. 2017;23(3):284–299.
doi:10.1080/09297049.2015.1135422.
13. Dandachi-FitzGerald B, Ponds RW, Merten T. Symptom validity and neuropsychological
assessment: a survey of practices and beliefs of neuropsychologists in six European
countries. Arch Clin Neuropsychol. 2013;28(8):771–783.
14. Heaton RK, Smith HH, Lehman RAW, Vogt AT. Prospects for faking believable decits on
neuropsychological testing. J Consult Clin Psychol. 1978;46(5):892–900. doi:10.1037/0022-
006X.46.5.892.
15. Lezak MD, Howieson DB, Bigler ED, Tranel D. Neuropsychological assessment. New York:
Oxford University Press; 2012.
16. Bigler ED. Neuroimaging as a biomarker in symptom validity and performance validity
testing. Brain Imaging Behav. 2015;9(3):421–444. doi:10.1007/s11682-015-9409-1.
17. Bush SS, Heilbronner RL, Ru RM. Psychological assessment of symptom and performance
validity, response bias, and malingering: ocial position of the association for scientic
advancement in psychological injury and law. Psychol Inj Law. 2014;7(3):197–205.
18. Bush SS, Ru RM, Troster AI, Barth JT, Koer SP, Pliskin NH, . . . Silver CH. Symptom validity
assessment: practice issues and medical necessity (NAN Policy and Planning Committees).
Arch Clin Neuropsychol. 2005;20:419–426.
19. Chafetz MD, Williams MA, Ben-Porath YS, Bianchini KJ, Boone KB, Kirkwood MW, Larrabee GJ,
Ord JS. Ocial position of the American academy of clinical neuropsychology social security
administration policy on validity testing: guidance and recommendations for change. Clin
Neuropsychol. 2015;29(6):723–740.
674 I. MESSA ET AL.
20. Heilbronner RL, Sweet JJ, Morgan JE, Larrabee GJ, Millis SR, Participants C. American academy of
clinical neuropsychology consensus conference statement on the neuropsychological assess-
ment of eort, response bias, and malingering. Clin Neuropsychol. 2009;23:1093–1129.
doi:10.1080/13854040903155063.
21. Boone KB. The need for continuous and comprehensive sampling of eort/response bias during
neuropsychological examination. Clin Neuropsychol. 2009;23(4):729–741. doi:10.1080/
13854040802427803.
22. Chafetz M. Reducing the probability of false positives in malingering detection of social
security disability claimants. Clin Neuropsychol. 2011;25(7):1239–1252. doi:10.1080/
13854046.2011.586785.
23. Larrabee GJ. Assessment of malingering. In: Larrabee GJ, editor. Forensic neuropsychology:
a scientic approach. Second ed. New York: Oxford University Press; 2012. p. 116–159.
24. Millis SR. What clinicians really need to know about symptom exaggeration, insucient eort,
and malingering: statistical and measurement matters. In: Morgan JE, Sweet JJ, editors.
American academy of clinical neuropsychology/psychology press continuing education ser-
ies. Neuropsychology of malingering casebook. Psychology Press; 2009. p. 21–37.
25. Schutte C, Axelrod BN, Montoya E. Making sure neuropsychological data are meaningful: use
of performance validity testing in medicolegal and clinical contexts. Psychol Inj Law. 2015;8
(2):100–105.
26. Larrabee GJ. Aggregation across multiple indicators improves the detection of malingering:
relationship to likelihood ratios. Clin Neuropsychol. 2008;22:410–425. doi:10.1080/
13854040701494987.
27. Miele AS, Gunner JH, Lynch JK, McCarey RJ. Are embedded validity indices equivalent to
free-standing symptom validity tests? Arch Clin Neuropsychol. 2012;27(1):10–22.
28. Boone KB. Clinical practice of forensic neuropsychology. New York (NY): Guilford; 2013.
29. Erdodi LA, Lichtenstein JD. Invalid before impaired: an emerging paradox of embedded validity
indicators. Clin Neuropsychol. 2017;31(6–7):1029–1046. doi:10.1080/13854046.2017.1323119.
30. Chafetz MD. Symptom validity issues in the psychological consultative examination for social
security disability. Clin Neuropsychol. 2010;24(6):1045–1063.
31. Chafetz MD, Abrahams JP, Kohlmaier J. Malingering on the Social Security disability con-
sultative exam: a new rating scale. Arch Clin Neuropsychol. 2007;22(1):1–14.
32. Chafetz MD, Prentkowski E, Rao A. To work or not to work: motivation (not low IQ) determines
symptom validity test ndings. Arch Clin Neuropsychol. 2011;26(4):306–313.
33. Atkins v. Virginia, 536 U.S. 304 (2002).
34. Chafetz MD, Biondolillo A. Validity issues in Atkins death cases. Clin Neuropsychol. 2012;26
(8):1358–1376.
35. Barker A, Musso MW, Jones GN, Roid G, Gouvier D. Unreliable block span reveals simulated
intellectual disability on the stanford-binet intelligence scales. Appl Neuropsychol Adult.
2014;21(1):51–59.
36. Salekin KL, Doane BM. Malingering intellectual disability: the value of available measures and
methods. Appl Neuropsychol. 2009;16(2):105–113.
37. Larrabee GJ. Detection of malingering using atypical performance patterns on standard
neuropsychological tests. Clin Neuropsychol. 2003;17(3):410–425. doi:10.1076/
clin.17.3.410.18089.
38. Graue LO, Berry DT, Clark JA, Sollman MJ, Cardi M, Hopkins J, Werline D. Identication of
feigned mental retardation using the new generation of malingering detection instruments:
preliminary ndings. Clin Neuropsychol. 2007;21(6):929–942.
39. Hurley KE, Deal WP. Assessment instruments measuring malingering used with individuals
who have mental retardation: potential problems and issues. Ment Retard. 2006;44
(2):112–119.
40. Love CM, Glassmire DM, Zanolini SJ, Wolf A. Specicity and false positive rates of the test of
memory malingering, rey 15-item test, and rey word recognition test among forensic inpa-
tients with intellectual disabilities. Assessment. 2014;21(5):618–627.
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 675
41. Marshall P, Happe M. The performance of individuals with mental retardation on cognitive
tests assessing eort and motivation. Clin Neuropsychol. 2007;21(5):826–840.
42. Smith K, Boone K, Victor T, Miora D, Cottingham M, Ziegler E, Zeller M, Wright M. Comparison
of credible patients of very low intelligence and non-credible patients on neurocognitive
performance validity indicators. Clin Neuropsychol. 2014;28(6):1048–1070.
43. Victor TL, Boone KB. Identication of feigned mental retardation. In: Boone KB, editor
Assessment of feigned cognitive impairment: a neuropsychological perspective. New York
(NY): The Guilford Press; 2007. p. 310–345.
44. Arentsen TJ, Boone KB, Lo TT, Goldberg HE, Cottingham ME, Victor TL, . . . Zeller MA.
Eectiveness of the Comalli Stroop Test as a measure of negative response bias. Clin
Neuropsychol. 2013;27(6):1060–1076.
45. Bell-Sprinkel TL, Boone KB, Miora D, Cottingham M, Victor T, Ziegler E, Zeller M, Wright M. Re-
examination of the rey word recognition test. Clin Neuropsychol. 2013;27(3):516–527.
46. Boone KB, Salazar X, Lu P, Warner-Chacon K, Razani J. The Rey 15-item recognition trial:
a technique to enhance sensitivity of the Rey 15-item memorization test. J Clin Exp
Neuropsychol. 2002;24(5):561–573. doi:10.1076/jcen.24.5.561.1004.
47. Lichtenstein JD, Erdodi LA, Rai JK, Mazur-Mosiewicz A, Flaro L. Wisconsin card sorting test
embedded validity indicators developed for adults can be extended to children. Child
Neuropsychol. 2018;24(2):247–260. doi:10.1080/09297049.2016.1259402.
48. Lichtenstein JD, Greenacre MK, Cutler L, Abeare K, Baker SD, Kent K, Ali J, Erdodi LA. Geographic
variation and instrumentation artifacts: in search of confounds in performance validity assessment
in adults with mild TBI. Psychol Inj Law. 2019;12(2):127–145. doi:10.1007/s12207-019-0935.
49. Marshall P, Schroeder R, O’Brien J, Fischer R, Ries A, Blesi B, Barker J. Eectiveness of symptom
validity measures in identifying cognitive and behavioral symptom exaggeration in adult
attention decit hyperactivity disorder. Clin Neuropsychol. 2010;24:1204–1237. doi:10.1080/
13854046.2010.514290.
50. Morse CL, Douglas-Newman K, Mandel S, Swirsky-Sacchetti T. Utility of the Rey-15 recognition
trial to detect invalid performance in a forensic neuropsychological sample. Clin
Neuropsychol. 2013;1–13. doi:10.1080/13854046.2013.832385
51. Nitch S, Boone KB, Wen J, Arnold G, Alfano K. The utility of the Rey word recognition test in
the detection of suspect eort. Clin Neuropsychol. 2006;20(4):873–887.
52. Poynter K, Boone KB, Ermshar A, Miora D, Cottingham M, Victor TL, Ziegler E, Zeller MA,
Wright M. Wait, there’s a baby in this bath water! Update on quantitative and qualitative cut-
os for Rey 15-item recall and recognition. Arch Clin Neuropsychol. 2019;34(8):1367–1380.
doi:10.1093/arclin/acy087.
53. Erdodi LA, Abeare CA, Lichtenstein JD, Tyson BT, Kucharski B, Zuccato BG, Roth RM. WAIS-IV
processing speed scores as measures of non-credible responding the third generation of
embedded performance validity indicators. Psychol Assess. 2017;29(2):148–157. doi:10.1037/
pas0000319.
54. Glassmire DM, Wood ME, Ta MT, Kinney DI, Nitch SR. Examining false-positive rates of
Wechsler Adult Intelligence Scale (WAIS-IV) processing speed based embedded validity
indicators among individuals with schizophrenia spectrum disorders. Psychol Assess.
2019;31(1):120–125. doi:10.1037/pas0000650.
55. Abeare C, Messa I, Whiteld C, Zuccato B, Casey J, Rykulski N, Erdodi L. Performance validity in
collegiate football athletes at baseline neurocognitive testing. J Head Trauma Rehabil.
2019;34(4):E20–E31.
56. Chafetz MD, Biondolillo AM. Feigning a severe impairment prole. Arch Clin Neuropsychol.
2013;28(3):205–212.
57. Erdodi LA, Pelletier CL, Roth RM. Elevations on select Conners’ CPT-II scales indicate non-
credible responding in adults with traumatic brain injury. Appl Neuropsychol Adult. 2018;25
(1):19–28. doi:10.1080/23279095.2016.1232262.
58. Erdodi LA, Tyson BT, Shahein A, Lichtenstein JD, Abeare CA, Pelletier CL, Zuccato BG,
Kucharski B, Roth RM. The power of timing: adding a time-to-completion cuto to the
676 I. MESSA ET AL.
word choice test and recognition memory test improves classication accuracy. J Clin Exp
Neuropsychol. 2017;39(4):369–383. doi:10.1080/13803395.2016.1230181.
59. Greve KW, Curtis KL, Bianchini KJ, Ord JS. Are the original and second edition of the California
Verbal Learning Test equally accurate in detecting malingering? Assessment. 2009;16
(3):237–248.
60. Wechsler D. Wechsler Adult Intelligence Test fourth Edition [WAIS-IV). San Antonio (TX):
Pearson; 2008.
61. Babikian T, Boone KB, Lu P, Arnold G. Sensitivity and specicity of various digit span scores in
the detection of suspect eort. Clin Neuropsychol. 2006;20(1):145–159.
62. Erdodi LA. Aggregating validity indicators: the salience of domain specicity and the inde-
terminate range in multivariate models of performance validity assessment. Appl
Neuropsychol Adult. 2019;26(2):155–172. doi:10.1080/23279095.2017.1384925.
63. Erdodi LA, Abeare CA. Stronger together: the Wechsler adult intelligence scale fourth
Edition as a multivariate performance validity test in patients with traumatic brain injury.
Arch Clin Neuropsychol. 2020;35(2):188–204. doi:10.1093/arclin/acz032/5613200.
64. Erdodi LA, Hurtubise JL, Charron C, Dunn A, Enache A, McDermott A, Hirst R. The D-KEFS Trails
as performance validity tests. Psychol Assess. 2018;30(8):1081–1095.
65. Erdodi LA, Lichtenstein JD. Information processing speed tests as PVTs. In: Boone KB, editor.
Assessment of feigned cognitive impairment. A neuropsychological perspective. New York (NY):
Guilford; 2020. p. 218–247.
66. Etherton JL, Bianchini KJ, Heinly MT, Greve KW. Pain, malingering, and performance on the
WAIS-III processing speed index. J Clin Exp Neuropsychol. 2006;28(7):1218–1237. doi:10.1080/
13803390500346595.
67. Greienstein MF, Baker WJ, Gola T. Validation of malingered amnesia measures with a large
clinical sample. Psychol Assess. 1994;6(3):218–224. https://doi.org/10.1037/1040-3590.6.3.218
68. Heinly MT, Greve KW, Bianchini K, Love JM, Brennan A. WAIS Digit-Span-based indicators of
malingered neurocognitive dysfunction: classication accuracy in traumatic brain injury.
Assessment. 2005;12(4):429–444.
69. Hurtubise J, Baher T, Messa I, Cutler L, Shahein A, Hastings M, Carignan-Querqui M, Erdodi L.
Verbal uency and digit span variables as performance validity indicators in experimentally
induced malingering and real world patients with TBI. Appl Neuropsychol Child. 2020;1–18.
doi:10.1080/21622965.2020.1719409
70. Jasinski LJ, Berry DT, Shandera AL, Clark JA. Use of the Wechsler adult intelligence scale digit
span subtest for malingering detection: a meta-analytic review. J Clin Exp Neuropsychol.
2011;33(3):300–314.
71. Mathias CW, Greve KW, Bianchini KJ, Houston RJ, Crouch JA. Detecting malingered neuro-
cognitive dysfunction using the reliable digit span in traumatic brain injury. Assessment.
2002;9(3):301–308.
72. Reese CS, Suhr JA, Riddle TL. Exploration of malingering indices in the Wechsler adult
intelligence scale fourth edition digit span subtest. Arch Clin Neuropsychol.
2012;27:176–181.
73. Schroeder RW, Twumasi-Ankrah P, Baade LE, Marshall PS. Reliable digit span: a systematic
review and cross-validation study. Assessment. 2012;19(1):21–30.
74. Shura RD, Martindale SL, Taber KH, Higgins AM, Rowland JA. Digit Span embedded validity
indicators in neurologically-intact veterans. Clin Neuropsychol. 2020;34(5):1025–1037.
75. Spencer RJ, Axelrod BN, Drag LL, Waldron-Perrine B, Pangilinan PH, Bieliauskas LA. WAIS-IV
reliable digit span is no more accurate than age corrected scaled score as an indicator of
invalid performance in a veteran sample undergoing evaluation for mTBI. Clin Neuropsychol.
2013;27(8):1362–1372.
76. Trueblood W. Qualitative and quantitative characteristics of malingered and other invalid
WAIS-R and clinical memory data. J Clin Exp Neuropsychol. 1994;14(4):697–707. doi:10.1080/
01688639408402671.
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 677
77. Webber TA, Soble JR. Utility of various WAIS-IV digit span indices for identifying noncredible
performance among performance validity among cognitively impaired and unimpaired
examinees. Clin Neuropsychol. 2018;32(4):657–670.
78. Ashendorf L. Rey auditory verbal learning test and Rey–osterrieth complex gure test
performance validity indices in a VA polytrauma sample. Clin Neuropsychol. 2019;33
(8):1388–1402.
79. Axelrod BN, Schutte C. Concurrent validity of three forced-choice measures of symptom
validity. Appl Neuropsychol. 2011;18(1):27–33.
80. Bauer L, Yantz CL, Ryan LM, Warden DL, McCarey RJ. An examination of the California verbal
learning Test II to detect incomplete eort in a traumatic brain injury sample. Appl
Neuropsychol. 2005;12(4):202–207. doi:10.1207/s15324826an1204_3.
81. Blaskewitz N, Merten T, Brockhaus R. Detection of suboptimal eort with the Rey complex
gure test and recognition trial. Appl Neuropsychol. 2009;16:54–61.
82. Bortnik KE, Boone KB, Marion SD, Amano S, Ziegler E, Victor TL, Zeller MA. Examination of
various WMS-III logical memory scores in the assessment of response bias. Clin Neuropsychol.
2010;24(2):344–357. doi:10.1080/13854040903307268.
83. Curtis KL, Greve KW, Bianchini KJ, Brennan A. California verbal learning test indicators of
malingered neurocognitive dysfunction: sensitivity and specicity in traumatic brain injury.
Assessment. 2006;13(1):46–61.
84. Donders J, Strong CH. Embedded eort indicators on the California verbal learning test – second
edition (CVLT-II): an attempted cross-validation. Clin Neuropsychol. 2011;25:173–184.
85. Erdodi LA, Abeare CA, Medo B, Seke KR, Sagar S, Kirsch NL. A single error is one too many:
the forced choice recognition trial on the CVLT-II as a measure of performance validity in
adults with TBI. Arch Clin Neuropsychol. 2018;33(7):845–860. doi:10.1093/arclin/acx110.
86. Langeluddecke PM, Lucas SK. Quantitative measures of memory malingering on the Wechsler
memory scale—third edition in mild head injury litigants. Arch Clin Neuropsychol. 2003;18
(2):181–197.
87. Moore BA, Donders J. Predictors of invalid neuropsychological performance after traumatic
brain injury. Brain Inj. 2004;18(10):975–984. doi:10.1080/02699050410001672350.
88. Rai J, An KY, Charles J, Ali S, Erdodi LA. Introducing a forced choice recognition trial to the Rey
Complex Figure Test. Psychol Neurosci. 2019;12(4):451–472. doi:10.1037/pne0000175.
89. Resch ZJ, Pham AT, Abramson DA, White DJ, DeDios-Stern S, Soble JR. Examining indepen-
dent and combined accuracy of embedded performance validity tests in the California verbal
learning Test-II and brief visuospatial memory-revised for detecting invalid performance.
Appl Neuropsychol Adult. 2020. Advance online publication.
90. Root JC, Robbins RN, Chang L, Van Gorp WG. Detection of inadequate eort on the California
verbal learning Test - second edition: forced choice recognition and critical item analysis.
J Int Neuropsychol Soc. 2006;12:688–696. doi:10.1017/S1355617706060838.
91. Schwartz ES, Erdodi L, Rodriguez N, Jyotsna JG, Curtain JR, Flashman LA, Roth RM. CVLT-II
forced choice recognition trial as an embedded validity indicator: a systematic review of the
evidence. J Int Neuropsychol Soc. 2016;22(8):851–858. doi:10.1017/S1355617716000746.
92. Shura RD, Miskey HM, Rowland JA, Yoash-Gatz RE, Denning JH. Embedded performance validity
measures with postdeployment veterans: cross-validation and eciency with multiple measures.
Appl Neuropsychol Adult. 2016;23:94–104. doi:10.1080/23279095.2015.1014556.
93. Abramson DA, Resch ZJ, Ovsiew GP, White DJ, Bernstein MT, Basurto KS, Soble JR. Impaired or
invalid? Limitations of assessing performance validity using the Boston naming test.
Appl Neuropsychol Adult. 2020. Advance online publication. doi:10.1080/23279095.2020.1774378
94. An KY, Charles J, Ali S, Enache A, Dhuga J, Erdodi LA. Re-examining performance validity
cutos within the complex ideational material and the Boston naming test-short form using
an experimental malingering paradigm. J Clin Exper Neuropsychol. 2019;41(1):15–25.
doi:10.1080/13803395.2018.1483488.
95. Curtis KL, Thompson LK, Greve KW, Bianchini KJ. Verbal uency indicators of malingering in
traumatic brain injury: classication accuracy in known groups. Clin Neuropsychol.
2008;22:930–945. doi:10.1080/13854040701563591.
678 I. MESSA ET AL.
96. Erdodi LA, Dunn AG, Seke KR, Charron C, McDermott A, Enache A, Maytham C, Hurtubise J.
The Boston naming test as a measure of performance validity. Psychol Inj Law. 2018;11:1–8.
doi:10.1007/s12207-017-9309-3.
97. Erdodi LA, Roth RM. Low scores on BDAE complex ideational material are associated with
invalid performance in adults without aphasia. Appl Neuropsychol Adult. 2017;24(3):264–274.
doi:10.1080/23279095.2017.1298600.
98. Erdodi LA, Tyson BT, Abeare CA, Lichtenstein JD, Pelletier CL, Rai JK, Roth RM. The BDAE
complex ideational material a measure of receptive language or performance validity?
Psychol Inj Law. 2016;9:112–120. doi:10.1007/s12207-016-9254-6.
99. Iverson GI, Binder LM. Detecting exaggeration and malingering in neuropsychological
assessment. J Head Trauma Rehabil. 2000;15(2):829–858.
100. Millis SR, Ross SR, Ricker JH. Detection of incomplete eort of the Wechsler adult intelligence
scale-revised: a cross-validation. J Clin Exp Neuropsychol. 1998;20(2):167–173.
101. Mittenberg W, Theroux-Fichera S, Zielinski RE, Heilbronner RL. Identication of malingered
head injury on the Wechsler adult intelligence scale-revised. Prof Psychol Res Pr. 1995;26
(5):491–498.
102. Mittenberg W, Theroux S, Aguila-Puentes G, Bianchini K, Greve K, Rayls K. Identication of
malingered head injury on the Wechsler adult intelligence scale-3rd Edition. Clin
Neuropsychol. 2001;15(4):440–445.
103. Sugarman MA, Axelrod BN. Embedded measures of performance validity using verbal uency
tests in a clinical sample. Appl Neuropsychol Adult. 2015;22(2):141–146.
104. Abeare C, Sabelli A, Taylor B, Holcomb M, Dumitrescu C, Kirsch N, Erdodi L. The importance of
demographically adjusted cutos: age and education bias in raw score cutos within the Trail
Making Test. Psychol Inj Law. 2019;12(2):170–182. doi:10.1007/s12207-019-09353.
105. Ashendorf L, O’Bryant SE, McCarey RJ. Specicity of malingering detection strategies in older
adults using the CVLT and WCST. Clin Neuropsychol. 2003;17(2):255–262.
106. DenBoer JW, Hall S. Neuropsychological test performance of successful brain injury
simulators. Clin Neuropsychol. 2007;21(6):943–955.
107. Gligorović M, Buha N. Conceptual abilities of children with mild intellectual disability: analysis
of Wisconsin card sorting test performance. J Intell Dev Disab. 2013;38(2):134–140.
doi:10.3109/13668250.2013.772956.
108. Greve KW, Bianchini KJ, Mathias CW, Houston RJ, Crouch JA. Detecting malingered neuro-
cognitive dysfunction with the Wisconsin card sorting test: a preliminary investigation in
traumatic brain injury. Clin Neuropsychol. 2002;16(2):179–191.
109. Greve KW, Heinly MT, Bianchini KJ, Love JM. Malingering detection with the Wisconsin card
sorting test in mild traumatic brain injury. Clin Neuropsychol. 2009;23:343–362.
110. Iverson GL, Lange RT, Green P, Franzen MD. Detecting exaggeration and malingering with the trail
making test. Clin Neuropsychol. 2002;16(3):398–406. doi:10.1076/clin.16.3.398.13861.
111. Jodzio K, Biechowska D. Wisconsin card sorting test as a measure of executive function
impairments in stroke patients. Appl Neuropsychol. 2010;17(4):267–277.
112. Merten T, Bossink L, Schmand B. On the limits of eort testing: symptom validity tests and
severity of neurocognitive symptoms in nonlitigant patients. J Clin Exp Neuropsychol.
2007;29(3):308–318.
113. Merten T, Green P, Henry M, Blaskewitz N, Brockhaus R. Analog validation of
German-language symptom validity tests and the inuence of coaching. Arch Clin
Neuropsychol. 2005;20:719–726.
114. Suhr JA, Boyer D. Use of the Wisconsin Card Sorting Test in the detection of malingering in
student simulator and patient samples. J Clin Exp Psychol. 1999;21(5):701–708. doi:10.1076/
jcen.21.5.701.868.
115. Erdodi LA, Kirsch NL, Sabelli AG, Abeare CA. The Grooved Pegboard Test as a validity
indicator A study on psychogenic interference as a confound in performance validity
research. Psychol Inj Law. 2018;11(4):307–324. doi:10.1007/s12207-018-9337-7.
AUSTRALIAN JOURNAL OF FORENSIC SCIENCES 679
116. Erdodi LA, Seke KR, Shahein A, Tyson BT, Sagar S, Roth RM. Low scores on the Grooved
Pegboard Test are associated with invalid responding and psychiatric symptoms. Psychol
Neurosci. 2017;10(3):325–344. doi:10.1037/pne0000103.
117. Erdodi LA, Taylor B, Sabelli A, Malleck M, Kirsch NL, Abeare CA. Demographically adjusted
validity cutos in the Finger tapping test are superior to raw score cutos. Psychol Inj Law.
2019;12(2):113–126. doi:10.1007/s12207-019-09352-y.
118. An KY, Kaploun K, Erdodi LA, Abeare CA. Performance validity in undergraduate research
participants: a comparison of failure rates across tests and cutos. Clin Neuropsychol. 2017;31
(1):193–206.
119. Roye S, Calamia M, Bernstein JP, De Vito AN, Hill BD. A multi-study examination of perfor-
mance validity in undergraduate research participants. Clin Neuropsychol. 2019;33
(6):1138–1155.
120. Dandachi-FitzGerald B, Duits AA, Leentjens AF, Verhey FR, Ponds RW. Performance and
symptom validity assessment in patients with apathy and cognitive impairment.
J Int Neuropsychol Soc. 2020;26(3):314–321.
121. Larrabee GJ, Millis SR, Meyers JE. 40 plus or minus 10, a new magical number: reply to Russel.
Clin Neuropsychol. 2009;23:841–849.
122. Martin PK, Schroeder RW. Base rates of invalid test performance across clinical non-forensic
contexts and settings. Arch Clin Neuropsychol. 2020;35(6):717–725. doi:10.1093/arclin/
acaa017.
123. Chafetz M, Underhill J. Estimated costs of malingered disability. Arch Clin Neuropsychol.
2013;28(7):633–639.
124. Denning JH, Shura RD. Cost of malingering mild traumatic brain injury-related cognitive
decits during compensation and pension evaluations in the veterans benets
administration. Appl Neuropsychol Adult. 2019;26(1):1–16.
125. Grossi LM, Green D, Einzig S, Bel B. Evaluation of the response bias scale and improbable
failure scale in assessing feigned cognitive impairment. Psychol Assess. 2017;29(5):531–541.
doi:10.1037/pas0000364.
126. Odland AP, Lammy AB, Martin PK, Grote CL, Mittenberg W. Advanced administration and
interpretation of multiple validity tests. Psychol Inj Law. 2015;8(1):46–63.
127. Pearson. Advanced clinical solutions for the WAIS-IV and WMS-IV technical manual. San
Antonio (TX): Author; 2009.
128. Erdodi LA, Roth RM, Kirsch NL, Lajiness-O’Neill R, Medo B. Aggregating validity indicators
embedded in Conners’ CPT-II outperforms individual cutos at separating valid from invalid
performance in adults with traumatic brain injury. Arch Clin Neuropsychol. 2014;29
(5):456–466. doi:10.1093/arclin/acu026.
129. Lu PH, Boone KB, Cozolino L, Mitchell C. Eectiveness of the Rey-Osterrieth complex gure
test and the Meyers and Meyers recognition trial in the detection of suspect eort. Clin
Neuropsychol. 2003;17(3):426–440. doi:10.1076/clin.17.3.426.18083.
130. Reedy SD, Boone KB, Cottingham ME, Glaser DF, Lu PH, Victor TL, Ziegler EA, Zeller MA,
Wright MJ. Cross validation of the Lu and colleagues (2003) Rey-Osterrieth complex gure
test eort equation in a large known-group sample. Arch Clin Neuropsychol. 2013;28:30–37.
doi:10.1093/arclin/acs106.
131. Tyson BT, Baker S, Greenacre M, Kent KJ, Lichtenstein JD, Sabelli A, Erdodi LA. Dierentiating
epilepsy from psychogenic nonepileptic seizures using neuropsychological test data.
Epilepsy Behav. 2018;87:39–45.
132. Lace JW, Grant AF, Kosky KM, Teague CL, Lowell KT, Gfeller JD. Identifying novel embedded
performance validity test formulas within the repeatable battery for the assessment of
neuropsychological status: a simulation study. Psychol Inj Law. 2020;13:303–315.
133. Johnstone L, Cooke DJ. Feigned intellectual decits on the Wechsler adult intelligence scale-
revised. British J Clin Psychol. 2003;42(3):303–318.
134. Heaton RK, Miller, SW, Taylor, MJ, & Grant, I. 2004. Revised comprehensive norms for an
expanded Halstead-Reitan battery: Demographically adjusted neuropsychological norms for
African American and Caucasian adults. Psychological Assessment Resources.
680 I. MESSA ET AL.
... As well, these tests do not function outside of context, which affects their psychometric properties. They are influenced by demographic [age, education, race, level of English proficiency and even lateral dominance (Ali et al., 2022;Crişan et al., 2023;Erdodi, Nussbaum, et al., 2017;Holcomb et al., 2022;Pearson Clinical, 2009;Rai et al., 2024;Votruba et al., 2020)] and clinical characteristics [traumatic brain injury (TBI) severity (Erdodi et al., 2017a(Erdodi et al., , 2017b, medically verified neurological disorders , or the presence of genuine and severe cognitive impairment (Bell-Sprinkle et al., 2013;Messa et al., 2022)]. ...
... In another proviso, the 30% rule might be different across cognitive domains (e.g., attention/processing speed, memory, and executive function). The rule might need adjustment as a function of neuropsychological disorder [mTBI versus medically verified severe TBI; a relatively minor injury superimposed on a history of complex trauma history (Parsons et al., 2024b), as well as ntellectual disability (Messa et al., 2022) or dementia (Bell-Sprinkle et al., 2013)]. Demographic factors could also play a role (limited English proficiency, cultural factors). ...
Article
Full-text available
The field of psychological injury and law is marked by use of psychometrically sound validity tests that use empirically derived cut scores to determine the credibility of cognitive deficits and psychological symptoms in forensic and related disability assessments (FDRA). Performance validity tests (PVTs) are used in neuropsychological/cognitive assessments to determine the extent to which test scores reflect true ability levels. Symptom validity tests (SVTs) are designed to evaluate the credibility of self-reported level of excessive report in behaviors, emotions, and thoughts. They monitor the rate of endorsement of rare, absurd, impossible, and improbable symptoms. The authors argue for a 30% rule as a tentative multivariate threshold for invalid presentation (with provisos). In other words, failure on about one third of the PVTs/SVTs administered should be required before deeming the overall profile non-credible to control for the threat of inflated false positive error due to the increasing number of instruments used. Typically, workers in the field use the multivariate threshold of ≥ 2 PVT failures in FDRA to deem an entire profile invalid, without considering the number of tests administered. The proposed 30% rule accommodates this face validity question. It is tentatively proposed as a starting point for future research, and with sufficient empirical support, a general guideline for FDRA.
... In addition, considering that different authors have proposed varying IOP-M thresholds for determining performance invalidity (for example, Erdodi et al., 2023 suggested a slightly more liberal cutoff than Giromini et al., 2020a, b), our study also aimed to provide additional information regarding potentially optimal cut scores for the IOP-M. Moreover, given that severe or domain-specific cognitive impairment could increase the likelihood of failures on PVTs (Cutler et al., 2024;Erdodi, 2023;Glassmire et al., 2019;Messa et al., 2022;Tyson et al., 2023), we also sought to analyze the diagnostic accuracy of the IOP-M at different levels of cognitive impairment to generate a reasonable estimate of the false positive rate within the IOP-M. ...
Article
Full-text available
This study aimed to investigate the specificity of the Inventory of Problems (IOP) tests, specifically the IOP-29 and its memory module (IOP-M), in a high-stakes environment. The study involved 114 Italian adults who applied for the renewal or reinstatement of their driver’s license after it had been revoked due to psychiatric, cognitive, or legal issues. The IOP-29 and the IOP-M were administered alongside other tests. Data analysis revealed very few positive results for both the IOP-29 and the IOP-M, indicating high specificity in detecting a possible negative response bias. In fact, the false positive rate (or, more accurately, the presumably false positive rate) was less than 5% for each of the two IOP components, meaning that the specificity for the standard cutoff values of each IOP component (i.e., IOP-29 ≥ 0.50 and IOP-M ≤ 29) was above 0.95. Taken together, these results contribute to the growing body of research supporting the use of the IOP-29 and IOP-M in applied settings where mild cognitive impairment might be present. However, further studies are needed to validate these results in populations with moderate or severe cognitive impairment.
... For example, around 6% of the participants aged 16-60 who contributed to the normative data for the California Verbal Learning Test-Second Edition (CVLT-II; Delis et al., 2000) scored ≤15 on the Forced Choice Recognition (FCR) trial. Such a score has been shown to be specific (≥.90) to non-credible performance in clinical samples (Erdodi, 2023c;Erdodi et al., 2018a;Messa et al., 2022a;Resch et al., 2022). Therefore, a small subset of invalid response sets could have shifted the cutoff for impairment (2 SDs below the mean) closer to the center of the distribution-partially explaining the invalid before impaired paradox. ...
Chapter
Full-text available
This chapter reviews factors that are rarely considered as the focus of empirical research, yet they likely impact the outcome of neuropsychological assessments and are often recognized as relevant clinical variables by practitioners. Carefully considering their impact on the results of the assessment can enhance the overall case conceptualization and the validity of the clinical interpretations. In addition to a review of relevant literature, the chapter includes case studies and new data to illustrate key concepts. Complicating factors are grouped into three categories: (1) Examinee-related variables, (2) Contextual variables, and (3) Assessment artifacts. Tentative solutions to managing these confounding variables are discussed.
... Shading, capitalization and bold face provide a visual representation of the change in confidence in correctly classifying a given score as invalid (darker background, capital letters and bold mean increased likelihood of non-credible performance) Note. Shading, capitalization and bold face provide a visual representation of the change in confidence in correctly classifying a given score as invalid (darker background, capital letters and bold mean increased likelihood of non-credible performance); CD WAIS-IV , Coding subtest of the Wechsler Adult Intelligence Scale-Fourth Edition (age-corrected scaled score; Ashendorf et al., 2017;Erdodi, 2021;Erdodi & Abeare, 2020;Erdodi & Lichtenstein, 2017;Etherton et al., 2006;Kim et al., 2010); GPB: Grooved Pegboard Test dominant hand (demographically adjusted T-score using norms by Erdodi et al., 2017aErdodi et al., , b, 2018aHeaton et al., 2004;Jinkerson et al., 2023;Link et al., 2022); FCR CVLT-II , Forced Choice Recognition trial of the California Verbal Learning Test-Second edition (Erdodi, 2021;Erdodi et al., 2018a, b, c, d;Messa et al., 2022;Persinger et al., 2018;Resch et al., 2022;Schwartz et al., 2016); RDS, reliable digit span (Greiffenstein et al., 1994;Pearson, 2009;Schroeder et al., 2012); WCT word choice test Erdodi, 2021;Pearson, 2009; the entire profile invalid; scores may underestimate true ability levels"). In contrast, EI-5 ≥ 4 indicates four Level 1 failures, two Level 2 failures, or some combination of both. ...
Article
Full-text available
This article on malingering-related topics in forensic and disability-related assessments (FDRA) examines malingering “detection” systems, such as the malingered neurocognitive dysfunction (MND) system, and its revision. The article examines, as well, the malingered pain-related disability (MPRD) detection system. We conclude that none of these systems reach the required standards for reliability and validity in toto. The article moves on to discuss the Erdodi Index (EI) System, which can vary in the number of indicators and cognitive domains (e.g., attention, memory, processing speed). The article ends by presenting a list of 25 factors that could influence responses on performance validity tests (PVTs) and symptom validity tests (SVTs) across multiple areas. Among others, these factors relate to the examinee, testing, and extraneous factors, such as litigation distress. Workers in the field are cautioned to vet and to use carefully any malingering-related detection system and in gravitating to malingering as a stock interpretation of failed PVTs/SVTs.
... For these reasons, researchers have begun investigating factors that increase the propensity for false positives during validity testing. Existing research has found that FPRs are unacceptably high (>10%) when using most EVIs in populations with severe neuropsychiatric disorders and/or cognitive impairment (Davis, 2018;Messa et al., 2022). Some studies have also found that certain EVIs perform poorly in populations with milder impairment. ...
Article
Objective This study investigated why certain embedded performance validity indicators (EVIs) are prone to higher false-positive rates (FPRs) in attention-deficit/hyperactivity disorder (ADHD) evaluations. The first aim was to establish the relationship between FPRs and 15 EVIs derived from six cognitive tests when used independently and together among adults with ADHD who have valid test performance. The second aim was to determine which specific EVIs increase the FPRs in this population. Method Participants were 517 adult ADHD referrals with valid neurocognitive test performance as determined by multiple performance validity tests and established empirical criteria. FPRs were defined by the proportion of participants who scored below an empirically established EVI cutoff with ≥0.90 specificity. Results EVIs derived from two of the six tests exhibited unacceptably high FPRs (>10%) when used independently, but the total FPR decreased to 8.1% when the EVIs were aggregated. Several EVIs within a sustained attention test were associated with FPRs around 11%. EVIs that did not include demographically adjusted cutoffs, specifically for race, were associated with higher FPRs around 14%. Conversely, FPRs did not significantly differ based on whether EVIs included timed versus untimed, verbal versus nonverbal, or graphomotor versus non-graphomotor components, nor whether they had raw versus standardized cut scores. Conclusions Findings suggest that practitioners should consider both the type of test from which an EVI is derived and the aggregate number of EVIs employed to minimize the FPRs in ADHD evaluations. Findings also indicate that more nuanced approaches to validity test selection and development are needed.
... The following were used as inclusion criteria: (a) diagnosis of schizophrenia spectrum disorder; (b) 18 years of age; (c) informed consent for the secondary use of test data for research purposes. The only exclusion criterion was comorbidity with intellectual disability, considering that such a severe level of cognitive impairment is known to result in unacceptably high levels of false positive errors on both SVTs (van Impelen et al., 2014) and PVTs (Glassmire et al., 2019;Messa et al., 2022). ...
Article
Professionals are encouraged to include multiple symptom validity tests (SVTs) and multiple performance validity tests (PVTs) to assess the credibility of clinical and forensic presentations. Combining the Inventory of Problems–29 (IOP-29, an SVT) with its Memory module (IOP-M, a PVT), referred to here as ‘IOP-29-M’, may be especially beneficial, as it allows for rapid assessment of both symptom and performance validity. Accordingly, the present study aimed at ‘stress testing’ the IOP-29-M by examining base rates of failure (BRFail) in 109 patients with schizophrenia spectrum psychopathology, both with and without criminal convictions. In the overall sample, BRFail were low: 13.8% and 8.3%, respectively, for the standard (≥ .50) and conservative (≥ .65) cutoffs of the IOP-29, and 12.8% and 9.2%, respectively, for the standard (≤ 29) and conservative (≤ 28) cutoffs of the IOP-M. Importantly, very few individuals, less than 5%, failed both components of the IOP-29-M.
... maximize specificity (Chafetz, 2022). Nevertheless, concerns about misclassifying response sets as invalid due to genuine, severe cognitive impairment (Messa et al., 2022;Tyson et al., 2023), limited English proficiency Crişan et al., 2023), or race Pearson, 2009) persist and are substantiated by empirical evidence. Interestingly, there is little research on the effect of age on PVT failure, despite overwhelming evidence that performance on most measures of cognitive functioning declines later in life (Lezak et al., 2012) and sporadic reports that adults who fail PVTs tend to be older than those who pass them (C. ...
Article
Full-text available
This study aimed to investigate the relationship between age and base rates of failure (BRFail) on various performance validity tests (PVTs) administered in medical–legal settings. Archival data were analyzed from 3,297 adults (Mage = 42.3 years; Meducation = 11.2) referred for psychological or neuropsychological assessments in a medical–legal or forensic civil disability context who passed the Word Memory Test. BRFail on 10 PVTs (three freestanding and seven embedded) were reported at multiple cutoffs across five age groups ranging from 16 years to 69 years. BRFail increased with age on most embedded PVTs, with a couple of notable exceptions. Reliable Digit Span was unrelated to age at ≤6 but produced elevated BRFail among older examinees at ≤7. Within freestanding PVTs, a positive relationship emerged between age and BRFail on most instruments/cutoffs. Older age is associated with an increased risk of false positive errors on many embedded PVTs that rely on raw scores. Although freestanding PVTs tend to be more resistant to the effects of age, several commonly used cutoffs may still produce increased false positive rates in older examinees. Taken together, results suggest that PVT scores should be interpreted in the context of patient characteristics, in an evidence-based manner, rather than by rigidly applying omnibus cutoffs.
Article
Full-text available
Little is known about the neuropsychological profiles associated with sexual abuse (SA). Using a serial case study design, we examined the cognitive profile from 11 patients medically referred for neuropsychological evaluation who endorsed a remote history of SA. The data can be best summarized as the co-existence of strong psychometric evidence of non-credible responding [operationalized as multiple failures on performance validity tests (PVTs)] and intact (often better than expected) performance on difficult tests designed to measure cognitive ability across various domains. Five (45.5%) of the patients had strong evidence for noncredible responding and another three (27.3%) had indeterminate profiles (i.e., neither clearly valid nor clearly invalid). We propose that the paradoxical co-occurrence of multiple PVT failures and intact cognitive abilities may be a psychometric marker of complex trauma history. Naturally, this hypothesis needs extensive independent replication before it can be incorporated into routine clinical interpretation of neuropsychological data. Given the small sample size and the variability in demographic and clinical characteristics, results are considered preliminary. The effects of developmental timing and recurrence of SA, the survivor-perpetrator relationship, protective factors, and post-traumatic growth on cognitive functioning in people who have experienced SA are discussed. Future research is needed to better understand the long-term effects of SA on the neurocognitive profile of adult survivors. In the meantime, assessors may want to consider complex trauma history as a potential alternative explanation to feigning for multiple PVT failures and unexplained internal inconsistencies within the cognitive profile.
Article
Neuropsychological assessment can play a vital role in competency to stand trial (CST) evaluations. This article provides an overview of the brain and behavior-based conditions that may impact an individual’s ability to participate in their legal proceedings, as well as the relevant legal parameters that guide these types of forensic mental health evaluations. Circumstances that may warrant the involvement of a neuropsychologist in these evaluations are reviewed. For example, neuropsychologists’ expertise with validity testing, as well as their specialized knowledge of cognition, is useful from the onset of the evaluation through the end, where involved parties often want to know about restoration interventions and feasibility. Select neuropsychological testing measures and cultural considerations are also addressed. The article concludes with illustrative case examples that demonstrate the real-world application of neuropsychological involvement in this specific forensic context. Given the expanding opportunities for neuropsychologists to assist triers of fact, this work contributes to the necessary and developing education base for neuropsychologists who wish to provide informed, effective, and culturally sensitive CST evaluations or related consultation to the legal system.
Article
Full-text available
Objective: The study was designed to evaluate the performance validity module of Advanced Clinical Solutions (ACS) against external criterion measures and compare two alternative aggregation methods for its five components. Method: The ACS was evaluated against psychometrically defined criterion groups in a sample of 93 outpatients with TBI. In addition to the default method, the component performance validity tests (PVTs) were either dichotomized along a single cutoff (VI-ACS) or recoded to capture various degrees of failure (EI-ACS). Results: The standard ACS model correctly classified 75-83% of the sample. The alternative aggregation methods produced superior overall correct classification: 80-91% (VI-ACS) and 86-91% (EI-ACS). Mild TBI was associated with higher failure rates than moderate/severe TBI. Failing just one of the five ACS components resulted in a 3- to 8-fold increase in the likelihood of failing criterion PVTs. Conclusions: Results support the use of the standard PVT module for ACS: it is an effective measure of performance validity that is robust to moderate-to-severe TBI. Post-publication research on individual ACS components and methodological advances in PVT research provide an opportunity to enhance the overall classification accuracy of the ACS model. Passing stringent multivariate PVT cutoffs does not indicate valid performance.
Article
Full-text available
While literature on performance validity tests (PVTs) in neuropsychological assessment has examined memory-based paradigms, other research has suggested that tests of attention, visuospatial ability, and language may also detect noncredible performance. Previous work has identified several PVTs in the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), though all of them emphasize memory-based subtests. This study sought to determine if PVT formulas can be derived from exclusively non-memory RBANS subtests (i.e., Figure Copy, Line Orientation, Picture Naming, Semantic Fluency, Digit Span, and Coding) using an analog simulation study. Seventy-two undergraduate participants (M age = 18.9) were assigned to either an asymptomatic (AS) group, which was instructed to perform optimally, or a simulated mild traumatic brain injury (S-mTBI) group, which received symptom and test coaching to help simulate mTBI-related impairment. Participants were administered a battery of neuropsychological tests, including the RBANS and standalone PVTs. Differences were found between groups for all RBANS subtests of interest except Picture Naming. Five subtests showing meaningful group differences were entered as predictor variables as one set in logistic regressions (LR); raw and norm-based scores were considered separately. Both LRs accurately classified 90.3% of cases with good sensitivity (.89) while maintaining ideal specificity (.92). Two exponentiated equations were described from LR results, with both yielding good discriminability (AUCs = .94), generally comparable with other PVTs. These findings suggested that non-memory RBANS subtests may be sensitive to noncredible performance and reiterate the importance of considering tests of various cognitive abilities when assessing performance validity during neuropsychological assessment. Limitations of this study and directions for future inquiry, including necessity for validation in a clinical sample, were discussed.
Article
Full-text available
Objective: This study was designed to examine the classification accuracy of verbal fluency (VF) measures as performance validity tests (PVT). Method: Student volunteers were assigned to the control (n = 57) or experimental malingering (n = 24) condition. An archival sample of 77 patients with TBI served as a clinical comparison. Results: Among students, FAS T-score ≤29 produced a good combination of sensitivity (.40–.42) and specificity (.89–.95). Animals T-score ≤31 had superior sensitivity (.53–.71) at .86-.93 specificity. VF tests performed similarly to commonly used PVTs embedded within Digit Span: RDS ≤7 (.54–.80 sensitivity at .93–.97 specificity) and age-corrected scaled score (ACSS) ≤6 (.54–.67 sensitivity at .94–.96 specificity). In the clinical sample, specificity was lower at liberal cutoffs [animals T-score ≤31 (.89–.91), RDS ≤7 (.86–.89) and ACSS ≤6 (.86–.96)], but comparable at conservative cutoffs [animals T-score ≤29 (.94–.96), RDS ≤6 (.95–.98) and ACSS ≤5 (.92–.96)]. Conclusions: Among students, VF measures had higher signal detection performance than previously reported in clinical samples, likely due to the absence of genuine impairment. The superior classification accuracy of animal relative to letter fluency was replicated. Results suggest that existing validity cutoffs can be extended to cognitively high functioning examinees, and emphasize the importance of population-specific cutoffs.
Article
Full-text available
Objective: This study was designed to evaluate the classification accuracy of a multivariate model of performance validity assessment using embedded validity indicators (EVIs) within the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV). Method: Archival data were collected from 100 adults with traumatic brain injury (TBI) consecutively referred for neuropsychological assessment in a clinical setting. The classification accuracy of previously published individual EVIs nested within the WAIS-IV and a composite measure based on six independent EVIs were evaluated against psychometrically defined non-credible performance. Results: Univariate validity cutoffs based on age-corrected scaled scores on Coding, Symbol Search, Digit Span, Letter-Number-Sequencing, Vocabulary minus Digit Span, and Coding minus Symbol Search were strong predictors of psychometrically defined non-credible responding. Failing ≥3 of these six EVIs at the liberal cutoff improved specificity (.91-.95) over univariate cutoffs (.78-.93). Conversely, failing ≥2 EVIs at the more conservative cutoff increased and stabilized sensitivity (.43-.67) compared to univariate cutoffs (.11-.63) while maintaining consistently high specificity (.93-.95). Conclusions: In addition to being a widely used test of cognitive functioning, the WAIS-IV can also function as a measure of performance validity. Consistent with previous research, combining information from multiple EVIs enhanced the classification accuracy of individual cutoffs and provided more stable parameter estimates. If the current findings are replicated in larger, diagnostically and demographically heterogeneous samples, the WAIS-IV has the potential to become a powerful multivariate model of performance validity assessment. Brief summary: Using a combination of multiple performance validity indicators embedded within the subtests of the Wechsler Adult Intelligence Scale, the credibility of the response set can be established with a high level of confidence. Multivariate models improve classification accuracy over individual tests. Relying on existing test data is a cost-effective approach to performance validity assessment.
Article
Full-text available
Objective: Performance and symptom validity tests (PVTs and SVTs) measure the credibility of the assessment results. Cognitive impairment and apathy potentially interfere with validity test performance and may thus lead to an incorrect (i.e., false-positive) classification of the patient's scores as non-credible. The study aimed at examining the false-positive rate of three validity tests in patients with cognitive impairment and apathy. Methods: A cross-sectional, comparative study was performed in 56 patients with dementia, 41 patients with mild cognitive impairment, and 41 patients with Parkinson's disease. Two PVTs - the Test of Memory Malingering (TOMM) and the Dot Counting Test (DCT) - and one SVT - the Structured Inventory of Malingered Symptomatology (SIMS) - were administered. Apathy was measured with the Apathy Evaluation Scale, and severity of cognitive impairment with the Mini Mental State Examination. Results: The failure rate was 13.7% for the TOMM, 23.8% for the DCT, and 12.5% for the SIMS. Of the patients with data on all three tests (n = 105), 13.5% failed one test, 2.9% failed two tests, and none failed all three. Failing the PVTs was associated with cognitive impairment, but not with apathy. Failing the SVT was related to apathy, but not to cognitive impairment. Conclusions: In patients with cognitive impairment or apathy, failing one validity test is not uncommon. Validity tests are differentially sensitive to cognitive impairment and apathy. However, the rule that at least two validity tests should be failed to identify non-credibility seemed to ensure a high percentage of correct classification of credibility.
Article
Full-text available
Regional fluctuations in cognitive ability have been reported worldwide. Given perennial concerns that the outcome of performance validity tests (PVTs) may be contaminated by genuine neuropsychological deficits, geographic differences may represent a confounding factor in determining the credibility of a given neurocognitive profile. This pilot study was designed to investigate whether geographic location affects base rates of failure (BRFail) on PVTs. BRFail were compared across a number of free-standing and embedded PVTs in patients with mild traumatic brain injury (mTBI) from two regions of the US (Midwest and New England). Retrospective archival data were collected from clinically referred patients with mTBI at two different academic medical centers (nMidwest = 76 and nNew England = 84). One free-standing PVT (Word Choice Test) and seven embedded PVTs were administered to both samples. The embedded validity indicators were combined into a single composite score using two different previously established aggregation methods. The New England sample obtained a higher score on the Verbal Comprehension Index of the WAIS-IV (d = .34, small-medium). The difference between the two regions in Full Scale IQ (FSIQ) was small (d = .28). When compared with the omnibus population mean (100), the effect of mTBI on FSIQ was small (d = .22) in the New England sample and medium (d = .53) in the Midwestern one. However, contrasts using estimates of regional FSIQ produced equivalent effect sizes (d: .47–.53). BRFail was similar on free-standing PVTs, but varied at random for embedded PVTs. Aggregating individual indices into a validity composite effectively neutralized regional variability in BRFail. Classification accuracy varied as a function of both geographic region and instruments. Despite small overall effect sizes, regional differences in cognitive ability may potentially influence clinical decision making, both in terms of diagnosis and performance validity assessment. There was an interaction between geographic region and instruments in terms of the internal consistency of PVT profiles. If replicated, the findings of this preliminary study have potentially important clinical, forensic, methodological, and epidemiological implications.
Article
Full-text available
This study was designed to develop validity cutoffs within the Finger Tapping Test (FTT) using demographically adjusted T-scores, and to compare their classification accuracy to existing cutoffs based on raw scores. Given that FTT performance is known to vary with age, sex, and level of education, failure to correct for these demographic variables poses the risk of elevated false positive rates in examinees who, at the level of raw scores, have inherently lower FTT performance (women, older, and less educated individuals). Data were collected from an archival sample of 100 adult outpatients (MAge = 38.8 years, MEducation = 13.7 years, 56% men) consecutively referred for neuropsychological assessment at an academic medical center in the Midwestern USA after sustaining a traumatic brain injury (TBI). Performance validity was psychometrically defined using the Word Memory Test and two validity composites based on five embedded performance validity indicators. Previously published raw score-based validity cutoffs disproportionately sacrificed sensitivity (.13–.33) for specificity (.98–1.00). Worse yet, they were confounded by sex and education. Newly introduced demographically adjusted cutoffs (T ≤ 33 for the dominant hand, T ≤ 37 for both hands) produced high levels of specificity (.89–.98) and acceptable sensitivity (.36–.55) across criterion measures. Equally importantly, they were robust to injury severity and demographic variables. The present findings provide empirical support for a growing trend of demographically adjusted performance validity cutoffs. They provide a practical and epistemologically superior alternative to raw score cutoffs, while also reducing the potential bias against examinees inherently vulnerable to lower raw score level FTT performance.
Article
The Boston Naming Test (BNT) has been proposed as an embedded performance validity test (PVT), though replication is needed to provide further empirical support of its simultaneous use as a cognitive ability measure and embedded PVT. This cross-sectional study examined BNT performance in a mixed neuropsychiatric sample of 137 patients with/without cognitive impairment. Four independent criterion PVTs classified 109 (80%) as valid and 28 (20%) as invalid. BNT raw and demographically-corrected T-scores were significantly higher among the valid group with small effect sizes (ηp² = 0.04–0.05). Raw/T-scores differentiated valid/invalid groups, but with low classification accuracy (areas under the curve [AUCs] = 0.68/0.63), and unacceptably weak sensitivities (i.e. 7%/18%). When separated by impairment status, raw score accuracy appreciably increased (AUC = 0.87; 61% sensitivity/89% specificity) among unimpaired patients, whereas T-score accuracy, while significant, remained low (AUC = 0.68; 21% sensitivity/89% specificity). Conversely, among impaired patients, neither the raw (AUC = 0.59) nor T-score (AUC = 0.60) accurately identified invalid performance. In sum, BNT scores were not able to differentiate valid from invalid performance when cognitive impairment was present, and therefore showed limited overall utility as embedded PVTs. These findings further caution against inferring performance validity from measures in which a single score is used to assess both cognitive ability and validity.
Article
Objective: Base rates of invalidity in forensic neuropsychological contexts are well explored and believed to approximate 40%, whereas base rates of invalidity across clinical non-forensic contexts are relatively less known. Methods: Adult-focused neuropsychologists (n = 178) were surveyed regarding base rates of invalidity across various clinical non-forensic contexts and practice settings. Median values were calculated and compared across contexts and settings. Results: The median estimated base rate of invalidity across clinical non-forensic evaluations was 15%. When examining specific clinical contexts and settings, base rate estimates varied from 5% to 50%. Patients with medically unexplained symptoms (50%), external incentives (25%-40%), and oppositional attitudes toward testing (37.5%) were reported to have the highest base rates of invalidity. Patients with psychiatric illness, patients evaluated for attention deficit hyperactivity disorder, and patients with a history of mild traumatic brain injury were also reported to invalidate testing at relatively high base rates (approximately 20%). Conversely, patients presenting for dementia evaluation and patients with none of the previously mentioned histories and for whom invalid testing was unanticipated were estimated to produce invalid testing in only 5% of cases. Regarding practice setting, Veterans Affairs providers reported base rates of invalidity to be nearly twice that of any other clinical settings. Conclusions: Non-forensic clinical patients presenting with medically unexplained symptoms, external incentives, or oppositional attitudes are reported to invalidate testing at base rates similar to that of forensic examinees. The impact of context-specific base rates on the clinical evaluation of invalidity is discussed.
Article
The California Verbal Learning Test-Second Edition (CVLT-II) Forced Choice Recognition (FC) and Brief Visuospatial Memory Test-Revised (BVMT-R) Recognition Discrimination Index (RD) are embedded performance validity tests (PVTs) assessing material-specific neuropsychological processes (i.e., verbal and visual memory, respectively). Prior research demonstrated the utility of these PVTs independently; however, no study has compared their diagnostic accuracy for identifying invalid performance relative to each other and in combination within a single sample. This cross-sectional study included an adult neuropsychiatric sample who underwent neuropsychological evaluation. Validity groups were determined via independent criterion PVT performance, and consisted of 103 participants with valid and 25 with invalid neurocognitive performance. FC and RD were not significantly correlated (r = .154), yet both differed between validity groups (ηp² = .14–.19). Previously established FC (≤14) and RD (≤4) cutoffs evidenced 32–40% sensitivity/90–98% specificity, though receiver operating characteristic (ROC) analyses indicated a more liberal FC cutoff (≤15) was optimal. Logistic regression models utilizing both embedded PVTs indicated that FC did not significantly improve classification accuracy above and beyond RD. Results support the clinical utility of existing cutoffs for FC and RD for independently identifying invalid performance, though the latter showed relatively better ability to detect invalid performance when both are used together.
Article
Objective: Embedded validity measures are useful in neuropsychological evaluations but should be updated with new test versions and validated across various samples. This study evaluated Wechsler Adult Intelligence Scale, 4th edition (WAIS-IV) Digit Span validity indicators in post-deployment veterans. Method: Neurologically-intact veterans completed structured diagnostic interviews, the WAIS-IV, the Medical Symptom Validity Test (MSVT), and the b Test as part of a larger study. The Noncredible group included individuals who failed either the MSVT or the b Test. Of the total sample (N = 275), 21.09% failed the MSVT and/or b Test. Diagnostic accuracy was calculated predicting group status across cutoff scores on two Digit Span variables, four Reliable Digit Span (RDS) variables, and two Vocabulary minus Digit Span variables. Results: Digit Span age-corrected scaled score (ACSS) had the highest AUC (.648) of all measures assessed; however, sensitivity at the best cutoff of <7 was only 0.17. Of RDS measures, the Working Memory RDS resulted in the highest AUC (.629), but Enhanced RDS and Alternate RDS produced the highest sensitivities (0.22). Overall, cutoff scores were consistent with other studies, but sensitivities were lower. Vocabulary minus Digit Span measures were not significant. Conclusions: Digit Span ACSS was the strongest predictor of noncredible performance, and outperformed traditional RDS variants. Sensitivity across all validity indicators was low in this research sample, though cutoff scores were congruent with previous research. Although embedded Digit Span validity indicators may be useful, they are not sufficient to replace standalone performance validity tests.