ArticlePDF Available

The BDAE Complex Ideational Material—a Measure of Receptive Language or Performance Validity?

Authors:

Abstract

Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
The BDAE Complex Ideational MaterialaMeasure
of Receptive Language or Performance Validity?
Laszlo A. Erdodi
1,2
&Bradley T. Tyson
1,3
&Christopher A. Abeare
1
&
Jonathan D. Lichtenstein
2
&Chantalle L. Pelletier
2
&Jaspreet K. Rai
1
&Robert M. Roth
2
Received: 9 February 2016 /Accepted: 19 April 2016
#Springer Science+Business Media New York 2016
Abstract Scores on the Complex Ideational Material
(CIM) were examined in reference to various perfor-
mance validity tests (PVTs) in 106 adults clinically
referred for neuropsychological assessment. The main
diagnostic categories, reflecting a continuum between
neurological and psychiatric disorders, were epilepsy,
psychiatric disorders, postconcussive disorder, and psy-
chogenic non-epileptic seizures. Cross-validation analy-
ses suggest that in the absence of bona fide aphasia, a
raw score 9orTscore29 on the CIM is more likely to
reflect non-credible presentation than impaired receptive
language skills. However, these cutoffs may be associated
with unacceptably high false positive rates in patients
with longstanding, documented neurological deficits.
Therefore, more conservative cutoffs (8/23) are recommend-
ed in such populations. Contrary to the widely accepted as-
sumption that psychiatric disorders are unrelated to perfor-
mance validity, results were consistent with the psychogenic
interference hypothesis, suggesting that emotional distress in-
creases the likelihood of PVT failures even in the absence of
apparent external incentives to underperform on cognitive
testing.
Keywords Complex Ideational Material .Performance
validity assessment .Embedded validity indicators .
Psychogenic interference
The Complex Ideational Material (CIM) subtest of the Boston
Diagnostic Aphasia Battery is a sentence comprehension task
developed to measure receptive language skills. It consists of
Yes/No questions that gradually increase in difficulty from
recalling common facts to details contained in short stories
read aloud by the examiner. Scores range from 0 to 12.
Eight was the lower limit of performance in the normative
data (M=11.2, SD = 1.1; n= 147; Borod, Goodglass, &
Kaplan, 1980). Therefore, it was designated as the cutoff for
impairment.
In a selective review of the literature based on 1078 indi-
viduals across 34 samples, healthy controls produced a
weighted mean of 11.3, while patients with stroke, dementia,
and epilepsy demonstrated somewhat lower performance,
with weighted means between 10.2 and 10.6 (Erdodi &
Roth, 2016).Theaphasiasamplewasthemostimpaired
(weighted mean = 7.5). Their findings suggest that healthy
controls (N=538) obtain near perfect scores on the CIM, with
all clinical groups performing below them and the aphasic
groups (N=188) producing a mean below the suggested clin-
ical cutoff (<8). In other words, the CIM is broadly sensitive to
neurological deficits and very low scores may be specific to
aphasia.
As these studies did not formally evaluate the credibility of
the cognitive data, they also examined the link between per-
formance validity tests (PVTs) and performance on the CIM in
a new sample of 68 clinically referred patients for neuropsy-
chological assessment. They found that patients who failed
PVTs had lower scores on the CIM, calling into question the
validity of some of the previous findings. The present study
*Laszlo A. Erdodi
lerdodi@gmail.com
1
Department of Psychology, University of Windsor, 168 Chrysler Hall
South, 401 Sunset Avenue, Windsor, ON N9B 3P4, Canada
2
Department of Psychiatry, Geisel School of Medicine at Dartmouth,
Lebanon, NH, USA
3
Western Washington Medical Group, Everett, WA, USA
Psychol. Inj. and Law
DOI 10.1007/s12207-016-9254-6
was designed to replicate these findings using a larger sample
and different instruments, to extend the investigation to the
relationship among diagnostic categories, PVTs and CIM,
and to evaluate the use of the CIM as a PVT itself.
Although the CIM was developed as a receptive language
test primarily to diagnose and subtype aphasia, some exam-
inees may perceive it as an attention and memory tasktwo
cognitive domains within which self-reported deficits are
common. Thus, the CIM may elicit a response pattern that
underestimates true ability levels. However, as the CIM is in
fact an easy task for most native speakers with intact language
skills, it may be effective at differentiating valid and invalid
response sets. Since most PVTs are based on recognition
memory, the CIM could provide an index of performance
validity using a different paradigm. As such, it could enhance
the overall measurement model by contributing non-
redundant information to validity assessment (Boone, 2013;
Larrabbee, 2003).
We hypothesized that low CIM scores would be related to
invalid responding in the absence of bona fide aphasia. Our
second hypothesis was that a positive psychiatric history
would be related to both high base rates of PVT failure and
unexpectedly poor CIM performance. If non-credible
responding and psychiatric symptoms are related to CIM
scores in the invalid range, this would allow for better-
informed clinical interpretation of both CIM performance
and the entire neurocognitive profile.
To investigate the link between emotional functioning and
CIM scores, a specific combination of clinical groups was
selected to represent the neurological-psychological continu-
um (Boone, 2007) underlying subjective cognitive com-
plaints. The epilepsy (EPI) sample consisted of patients with
a documented seizure disorder to serve as clinical controls
with a medically verified neurological condition. The
postconcussive disorder (PCD) group was composed of indi-
viduals whose cognitive complaints persisted well beyond the
3-month window of normative recovery (APA, 2013). The
psychiatric (PSY) sample included patients who reported cog-
nitive decline in the absence of an identifiable physical etiol-
ogy and whose symptomology was judged to be the result of
emotional distress. Finally, the psychogenic non-epileptic sei-
zure (PNES) sample consisted of individuals who presented
with seizure-like symptoms that were determined to be non-
epileptic by their neurologist. Although the condition itself is
quite rare (estimated 2 to 33/100 000; Benbadis & Hauser,
2000), a history of complex trauma is virtually ubiquitous
within this population (Myers, 2014). The last three groups
share the common feature of unexplained variance in their
presentation that is unlikely to be attributable to a known
neurological disorder. While the neurological vs. psychiat-
ricdichotomy is an artificial one, these diagnostic groups
may represent clinically distinct categories in terms of the
relative contribution of neurological and psychiatric factors.
The complex relationship between emotional and cognitive
functioning has long been recognized (Boone, 2007; Suhr,
Tranel, Wefel, & Barrash, 1997). However, the evidence is
mixed. Despite an apparent consensus that depression is un-
related to PVT failure (Considine et al., 2011; Egeland et al.,
2005; Rees, Tombaugh, & Boulay, 2001), the relative contri-
bution of depressive symptoms to memory performance re-
mains an evolving controversy. Some studies found that de-
pression was unrelated to memory performance (Egeland
et al., 2005; Langenecker et al., 2005; Rohling, Green,
Allen, & Iverson, 2002), whereas others reported a strong link
between the two (Christensen, Griffiths, MacKinnon, &
Jacomb, 1997; Considine et al., 2011).
Given such divergent findings, additional research exam-
ining the influence of psychiatric problems on cognitive per-
formance is warranted. In addition to investigating the CIMs
potential toserve as a PVT, the design of the present study also
allows for a systematic analysis of the interaction between
instrumentation issues and diagnostic categories. As such, it
can extend the psychogenic interference hypothesis (the no-
tion that emotional distress can negatively impact cognitive
performance) to PVTs.
Method
Participants
The sample consisted of 106 patients clinically referred for
neuropsychological assessment at a Northeastern academic
medical center in the context of self-reported cognitive de-
cline. All patients were native speakers of English and classi-
fied into one of the four diagnostic categories outlined above:
EPI (29), PCD (23), PNES (20), and PSY (34). The majority
of the sample was right handed (80 %) and female (63 %).
Mean age was 40.6 years (SD = 11.9). Mean level of education
was 12.6 years (SD =2.0). Mean FSIQ was 88.5 (SD =14.9).
Patients did not differ in age and level of education as a func-
tion of diagnosis. None of the patients reported any symptoms
of aphasia nor were any noted in the medical records by the
referring or assessing clinicians.
All PCD patients met ICD-10 diagnostic criteria (World
Health Organization, 2014). The mean time since injury was
40.6 months. None had positive neuroradiological findings or
documented loss of consciousness. The majority (52 %) either
reported being in litigation at the time of the assessment or
became injured under circumstances in which a reasonable
argument for external culpability could be made. The most
frequent mechanism of injury was motor vehicle accidents
(43 %), although one of the collisions was admittedly a failed
suicide attempt. Comorbid psychiatric problems were com-
mon: depression (52 %), history of substance abuse (52 %),
posttraumatic stress disorder (35 %), anxiety (30 %), history
Psychol. Inj. and Law
of past suicide attempts (22 %), persistent fatigue (17 %),
history of childhood abuse (13 %).
The EPI and PNES patients were diagnosed through con-
sensus by a multidisciplinary team of epilepsy specialists
based on their clinical history, seizure semiology, and video-
EEG findings. The majority of the EPI patients had comorbid
psychiatric diagnoses [depression (55 %) and anxiety (28 %)].
Psychiatric comorbidity was also common within the PNES
sample [depression (55 %), anxiety, history of childhood
abuse (20 % each), panic disorder (15 %), dissociative symp-
toms (15 %), substance abuse (15 %), past suicide attempts
(10 %), posttraumatic stress disorder (10 %)].
PSY patients were diagnosed by the assessing neuropsy-
chologists based on a combination of clinical interviews, rat-
ing scales, and review of medical records. The most frequent
psychiatric diagnosis was depression (85 %) followed by anx-
iety (53 %), posttraumatic stress disorder (32 %), panic disor-
der (24 %), substance abuse (24 %), borderline personality
disorders (21 %), schizophrenia spectrum disorders (15 %),
and bipolar disorder (12 %). A history of psychiatric hospital-
izations was present in 26 % of the sample, and 12 % reported
previous suicide attempts. The majority of patients (76 %) had
2 psychiatric conditions; 68 % had 3, while 47 % had 4.
Materials
The core battery of neuropsychological assessment included
the Wechsler Adult Intelligence and Memory Scales
(WAIS/WMS 3rd and 4th editions), California Verbal
Learning Test, 2nd edition (CVLT-II), verbal fluency tasks
(FAS & animals), the Boston Naming Test (BNT), the single
word reading task of the Wide Range Achievement Test
(WRAT; 3rd and 4th editions), the CIM, the Wisconsin Card
Sorting Test (WCST), the Beck Depression Inventory, 2nd
edition (BDI-II) and the Test of Memory Malingering
(TOMM).
To complement the TOMM, the only free-standing PVT
consistently administered, a composite measure of perfor-
mance validity was developed using embedded effort mea-
sures following the methodology described by Erdodi et al.
(2014,2016). Five language-based embedded validity indica-
tors (EVIs) were selected to create a composite measure of
performance validity, labeled the Effort Index Five(EI-5).
To capture the underlying continuity in cognitive effort, each
of the five constituent EVIs was recoded into a four-point
scale (0 to 3). The level of performance that passed the most
liberal cutoff (i.e., low probability of invalid responding) was
assigned the value of zero. The most liberal cutoff available in
the literature was assigned the value of one, the next available
cutoff was assigned a value of two, and finally, failing the
most conservative cutoff was assigned a value of three. The
re-scaling of the five EVIs reflects a gradient of increasing
probability of invalid performance. The value of the EI-5 is
the sum of the re-scaled component EVIs (Table 1).
The EI-5 provides a robust psychometric model for perfor-
mance validity assessment, as it takes into account the exam-
inees score on a wider range of measures (Larrabbee, 2003)
and thus, is better at approximating the ultimate goal of ongo-
ing monitoring of cognitive effort throughout the assessment
(Boone, 2009). Further, it makes an attempt to equalize the
inherent differences in the signal detection profiles of various
EVIs by translating them to a common metric. It also captures
the underlying continuity in performance validity by generat-
ing a scale on which each unit of measurement represents a
clinically interpretable shift in the likelihood of invalid
responding. Finally, its components were specifically selected
to represent cognitive domains that conceptually overlap with
the CIM: auditory attention, story memory, and verbal learn-
ing. Therefore, the EI-5 provides an ecologically valid refer-
ence PVT.
Although conceptually, cognitive effort is a continuous var-
iable, clinical practice demands a categorical approach to va-
lidity assessment. Therefore, the first two levels of EI-5 (1
EVI failures) were considered a Pass. The next two levels start
to become problematic, as they suggest either multiple inde-
pendent EVI failures or failure of a single EVI at the most
conservative cutoff. Given the ambiguity, this level of perfor-
mance was labeled Borderline. Above this, all EI-5 scores
were considered a Fail, with each additional value providing
incremental evidence for the accuracy of the classification. To
maximize the purity of the reference groups used to calibrate
the target PVT, Borderline cases were excluded from
Table 1 The components of the effort index (EI-5), base rates for
failing each cutoff with cumulative failure rates, and percent of patients
passing the most liberal cutoff
EI-5 components EI-5 values
0123
RDS 87 6 5
Base rate 54.3 22.9 15.2 7.6
DS
ACSS
76 5 4
Base rate 62.5 13.5 16.3 7.7
LM recognition 21 20 19 18
Base rate 76.5 11.8 3.9 7.8
CVLT-II Rec Hit 11 10 9 8
Base rate 73.6 7.5 5.7 13.2
CVLT-II FCR 16 15 14 13
Base rate 56.2 16.2 10.5 17.1
RDS reliable digit span (Greiffenstein, Baker, & Gola 1994; Larrabbee
2003;Heinlyetal.,2005; Pearson, 2009), DS
ACSS
Digit Span age-
corrected scaled score (Heinly et al., 2005; Spencer et al., 2013); LM
Logical Memory (Bortnik et al., 2010; Pearson, 2009), CVLT-II recogni-
tion hits (RH
CVLT-II
; Wolfe et al., 2010), and Forced Choice Recognition
(FCR
CVLT-II
; Moore & Donders, 2004; Bauer et al., 2005)
Psychol. Inj. and Law
cross-validation analyses, following recommendations by
Greve and Bianchini (2004).
Finally, the logistical regression equations developed by
Suhr and Boyer (1999) and King et al. (2002)wereusedas
alternative reference PVTs. They were selected because they
are based on WCST variables that are not verbally mediated
(like the EI-5) or recognition based (like the TOMM). As
such, they provide independent indicators of performance va-
lidity, which is an essential feature of multivariate models of
validity assessment (Boone, 2013;Larrabbee2003).
Procedure
Tests were administered in an outpatient setting by staff psy-
chometricians and postdoctoral fellows under the supervision
of licensed clinical psychologists with specialty training in
neuropsychology. Data were collected as part of a retrospec-
tive chart review using the clinical archives of the neuropsy-
chology service at a Northeastern academic medical center.
The final sample consisted of consecutive referrals that fit a
priori inclusion criteria: (1) The file contained sufficient infor-
mation to establish a diagnosis; (2) one of the four conditions
was the main diagnosis and was active at the time of testing;
(3) data were available on intellectual functioning as well as a
complete CVLT-II and CIM administration. The project was
approved by the Institutional Review Board. Relevant ethical
guidelines were followed throughout the data collection
process.
Data Analysis
Descriptive statistics were computed for the variables of inter-
ests. Inferential statistics included ANOVAs, independent
samples ttest, and χ
2
. Post hoc pair-wise contrasts were com-
puted using standard uncorrected ttests. The Ftest was used
for pair-wise comparison of sample variances. Effect size es-
timates were expressed in Cohensd,partialη
2
,andΦ
2
.Area
under the curve (AUC) was calculated using SPSS 22.0. The
rest of classification accuracy parameters (sensitivity, specific-
ity, likelihood ratios) were computed using standard formulas.
Results
The mean CIM raw score in the present sample was 10.2
(SD =1.8;range 312). This was significantly lower than the
mean of the normative sample (Borod et al., 1980):
t(105) = 5.82, p< .001, d= .67. Low scores were rare: only
7.5 % of the sample scored <8, and only 3.8 % scored <7
(Table 2). Mean CIM Tscore was 39.3 (SD = 14.3; range 5
63), which was significantly lower than the nominal popula-
tion mean of 50: t(105) = 7.70, p<.001, d=.87.
Next, a series of ttests were performed using the dichoto-
mized (Pass/Fail) reference PVTs as the independent vari-
ables and the CIM as the dependent variable. All contrasts
were significant. Effect sizes ranged from .58 to .86. As the
mean raw score differences between the valid and invalid
groups were small (0.91.3) and clustered around 10, two
cutoffs (a liberal 9 and a conservative 8) were chosen for
further testing. These raw scores were matched, based on per-
centile rank, to Tscores of 29 and 23, respectively.
Against the EI-5, all CIM cutoffs cleared the lower thresh-
old for specificity (.84; Larrabbee, 2003). The 9 raw score
cutoff produced a sensitivity of .39. Lowering it to 8 de-
creased sensitivity to .18. A Tscore 29 had .37sensitivity,
which decreased to .18 when the cutoff was lowered to 23.
Against the TOMM, the liberal cutoffs narrowly missed the
lower threshold for specificity (.82). The 9 raw score cutoff
produced .40 sensitivity. Lowering the raw score cutoff to 8
decreased sensitivity to .17, but improved specificity to .95. A
Tscore 29 had .37 sensitivity. Lowering the Tscore to 23
decreased sensitivity to .20, but increased specificity to .92.
Against the Suhr and Boyer (1999) equation, all CIM cut-
offs cleared the lower threshold for specificity. The 9raw
score cutoff had .36 sensitivity. Lowering the raw score cutoff
to 8 decreased sensitivity to .17, but improved specificity to
.98. A Tscore 29 had .34 sensitivity. Lowering the Tscore
cutoff to 23 decreased sensitivity to .22, but improved spec-
ificity to .98. Against the King et al. (2002) equation, the
liberal cutoffs failed to reach the lower threshold for specific-
ity, but the conservative cutoffs achieved specificity .90 at
.33 sensitivity. Further details are displayed in Table 3.
Dual CutoffsRaw Scores vs. TScores
Having two sets of cutoffs (raw and Tscore) creates the pos-
sibility of an internal discrepancy. To provide empirically
based guidelines for resolving such a divergence, patients
who failed one cutoff, but not the other, were examined to
determine the most appropriate clinical classification. One pa-
tient who scored 8 (raw), but T >23 also failed five PVTs. As
such, this profile can be confidently classified as invalid. On
the other side, three patients scored 23 (T), but >8 (raw).
They all had a raw score of 9 and failed 2 PVTs. As such,
their test data should be classified as invalid.
Pattern Analysis as a Validity Check
in Neuropsychological Testing
Another method for differentiating effort from impairment is
examining the pattern of scores across cognitive domains.
Certain combinations of high and low scores that are incon-
sistent with known patterns of neuropsychological impair-
ment provide further evidence for non-credible performance
(Boone, 2013). Since the present study focused on the CIM,
Psychol. Inj. and Law
we examined other tests that target similar constructs (Digit
Span, Similarities, Vocabulary, Arithmetic, Letter-Number
Sequencing, Logical Memory Immediate Recall, verbal fluen-
cy, Boston Naming Test, longest Digit Span forward). These
tests are generally more challenging than the CIM. Therefore,
if a patient obtains a low score on the CIM, low score would
also be expected on these other tests. Intact performance on
these instruments was defined as 1SDbelowthemean
(scaled score 7, Tscore 40) or longest digit span forward
5. Contrary to expectation, patients who failed the liberal
CIM cutoffs produced several scores in the intact range on
these language-based measures. More than half had 4intact
scores, while more than a quarter of the sample had 6intact
scores. Likewise, more than half of those who failed the con-
servative cutoffs had 3 intact scores, while a quarter had 4
intact scores.
Performance Validity and Depression
Finally, the four diagnostic groups were compared on refer-
ence PVTs, CIM, and BDI-II scores. There was a large effect
(partial η
2
= .25) on the EI-5, mainly driven by the high PCD
mean and low EPI mean. The overall model for BDI-II
approached significance (p= .09), with medium effect (partial
η
2
= .07). This negative finding may be due to large within-
group variability in PCD, PSY, and PNES. Otherwise, the
group means form a clinically interpretable pattern with the
PSY sample being the most depressed and the EPI sample
being the least depressed.
No significant difference emerged on CIM scores. This
finding is unlikely to be due to small sample size or unusually
large within-group variability. All four diagnostic categories
produced raw scores within 0.9 units. While there was more
variability among group means in Tscores, the within-group
variability was also higher, washing out the main effect.
When the analyses were refocused on base rates of failure,
significant group differences emerged with very large effects
on the reference PVTs and small-medium effects on the CIM
(Table 4). EPI consistently had the lowest base rate of failure
(917 %), while PCD and PSY had very high rates (6193 %).
Base rates of PVT failure on PNES hovered around 20 %.
PCD and PSY had the highest and most variable failure rates
on the CIM (941 %), while PNES had consistently low rates
(5 %), with EPI in between (721 %).
Of the 27 patients who failed the liberal cutoffs on the CIM,
the majority (52 %) were PSY, followed by PCD and EPI
(22 %), and PNES (4 %). Of the six EPI patients, five had a
history of psychiatric comorbidities (depression, anxiety, sui-
cidal ideation). The only PNES patient who failed the CIM
cutoffs had a complex medical history (jaundice at birth, TBI
at age two, and muscular dystrophy). Of the six PCD patients,
five had a psychiatric history predating the head injury (de-
pression, anxiety, somatic symptoms, alcohol and cocaine
abuse), and one of them had a history of driving-while-
intoxicated and physical violence with police involvement.
Finally, to examine the effect of potential secondary gain
within the PCD sample, the failure rates on reference PVTs
were compared between patients who reported being in litiga-
tion at the time of the assessment and those who did not. No
significant relationship was found between litigation status
and failure rates on any of the reference PVTs or CIM (raw
or Tscore based, liberal or conservative cutoff).
Discussion
The present study examined the relationship between CIM
scores, performance validity, and clinical diagnosis in a sam-
ple of 106 adults clinically referred for neuropsychological
assessment. Results are largely consistent with the previous
investigation of the link between the CIM and performance
Tabl e 2 CIM raw and Tscore
distributions for the whole sample
(N=106)
CIM Raw Score fPercent Cumulative % CIM T-Score fPercent Cumulative %
31.9.920 12 11.3 11.3
4 1 .9 1.9 2125 8 7.5 18.9
5 1 .9 2.8 2630 6 5.6 24.5
6 1 .9 3.8 3135 20 18.8 43.4
7 4 3.8 7.5 3640 13 12.3 55.7
8 5 4.7 12.3 4145 11 10.4 66.0
9 13 12.3 24.5 4650 8 7.6 73.6
10 32 30.2 54.7 5155 6 5.6 79.2
11 18 17.0 71.7 5660 19 17.9 97.2
12 30 28.3 100.0 6165 3 2.8 100.0
Tscores are demographically adjusted scores based on norms published by Heaton, Miller, Taylor, & Grant (2004)
CIM Complex Ideational Material subtest of the Boston Diagnostic Aphasia Examination
Psychol. Inj. and Law
validity (Erdodi & Roth, 2016). Our first hypothesis that low
CIM scores would be associated with higher rates of failure on
PVTs was supported. CIM cutoffs were effective at identify-
ing psychometrically defined invalid responding.
Our second hypothesis that psychiatric history would be
related to increased failure rates on both PVTs and CIM pro-
duced mixed results. Consistent with our prediction, PSY had
the highest base rate of failure on CIM and comparable rates to
PCD on reference PVTs. These two groups also reported the
highest level of depression on the BDI-II and had the highest
rates of psychiatric symptoms. However, no clear pattern
emerged in the validity profiles of PNES and EPI patients.
While the inability of PVTs to reliably differentiate these
two diagnostic categories is disappointing, this finding is
Tabl e 3 Classification accuracy of CIM raw and Tscore cutoffs against reference PVTs
CIM AUC EI-5 Hypothetical base rates
(95 % CI) SENS SPEC +LR LR .10 .20 .30 .40 .50
Raw score 9 .72 (.60.83) .39 .85 2.6 .72 PPP .22 .39 .52 .63 .72
NPP .93 .85 .76 .67 .58
Raw score 8 .18 .92 2.4 .88 PPP .20 .36 .49 .60 .69
NPP .91 .82 .72 .63 .53
T-scor e 29 .66 (.54.78) .37 .85 2.4 .75 PPP .22 .38 .51 .62 .71
NPP .92 .84 .76 .67 .57
T-scor e 23 .18 .90 1.8 .91 PPP .17 .31 .44 .55 .64
NPP .91 .81 .72 .62 .52
TOMM
Raw score 9 .66 (.53.76) .40 .82 2.2 .74 PPP .20 .36 .49 .60 .69
NPP .93 .85 .76 .67 .58
Raw score 8 .17 .95 3.2 .88 PPP .27 .46 .59 .69 .77
NPP .91 .82 .73 .63 .53
T-scor e 29 .62 (.60.82) .37 .82 2.0 .78 PPP .19 .34 .47 .58 .67
NPP .92 .84 .75 .66 .57
T-scor e 23 .20 .92 2.5 .87 PPP .22 .39 .52 .63 .71
NPP .91 .82 .73 .63 .54
Suhr & Boyer (1999)Equation
Raw score 9 .71 (.61.83) .36 .86 2.6 .74 PPP .22 .39 .52 .63 .72
NPP .92 .84 .76 .67 .57
Raw score 8 .17 .98 7.5 .85 PPP .49 .68 .79 .85 .90
NPP .91 .83 .73 .64 .54
T-scor e 29 .72 (.61.83) .34 .89 3.0 .74 PPP .26 .44 .57 .67 .76
NPP .92 .84 .76 .67 .57
T-scor e 23 .22 .98 9.7 .80 PPP .55 .73 .83 .88 .92
NPP .92 .83 .75 .65 .56
King et al. (2002)Equation
Raw score 9 .74 (.58.90) .33 .80 1.7 .83 PPP .16 .29 .41 .52 .62
NPP .92 .83 .74 .64 .54
Raw score 8 .33 .93 5.0 .71 PPP .34 .54 .67 .76 .83
NPP .93 .85 .76 .68 .58
T-scor e 29 .78 (.65.91) .50 .83 2.9 .60 PPP .25 .42 .56 .66 .75
NPP .94 .87 .80 .71 .62
T-scor e 23 .33 .91 0 .74 PPP .29 .48 .61 .71 .79
NPP .92 .85 .76 .67 .58
Failure on EI-5 defined as 4; failure on TOMM defined as one or more of the following scores: trial 1 39, trial 2, and retention 48
AUCarea under the curve, LR likelihood ratio, PPP positive predictive power, NPP negative predictive power
Psychol. Inj. and Law
consistent with the results of previous studies (Cragar, Berry,
Fakhoury, Cibula, & Schmitt, 2006;Dodrill,2008).
The data analysis can be reduced to a number of clinically
relevant findings: (1) CIM scores were consistently lower in
patients who failed reference PVTs. (2) The liberal set of CIM
cutoffs (raw 9/T 29) produced specificity values hovering
around the lower threshold (.84), while the conservative cut-
offs (raw 8/T 23) had specificity .90. (3) Regardless of
scale (raw/T) or threshold (liberal/conservative), CIM cutoffs
produced remarkably consistent classification accuracy
against a variety of reference PVTs. (4) Individuals with EPI
may be at risk for an elevated false positive rate and may
warrant the use of the more conservative cutoffs. (5) Within
each pair (liberal and conservative), raw and Tscore cutoffs
(9/T 29 and 8/T 23) produce equivalent cognitive pro-
files. Thus, failing either one of them provides sufficient evi-
dence of non-credible performance. (6) Across diagnostic
groups, PCD and PSY had the highest PVT failure rate,
followed by PNES. Generally, EPI had the lowest base rate
of failure. Also, PNES seemed remarkably immune to failing
CIM cutoffs, regardless of scale or threshold. (7) Despite
ample evidence that those who scored 9/29 on the CIM pro-
duced an overall invalid profile, the majority of them still
performed within normal limits on several other, more diffi-
cult language-based tests. This finding further supports the
argument that low scores are more likely to indicate poor
effort than aphasia.
These findings converge in a couple of overarching con-
clusions. First, scoring 9/29 on the CIM is a more reliable
indicator of invalid responding than credible language im-
pairment in patients without bona fide aphasia. Therefore,
while the CIM may not be a very effective general aphasia
screener, it shows promise as a PVT. Second, if the four-
fold difference in PVT failure rates between EPI and PNES
replicates in larger samples, it could serve as a valuable
tool in separating these two groups with different etiology,
but very similar presentation.
In a more general sense, non-credible responding in this
sample appears to be related to psychiatric functioning, albeit
in a complex way. The PCD and PSY groups demonstrated
complex psychiatric history and acute emotional distress along
with unusually high rates of PVT failure (6193 %) that
Tabl e 4 Performance on PVTs
and the BDI-II across diagnostic
groups
EI-5 TOMM BDI-II CIM
Raw T
Cutoff 439/48 9829 23
PCD M 5.6 20.4 10.2
a
43.3
a
SD 3.5
a
14.1 1.95 12.9
BR
Fail
93.3 75.0 26.1 13.0 21.7 8.7
PSY M 4.2 24.1 9.7
a
34.6
a
SD 3.8
a
13.6 2.1
a
16.2
BR
Fail
60.7 73.3 41.2 20.6 41.2 29.4
PNES M 2.1 18.3 10.6
a
40.4
a
SD 2.8
a
13.6 1.2 12.6
BR
Fail
27.3 18.2 5.0 5.0 5.0 5.0
EPI M 1.1 15.7 10.6
a
40.8
a
SD 1.6 10.3 1.3 13.4
BR
Fail
17.4 9.1 20.7 6.9 20.7 6.9
ANOVA p<.001 .09 .16 .12
η
2
.25 .07 .05 .06
Sig. post hocs 1-2-3-4 33 5
χ
2
p<.001 <.01 <.05 .27 <.05 <.05
Φ
2
.32 .40 .09 .04 .09 .09
Failure on TOMM defined as one or more of the following scores: trial 1 39, trial 2 and retention 48 (Greve,
Bianchini, & Doane, 2006;Jones,2013; Stenclik, Miele, Silk-Eglit, Lynch, & McCaffrey, 2013 1. PCD vs. PNES,
2. PCD-EPI, 3. PSY-EPI, 4. PSY-PNES, 5. PCD-PSY
Values in bold face reflect the percent of the sample that failed a given cutoff
BR
Fail
:Baserate of failure (% of the sample that failed a given cutoff)
a
p<.05one-sample ttest against the mean of 11.2 produced by 147 neurologically normal, right-handed,
English speaking adult males (Borod et al. 1980) or the mean of 50 (T-distribution) and Ftest between two
variances using the SD of the EPI group as a reference
Psychol. Inj. and Law
exceeded the estimated base rates (30-50 %) in forensic set-
tings (Martin et al., 2015). However, patients with PNES are
also known to have complicated psychiatric histories, yet the
predicted elevated base rate of PVT failure was not observed. It
is possible that although psychiatric problems lead to increased
failure rates for some diagnostic groups, there may be moder-
ating factors that result in diverging PVT profiles within
broader clinical categories that share a common etiology.
Results suggest that previous research may have
underestimated the link between emotional distress and PVT
failures. A possible explanation for such divergent results may
be that earlier studies that reported no relationship between
psychiatric factors and performance validity tended to focus
on patients with depression as the main presenting problem
(Considine et al., 2011; Egeland et al., 2005;Reesetal.,
2001), whereas our sample consisted of a heterogeneous
group of individuals, many of whom had a combination of
severe mental illness, complex emotional trauma, and somatic
complaints.
The results should be interpreted in the context of the
studys limitations. First, the sample size is relatively small
and geographically restricted, so the findings may not gen-
eralize to other regions with different demographic charac-
teristics and medico-legal systems. Further, data on exter-
nal incentive status and remote trauma history were not
available outside the PCD group. Finally, a sample of pa-
tients with bona fide aphasia would have served as a valu-
able clinical control.
The study also has several strengths. It utilized stan-
dardized, well-validated, and widely used neuropsycholog-
ical tests in clinical groups that were strategically selected
so that effort and impairment could be dissociated by de-
sign. To complement the cross-validation analyses, scores
on other language-based measures were also examined in
patients who failed the newly developed CIM validity cut-
offs. Intact performance on more difficult verbal tests pro-
vided further evidence of within-profile inconsistency,
which is considered an emergent marker of non-credible
presentation (Boone, 2013).
In summary, the combination of repeated PVT failures,
lack of a verifiable neurological impairment, and absence of
bona fide aphasia renders a low score on CIM an unreliable
indicator of receptive language deficits. At the same time, a
CIM score 9/29 appears to be a promising new indicator of
invalid responding. The observed pattern of between-group
differences is interesting in light of the psychiatric, psychoso-
matic, and motivational differences across the four diagnostic
categories. Future research should examine the interaction
among psychiatric factors, such as depression, somatic con-
cerns, and trauma history and potential moderating variables,
such as attribution style, tendency for rumination, or other
potential moderators in order to better understand the potential
psychogenic etiology behind PVT failure.
Compliance with Ethical Standards
Conflict of Interest None.
Ethical Approval All procedures performed in studies involving hu-
man participants were in accordance with the ethical standards of the
institutional and/or national research committee and with the 1964
Helsinki Declaration and its later amendments or comparable ethical
standards.
Informed Consent Given that the data were collected as part of a
retrospective archival study, the requirement for informed consent was
waived by the Committee for the Protection of Human Subjects at the
institution where the data were collected. The project was also approved
by the Research Ethics Board of the institution where the study was
completed.
References
American Psychiatric Association (2013). Diagnostic and statistical
manual of mental disorders. (5th ed.). Washington, DC: Author.
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., & McCaffrey, R. J.
(2005). An examination of the California Verbal Learning Test II to
detect incomplete effort in a traumatic brain injury sample. Applied
Neuropsychology, 12(4), 202207.
Benbadis, S. R., & Hauser, A. (2000). An estimate of the prevalence of
psychogenic non-epileptic seizures. Seizure, 9(4), 280281.
Boone, K. B. (2007). Assessment of feigned cognitive impairment. A
neuropsychological perspective. New York: Guilford.
Boone, K. B. (2009). The need for continuous and comprehensive
sampling of effort/response bias during neuropsychological exami-
nations. The Clinical Neuropsychologist, 23(4), 729741.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.New
York: Guilford.
Borod, J. C., Goodglass, H., & Kaplan, E. (1980). Normative data on the
Boston diagnostic aphasia examination, parietal lobe battery, and the
Boston naming test. Journal of Clinical Neuropsychology, 2,
209215.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E.,
Victor, T. L., & Zeller, M. A. (2010). Examination of various
WMS-III logical memory scores in the assessment of response bias.
The Clinical Neuropsychologist, 24(2), 344357.
Christensen, H., Griffiths, K., MacKinnon, A., & Jacomb, P. (1997). A
quantitative review of cognitive deficits in depression and
Alzheimer-type dementia. Journal of the International
Neuropsychological Society, 3, 631651.
Considine, C., Weisenbach, S. L., Walker, S. J., McFadden, E. M., Franti,
L. M., Bieliauskas, L. A., Langenecker, S. A. (2011). Auditory
memory decrements, without dissimulation, among patients with
major depressive disorder. Archives of Clinical Neuropsychology,
26, 445453.
Cragar, D. E., Berry, D. T., Fakhoury, T. A., Cibula, J. E., & Schmitt, F. A.
(2006). Performance of patients with epilepsy or psychogenic non-
epileptic seizures on four measures of effort. The Clinical
Neuropsychologist, 20(3), 552566.
Dodrill, C. B. (2008). Do patients with psychogenic nonepileptic seizures
produce trustworthy findings on neuropsychological tests?
Epilepsia, 49(4), 691695.
Egeland, J., Lund, A., Landro N. I., Rund, B. R., Sudet, K., Asbjornsen,
A., Stordal, K. I. (2005). Cortisol level predicts executive and
memory function in depression, symptom level predicts psychomo-
tor speed. Acta Psychiatr Scand, 112,434441.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B., T., Kucharski,
B., Zuccato, B. G., & Roth, R. M. (2016). WAIS-IV processing
Psychol. Inj. and Law
speed scores as measures of non-credible responding The third
generation of embedded performance validity indicators.
Psychological Assessment.
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE Complex
Ideational Material are associated with invalid performance in adults
without aphasia. Applied Neuropsychology: Adult.
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-ONeill, R., & Medoff,
B. (2014). Aggregating validity indicators embedded in Conners
CPT-II outperforms individual cutoffs at separating valid from inva-
lid performance in adults with traumatic brain injury. Archives of
Clinical Neuropsychology, 29(5), 456466.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of ma-
lingered amnesia measures with a large clinical sample.
Psychological Assessment, 6, 218224.
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on
psychometric indicators of negative responsebias: a methodological
commentary with recommendations. Archives of Clinical
Neuropsychology, 19, 533541.
Greve, K. W., Bianchini, K. J., & Doane, B. M. (2006). Classification
accuracy of the test of memory malingering in traumatic brain inju-
ry: results of a known-group analysis. JournalofClinicaland
Experimental Neuropsychology, 28(7), 11761190.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, L. (2004). Revised
comprehensive norms for an expanded Halstead-Reitan battery:
demographically adjusted neuropsychological norms for African
American and Caucasian adults.Lutz:PAR.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A.
(2005). WAIS digit-span-based indicators of malingered
neurocognitive dysfunction: classification accuracy in traumatic
brain injury. Assessment, 12(4), 429444.
Jones, A. (2013). Test of memory malingering: cutoff scores for psycho-
metrically defined malingering groups in a military sample. The
Clinical Neuropsychologist, 27(6), 10431059.
King, J. H., Sweet, J. J., Sherer, M., Curtiss, G., & Vanderploeg, R. D.
(2002). Validity indicators within the Wisconsin card sorting test:
application of new and previously researched multivariate proce-
dures in multiple traumatic brain injury samples. The Clinical
Neuropsychologist, 16(4), 506523.
Langenecker, S. A., Bieliauskas, L. A., Rapport, L. J., Zubieta, J. K.,
Wilde, E. A., & Berent, S. (2005). Face emotion perception and
executive functioning deficits in depression. Journal of Clinical
and Experimental Psychology, 27, 320333.
Larrabbee, G. J. (2003). Detecting of malingering using atypical
performance patterns on standard neuropsychological tests. The
Clinical Neuropsychologist, 17(3), 410425.
Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015).
Neuropsychologistsvalidity testing beliefs and practices: a survey
of North American professionals. The Clinical Neuropsychologist,
29(6), 741746.
Moore, B. A., & Donders, J. (2004). Predictors of invalid neuropsycho-
logical performance after traumatic brain injury. Brain Injury,
18(10), 975984.
Myers, L. (2014). Psychogenic non-epileptic seizures: a guide. North
Charleston: CreateSpace.
Pearson (2009). Advanced Clinical Solutions for the WAIS-IV and
WMS-IV Technical Manual. San Antonio, TX: Author.
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the
test of memory malingering. Archives of Clinical Neuropsychology,
16, 501506.
Rohling, M. L., Green, P., Allen, L. M., & Iverson, G. L. (2002).
Depressive symptoms and neurocognitive test scores in patients
passing symptom validity tests. Archives of Clinical Neuropsychology,
17, 205222.
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine, B.,
Pangilinan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable
digit span is no more accurate than age corrected scaled score as
an indicator of invalid performance in a veteran sample undergoing
evaluation for mild TBI. The Clinical Neuropsychologist, 27(8),
13621372.
Stenclik, J. H., Miele, A. S., Silk-Eglit, G., Lynch, J. K., & McCaffrey, R.
J. (2013). Can the sensitivity and specificity of the TOMM be in-
creased with different cut-off scores? Applied Neuropsychology:
Adult, 20(4), 243248.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card sorting test in
the detection of malingering in student simulator and patient
samples. Journal of Clinical and Experimental Psychology, 21(5),
701708.
Suhr, J., Tranel, D., Wefel, J., &Barrash, J. (1997). Memory performance
after head injury: contributions of malingering, litigation status, psy-
chological factors, and medication use. Journal of Clinical and
Experimental Psychology, 19(4), 500514.
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., &
Sweet, J. J. (2010). Effort indicators within the California Verbal
Learning Test-II (CVLT-II). The Clinical Neuropsychologist,
24(1), 153168.
World Health Organization (2014). ICD-10: International statistical
classification of disease and related health problems: 10th revision
(2nd ed.). Geneva, Switzerland: Author.
Psychol. Inj. and Law
... Recent research evaluated the potential of the CIM as an embedded performance validity test (PVT). In a mixed clinical sample of 106 patients, Erdodi et al. (2016) reported that a raw score of ≤ 9 or demographically adjusted T-score of ≤ 29 is associated with psychometrically defined invalid performance. However, specificity fluctuated across criterion PVTs (.80-.89). ...
... To summarize, a CIM raw score of 9 corresponds to a demographically adjusted T-score of 27 (impaired). However, these scores have also been shown to be specific to invalid performance (Erdodi, 2019;Erdodi et al., 2016;. The next raw score (10) corresponds to a demographically adjusted T-score of 36 (borderline). ...
... This case study illustrates the limits of the CIM as an embedded PVT. In the context of the patient's neurodevelopmental history and pattern of performance on extensive cognitive testing, failing the previously proposed validity cutoff (raw ≤9/T ≤ 29; Erdodi et al., 2016) likely reflects a false positive. Considering potential extenuating factors [congenital perceptual deficits (color blindness), SLD, orthopedic injuries, etc.), the aggregate versus individual data points (Medici, 2013;Odland et al., 2015;A. ...
Article
Full-text available
Objective: This paper describes a clinical case illustrating the limitations of the Complex Ideational Material (CIM) as an embedded performance validity test (PVT). Method: A comprehensive neuropsychological battery was administered to a 19-year-old male to evaluate the residual effects of a motor vehicle collision. Results: The patient passed all free-standing PVTs and the majority of embedded validity indicators. Failing the CIM (≤9) in the context of long-standing, documented deficits in semantic processing and following the difficulty gradient inherent in the task (i.e. all errors occurred on later items) likely represents a false positive. Conclusions: Future research on CIM as PVT should include an item-level analysis in addition to the overall score to reduce the risk of misidentifying bona fide deficits as non-credible responding. More broadly, genuine impairment and invalid performance may be psychometrically indistinguishable in individual embedded PVTs. Failures on embedded validity cutoffs should be interpreted in the context of the patient's clinical history. Routinely administering a comprehensive battery of neuropsychological tests can aid the interpretation of isolated atypical scores.
... A number of studies have also found that FPRs are unacceptably high when using certain EVIs in populations with specific diagnoses or syndromes. For example, patients with genuine language difficulties (e.g., aphasia, dyslexia) have higher FPRs on EVIs derived from language or reading ability tests (Erdodi et al., 2016;Hurtubise et al., 2017;Whiteside et al., 2015). Patients with frontal intracranial damage following traumatic brain injury also perform poorly on EVIs within tests of executive functioning (Nelson et al., 2021). ...
... This may suggest that EVIs within tests assessing certain prototypical ADHD deficits have an increased propensity to conflate true cognitive difficulties with invalid performance in individuals with bona fide ADHD. Some studies have shown that FPRs may be more pronounced when using EVIs within tests that assess cognitive deficits associated with certain mental disorders (Erdodi et al., 2016;Hurtubise et al., 2017;Whiteside et al., 2015); but these findings are based on a few studies, none of which specifically focus on ADHD. By using EVIs that are based on "improbably" low cut scores, genuine impairments may be conflated with invalid performance during testing. ...
Article
Objective This study investigated why certain embedded performance validity indicators (EVIs) are prone to higher false-positive rates (FPRs) in attention-deficit/hyperactivity disorder (ADHD) evaluations. The first aim was to establish the relationship between FPRs and 15 EVIs derived from six cognitive tests when used independently and together among adults with ADHD who have valid test performance. The second aim was to determine which specific EVIs increase the FPRs in this population. Method Participants were 517 adult ADHD referrals with valid neurocognitive test performance as determined by multiple performance validity tests and established empirical criteria. FPRs were defined by the proportion of participants who scored below an empirically established EVI cutoff with ≥0.90 specificity. Results EVIs derived from two of the six tests exhibited unacceptably high FPRs (>10%) when used independently, but the total FPR decreased to 8.1% when the EVIs were aggregated. Several EVIs within a sustained attention test were associated with FPRs around 11%. EVIs that did not include demographically adjusted cutoffs, specifically for race, were associated with higher FPRs around 14%. Conversely, FPRs did not significantly differ based on whether EVIs included timed versus untimed, verbal versus nonverbal, or graphomotor versus non-graphomotor components, nor whether they had raw versus standardized cut scores. Conclusions Findings suggest that practitioners should consider both the type of test from which an EVI is derived and the aggregate number of EVIs employed to minimize the FPRs in ADHD evaluations. Findings also indicate that more nuanced approaches to validity test selection and development are needed.
... For example, separate cutoffs are needed for men and women on the Finger Tapping Test (Arnold et al., 2005). The signal detection profile of raw and T-score cutoffs on the Complex Ideational Material differs based on examinee demographics Erdodi et al., 2016). Raw score-based validity cutoffs on the Trail Making Test result in higher failure rate in older and less educated individuals, whereas cutoffs based on demographically adjusted T-scores neutralize this effect (Abeare et al., 2019b). ...
Chapter
Full-text available
This chapter reviews factors that are rarely considered as the focus of empirical research, yet they likely impact the outcome of neuropsychological assessments and are often recognized as relevant clinical variables by practitioners. Carefully considering their impact on the results of the assessment can enhance the overall case conceptualization and the validity of the clinical interpretations. In addition to a review of relevant literature, the chapter includes case studies and new data to illustrate key concepts. Complicating factors are grouped into three categories: (1) Examinee-related variables, (2) Contextual variables, and (3) Assessment artifacts. Tentative solutions to managing these confounding variables are discussed.
... Psychogenic interference (Tarachow, 1947) is a term used to describe the phenomenon when acute emotional distress suppresses an individual's ability to perform at their optimal ability level (Bigler, 2012;Erdodi et al., 2016). One of the psychometric markers of psychogenic interference is internally inconsistent cognitive profiles that are typically interpreted as evidence of non-credible responding, which is sometimes referred to as compelling inconsistency Erdodi et al., 2018a, b, c, d;Sherman et al., 2020). ...
Article
Full-text available
Little is known about the neuropsychological profiles associated with sexual abuse (SA). Using a serial case study design, we examined the cognitive profile from 11 patients medically referred for neuropsychological evaluation who endorsed a remote history of SA. The data can be best summarized as the co-existence of strong psychometric evidence of non-credible responding [operationalized as multiple failures on performance validity tests (PVTs)] and intact (often better than expected) performance on difficult tests designed to measure cognitive ability across various domains. Five (45.5%) of the patients had strong evidence for noncredible responding and another three (27.3%) had indeterminate profiles (i.e., neither clearly valid nor clearly invalid). We propose that the paradoxical co-occurrence of multiple PVT failures and intact cognitive abilities may be a psychometric marker of complex trauma history. Naturally, this hypothesis needs extensive independent replication before it can be incorporated into routine clinical interpretation of neuropsychological data. Given the small sample size and the variability in demographic and clinical characteristics, results are considered preliminary. The effects of developmental timing and recurrence of SA, the survivor-perpetrator relationship, protective factors, and post-traumatic growth on cognitive functioning in people who have experienced SA are discussed. Future research is needed to better understand the long-term effects of SA on the neurocognitive profile of adult survivors. In the meantime, assessors may want to consider complex trauma history as a potential alternative explanation to feigning for multiple PVT failures and unexplained internal inconsistencies within the cognitive profile.
... a Specifically computed to convert traditional scores into a binomial experiment; 1-3, Acquisition trials; Animals, Category fluency (Curtis et al., 2008;Hurtubise et al., 2020;Sugarman & Axelrod, 2015); BC, Below chance level (below the 95% confidence interval around the mean); BIN, Cutoff based on the binomial distribution; BNT-15, Boston Naming Test-Short Form Deloria et al., 2021;); C, At chance level (within the 95% confidence interval around the mean); CD WAIS-IV , Coding (Ashendorf et al., 2017;; CIM, Complex Ideational Material (Erdodi, 2019;Erdodi et al., 2016;; COL, Color Naming; CPT-3, Conners' Continuous Performance Test-Third Edition (T = 90 is the highest score possible; Ord et al., 2020;Robinson et al., 2022); COM, Combination score (FR + true positives-false positives); DCT, Dot Counting Test (Boone et al., 2002;Hansen et al., 2022); Dem ADJ , Demographically adjusted score; DH, Dominant hand; D-KEFS, Delis Kaplan Executive System Eglit et al., 2020;; DR, Delayed recall; DS WAIS-IV , Digit Span subtest of the Wechsler Adult Intelligence Scale-Fourth Edition Whitney et al., 2009); EMP, Empirically derived cutoffs; EWFT, Emotion Word Fluency Test (Abeare, Hurtubise, et al., 2021); FAS, Letter fluency (Curtis et al., 2008;Deloria et al., 2021;Hurtubise et al., 2020); FCR, Forced choice recognition; FR, Free recall; FTT, Finger Tapping Test (Arnold et al., 2005;Axelrod et al., 2014;Erdodi, Taylor, et al., 2019); GPB, Grooved Pegboard Test Link et al., 2021); HVLT-R, Hopkins Verbal Learning Test-Revised (Cutler et al., 2021;Sawyer et al., 2017); IOP-M, Inventory of Problems-29 memory module (Giromini et al., 2020a(Giromini et al., , 2020bHolcomb, Pyne, et al., 2022 Greve et al., 2006;Heinly et al., 2005), PVT failures had a much stronger relationship with FCR CVLT-II or TOMM-2 scores than external incentive status, suggesting that the objective psychometric evidence of invalid performance is a better predictor of the credibility of a given clinical presentation than contextual variables (e.g., being in litigation). Finally, the remarkably consistent findings across the two samples and two sets of criterion PVTs reinforce previous claims that a single error on these two PVTs provides sufficient evidence of invalid performance. ...
Article
Full-text available
This study was designed to empirically evaluate the classification accuracy of various definitions of invalid performance in two forced-choice recognition performance validity tests (PVTs; FCR CVLT-II and Test of Memory Malingering [TOMM-2]). The proportion of at and below chance level responding defined by the binomial theory and making any errors was computed across two mixed clinical samples from the United States and Canada (N = 470) and two sets of criterion PVTs. There was virtually no overlap between the bino-mial and empirical distributions. Over 95% of patients who passed all PVTs obtained a perfect score. At chance level responding was limited to patients who failed ≥2 PVTs (91% of them failed 3 PVTs). No one scored below chance level on FCR CVLT-II or TOMM-2. All 40 patients with dementia scored above chance. Although at or below chance level performance provides very strong evidence of non-credible responding, scores above chance level have no negative predictive value. Even at chance level scores on PVTs provide compelling evidence for non-credible presentation. A single error on the FCR CVLT-II or TOMM-2 is highly specific (0.95) to psychometrically defined invalid performance. Defining non-credible responding as below chance level This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
... Note. EI-7: Erdodi Index Seven; Animals: Category fluency (demographically adjusted T-score based on norms by Heaton et al., 2004;Hayward et al., 1987;Hurtubise et al., 2020;Sugarman & Axelrod, 2015); BNT-15: Boston Naming Test -Short Form ; T2C: Time to completion (seconds); CDWAIS-III : Coding subtest of the Wechsler Adult Intelligence Scale -Third Edition (age-corrected scaled score; Ashendorf et al., 2017;Etherton et al., 2006;Inman & Berry, 2002;Kim et al., 2010;Trueblood, 1994); CIM: Raw score on the Complex Ideational Material subtest of the Boston Diagnostic Aphasia Battery Erdodi, 2019;Erdodi et al., 2016;; FAS: Letter fluency (demographically adjusted T-score based on norms by Heaton et al., 2004;Curtis et al., 2008;Hurtubise et al., 2020;Sugarman & Axelrod, 2014;Whiteside et al., 2015); GPBDominant : Grooved Pegboard Test dominant hand (demographically adjusted T-score based on norms by Heaton et al., 2004;Erdodi, Kirsch, et al., 2018;Erdodi, Seke et al., 2017); RDS: Reliable Digit Span (Greiffenstein et al., 1994;Pearson, 2009;Reese, et al., 2012;Schroeder et al., 2012;Webber & Soble, 2018). ...
Article
Full-text available
This study sought to provide a direct comparison between old and new versions of the Trail Making Test (TMT). Eighty-five undergraduate student volunteers were administered the old and new TMT. A third of them were instructed to feign neuropsychiatric deficits. The classification accuracy of the TMTs was evaluated against experimentally induced and psychometrically defined invalid performance. Results showed that the old TMT demonstrated superior psychometric properties, both as a measure of cognitive ability and performance validity. In conclusion, newer and more sophisticated versions of a test are not necessarily better than older, established instruments. Replications in clinical samples are needed to verify the findings.
Article
Full-text available
There are growing concerns that increasing the number of performance validity tests (PVTs) may inflate the false positive rate. Indeed, the number of available embedded PVTs increased exponentially within the last decades. However, the standard for declaring a neurocognitive profile invalid (≥ 2 PVT failures) has not been adjusted to reflect this change. Data were collected from 100 clinically referred patients with traumatic brain injury. Two distinct aggregation methods were used to combine multiple (5, 7, 9, 11 and 13) embedded PVTs into a single-number summary of performance validity using two established free-standing PVTs as criteria. Multivariate cutoffs had to be adjusted to contain false positives: ≥ 2 failures out of nine or more dichotomized (Pass/Fail) PVTs had unexpectedly low multivariate specificity (.76-.79). However, ≥ 4 failures resulted in high specificity (.90-.96), even out of 13 embedded PVTs. Multivariate models of embedded PVTs correctly classified between 92 and 96% of the sample at ≥ .90 specificity. Alternative aggregation methods produced similar results. Findings support the notion of the elasticity of multivariate cutoffs: as the number of PVTs interpreted increases, more stringent cutoffs are required to deem the profile invalid – at least until a certain level of evidence for non-credible responding accumulates (cutoff elasticity). A desirable byproduct of increasing the number of PVTs was improved sensitivity (.85–1.00). There is no such thing as too many PVTs – only insufficiently conservative multivariate cutoffs.
Article
Full-text available
This study was designed to evaluate the utility of the Atypical Responses (ATR) scale of the Trauma Symptom Inventory – Second Edition (TSI-2) as a symptom validity test (SVT) in a medicolegal sample. Archival data were collected from a consecutive case sequence of 99 patients referred for neuropsychological evaluation following a motor vehicle collision. The ATR’s classification accuracy was computed against criterion measures consisting of composite indices based on SVTs and performance validity tests (PVTs). An ATR cutoff of ≥ 9 emerged as the optimal cutoff, producing a good combination of sensitivity (.35-.53) and specificity (.92-.95) to the criterion SVT, correctly classifying 71–79% of the sample. Predictably, classification accuracy was lower against PVTs as criterion measures (.26-.37 sensitivity at .90-.93 specificity, correctly classifying 66–69% of the sample). The originally proposed ATR cutoff (≥ 15) was prohibitively conservative, resulting in a 90–95% false negative rate. In contrast, although the more liberal alternative (≥ 8) fell short of the specificity standard (.89), it was associated with notably higher sensitivity (.43-.68) and the highest overall classification accuracy (71–82% of the sample). Non-credible symptom report was a stronger confound on the posttraumatic stress scale of the TSI-2 than that of the Personality Assessment Inventory. The ATR demonstrated its clinical utility in identifying non-credible symptom report (and to a lesser extent, invalid performance) in a medicolegal setting, with ≥ 9 emerging as the optimal cutoff. The ATR demonstrated its potential to serve as a quick (potentially stand-alone) screener for the overall credibility of neuropsychological deficits. More research is needed in patients with different clinical characteristics assessed in different settings to establish the generalizability of the findings.
Article
Full-text available
This editorial article introduces the second special issue of Psychology & Neuroscience devoted to performance and symptom validity testing. The reason for including the second special issue is that we received an unusually large number of high-quality submissions that could not fit into a single volume. The articles included in this second part offer practical, immediately actionable knowledge to assessors while simultaneously advancing the methodology for calibrating instruments designed to evaluate the credibility of a given clinical presentation. In this introduction, we briefly summarize each article and reflect on an emerging epistemological question about the interpretation of noncredible results in the context of a clinical research study: If a relatively large proportion of clinical patients fail a validity test without any apparent external incentives to appear impaired, should this be interpreted as a possible vulnerability of that validity test to false-positive classifications or as evidence that noncredible responding is relatively common outside of medicolegal/forensic assessments? The methodological implications of symptom and performance validity research are discussed.
Article
Full-text available
This study was designed to expand on a recent meta-analysis that identified ≤42 as the optimal cutoff on the Word Choice Test (WCT). We examined the base rate of failure and the classification accuracy of various WCT cutoffs in four independent clinical samples (N = 252) against various psychometrically defined criterion groups. WCT ≤ 47 achieved acceptable combinations of specificity (.86–.89) at .49 to .54 sensitivity. Lowering the cutoff to ≤45 improved specificity (.91–.98) at a reasonable cost to sensitivity (.39–.50). Making the cutoff even more conservative (≤42) disproportionately sacrificed sensitivity (.30–.38) for specificity (.98–1.00), while still classifying 26.7% of patients with genuine and severe deficits as non-credible. Critical item (.23–.45 sensitivity at .89–1.00 specificity) and time-to-completion cutoffs (.48–.71 sensitivity at .87–.96 specificity) were effective alternative/complementary detection methods. Although WCT ≤ 45 produced the best overall classification accuracy, scores in the 43 to 47 range provide comparable objective psychometric evidence of non-credible responding. Results question the need for designating a single cutoff as “optimal,” given the heterogeneity of signal detection environments in which individual assessors operate. As meta-analyses often fail to replicate, ongoing research is needed on the classification accuracy of various WCT cutoffs.
Article
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Article
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
Article
Full-text available
Objective: The current study investigated changes in neuropsychologists' validity testing beliefs and practices since publication of the last North American survey targeting these issues in 2007 and explored emerging issues in validity testing that had not been previously addressed in the professional survey literature. Methods: Licensed North American neuropsychologists (n = 316), who primarily evaluate adults, were surveyed in regard to the following topics: (1) comparison of objective validity testing, qualitative data, and clinical judgment; (2) approaches to validity test administration; (3) formal communication in cases of suspected malingering; (4) reporting of validity test results; (5) suspected causes of invalidity; (6) integration of stand-alone, embedded, and symptom-report validity measures; (7) multiple performance validity test interpretation; (8) research practices; and (9) popularity of specific validity instruments. Results: Overall, findings from the current survey indicated that all but a small minority of respondents routinely utilize validity testing in their examinations. Furthermore, nearly all neuropsychologists surveyed believed formal validity testing to be mandatory in forensic evaluations and at least desirable in clinical evaluations. While results indicated general agreement among neuropsychologists across many aspects of validity testing, responses regarding some facets of validity test implementation, interpretation, and reporting were more variable. Validity testing utilization generally did not differ according to level of forensic involvement but did vary in respect to respondent literature consumption. Conclusions: Study findings differ significantly from past professional surveys and indicate an increased utilization of validity testing, suggesting a pronounced paradigm shift in neuropsychology validity testing beliefs and practices.
Article
Meta-analysis was used to examine the performance of depressed and Alzheimer-type dementia (DAT) patients on standard and experimental clinical tests of cognitive function. Deficits were found for depression on almost every psychological test. Relative to nondepressed controls, the average deficit was 0.63 of a standard deviation, but the magnitude of the effect varied with the type of test. DAT patients performed worse than depressed patients, with an average effect size of 1.21 standard deviations, but the size of the effect depended on the clinical test. Effect sizes for the comparison between depressives and controls were significantly affected by age, treatment setting, ECT use, severity of depression, and the source of diagnostic criteria, but not by the type of depression. Effect sizes in the comparison of depressives to DAT patients were influenced by age, the severity of depression, and ECT. Depressives performed proportionately worse than controls on tasks with pleasant or neutral, compared with unpleasant content, on speeded compared with nonspeeded tasks, and on vigilance tasks. However, there were no differences in the magnitude of effect size for tests using recall compared with recognition, using categorical compared with noncategorical word lists, on story compared with word comprehension, and using verbal compared with visual material. Relative to DAT patients, depressives performed no better on recall compared to recognition tasks, or verbal compared to visual material. The findings of the review are not consistent with the hypothesis that depression is associated with deficits in effortful processing. A model of psychological deficit in depression as a deficit in speed or attention has more promise. ( JINS , 1997, 3 , 631–651.)
Book
You’ve just been told that those seizures you’ve been having for months or years and that have been making a mess of your life aren’t actually due to epilepsy and that instead they are caused by psychological stress. Maybe you’re discharged from the hospital by your neurologist with a name and a number of a mental health professional who can start treating you, but too often you leave the hospital with nothing other than the name of your disorder: psychogenic non-epileptic seizures. When you search for information and professional help, it’s terribly hard to find. This book is directed at patients, loved ones and mental health professionals and is an invaluable resource that explains the elements that make up PNES, provides the necessary tools to begin achieving real changes in behavior, thoughts and emotions, and guidelines to live a life that is healthy, safe and good in quality. Psychogenic non-Epileptic Seizures: A Guide will equip you with essential knowledge about this condition and provide you with tools that will help you take charge of your PNES.
Article
A large sample of chronic postconcussive patients with and without overt malingering signs was compared with objectively brain-injured patients on common episodic memory and malingered amnesia measures. Probable malingerers and traumatically brain-injured Ss were not differentiated on popular episodic recall tests. In contrast, probable malingerers performed poorly on the Rey 15-item, Rey Word Recognition List, Reliable Digit Span, Portland Digit Recognition Test, and Rey Auditory Verbal Learning Test recognition trial. These findings validated both commonly cited malingering measures and newly introduced methods of classifying malingering in real-world clinical samples. The base rate for malingering in chronically complaining mild head injury patients may be much larger than previously assumed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This study examined the effect of depression on neurocognitive performance in patients who passed symptom validity testing. The Beck Depression Inventory (BDI) was used to assess depression in 420 patients with heterogeneous referral diagnoses (more than half of these cases were head injury or neurological disease). All patients had demonstrated satisfactory effort by passing two symptom validity tests. No differences were found on objective cognitive and psychomotor measures in groups sorted based on their self-reported depression. In contrast, on the self-report measures [i.e., Minnesota Multiphasic Personality Inventory-2 (MMPI-2), Symptom Checklist-90-Revised (SCL-90-R), and Memory Complaints Inventory (MCI)], differences were found indicating that patients with depression report more emotional, somatic, and cognitive problems. Contrary to expectation, these data suggest that depression has no impact on objective neurocognitive functioning.