Content uploaded by Laszlo A Erdodi
Author content
All content in this area was uploaded by Laszlo A Erdodi on Mar 09, 2020
Content may be subject to copyright.
The BDAE Complex Ideational Material—aMeasure
of Receptive Language or Performance Validity?
Laszlo A. Erdodi
1,2
&Bradley T. Tyson
1,3
&Christopher A. Abeare
1
&
Jonathan D. Lichtenstein
2
&Chantalle L. Pelletier
2
&Jaspreet K. Rai
1
&Robert M. Roth
2
Received: 9 February 2016 /Accepted: 19 April 2016
#Springer Science+Business Media New York 2016
Abstract Scores on the Complex Ideational Material
(CIM) were examined in reference to various perfor-
mance validity tests (PVTs) in 106 adults clinically
referred for neuropsychological assessment. The main
diagnostic categories, reflecting a continuum between
neurological and psychiatric disorders, were epilepsy,
psychiatric disorders, postconcussive disorder, and psy-
chogenic non-epileptic seizures. Cross-validation analy-
ses suggest that in the absence of bona fide aphasia, a
raw score ≤9orTscore≤29 on the CIM is more likely to
reflect non-credible presentation than impaired receptive
language skills. However, these cutoffs may be associated
with unacceptably high false positive rates in patients
with longstanding, documented neurological deficits.
Therefore, more conservative cutoffs (≤8/23) are recommend-
ed in such populations. Contrary to the widely accepted as-
sumption that psychiatric disorders are unrelated to perfor-
mance validity, results were consistent with the psychogenic
interference hypothesis, suggesting that emotional distress in-
creases the likelihood of PVT failures even in the absence of
apparent external incentives to underperform on cognitive
testing.
Keywords Complex Ideational Material .Performance
validity assessment .Embedded validity indicators .
Psychogenic interference
The Complex Ideational Material (CIM) subtest of the Boston
Diagnostic Aphasia Battery is a sentence comprehension task
developed to measure receptive language skills. It consists of
Yes/No questions that gradually increase in difficulty from
recalling common facts to details contained in short stories
read aloud by the examiner. Scores range from 0 to 12.
Eight was the lower limit of performance in the normative
data (M=11.2, SD = 1.1; n= 147; Borod, Goodglass, &
Kaplan, 1980). Therefore, it was designated as the cutoff for
impairment.
In a selective review of the literature based on 1078 indi-
viduals across 34 samples, healthy controls produced a
weighted mean of 11.3, while patients with stroke, dementia,
and epilepsy demonstrated somewhat lower performance,
with weighted means between 10.2 and 10.6 (Erdodi &
Roth, 2016).Theaphasiasamplewasthemostimpaired
(weighted mean = 7.5). Their findings suggest that healthy
controls (N=538) obtain near perfect scores on the CIM, with
all clinical groups performing below them and the aphasic
groups (N=188) producing a mean below the suggested clin-
ical cutoff (<8). In other words, the CIM is broadly sensitive to
neurological deficits and very low scores may be specific to
aphasia.
As these studies did not formally evaluate the credibility of
the cognitive data, they also examined the link between per-
formance validity tests (PVTs) and performance on the CIM in
a new sample of 68 clinically referred patients for neuropsy-
chological assessment. They found that patients who failed
PVTs had lower scores on the CIM, calling into question the
validity of some of the previous findings. The present study
*Laszlo A. Erdodi
lerdodi@gmail.com
1
Department of Psychology, University of Windsor, 168 Chrysler Hall
South, 401 Sunset Avenue, Windsor, ON N9B 3P4, Canada
2
Department of Psychiatry, Geisel School of Medicine at Dartmouth,
Lebanon, NH, USA
3
Western Washington Medical Group, Everett, WA, USA
Psychol. Inj. and Law
DOI 10.1007/s12207-016-9254-6
was designed to replicate these findings using a larger sample
and different instruments, to extend the investigation to the
relationship among diagnostic categories, PVTs and CIM,
and to evaluate the use of the CIM as a PVT itself.
Although the CIM was developed as a receptive language
test primarily to diagnose and subtype aphasia, some exam-
inees may perceive it as an attention and memory task—two
cognitive domains within which self-reported deficits are
common. Thus, the CIM may elicit a response pattern that
underestimates true ability levels. However, as the CIM is in
fact an easy task for most native speakers with intact language
skills, it may be effective at differentiating valid and invalid
response sets. Since most PVTs are based on recognition
memory, the CIM could provide an index of performance
validity using a different paradigm. As such, it could enhance
the overall measurement model by contributing non-
redundant information to validity assessment (Boone, 2013;
Larrabbee, 2003).
We hypothesized that low CIM scores would be related to
invalid responding in the absence of bona fide aphasia. Our
second hypothesis was that a positive psychiatric history
would be related to both high base rates of PVT failure and
unexpectedly poor CIM performance. If non-credible
responding and psychiatric symptoms are related to CIM
scores in the invalid range, this would allow for better-
informed clinical interpretation of both CIM performance
and the entire neurocognitive profile.
To investigate the link between emotional functioning and
CIM scores, a specific combination of clinical groups was
selected to represent the neurological-psychological continu-
um (Boone, 2007) underlying subjective cognitive com-
plaints. The epilepsy (EPI) sample consisted of patients with
a documented seizure disorder to serve as clinical controls
with a medically verified neurological condition. The
postconcussive disorder (PCD) group was composed of indi-
viduals whose cognitive complaints persisted well beyond the
3-month window of normative recovery (APA, 2013). The
psychiatric (PSY) sample included patients who reported cog-
nitive decline in the absence of an identifiable physical etiol-
ogy and whose symptomology was judged to be the result of
emotional distress. Finally, the psychogenic non-epileptic sei-
zure (PNES) sample consisted of individuals who presented
with seizure-like symptoms that were determined to be non-
epileptic by their neurologist. Although the condition itself is
quite rare (estimated 2 to 33/100 000; Benbadis & Hauser,
2000), a history of complex trauma is virtually ubiquitous
within this population (Myers, 2014). The last three groups
share the common feature of unexplained variance in their
presentation that is unlikely to be attributable to a known
neurological disorder. While the “neurological vs. psychiat-
ric”dichotomy is an artificial one, these diagnostic groups
may represent clinically distinct categories in terms of the
relative contribution of neurological and psychiatric factors.
The complex relationship between emotional and cognitive
functioning has long been recognized (Boone, 2007; Suhr,
Tranel, Wefel, & Barrash, 1997). However, the evidence is
mixed. Despite an apparent consensus that depression is un-
related to PVT failure (Considine et al., 2011; Egeland et al.,
2005; Rees, Tombaugh, & Boulay, 2001), the relative contri-
bution of depressive symptoms to memory performance re-
mains an evolving controversy. Some studies found that de-
pression was unrelated to memory performance (Egeland
et al., 2005; Langenecker et al., 2005; Rohling, Green,
Allen, & Iverson, 2002), whereas others reported a strong link
between the two (Christensen, Griffiths, MacKinnon, &
Jacomb, 1997; Considine et al., 2011).
Given such divergent findings, additional research exam-
ining the influence of psychiatric problems on cognitive per-
formance is warranted. In addition to investigating the CIM’s
potential toserve as a PVT, the design of the present study also
allows for a systematic analysis of the interaction between
instrumentation issues and diagnostic categories. As such, it
can extend the psychogenic interference hypothesis (the no-
tion that emotional distress can negatively impact cognitive
performance) to PVTs.
Method
Participants
The sample consisted of 106 patients clinically referred for
neuropsychological assessment at a Northeastern academic
medical center in the context of self-reported cognitive de-
cline. All patients were native speakers of English and classi-
fied into one of the four diagnostic categories outlined above:
EPI (29), PCD (23), PNES (20), and PSY (34). The majority
of the sample was right handed (80 %) and female (63 %).
Mean age was 40.6 years (SD = 11.9). Mean level of education
was 12.6 years (SD =2.0). Mean FSIQ was 88.5 (SD =14.9).
Patients did not differ in age and level of education as a func-
tion of diagnosis. None of the patients reported any symptoms
of aphasia nor were any noted in the medical records by the
referring or assessing clinicians.
All PCD patients met ICD-10 diagnostic criteria (World
Health Organization, 2014). The mean time since injury was
40.6 months. None had positive neuroradiological findings or
documented loss of consciousness. The majority (52 %) either
reported being in litigation at the time of the assessment or
became injured under circumstances in which a reasonable
argument for external culpability could be made. The most
frequent mechanism of injury was motor vehicle accidents
(43 %), although one of the collisions was admittedly a failed
suicide attempt. Comorbid psychiatric problems were com-
mon: depression (52 %), history of substance abuse (52 %),
posttraumatic stress disorder (35 %), anxiety (30 %), history
Psychol. Inj. and Law
of past suicide attempts (22 %), persistent fatigue (17 %),
history of childhood abuse (13 %).
The EPI and PNES patients were diagnosed through con-
sensus by a multidisciplinary team of epilepsy specialists
based on their clinical history, seizure semiology, and video-
EEG findings. The majority of the EPI patients had comorbid
psychiatric diagnoses [depression (55 %) and anxiety (28 %)].
Psychiatric comorbidity was also common within the PNES
sample [depression (55 %), anxiety, history of childhood
abuse (20 % each), panic disorder (15 %), dissociative symp-
toms (15 %), substance abuse (15 %), past suicide attempts
(10 %), posttraumatic stress disorder (10 %)].
PSY patients were diagnosed by the assessing neuropsy-
chologists based on a combination of clinical interviews, rat-
ing scales, and review of medical records. The most frequent
psychiatric diagnosis was depression (85 %) followed by anx-
iety (53 %), posttraumatic stress disorder (32 %), panic disor-
der (24 %), substance abuse (24 %), borderline personality
disorders (21 %), schizophrenia spectrum disorders (15 %),
and bipolar disorder (12 %). A history of psychiatric hospital-
izations was present in 26 % of the sample, and 12 % reported
previous suicide attempts. The majority of patients (76 %) had
≥2 psychiatric conditions; 68 % had ≥3, while 47 % had ≥4.
Materials
The core battery of neuropsychological assessment included
the Wechsler Adult Intelligence and Memory Scales
(WAIS/WMS 3rd and 4th editions), California Verbal
Learning Test, 2nd edition (CVLT-II), verbal fluency tasks
(FAS & animals), the Boston Naming Test (BNT), the single
word reading task of the Wide Range Achievement Test
(WRAT; 3rd and 4th editions), the CIM, the Wisconsin Card
Sorting Test (WCST), the Beck Depression Inventory, 2nd
edition (BDI-II) and the Test of Memory Malingering
(TOMM).
To complement the TOMM, the only free-standing PVT
consistently administered, a composite measure of perfor-
mance validity was developed using embedded effort mea-
sures following the methodology described by Erdodi et al.
(2014,2016). Five language-based embedded validity indica-
tors (EVIs) were selected to create a composite measure of
performance validity, labeled the “Effort Index Five”(EI-5).
To capture the underlying continuity in cognitive effort, each
of the five constituent EVIs was recoded into a four-point
scale (0 to 3). The level of performance that passed the most
liberal cutoff (i.e., low probability of invalid responding) was
assigned the value of zero. The most liberal cutoff available in
the literature was assigned the value of one, the next available
cutoff was assigned a value of two, and finally, failing the
most conservative cutoff was assigned a value of three. The
re-scaling of the five EVIs reflects a gradient of increasing
probability of invalid performance. The value of the EI-5 is
the sum of the re-scaled component EVIs (Table 1).
The EI-5 provides a robust psychometric model for perfor-
mance validity assessment, as it takes into account the exam-
inee’s score on a wider range of measures (Larrabbee, 2003)
and thus, is better at approximating the ultimate goal of ongo-
ing monitoring of cognitive effort throughout the assessment
(Boone, 2009). Further, it makes an attempt to equalize the
inherent differences in the signal detection profiles of various
EVIs by translating them to a common metric. It also captures
the underlying continuity in performance validity by generat-
ing a scale on which each unit of measurement represents a
clinically interpretable shift in the likelihood of invalid
responding. Finally, its components were specifically selected
to represent cognitive domains that conceptually overlap with
the CIM: auditory attention, story memory, and verbal learn-
ing. Therefore, the EI-5 provides an ecologically valid refer-
ence PVT.
Although conceptually, cognitive effort is a continuous var-
iable, clinical practice demands a categorical approach to va-
lidity assessment. Therefore, the first two levels of EI-5 (≤1
EVI failures) were considered a Pass. The next two levels start
to become problematic, as they suggest either multiple inde-
pendent EVI failures or failure of a single EVI at the most
conservative cutoff. Given the ambiguity, this level of perfor-
mance was labeled Borderline. Above this, all EI-5 scores
were considered a Fail, with each additional value providing
incremental evidence for the accuracy of the classification. To
maximize the purity of the reference groups used to calibrate
the target PVT, Borderline cases were excluded from
Table 1 The components of the effort index (EI-5), base rates for
failing each cutoff with cumulative failure rates, and percent of patients
passing the most liberal cutoff
EI-5 components EI-5 values
0123
RDS ≥87 6 ≤5
Base rate 54.3 22.9 15.2 7.6
DS
ACSS
≥76 5 ≤4
Base rate 62.5 13.5 16.3 7.7
LM recognition ≥21 20 19 ≤18
Base rate 76.5 11.8 3.9 7.8
CVLT-II Rec Hit ≥11 10 9 ≤8
Base rate 73.6 7.5 5.7 13.2
CVLT-II FCR 16 15 14 ≤13
Base rate 56.2 16.2 10.5 17.1
RDS reliable digit span (Greiffenstein, Baker, & Gola 1994; Larrabbee
2003;Heinlyetal.,2005; Pearson, 2009), DS
ACSS
Digit Span age-
corrected scaled score (Heinly et al., 2005; Spencer et al., 2013); LM
Logical Memory (Bortnik et al., 2010; Pearson, 2009), CVLT-II recogni-
tion hits (RH
CVLT-II
; Wolfe et al., 2010), and Forced Choice Recognition
(FCR
CVLT-II
; Moore & Donders, 2004; Bauer et al., 2005)
Psychol. Inj. and Law
cross-validation analyses, following recommendations by
Greve and Bianchini (2004).
Finally, the logistical regression equations developed by
Suhr and Boyer (1999) and King et al. (2002)wereusedas
alternative reference PVTs. They were selected because they
are based on WCST variables that are not verbally mediated
(like the EI-5) or recognition based (like the TOMM). As
such, they provide independent indicators of performance va-
lidity, which is an essential feature of multivariate models of
validity assessment (Boone, 2013;Larrabbee2003).
Procedure
Tests were administered in an outpatient setting by staff psy-
chometricians and postdoctoral fellows under the supervision
of licensed clinical psychologists with specialty training in
neuropsychology. Data were collected as part of a retrospec-
tive chart review using the clinical archives of the neuropsy-
chology service at a Northeastern academic medical center.
The final sample consisted of consecutive referrals that fit a
priori inclusion criteria: (1) The file contained sufficient infor-
mation to establish a diagnosis; (2) one of the four conditions
was the main diagnosis and was active at the time of testing;
(3) data were available on intellectual functioning as well as a
complete CVLT-II and CIM administration. The project was
approved by the Institutional Review Board. Relevant ethical
guidelines were followed throughout the data collection
process.
Data Analysis
Descriptive statistics were computed for the variables of inter-
ests. Inferential statistics included ANOVAs, independent
samples ttest, and χ
2
. Post hoc pair-wise contrasts were com-
puted using standard uncorrected ttests. The Ftest was used
for pair-wise comparison of sample variances. Effect size es-
timates were expressed in Cohen’sd,partialη
2
,andΦ
2
.Area
under the curve (AUC) was calculated using SPSS 22.0. The
rest of classification accuracy parameters (sensitivity, specific-
ity, likelihood ratios) were computed using standard formulas.
Results
The mean CIM raw score in the present sample was 10.2
(SD =1.8;range 3–12). This was significantly lower than the
mean of the normative sample (Borod et al., 1980):
t(105) = 5.82, p< .001, d= .67. Low scores were rare: only
7.5 % of the sample scored <8, and only 3.8 % scored <7
(Table 2). Mean CIM Tscore was 39.3 (SD = 14.3; range 5–
63), which was significantly lower than the nominal popula-
tion mean of 50: t(105) = 7.70, p<.001, d=.87.
Next, a series of ttests were performed using the dichoto-
mized (Pass/Fail) reference PVTs as the independent vari-
ables and the CIM as the dependent variable. All contrasts
were significant. Effect sizes ranged from .58 to .86. As the
mean raw score differences between the valid and invalid
groups were small (0.9–1.3) and clustered around 10, two
cutoffs (a liberal ≤9 and a conservative ≤8) were chosen for
further testing. These raw scores were matched, based on per-
centile rank, to Tscores of ≤29 and ≤23, respectively.
Against the EI-5, all CIM cutoffs cleared the lower thresh-
old for specificity (.84; Larrabbee, 2003). The ≤9 raw score
cutoff produced a sensitivity of .39. Lowering it to ≤8 de-
creased sensitivity to .18. A Tscore ≤29 had .37sensitivity,
which decreased to .18 when the cutoff was lowered to ≤23.
Against the TOMM, the liberal cutoffs narrowly missed the
lower threshold for specificity (.82). The ≤9 raw score cutoff
produced .40 sensitivity. Lowering the raw score cutoff to ≤8
decreased sensitivity to .17, but improved specificity to .95. A
Tscore ≤29 had .37 sensitivity. Lowering the Tscore to ≤23
decreased sensitivity to .20, but increased specificity to .92.
Against the Suhr and Boyer (1999) equation, all CIM cut-
offs cleared the lower threshold for specificity. The ≤9raw
score cutoff had .36 sensitivity. Lowering the raw score cutoff
to ≤8 decreased sensitivity to .17, but improved specificity to
.98. A Tscore ≤29 had .34 sensitivity. Lowering the Tscore
cutoff to ≤23 decreased sensitivity to .22, but improved spec-
ificity to .98. Against the King et al. (2002) equation, the
liberal cutoffs failed to reach the lower threshold for specific-
ity, but the conservative cutoffs achieved specificity ≥.90 at
.33 sensitivity. Further details are displayed in Table 3.
Dual Cutoffs—Raw Scores vs. TScores
Having two sets of cutoffs (raw and Tscore) creates the pos-
sibility of an internal discrepancy. To provide empirically
based guidelines for resolving such a divergence, patients
who failed one cutoff, but not the other, were examined to
determine the most appropriate clinical classification. One pa-
tient who scored ≤8 (raw), but T >23 also failed five PVTs. As
such, this profile can be confidently classified as invalid. On
the other side, three patients scored ≤23 (T), but >8 (raw).
They all had a raw score of 9 and failed ≥2 PVTs. As such,
their test data should be classified as invalid.
Pattern Analysis as a Validity Check
in Neuropsychological Testing
Another method for differentiating effort from impairment is
examining the pattern of scores across cognitive domains.
Certain combinations of high and low scores that are incon-
sistent with known patterns of neuropsychological impair-
ment provide further evidence for non-credible performance
(Boone, 2013). Since the present study focused on the CIM,
Psychol. Inj. and Law
we examined other tests that target similar constructs (Digit
Span, Similarities, Vocabulary, Arithmetic, Letter-Number
Sequencing, Logical Memory Immediate Recall, verbal fluen-
cy, Boston Naming Test, longest Digit Span forward). These
tests are generally more challenging than the CIM. Therefore,
if a patient obtains a low score on the CIM, low score would
also be expected on these other tests. Intact performance on
these instruments was defined as ≤1SDbelowthemean
(scaled score ≥7, Tscore ≥40) or longest digit span forward
≥5. Contrary to expectation, patients who failed the liberal
CIM cutoffs produced several scores in the intact range on
these language-based measures. More than half had ≥4intact
scores, while more than a quarter of the sample had ≥6intact
scores. Likewise, more than half of those who failed the con-
servative cutoffs had ≥3 intact scores, while a quarter had ≥4
intact scores.
Performance Validity and Depression
Finally, the four diagnostic groups were compared on refer-
ence PVTs, CIM, and BDI-II scores. There was a large effect
(partial η
2
= .25) on the EI-5, mainly driven by the high PCD
mean and low EPI mean. The overall model for BDI-II
approached significance (p= .09), with medium effect (partial
η
2
= .07). This negative finding may be due to large within-
group variability in PCD, PSY, and PNES. Otherwise, the
group means form a clinically interpretable pattern with the
PSY sample being the most depressed and the EPI sample
being the least depressed.
No significant difference emerged on CIM scores. This
finding is unlikely to be due to small sample size or unusually
large within-group variability. All four diagnostic categories
produced raw scores within 0.9 units. While there was more
variability among group means in Tscores, the within-group
variability was also higher, washing out the main effect.
When the analyses were refocused on base rates of failure,
significant group differences emerged with very large effects
on the reference PVTs and small-medium effects on the CIM
(Table 4). EPI consistently had the lowest base rate of failure
(9–17 %), while PCD and PSY had very high rates (61–93 %).
Base rates of PVT failure on PNES hovered around 20 %.
PCD and PSY had the highest and most variable failure rates
on the CIM (9–41 %), while PNES had consistently low rates
(5 %), with EPI in between (7–21 %).
Of the 27 patients who failed the liberal cutoffs on the CIM,
the majority (52 %) were PSY, followed by PCD and EPI
(22 %), and PNES (4 %). Of the six EPI patients, five had a
history of psychiatric comorbidities (depression, anxiety, sui-
cidal ideation). The only PNES patient who failed the CIM
cutoffs had a complex medical history (jaundice at birth, TBI
at age two, and muscular dystrophy). Of the six PCD patients,
five had a psychiatric history predating the head injury (de-
pression, anxiety, somatic symptoms, alcohol and cocaine
abuse), and one of them had a history of driving-while-
intoxicated and physical violence with police involvement.
Finally, to examine the effect of potential secondary gain
within the PCD sample, the failure rates on reference PVTs
were compared between patients who reported being in litiga-
tion at the time of the assessment and those who did not. No
significant relationship was found between litigation status
and failure rates on any of the reference PVTs or CIM (raw
or Tscore based, liberal or conservative cutoff).
Discussion
The present study examined the relationship between CIM
scores, performance validity, and clinical diagnosis in a sam-
ple of 106 adults clinically referred for neuropsychological
assessment. Results are largely consistent with the previous
investigation of the link between the CIM and performance
Tabl e 2 CIM raw and Tscore
distributions for the whole sample
(N=106)
CIM Raw Score fPercent Cumulative % CIM T-Score fPercent Cumulative %
31.9.9≤20 12 11.3 11.3
4 1 .9 1.9 21–25 8 7.5 18.9
5 1 .9 2.8 26–30 6 5.6 24.5
6 1 .9 3.8 31–35 20 18.8 43.4
7 4 3.8 7.5 36–40 13 12.3 55.7
8 5 4.7 12.3 41–45 11 10.4 66.0
9 13 12.3 24.5 46–50 8 7.6 73.6
10 32 30.2 54.7 51–55 6 5.6 79.2
11 18 17.0 71.7 56–60 19 17.9 97.2
12 30 28.3 100.0 61–65 3 2.8 100.0
Tscores are demographically adjusted scores based on norms published by Heaton, Miller, Taylor, & Grant (2004)
CIM Complex Ideational Material subtest of the Boston Diagnostic Aphasia Examination
Psychol. Inj. and Law
validity (Erdodi & Roth, 2016). Our first hypothesis that low
CIM scores would be associated with higher rates of failure on
PVTs was supported. CIM cutoffs were effective at identify-
ing psychometrically defined invalid responding.
Our second hypothesis that psychiatric history would be
related to increased failure rates on both PVTs and CIM pro-
duced mixed results. Consistent with our prediction, PSY had
the highest base rate of failure on CIM and comparable rates to
PCD on reference PVTs. These two groups also reported the
highest level of depression on the BDI-II and had the highest
rates of psychiatric symptoms. However, no clear pattern
emerged in the validity profiles of PNES and EPI patients.
While the inability of PVTs to reliably differentiate these
two diagnostic categories is disappointing, this finding is
Tabl e 3 Classification accuracy of CIM raw and Tscore cutoffs against reference PVTs
CIM AUC EI-5 Hypothetical base rates
(95 % CI) SENS SPEC +LR −LR .10 .20 .30 .40 .50
Raw score ≤9 .72 (.60–.83) .39 .85 2.6 .72 PPP .22 .39 .52 .63 .72
NPP .93 .85 .76 .67 .58
Raw score ≤8 .18 .92 2.4 .88 PPP .20 .36 .49 .60 .69
NPP .91 .82 .72 .63 .53
T-scor e ≤29 .66 (.54–.78) .37 .85 2.4 .75 PPP .22 .38 .51 .62 .71
NPP .92 .84 .76 .67 .57
T-scor e ≤23 .18 .90 1.8 .91 PPP .17 .31 .44 .55 .64
NPP .91 .81 .72 .62 .52
TOMM
Raw score ≤9 .66 (.53–.76) .40 .82 2.2 .74 PPP .20 .36 .49 .60 .69
NPP .93 .85 .76 .67 .58
Raw score ≤8 .17 .95 3.2 .88 PPP .27 .46 .59 .69 .77
NPP .91 .82 .73 .63 .53
T-scor e ≤29 .62 (.60–.82) .37 .82 2.0 .78 PPP .19 .34 .47 .58 .67
NPP .92 .84 .75 .66 .57
T-scor e ≤23 .20 .92 2.5 .87 PPP .22 .39 .52 .63 .71
NPP .91 .82 .73 .63 .54
Suhr & Boyer (1999)Equation
Raw score ≤9 .71 (.61–.83) .36 .86 2.6 .74 PPP .22 .39 .52 .63 .72
NPP .92 .84 .76 .67 .57
Raw score ≤8 .17 .98 7.5 .85 PPP .49 .68 .79 .85 .90
NPP .91 .83 .73 .64 .54
T-scor e ≤29 .72 (.61–.83) .34 .89 3.0 .74 PPP .26 .44 .57 .67 .76
NPP .92 .84 .76 .67 .57
T-scor e ≤23 .22 .98 9.7 .80 PPP .55 .73 .83 .88 .92
NPP .92 .83 .75 .65 .56
King et al. (2002)Equation
Raw score ≤9 .74 (.58–.90) .33 .80 1.7 .83 PPP .16 .29 .41 .52 .62
NPP .92 .83 .74 .64 .54
Raw score ≤8 .33 .93 5.0 .71 PPP .34 .54 .67 .76 .83
NPP .93 .85 .76 .68 .58
T-scor e ≤29 .78 (.65–.91) .50 .83 2.9 .60 PPP .25 .42 .56 .66 .75
NPP .94 .87 .80 .71 .62
T-scor e ≤23 .33 .91 0 .74 PPP .29 .48 .61 .71 .79
NPP .92 .85 .76 .67 .58
Failure on EI-5 defined as ≥4; failure on TOMM defined as one or more of the following scores: trial 1 ≤39, trial 2, and retention ≤48
AUCarea under the curve, LR likelihood ratio, PPP positive predictive power, NPP negative predictive power
Psychol. Inj. and Law
consistent with the results of previous studies (Cragar, Berry,
Fakhoury, Cibula, & Schmitt, 2006;Dodrill,2008).
The data analysis can be reduced to a number of clinically
relevant findings: (1) CIM scores were consistently lower in
patients who failed reference PVTs. (2) The liberal set of CIM
cutoffs (raw ≤9/T ≤29) produced specificity values hovering
around the lower threshold (.84), while the conservative cut-
offs (raw ≤8/T ≤23) had specificity ≥.90. (3) Regardless of
scale (raw/T) or threshold (liberal/conservative), CIM cutoffs
produced remarkably consistent classification accuracy
against a variety of reference PVTs. (4) Individuals with EPI
may be at risk for an elevated false positive rate and may
warrant the use of the more conservative cutoffs. (5) Within
each pair (liberal and conservative), raw and Tscore cutoffs
(≤9/T ≤29 and ≤8/T ≤23) produce equivalent cognitive pro-
files. Thus, failing either one of them provides sufficient evi-
dence of non-credible performance. (6) Across diagnostic
groups, PCD and PSY had the highest PVT failure rate,
followed by PNES. Generally, EPI had the lowest base rate
of failure. Also, PNES seemed remarkably immune to failing
CIM cutoffs, regardless of scale or threshold. (7) Despite
ample evidence that those who scored ≤9/29 on the CIM pro-
duced an overall invalid profile, the majority of them still
performed within normal limits on several other, more diffi-
cult language-based tests. This finding further supports the
argument that low scores are more likely to indicate poor
effort than aphasia.
These findings converge in a couple of overarching con-
clusions. First, scoring ≤9/29 on the CIM is a more reliable
indicator of invalid responding than credible language im-
pairment in patients without bona fide aphasia. Therefore,
while the CIM may not be a very effective general aphasia
screener, it shows promise as a PVT. Second, if the four-
fold difference in PVT failure rates between EPI and PNES
replicates in larger samples, it could serve as a valuable
tool in separating these two groups with different etiology,
but very similar presentation.
In a more general sense, non-credible responding in this
sample appears to be related to psychiatric functioning, albeit
in a complex way. The PCD and PSY groups demonstrated
complex psychiatric history and acute emotional distress along
with unusually high rates of PVT failure (61–93 %) that
Tabl e 4 Performance on PVTs
and the BDI-II across diagnostic
groups
EI-5 TOMM BDI-II CIM
Raw T
Cutoff ≥4≤39/48 –≤9≤8≤29 ≤23
PCD M 5.6 –20.4 10.2
a
43.3
a
SD 3.5
a
–14.1 1.95 12.9
BR
Fail
93.3 75.0 –26.1 13.0 21.7 8.7
PSY M 4.2 –24.1 9.7
a
34.6
a
SD 3.8
a
–13.6 2.1
a
16.2
BR
Fail
60.7 73.3 –41.2 20.6 41.2 29.4
PNES M 2.1 –18.3 10.6
a
40.4
a
SD 2.8
a
–13.6 1.2 12.6
BR
Fail
27.3 18.2 –5.0 5.0 5.0 5.0
EPI M 1.1 15.7 10.6
a
40.8
a
SD 1.6 10.3 1.3 13.4
BR
Fail
17.4 9.1 –20.7 6.9 20.7 6.9
ANOVA p<.001 –.09 .16 .12
η
2
.25 –.07 .05 .06
Sig. post hocs 1-2-3-4 –33 5
χ
2
p<.001 <.01 –<.05 .27 <.05 <.05
Φ
2
.32 .40 –.09 .04 .09 .09
Failure on TOMM defined as one or more of the following scores: trial 1 ≤39, trial 2 and retention ≤48 (Greve,
Bianchini, & Doane, 2006;Jones,2013; Stenclik, Miele, Silk-Eglit, Lynch, & McCaffrey, 2013 1. PCD vs. PNES,
2. PCD-EPI, 3. PSY-EPI, 4. PSY-PNES, 5. PCD-PSY
Values in bold face reflect the percent of the sample that failed a given cutoff
BR
Fail
:Baserate of failure (% of the sample that failed a given cutoff)
a
p<.05—one-sample ttest against the mean of 11.2 produced by 147 neurologically normal, right-handed,
English speaking adult males (Borod et al. 1980) or the mean of 50 (T-distribution) and Ftest between two
variances using the SD of the EPI group as a reference
Psychol. Inj. and Law
exceeded the estimated base rates (30-50 %) in forensic set-
tings (Martin et al., 2015). However, patients with PNES are
also known to have complicated psychiatric histories, yet the
predicted elevated base rate of PVT failure was not observed. It
is possible that although psychiatric problems lead to increased
failure rates for some diagnostic groups, there may be moder-
ating factors that result in diverging PVT profiles within
broader clinical categories that share a common etiology.
Results suggest that previous research may have
underestimated the link between emotional distress and PVT
failures. A possible explanation for such divergent results may
be that earlier studies that reported no relationship between
psychiatric factors and performance validity tended to focus
on patients with depression as the main presenting problem
(Considine et al., 2011; Egeland et al., 2005;Reesetal.,
2001), whereas our sample consisted of a heterogeneous
group of individuals, many of whom had a combination of
severe mental illness, complex emotional trauma, and somatic
complaints.
The results should be interpreted in the context of the
study’s limitations. First, the sample size is relatively small
and geographically restricted, so the findings may not gen-
eralize to other regions with different demographic charac-
teristics and medico-legal systems. Further, data on exter-
nal incentive status and remote trauma history were not
available outside the PCD group. Finally, a sample of pa-
tients with bona fide aphasia would have served as a valu-
able clinical control.
The study also has several strengths. It utilized stan-
dardized, well-validated, and widely used neuropsycholog-
ical tests in clinical groups that were strategically selected
so that effort and impairment could be dissociated by de-
sign. To complement the cross-validation analyses, scores
on other language-based measures were also examined in
patients who failed the newly developed CIM validity cut-
offs. Intact performance on more difficult verbal tests pro-
vided further evidence of within-profile inconsistency,
which is considered an emergent marker of non-credible
presentation (Boone, 2013).
In summary, the combination of repeated PVT failures,
lack of a verifiable neurological impairment, and absence of
bona fide aphasia renders a low score on CIM an unreliable
indicator of receptive language deficits. At the same time, a
CIM score ≤9/29 appears to be a promising new indicator of
invalid responding. The observed pattern of between-group
differences is interesting in light of the psychiatric, psychoso-
matic, and motivational differences across the four diagnostic
categories. Future research should examine the interaction
among psychiatric factors, such as depression, somatic con-
cerns, and trauma history and potential moderating variables,
such as attribution style, tendency for rumination, or other
potential moderators in order to better understand the potential
psychogenic etiology behind PVT failure.
Compliance with Ethical Standards
Conflict of Interest None.
Ethical Approval All procedures performed in studies involving hu-
man participants were in accordance with the ethical standards of the
institutional and/or national research committee and with the 1964
Helsinki Declaration and its later amendments or comparable ethical
standards.
Informed Consent Given that the data were collected as part of a
retrospective archival study, the requirement for informed consent was
waived by the Committee for the Protection of Human Subjects at the
institution where the data were collected. The project was also approved
by the Research Ethics Board of the institution where the study was
completed.
References
American Psychiatric Association (2013). Diagnostic and statistical
manual of mental disorders. (5th ed.). Washington, DC: Author.
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., & McCaffrey, R. J.
(2005). An examination of the California Verbal Learning Test II to
detect incomplete effort in a traumatic brain injury sample. Applied
Neuropsychology, 12(4), 202–207.
Benbadis, S. R., & Hauser, A. (2000). An estimate of the prevalence of
psychogenic non-epileptic seizures. Seizure, 9(4), 280–281.
Boone, K. B. (2007). Assessment of feigned cognitive impairment. A
neuropsychological perspective. New York: Guilford.
Boone, K. B. (2009). The need for continuous and comprehensive
sampling of effort/response bias during neuropsychological exami-
nations. The Clinical Neuropsychologist, 23(4), 729–741.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.New
York: Guilford.
Borod, J. C., Goodglass, H., & Kaplan, E. (1980). Normative data on the
Boston diagnostic aphasia examination, parietal lobe battery, and the
Boston naming test. Journal of Clinical Neuropsychology, 2,
209–215.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E.,
Victor, T. L., & Zeller, M. A. (2010). Examination of various
WMS-III logical memory scores in the assessment of response bias.
The Clinical Neuropsychologist, 24(2), 344–357.
Christensen, H., Griffiths, K., MacKinnon, A., & Jacomb, P. (1997). A
quantitative review of cognitive deficits in depression and
Alzheimer-type dementia. Journal of the International
Neuropsychological Society, 3, 631–651.
Considine, C., Weisenbach, S. L., Walker, S. J., McFadden, E. M., Franti,
L. M., Bieliauskas, L. A., …Langenecker, S. A. (2011). Auditory
memory decrements, without dissimulation, among patients with
major depressive disorder. Archives of Clinical Neuropsychology,
26, 445–453.
Cragar, D. E., Berry, D. T., Fakhoury, T. A., Cibula, J. E., & Schmitt, F. A.
(2006). Performance of patients with epilepsy or psychogenic non-
epileptic seizures on four measures of effort. The Clinical
Neuropsychologist, 20(3), 552–566.
Dodrill, C. B. (2008). Do patients with psychogenic nonepileptic seizures
produce trustworthy findings on neuropsychological tests?
Epilepsia, 49(4), 691–695.
Egeland, J., Lund, A., Landro N. I., Rund, B. R., Sudet, K., Asbjornsen,
A., …Stordal, K. I. (2005). Cortisol level predicts executive and
memory function in depression, symptom level predicts psychomo-
tor speed. Acta Psychiatr Scand, 112,434–441.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B., T., Kucharski,
B., Zuccato, B. G., & Roth, R. M. (2016). WAIS-IV processing
Psychol. Inj. and Law
speed scores as measures of non-credible responding –The third
generation of embedded performance validity indicators.
Psychological Assessment.
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE Complex
Ideational Material are associated with invalid performance in adults
without aphasia. Applied Neuropsychology: Adult.
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R., & Medoff,
B. (2014). Aggregating validity indicators embedded in Conners’
CPT-II outperforms individual cutoffs at separating valid from inva-
lid performance in adults with traumatic brain injury. Archives of
Clinical Neuropsychology, 29(5), 456–466.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of ma-
lingered amnesia measures with a large clinical sample.
Psychological Assessment, 6, 218–224.
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on
psychometric indicators of negative responsebias: a methodological
commentary with recommendations. Archives of Clinical
Neuropsychology, 19, 533–541.
Greve, K. W., Bianchini, K. J., & Doane, B. M. (2006). Classification
accuracy of the test of memory malingering in traumatic brain inju-
ry: results of a known-group analysis. JournalofClinicaland
Experimental Neuropsychology, 28(7), 1176–1190.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, L. (2004). Revised
comprehensive norms for an expanded Halstead-Reitan battery:
demographically adjusted neuropsychological norms for African
American and Caucasian adults.Lutz:PAR.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A.
(2005). WAIS digit-span-based indicators of malingered
neurocognitive dysfunction: classification accuracy in traumatic
brain injury. Assessment, 12(4), 429–444.
Jones, A. (2013). Test of memory malingering: cutoff scores for psycho-
metrically defined malingering groups in a military sample. The
Clinical Neuropsychologist, 27(6), 1043–1059.
King, J. H., Sweet, J. J., Sherer, M., Curtiss, G., & Vanderploeg, R. D.
(2002). Validity indicators within the Wisconsin card sorting test:
application of new and previously researched multivariate proce-
dures in multiple traumatic brain injury samples. The Clinical
Neuropsychologist, 16(4), 506–523.
Langenecker, S. A., Bieliauskas, L. A., Rapport, L. J., Zubieta, J. K.,
Wilde, E. A., & Berent, S. (2005). Face emotion perception and
executive functioning deficits in depression. Journal of Clinical
and Experimental Psychology, 27, 320–333.
Larrabbee, G. J. (2003). Detecting of malingering using atypical
performance patterns on standard neuropsychological tests. The
Clinical Neuropsychologist, 17(3), 410–425.
Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015).
Neuropsychologists’validity testing beliefs and practices: a survey
of North American professionals. The Clinical Neuropsychologist,
29(6), 741–746.
Moore, B. A., & Donders, J. (2004). Predictors of invalid neuropsycho-
logical performance after traumatic brain injury. Brain Injury,
18(10), 975–984.
Myers, L. (2014). Psychogenic non-epileptic seizures: a guide. North
Charleston: CreateSpace.
Pearson (2009). Advanced Clinical Solutions for the WAIS-IV and
WMS-IV –Technical Manual. San Antonio, TX: Author.
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the
test of memory malingering. Archives of Clinical Neuropsychology,
16, 501–506.
Rohling, M. L., Green, P., Allen, L. M., & Iverson, G. L. (2002).
Depressive symptoms and neurocognitive test scores in patients
passing symptom validity tests. Archives of Clinical Neuropsychology,
17, 205–222.
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine, B.,
Pangilinan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable
digit span is no more accurate than age corrected scaled score as
an indicator of invalid performance in a veteran sample undergoing
evaluation for mild TBI. The Clinical Neuropsychologist, 27(8),
1362–1372.
Stenclik, J. H., Miele, A. S., Silk-Eglit, G., Lynch, J. K., & McCaffrey, R.
J. (2013). Can the sensitivity and specificity of the TOMM be in-
creased with different cut-off scores? Applied Neuropsychology:
Adult, 20(4), 243–248.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card sorting test in
the detection of malingering in student simulator and patient
samples. Journal of Clinical and Experimental Psychology, 21(5),
701–708.
Suhr, J., Tranel, D., Wefel, J., &Barrash, J. (1997). Memory performance
after head injury: contributions of malingering, litigation status, psy-
chological factors, and medication use. Journal of Clinical and
Experimental Psychology, 19(4), 500–514.
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., &
Sweet, J. J. (2010). Effort indicators within the California Verbal
Learning Test-II (CVLT-II). The Clinical Neuropsychologist,
24(1), 153–168.
World Health Organization (2014). ICD-10: International statistical
classification of disease and related health problems: 10th revision
(2nd ed.). Geneva, Switzerland: Author.
Psychol. Inj. and Law
A preview of this full-text is provided by Springer Nature.
Content available from Psychological Injury and Law
This content is subject to copyright. Terms and conditions apply.