ArticlePDF Available

A Single Error Is One Too Many: The Forced Choice Recognition Trial of the CVLT-II as a Measure of Performance Validity in Adults with TBI


Abstract and Figures

Objective The Forced Choice Recognition (FCR) trial of the California Verbal Learning Test—Second Edition (CVLT-II) was designed to serve as a performance validity test (PVT). The present study was designed to compare the classification accuracy of a more liberal alternative (≤15) to the de facto FCR cutoff (≤14). Method The classification accuracy of the two cutoffs was computed in reference to psychometrically defined invalid performance, across various criterion measures, in a sample of 104 adults with TBI clinically referred for neuropsychological assessment. Results The FCR was highly predictive (AUC: .71–.83) of Pass/Fail status on reference PVTs, but unrelated to performance on measures known to be sensitive to TBI. On average, FCR ≤15 correctly identified an additional 6% of invalid response sets compared to FCR ≤14, while maintaining .92 specificity. Patients who failed the FCR reported higher levels of emotional distress. Conclusions Results suggest that even a single error on the FCR is a reliable indicator of invalid responding. Further research is needed to investigate the clinical significance of the relationship between failing the FCR and level of self-reported psychiatric symptoms.
Content may be subject to copyright.
A Single Error Is One Too Many: The Forced Choice Recognition Trial
of the CVLT-II as a Measure of Performance Validity in Adults with TBI
Laszlo A. Erdodi
*, Christopher A. Abeare
, Brent Medoff
, Kristian R. Seke
, Sanya Sagar
Ned L. Kirsch
Department of Psychology, University of Windsor, 168 Chrysler Hall South, Windsor, Canada ON N9B3P4
Department of Psychology, University of Windsor, 170 Chrysler Hall South, Windsor, Canada ON N9B3P4
The Commonwealth Medical College, 525 Pine St, Scranton, PA 18509, USA
University of Windsor, Brain-Cognition-Neuroscience Program, G105 Chrysler Hall North, Windsor, Canada ON N9B3P4
Department of Psychology, University of Windsor, 109 Chrysler Hall North, Windsor, Canada ON N9B3P4
Department of Physical Medicine and Rehabilitation, University of Michigan, Briarwood Circle #4 Ann Arbor, MI 48108, USA
*Corresponding author at: Department of Psychology, University of Windsor, 168 Chrysler Hall South, 401 Sunset Avenue, Windsor, Canada ON N9B 3P4.
Tel.: +519 253 3000x2202. E-mail address: (Laszlo Erdodi)
Editorial Decision 5 October 2017; Accepted 20 October 2017
Objective: The Forced Choice Recognition (FCR) trial of the California Verbal Learning TestSecond Edition (CVLT-II) was designed
to serve as a performance validity test (PVT). The present study was designed to compare the classication accuracy of a more liberal alter-
native (15) to the de facto FCR cutoff (14).
Method: The classication accuracy of the two cutoffs was computed in reference to psychometrically dened invalid performance,
across various criterion measures, in a sample of 104 adults with TBI clinically referred for neuropsychological assessment.
Results: The FCR was highly predictive (AUC: .71.83) of Pass/Fail status on reference PVTs, but unrelated to performance on measures
known to be sensitive to TBI. On average, FCR 15 correctly identied an additional 6% of invalid response sets compared to FCR 14,
while maintaining .92 specicity. Patients who failed the FCR reported higher levels of emotional distress.
Conclusions: Results suggest that even a single error on the FCR is a reliable indicator of invalid responding. Further research is needed
to investigate the clinical signicance of the relationship between failing the FCR and level of self-reported psychiatric symptoms.
Keywords: CVLT-II; Forced choice recognition; Traumatic brain injury; Performance validity assessment; Alternative cutoffs
The interpretation of neuropsychological tests rests on the assumption that examinees are able and willing to fully engage
with the tasks presented to them and therefore, demonstrate their maximal ability level (Delis, Kramer, Kaplan, & Ober,
2000). There is a growing consensus within the eld of neuropsychology that valid performance cannot be assumed by
default, but should be objectively evaluated (Boone, 2009;Chafetz et al., 2015;Heilbronner, Sweet, Morgan, Larrabee, &
Millis, 2009). Some even consider assessments that omit formal measures of test-taking effort incomplete (Iverson, 2003).
Along with free-standing performance validity tests [PVTs; Test of Memory Malingering (TOMM; Tombaugh, 1996); Word
Memory Test (WMT; Green, 2003); Validity Indicator Prole (VIP; Frederick, 2003)] that represent the traditional approach
to performance validity assessment, a growing array of embedded validity indicators (EVIs) have also been developed to help
clinicians determine the credibility of a given response set (Arnold et al., 2005;Erdodi, Sagar, et al., 2017;Greiffenstein,
Baker, & Gola, 1994;Larrabee, 2003).
The Forced Choice Recognition (FCR) task of the California Verbal Learning TestSecond Edition (CVLT-II; Delis
et al., 2000) falls somewhere in between these two categories of validity measures. It was introduced as an optional module
© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
with the explicit purpose of evaluating test-taking effort, and is administered 10 min after the original recall and recognition
trials are completed. These features are consistent with a free-standing PVT. However, FCR is dependent on the prior admin-
istration of the rest of the CVLT-II, which makes it an EVI.
The technical manual references a study by Connor, Drake, Bondi and Delis (1997) on an early experimental version of
FCR administered in conjunction with the original CVLT. On this instrument, a cutoff of 13 produced impressive classica-
tion accuracy (.80 sensitivity at .97 specicity) separating credible and simulated memory decits. Although the authors re-
frained from endorsing this or any other cutoff, they reported that over 90% of the CVLT-II normative sample obtained a
perfect score on FCR (16/16), with 1% scoring 14. Nobody scored 13. They suggested that its pronounced ceiling effect
in neurologically healthy individuals makes FCR a viable instrument for detecting non-credible responding in unsophisticated
examinees who blatantly exaggerate their memory problems.
Early studies on FCR in clinical samples provided indirect support for this claim. Baldo, Delis, Kramer and Shimamura
(2002) reported that all of the 11 patients with neuroradiologically conrmed focal frontal lesions and notable impairment in
acquisition, recall and recognition on the CVLT-II obtained perfect scores on FCR. Demonstrating that performance on FCR
is unrelated to brain lesions or credibly poor performance on the rest of the CVLT-II was an essential rst step in gaining
acceptance as a validity indicator.
The other requirement for validation was examining whether FCR can correctly identify individuals who fail other estab-
lished PVTs. Moore and Donders (2004) conducted the rst large scale concordance rate analysis against the TOMM in 132
clinically referred adults with TBI. Most of the sample was male (62.1%) and classied as mild TBI (54.5%). Mean age was
35.8 (SD =14.2) and mean level of education was 12.3 (SD =2.6). The majority (81.1%) obtained a perfect score on FCR;
6.8% scored 15 and 11.4% scored 14. The authors turned the base rate argument advanced by the test developers into an
explicit diagnostic threshold, dening failure on FCR as 14. The reference PVT was TOMM Trial 2 44 resulting in an
8.3% base rate of failure (BR
). FCR 14 produced a respectable combination of sensitivity (.55) and specicity (.93)
against this criterion. No alternative cutoff was considered. The authors expressed concerns that both the TOMM and FCR
might be too transparent as PVTs and thus, highly specic, but not very sensitive to invalid responding.
Bauer, Yantz, Ryan, Warden and McCaffrey (2005) examined the relationship between FCR and the WMT in a military
sample of 120 patients with TBI. Mean age was 28.4 (SD =9.2) and mean level of education was 13.4 (SD =2.3). The
,dened by the WMT at standard cutoffs, was 24.2%. Although the authors did not provide enough detail to compute
classication accuracy, the mean FCR value in the invalidgroup (14.9) was signicantly lower than in the validgroup
(15.7). Also, there was a positive linear relationship between WMT performance as a continuous variable (average of the IR,
DR and CNS subtests) and FCR: those with M
91% produced a M
of 15.8, while those with M
6170% pro-
duced a M
of 14.2. The authors concluded that while FCR was effective at discriminating those who passed and those
who failed the WMT, the mean difference was lower (0.8) than what was observed on Yes/No recognition hits raw scores
(2.0). They attributed this to the inherently lower cognitive demands of FCR paradigm compared to the Yes/No recognition
trial, which has a 3:1 distractor-to-target ratio. They also emphasized that FCR has high specicity, but low sensitivity to
invalid performance.
Root, Robbins, Chang and van Gorp (2006) investigated the relationship among FCR scores, memory impairment and per-
formance validity across three groups: a mixed clinical sample (n=25), a forensic sample with adequate effort (n=27) and
a forensic sample with inadequate effort (n=25). Performance validity was operationalized as passing or failing the TOMM
and/or the VIP among forensic patients, resulting in an overall BR
of 48%. Performance validity was not formally assessed
in the clinically referred patients; instead, they were assumed to have valid neuropsychological proles based on the lack of
apparent secondary gain. Given the emerging evidence that even university students with no incentive to appear impaired fail
PVTs (An, Zakzanis, & Joordens, 2012;An, Kaploun, Erdodi, & Abeare, 2017;Ross et al., 2016;Santos, Kazakov, Reamer,
Park, & Osmon, 2014), this logic (lack of apparent motivation to perform poorly =evidence of valid performance) seems
awed by current methodological standards that emphasize the importance of objective evidence from multiple independent
sources in establishing the credibility of a given neurocognitive prole (Boone, 2009,2013;Larrabee, 2012).
Nevertheless, Root et al. (2006) found that FCR scores were unrelated to delayed free recall performance. An FCR cutoff
of 15 produced .60 sensitivity at .81 specicity. Lowering the cutoff to 14 resulted in improved specicity (.93), but
decreased sensitivity (.44). The authors endorsed FCR as a brief screen of effortgiven its quick and easy administration and
strong positive predictive power. At the same time, they cautioned against using a passing score on FCR as evidence for cred-
ible performance. They also acknowledged the modality specicity as a potential confound in establishing the optimal cutoff
on FCR: the TOMM is a visually based PVT, while the VIP is non-memory based. As such, they may not be ideal reference
PVTs to cross-validate FCR.
Once FCRs ability to separate valid and invalid response sets had been established, later studies used it as an EVI. Some
of these reports provide indirect evidence that further consolidates the evidence base supporting its diagnostic utility.
2L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
For example, the investigation by Donders and Strong (2011) based on 100 clinically referred adults with TBI found that the
majority (85%) of the patients obtained a perfect score on FCR, 6% scored 15 and 9% scored 14. Although concordance rates
were not provided, 24% of the sample failed the WMT. The authors noted that FCR and WMT were unrelated to injury severity,
while other CVLT-II measures (Trials 15, LD-FR, d, Total Recall Discriminability) covaried with duration of coma.
Another method for assessing FCRs ability to differentiate invalid responding from credible impairment is to examine its
distribution in clinical populations with severe, credible neurological decits. Extremely low intellectual functioning (FSIQ <
70) and dementia are two conditions that have been identied as being at risk for high false positive rates on PVTs (Boone,
2009,2013). Marshall and Happe (2007) examined the BR
in several PVTs in a sample of 100 adults with intellectual dis-
ability (52% male, M
=36.6; M
=63). Mean FCR score was 15.1 (SD =1.9). A frequency distribution for a subset of
71 participants for which FCR data were available revealed that the majority (66.2%) obtained a perfect score, 18.3% of the
sample scored 15 and 15.5% scored 14.
Clark and colleagues (2012) demonstrated that FCR performance is a useful clinical marker of anterograde amnesia in later
stages of Alzheimers disease (AD). As such, in conjunction with other CVLT-II measures, it can aid the subtyping of late
life memory disorders and track disease progression in individuals diagnosed with a neurodegenerative disorder. Mean FCR
was 13.9 in the Alzheimers sample (n=37), 15.8 in the amnestic MCI sample (n=18), 15.7 in the non-amnestic MCI sam-
ple (n=19) and a near-perfect score in the control sample (n=35).
Research on FCR appears to converge in a few areas. First, the exceptionally good signal detection prole of the 13 cut-
off in the original experimental version has not been replicated. Second, the 14 cutoff performed well against established
PVTs, with classication accuracy hovering around the Larrabee limit: .50 sensitivity at .90 specicity (Lichtenstein,
Erdodi, & Linnea, 2017). Third, no alternative cutoff has been systematically evaluated, despite accumulating evidence that
the vast majority of credible individuals produce perfect scores on FCR, making 15 a logical candidate for a more liberal
cutoff. A recent systematic review of the literature on the FCRs classication accuracy found that the 14 cutoff tended to
sacrice sensitivity for specicity, and identied investigating the more liberal alternative cutoff (15) as a direction for future
research (Schwartz et al., 2016).
The implication of these ndings is that a score of 15 on FCR is more likely to be a Fail than a Pass. Even if it might not
constitute strong enough evidence to render the whole prole invalid, FCR =15 should be considered at least a red ag in
the evaluation of performance validity (D. Delis, personal communication, 10 May 2012). In fact, some researchers started
treating an FCR score of 15 as the rst level of invalid performance (Erdodi, Kirsch, Lajiness-ONeill, Vingilis & Medoff,
2014;Erdodi et al., 2016). Similarly, the authors of the newly introduced FCR trial to the child version of CVLT suggested
that even one incorrect response is indicative of suboptimal effort (Lichtenstein et al., 2017).
The present study was designed to investigate two psychometric issues involving FCR. First, we proposed to compare the
de facto FCR cutoff of 14 to its more liberal alternative (15) in a sample of clinically referred adults with TBI. We hypoth-
esized that raising the cutoff to 15 would improve the sensitivity of FCR, while maintaining acceptable specicity, as re-
ported in the child version of CVLT.
Second, based on earlier reports that active psychiatric conditions increase the BR
on PVTs (Moore & Donders, 2004),
we also hypothesized that performance on FCR would be related to self-ratings of emotional distress. Although previous
research suggests that failing one type of validity indicator is predictive of failing other types of validity measures
(Constantinou, Bauer, Ashendorf, Fisher, & McCaffrey, 2005), most clinicians seem to agree that the credibility of symptom
report and performance on cognitive tests are conceptually distinct and hence, should be assessed separately (van Dyke,
Millis, Axelrod, & Hanks, 2013). Overall, the link between performance validity and psychiatric functioning remains contro-
versial. Some investigators found that PVT failure was unrelated to depression (Considine et al., 2011;Egeland et al., 2005;
Rees, Tombaugh, & Boulay, 2001), while others reported an increase the BR
in patients with psychiatric disorders (Erdodi
et al., 2016). Recent research suggests that while PVT failure is unrelated to self-reported depression and anxiety, it has a
strong relationship with somatic symptoms (Erdodi, Sagar et al., 2017).
The sample consisted of 104 outpatients medically referred for neuropsychological assessment after TBI. The majority were
males (55.8%) and right-handed (90.4%). Mean age was 38.8 years (SD =16.7) and mean level of education was 13.7 years
(SD =2.6). Mean FSIQ
was 92.6 (SD =15.9). Using data on injury severity indices gathered through clinical interview
and the review of medical records (presence/length of loss of consciousness, neuroradiological ndings, peri-traumatic amnesia,
3L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
GCS score), 75.0% were classied as mild (mTBI). The rest were classied as moderate-to-severe. All patients were in the post-
acute stage of recovery (>3 months post mTBI and >1 year post moderate-to-severe TBI). As the assessments were conducted
for clinical purposes, no data were available on litigation status.
Axed battery of commonly used, standardized measures of general intelligence, learning, memory, attention, executive
functions, language, visuoperceptual and motor skills was administered to all patients (Table 1). Emotional functioning was
assessed with self-report inventories. Performance validity was evaluated using a combination of stand-alone and embedded
PVTs. As a free-standing PVT based on multiple trials separated by time-delay, the WMT represented the traditional approach
to performance validity research.
To address concerns about instrumentation artifacts as a threat to the internal validity of signal detection analyses (Root
et al., 2006), we developed two composites using ve independent validity measures to complement the WMT. The rst one
consists of PVTs based on recognition memory, labeled Erdodi Index FiveRecognition(EI-5
). The second consists of
EVIs that are not based on recognition memory, labeled Erdodi Index FiveNon-Recognition(EI-5
). Each component
of the EI-5s was recoded into a four-point scale: the performance reecting an incontrovertible Pass was assigned a value of
zero, while the most extreme level of failure was assigned the value of three, with intermediate levels of failure coded as one
and two, following the methodology described by Erdodi (2017). Table 2provides the details of the re-scaling process and re-
ferences to the cutoffs used.
In addition to aggregating multiple independent validity indicators and thus, increasing the overall diagnostic power of the
measurement model (Larrabee, 2003), an essential feature of the EI-5s is that they recapture the underlying continuity in per-
formance validity, distinguishing between near-passes and clear failures. An EI-5 score provides a summary of both the num-
berand extentof PVT failures. Since the practical demands of validity assessment require a dichotomous outcome, the
rst two levels were considered a Pass, and values of 4 were considered a Fail. EI-5 values 23 were considered borderline
(Table 3), and excluded from further analyses involving the EI-5 to ensure the purity of the criterion groups (Pass/Fail), a
methodological standard in calibrating new PVTs (Erdodi, Tyson, Abeare et al., 2017;Greve & Bianchini, 2004;Sugarman
& Axelrod, 2015).
To complement the WMT and the EI-5s, several other validity measures were used as reference PVTs to provide a more
representative sample of sensory modalities, testing paradigms and sensitivity to invalid responding. Including a variety of
independent PVTs is essential to keep multicollinearity at a minimum and thus, maximize the predictive power of the multi-
variate model of performance validity assessment (Boone, 2013;Larrabee, 2012).
The Word Choice Test (WCT) is a single-trial free-standing PVT based on the FCR paradigm (Pearson, 2009). Number of
hits on the Yes/No recognition trial of the CVLT-II (RH
) was selected because it is nested within the same test as FCR
Table 1. List of Tests Administered: Abbreviations, Scales and Norms
Test Name Abbreviation Norms
Beck Depression Inventory, 2nd edition BDI-II
Booklet Category Test BCT Heaton
California Verbal Leaning Test, 2nd edition CVLT-II Manual
ConnersContinuous Performance Test, 2nd edition
CPT-II Manual
Letter and Category Fluency Test FAS & Animals Heaton
Finger Tapping Test FTT Heaton
Greens Word Memory Test
WMT Manual
Peabody Picture Vocabulary Test, 4th edition PPVT-4 Manual
Symptom Checklist-90-Revised
SCL-90-R Manual
Tactual Performance Test TPT Heaton
Trail Making Test TMT Heaton
Wechsler Adult Intelligence Scale, 4th edition WAIS-IV Manual
Wechsler Memory Scale, 4th edition WMS-IV Manual
Wide Range Achievement Test, 4th edition WRAT-4 Manual
Wisconsin Card Sorting Test WCST Manual
Word Choice Test WCT Manual
Note: T: Heaton: Demographically adjusted norms published by Heaton, Miller, Taylor, & Grant (2004); Manual: Normative data published in the technical
Administered and scored on a computer.
4L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
and there are previous comparisons between the two tasks. The logistic regression equation developed by Wolfe and
colleagues (2010; LRE
) was the alternative CVLT-II-based reference PVT. Given reports of high false positive rates asso-
ciated with the original cutoff (.50), the more conservative alternative (.625) was used in cross-validation analyses
(Donders & Strong, 2011). The WAIS-IV Digit Span age-corrected scaled score (DS
) is a measure of auditory attention
and working memory that has been shown to be effective at detecting invalid responding (Axelrod, Fichteberg, Millis &
Wertheimer, 2006;Reese, Suhr, & Rife, 2012;Spencer et al., 2013).
Self-reported emotional functioning was assessed using the Beck Depression InventorySecond Edition (BDI-II) and the
Symptom Checklist-90-Revised (SCL-90-R). The BDI-II was designed to measure depressive symptoms consistent with the
DSM-IV (APA, 1996) diagnostic criteria (Beck, Steer, & Brown, 1996). Its brevity (21 items rated on a 4-point ordinal scale
[03]) combined with excellent psychometric properties and discriminant validity in both healthy controls and psychiatric pa-
tients make the BDI-II a popular screening tool for depression (Sprinkle et al., 2002;Storch, Roberti, & Roth, 2004).
The SCL-90-R is a widely used screening tool for a broad range of psychiatric symptoms in clinical populations with a
broad range of etiology (Derogatis, 1994) and in patients with TBI specically (Hooen, Barak, Vakil, & Gilboa, 2005). As
the name indicates, it contains 90 items self-rated on a 5-point ordinal scale [04] that converge into nine scales: somatization
(SOM), obsessive-compulsive symptoms (O-C), interpersonal sensitivity (I-S), depression (DEP), anxiety (ANX), hostility
(HOS), phobic anxiety (PHO), paranoid ideation (PAR) and psychotic symptoms (PSY). In addition, a Global Severity Index
(GSI) is computed to reect the mean of all items. The GSI has been found to be the most sensitive of the SCL-90-R indica-
tors to disruptions in emotional and social functioning post TBI (Baker, Schmidt, Heinemann, Langley & Miranti, 1998;
Marschark, Richtsmeier, Richardson, Crovitz, & Henry, 2000;Westcott & Alfano, 2005). Clinical elevations (T 63;
Table 2. The Components of the EI-5s and Base Rates of Failure at Given Cutoffs
Components EI-5 Values EI-5
Components EI-5 Values
0123 012 3
WMT 0 1 2 3 RDS 87 6 5
Base Rate 60.6 4.8 11.5 23.1 Base Rate 76.0 13.5 4.8 5.8
WCT >47 4447 4043 39 WCST FMS 12 3 4
Base Rate 74.0 10.6 8.7 6.7 Base Rate 87.5 7.7 1.9 2.9
VR Recognition >44 3 2 FTT None One Both
Base Rate 68.3 16.3 8.7 6.7 Base Rate 85.6 8.7 5.8
LM Recognition >20 2021 1819 17 Animals >13 1213 1011 9
Base Rate 80.8 13.5 1.0 4.8 Base Rate 86.5 6.7 2.9 3.8
VPA Recognition >35 3235 2831 27 CPT-II OMI 65 6680 81100 >100
Base Rate 71.2 16.3 6.7 5.8 Base Rate 74.0 4.8 6.7 14.4
Note: WMT (IR, DR & CNS): Word Memory TestNumber of failures on Immediate Recall, Delayed Recall & Consistency trials at standard cutoffs; WCT:
Word Choice Test (Pearson, 2009); VR: WMS-IV Visual Reproduction (Pearson, 2009); LM: WMS-IV Logical Memory (Bortnik et al., 2010;Pearson,
2009); VPA: WMS-IV Verbal Paired Associates (Pearson, 2009); RDS: Reliable Digit Span (Greiffenstein et al., 1994;Pearson, 2009); WCST FMS:
Wisconsin Card Sorting Test Failures to Maintain Set (Greve, Bianchini, Mathias, Houston & Crouch, 2002;Larrabee, 2003;Suhr & Boyer, 1999); FTT:
Finger Tapping Test, number of cutoffs failed of dominant hand raw score 28/35 and combined raw scores 58/66 (Arnold et al., 2005); Animals: Animal u-
ency raw score (Sugarman & Axelrod, 2015); CPT-II OMI: ConnersContinuous Performance Test, 2nd edition Omissions T-scores (Erdodi et al., 2014;Lange
et al., 2013;Ord, Boettcher, Greve, & Bianchini, 2010). The italic values represent the percent of the sample that scored within a given range of cutoffs.
Table 3. Frequency, Cumulative Frequency and Classication Range for the First Eight Levels of the EI-5s
EI-5 EI-5
fCumulative % fCumulative % Classication
0 42 40.4 49 47.1 PASS
1 12 51.9 15 61.5 Pass
2 12 63.5 10 71.2 Borderline
3 6 69.2 16 86.5 Borderline
4 2 71.2 4 90.4 Fail
5 7 77.9 2 92.3 Fail
6 6 83.7 2 94.2 FAIL
7 7 90.4 1 95.2 FAIL
8 1 91.3 3 98.1 FAIL
Note: EI-5
: Erdodi Index FiveRecognition memory based; EI-5
: Erdodi Index FiveNon-recognition memory based.
5L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
Derogatis, 1994) were also commonly observed on the O-C, I-S, DEP and PHO scales (Baker et al., 1998;Marschark et al.,
2000;Palav, Ortega, & McCaffrey, 2001;Westcott & Alfano, 2005).
Participants were assessed in two half-day appointments through the neurorehabilitation service of a Midwestern academic
medical center. Psychometric testing was completed in an outpatient setting by trained psychometricians. A staff neuro-
psychologist conducted the clinical interview and review of medical records, wrote the integrative report and provided feed-
back to patients. Data were collected through an archival retrospective chart review of a consecutive series of TBI referrals.
The study was approved by the Institutional Review Board. Ethical guidelines regulating research with human participants
were followed throughout the project.
Data Analysis
Descriptive statistics (frequency, percentage and cumulative percentage; mean, standard deviation) were computed for the
key variables. Signicance testing was performed using the F- and t-tests as well as χ
. ANOVAs were followed up with
uncorrected t-tests. Since all post hoc contrasts were a priori planned comparisons, no statistical correction was applied
(Rothman, 1990;Perneger, 1998). In addition, the tension between statistical versus clinical signicance was resolved by con-
sistently reporting effect size estimates associated with each relevant contrast: partial eta squared (η
), Cohensdand Ф
Receiver operating characteristics (ROC) analyses [area under the curve (AUC) with 95% CI] were performed using SPSS
22.0. The rest of the classication accuracy parameters [sensitivity, specicity, positive and negative likelihood ratio (+LR
and LR)] were computed using standard formulas.
Mean scores on tests of cognitive ability ranged from Low Average to Average (Table 4). The mean FCR score in the sam-
ple was 15.4 (SD =1.4; range: 916). The median value was 16. The distribution was negatively skewed (2.48) and had a
strong positive kurtosis (+6.47). The majority of the sample (75.0%) obtained a perfect score on FCR; 6.7% scored 15 and
18.3% scored 14.
The Effect of Age, Education, Cognitive Functioning and Injury Severity on FCR
As the study focused on comparing the discriminant power of two cutoffs (FCR 14 and 15) against a perfect score, the
sample was divided into three groups: FCR =16, FCR =15 and FCR 14. This trichotomy was used as the independent var-
iable (IV) for a series of ANOVAs with age, education and cognitive functioning as dependent variables (DVs).
There was no difference in age among groups. However, there was a signicant overall effect on level of education, driven
by the higher mean of FCR =16 subsample. A medium effect was observed on word knowledge, picture vocabulary and sin-
gle word reading performance. ANOVAs were not signicant for BCT (Total Errors), TPT (Total Time) and TMT-B T-scores
(Table 5).
Likewise, the three groups did not differ in TBI severity (percentage of mTBI patients and those with positive neuroradio-
logical ndings). In addition, the mTBI subsample was almost three times more likely to fail the old FCR cutoff (14; BR
21.8%) than the subsample with moderate-to-severe TBI (BR
=7.7%). Similarly, patients with mTBI were twice as
likely to fail the alternative FCR cutoff (15; BR
=28.2%) than the subsample with moderate-to-severe TBI
The Classication Accuracy of FCR Against Reference PVTs
All ROC models evaluating the level of agreement between FCR and reference PVTs were statistically signicant (p<
.01). AUC values ranged from .71 (DS
) to .83 (RH
). The most stable AUC estimates were obtained against the
WMT (95% CI: .65.85), while the least stable estimates were observed on EI-5
(95% CI:.61.93).
ROC analyses were followed up with direct comparisons between the classication accuracy of the old FCR cutoff (14)
and the proposed alternative (15) against the reference PVTs. All cross-validation analyses met the minimum standard of
6L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
specicity (.84; Larrabee, 2003), with values ranging from .85 to .98. Sensitivity was more variable, uctuating between .40
and .72. The BR
in reference PVTs ranged from 10.6% (TMT-A) to 38.5% (WMT).
FCR 14 produced a sensitivity of .40 against the WMT, at .95 specicity. The switch to 15 increased sensitivity to .47,
while preserving the same specicity. Classication accuracy was comparable between the two cutoffs against WCT (.48.50
sensitivity at .93 specicity). The new cutoff outperformed the old one against EI-5
in sensitivity (.52/.44) while both
Table 4. Group-Level Performance on the Tests Administered
Test Name Measure MSDDescriptive Range
Animals T-score 42.6 12.0 Low Average
BDI-II Total Raw Score 15.3 11.5 Mild Depression
BCT Total Errors T-score 41.7 13.8 Low Average
CVLT-II Trials 15 T-score 45.1 14.5 Average
LD-FR z-score 1.03 1.54 Low Average
CPT-II Omissions T-score 73.4 61.5 Elevated
Commissions T-score 52.2 11.2 Within Normal Limits
Hit RT T-score 53.7 14.6 Within Normal Limits
FTT Dominant Hand T-score 45.4 12.5 Average
WMT % Fail 38.5 N/A
PPVT-4 Standard Score 98.1 13.8 Average
SCL-90-R GSI T-score 62.5 12.7 Within Normal Limits
TPT Total Time T-score 45.0 13.7 Average
TMT Trails A T-score 43.0 13.5 Low Average
Trails B T-score 43.3 13.9 Low Average
WAIS-IV VCI Standard Score 95.1 15.3 Average
PRI Standard Score 96.2 16.4 Average
WMI Standard Score 92.7 15.6 Average
PSI Standard Score 89.4 16.4 Low Average
WMS-IV LM I Age-Corrected Scaled Score 8.1 3.5 Low Average
LM II Age-Corrected Scaled Score 7.6 3.4 Low Average
VPA I Age-Corrected Scaled Score 8.5 3.5 Low Average
VPA II Age-Corrected Scaled Score 8.6 3.7 Average
VR I Age-Corrected Scaled Score 8.6 3.7 Average
VR II Age-Corrected Scaled Score 7.9 3.1 Low Average
WRAT-4 Word Reading Standard Score 93.9 12.5 Average
WCST Perseverative Errors T-score 46.9 11.1 Average
WCT Total Accuracy Raw Score 47.6 3.7 Pass
Note: LD-FR: Long-delay free recall; RT: Reaction Time; GSI: Global Severity Index; VCI: Verbal Comprehension Index; PRI: Perceptual Reasoning Index;
WMI: Working Memory Index; PSI: Processing Speed Index; LM: Logical Memory; I: Immediate Recall; II: Delayed Recall; VPA: Verbal Paired
Associates; VR: Visual Reproduction. Values for standard deviations were italicized.
Table 5. Age, Education, Injury Severity and Performance on Select Neuropsychological Tests as a Function of Trichotomized FCR Scores
Reading Total Total B
16 M 37.5 14.0 9.8 100.1 95.7 42.6 46.1 44.8 71.8 67.7
SD 15.6 2.8 3.0 14.1 12.0 13.9 14.1 13.3
15 M 47.0 12.0 9.0 96.4 91.4 43.2 44.7 41.1 71.4 66.7
SD 12.1 1.3 3.1 14.1 13.8 13.2 8.2 7.9
14 M 41.4 12.8 7.9 90.1 87.5 38.1 39.2 37.7 89.5 71.4
SD 10.2 1.9 2.2 9.4 12.4 13.6 12.2 17.0
p.18 .05 <.05 <.05 <.05 .43 .28 .12 .27 .96
.03 .06 .06 .08 .07 .02 .03 .04 Φ
.03 .00
Sig. post hocs None 01020202 None None 02——
d.92 .72 .83 .67 —— .47 ——
Note. FCR: CVLT-II Forced Choice Recognition trial raw score; ED: years of formal education; VC: WAIS-IV Vocabulary age-corrected scale score; PPVT:
Peabody Picture Vocabulary Test, 4th edition; WRAT-4 Reading: Wide Range Achievement Test, 4th edition, Reading subtest standard score; BCT: Booklet
Category Test Total Errors T-score; Tactual Performance Test Total Time T-score; TMT-B: Trail Making Test B T-score; % mTBI: % patients with mild trau-
matic brain injury (vs. those with moderate-to-severe TBI); +NR: positive neuroradiological ndings; Sig. post hocs: pairwise comparisons with p<.05; 0:
FCR =16 (n=78); 1: FCR =15 (n=7); 2: FCR 14 (n=19). Values for standard deviations were italicized.
7L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
maintained very high specicity (.98). A similar pattern of increased sensitivity (.50/.58) and steady specicity (.91/.90) was
observed against EI-5
as the analyses shifted from the old to the new cutoff. Sensitivity spiked against RH
with both
cutoffs (.65/.72) in the backdrop of good specicity (.93/.92). Again, the new cutoff outperformed the old one against DS
in sensitivity (.45/.53) while producing the same specicity (.89). Overall, the new cutoff increased sensitivity from .50 to .56
compared to the old one, while preserving the same specicity (.92).
This pattern of consistently higher sensitivity and essentially unchanged specicity associated with the new cutoff was also
observed at the level of LRs (Table 6). With the exception of WCT, FCR 15 produced higher +LRs than FCR 14 against
the reference PVTs. The new cutoff had consistently lower LRs against the all reference PVTs than the old cutoff, suggest-
ing superior classication accuracy.
The Relationship Between FCR and Emotional Functioning
The majority of the sample (54.1%) scored in the non-clinical range on the SCL-90-R using a GSI T-score 63 as the cut-
off. However, only 38.5% had fewer than two elevations (T 63) on the nine clinical scales, the other criterion for establish-
ing the presence of clinically signicant distress (Derogatis, 1994). The number of clinical elevations (M=3.6, SD =3.3)
produced a bimodal distribution with two distinct clusters: patients with either zero (25.0%) or nine (14.6%) scores 63.
ANOVAs using the trichotomized FCR (16, 15 and 14) as IV and the SCL-90-R scales as DVs produced signicant
main effects for all SCL-90-R scales except ANX and PHO. Effect sizes (η
) ranged from .08 (medium) on HOS to .18
(large) on PSY. All post hoc contrasts were signicant between FCR =16 and FCR =15 subsamples except ANX and PHO.
Effect sizes (d) ranged from .87 (large) on SOM to 1.67 (very large) on O-C. All post hoc contrasts were signicant between
FCR =16 and FCR 14 subsamples except HOS. Effect sizes (d) ranged from .62 (medium) on PHO to .83 (large) on PSY.
When SCL-90-R scores were dichotomized around the T 63 cutoff into clinicalversus non-clinical, non-parametric
contrasts produced essentially the same results (Table 7). One comparison (PAR) became non-signicant. All effect size esti-
mates (Φ
) were within .02 of η
values produced by ANOVAs with the exception of the GSI.
All three groups produced saw-tooth proles, with distinct spikes on O-C, DEP and PSY (Fig. 1). FCR =16 subsample
had only one mean 63 on O-C, and on average had 2.9 elevations (SD =3.2). The FCR =15 subsample produced mean
T63 on all scales, and on average had 6.9 elevations (SD =1.9). FCR 14 subsample produced mean T 63 on SOM, O-C,
DEP, PSY and the GSI, and on average had 5.4 elevations (SD =3.1).
ANOVAs were repeated on the BDI-II, producing a large effect (η
=.17), driven by the non-clinical range score of the
FCR =16 group (M=12.5, SD =10.5). FCR =15 group (M=24.9, SD =6.9) did not differ from the FCR 14 group
(M=22.9, SD =11.8). Both of these means were in the range of moderate clinical depression, and signicantly higher than
FCR =16 mean (d=.93 and 1.40, large).
Table 6. A Direct Comparison between the Classication Accuracy of the Two FCR Cutoffs against Reference PVTs
Cutoff Standard 47 4410 .625 6
38.5 27.0 37.2 17.9 19.2 18.0 21.2
AUC .75 .72 .78 .77 .83 .74 .71
p<.001 <.01 <.001 <.001 <.001 <.01 <.01
95% CI .65.85 .59.85 .67.90 .61.93 .71.95 .60.89 .59.85
15 25.0 SENS .47 .50 .52 .58 .72 .59 .53
SPEC .95 .93 .98 .90 .92 .88 .89
+LR 9.88 6.70 27.5 6.13 9.51 5.03 4.56
LR 0.56 0.54 0.49 0.46 0.30 0.47 0.54
14 18.3 SENS .40 .48 .44 .50 .65 .56 .45
SPEC .95 .93 .98 .91 .93 .89 .89
+LR 8.53 7.03 23.6 5.33 9.10 5.06 4.14
LR 0.63 0.57 0.57 0.55 0.38 0.50 0.61
Note: WMT: Word Memory Test (Green, 2003); WCT: Word Choice Test (Pearson, 2009); EI-5
: Erdodi Index FiveRecognition based; EI-5
Index FiveNon-recognition based; RH
: CVLT-II Yes/No Recognition hits raw score (Wolfe et al., 2010); LRE
: Logistical regression equation based
on a combination of three CVLT-II scores: long-delay free recall raw score, total recall discriminability z-score and draw score (Donders & Strong, 2011;
Wolfe et al., 2010); DS
: Digit Span age-corrected scaled score (Axelrod et al., 2006;Spencer et al., 2013); BR
: Base rate of failure (%); AUC: Area under
the curve; FCR: CVLT-II forced choice recognition; SENS: Sensitivity; SPEC: Specicity; +LR: Positive likelihood ratio; LR: Negative likelihood ratio;
Number of participants with FCR 15 is 26; Number of participants with FCR 14 is 19. The italic values represent base rates of failure.
8L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
To investigate whether these ndings would generalize to other PVTs, a series of independent t-tests were performed
between patients who passed and those who failed the WMT on SCL-90-R and BDI-II scores. All contrasts were signicant,
with the Fail group reporting higher levels of symptoms. Effect size estimates ranged from .46 (medium) to 1.01 (large).
The analyses were repeated using a series of ANOVAs with the EI-5
as trichotomous independent variable (Pass/
Borderline/Fail) and the SCL-90-R and BDI-II scores as dependent variables. All ANOVAs were signicant (η
: .06.12;
medium-large effects) with the exception of the SOM scale (Table 8). The only post hoc contrast that consistently reached sig-
nicance was between the Pass and Fail groups, with effect sizes ranging from .43 (medium) to .87 (large). Unlike with
FCR, there was a linear relationship between level of PVT failure and self-reported emotional distress, with the Pass group re-
porting the least, the Fail group reporting the most emotional distress, with the Borderline group in the middle (Fig. 2).
Table 7. SCL-90-R Scores as a Function of FCR Performance
16 M 58.3 63.6 54.3 60.2 55.5 54.3 55.5 53.0 56.4 59.8 2.9 12.5
SD 12.4 11.8 13.2 11.5 11.9 11.3 11.9 12.8 11.7 12.2 3.2 10.5
34.2 47.9 29.2 39.7 31.5 23.3 31.9 21.9 35.6 32.9 49.3 19.2
15 M 67.7 78.9 70.4 73.3 63.5 65.1 63.4 66.3 72.3 75.0 6.9 24.9
SD 9.0 3.5 9.7 5.6 15.6 7.9 15.6 11.6 7.1 6.6 1.9 6.9
71.4 100 85.7 100 57.1 57.1 57.1 57.1 100 100 100 57.1
14 M 66.4 71.6 63.6 67.9 62.1 58.8 62.1 61.2 65.9 68.9 5.4 22.9
SD 12.5 12.3 12.7 10.2 12.7 10.5 13.4 10.6 11.1 11.7 3.1 11.8
72.2 83.3 55.6 72.2 50.0 44.4 50.0 38.9 72.2 77.8 94.4 61.1
ANOVA p<.05 <.01 <.01 <.01 NS <.05 NS <.01 <.01 <.01 <.01 <.01
.09 .15 .14 .13 .08 .11 .18 .15 .16 .17
Sig. post hocs 01010101NS01NS 0101010101
0202020202NS 020202020202
dfor 01 .87 1.67 1.39 1.45 1.11 1.09 1.64 1.55 1.55 1.40
dfor 02 .65 .66 .74 .71 .61 .52 .63 .83 .76 .79 .93
p<.01 <.01 <.01 <.01 NS .05 NS NS <.01 <.01 <.01 <.01
.11 .13 .12 .14 .06 —— .17 .21 .18 .15
Note. All SCL-90-R scales are in T-scores (M=50, SD =10); FCR: CVLT-II Forced Choice Recognition trial raw score; SCL-90-R: Symptom Checklist-90-Revised;
SOM: somatic distress; O-C: obsessive-compulsive symptoms; I-S: interpersonal sensitivity; DEP: depression; ANX: anxiety; HOS: hostility; PHOB: phobic anxiety; PAR:
paranoia; PSY: psychotic symptoms; GSI: Global Severity Index; Σ63: Sum of T-scores 63 on the SCL-90-R clinical scales; BDI-II: Deck Depression Inventory
Second Edition; %
: Percent of the subsample scoring T 63 on the SCL-90-R clinical scales; percent of the subsample with two or more scores T 63 on Σ63; and
percent of the subsample with BDI-II raw score 20 (cutoff for Moderate Depression); Sig. post hocs: pairwise comparisons with p<.05; 0: FCR =16 (n=78);1:FCR
=15 (n=7);2:FCR14 (n=19). Italic and bold values represent standard deviations and percent of the sample above the clinical threshold/phi-squared, respectively.
SCL-90-R Scale
FCR = 16
FCR = 15
FCR 14
Fig. 1. SCL-90-R proles associated with three levels of FCR performance; number of participants with perfect score on the FCR is 78; number of partici-
pants with FCR =15 is 7; number of participants with FCR 14 is 19.
9L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
The present study was designed to compare the de facto FCR cutoff (14) to a more liberal alternative (15) in a sample
of clinically referred patients with TBI. Both cutoffs performed around the Larrabee limit(.50 sensitivity at .90 specicity).
The hypothesis that increasing the cutoff will improve sensitivity while maintaining specicity was supported by the data. On
average, FCR 15 correctly classied an additional 6% of the invalid response sets, while maintaining a false positive rate of
<10%. Likewise, the alternative cutoff produced comparable or better classication accuracy at the level of likelihood ratios.
This pattern of ndings was remarkably consistent across a wide range of reference PVTs, including auditory and visual,
univariate and multivariate criteria, free-standing and embedded PVTs, indicators based on the FCR paradigm and those
derived from tests of attention. The replicable superiority of the new cutoff against a variety of criterion measures addresses
previous concerns about modality specicity (Erdodi, Tyson, Abeare et al., 2017;Erdodi, Tyson, Shahein et al., 2017), and
Table 8. SCL-90-R and BDI-II Scores as a Function of Passing or Failing the WMT and the EI-5
Pass M 57.1 62.0 54.1 60.0 54.2 53.4 53.7 53.2 56.2 58.8 2.6 12.3
SD 12.5 11.8 13.1 11.2 13.5 11.2 10.9 12.6 11.9 12.2 3.0 10.4
Fail M 65.3 73.0 62.4 66.7 62.2 60.0 63.1 59.1 64.2 68.7 5.3 20.3
SD 11.6 10.0 13.4 11.3 13.6 10.3 13.4 12.9 11.7 11.0 3.2 11.7
p<.01 <.01 <.01 <.01 <.01 <.01 <.01 <.05 <.01 <.01 <.01 <.01
d.68 1.01 .63 .60 .59 .61 .77 .46 .68 .85 .87 .72
Pass M 58.8 62.3 53.8 59.9 53.4 52.2 54.1 52.8 56.1 58.7 2.6 12.3
n=51 SD 12.7 11.4 13.3 11.5 13.9 11.2 11.1 13.0 12.1 12.7 3.0 10.7
Borderline M 61.1 67.6 56.9 63.8 60.4 58.1 58.6 56.7 59.0 63.7 4.4 17.9
n=18 SD 11.1 13.3 13.1 10.5 10.4 11.7 12.3 11.9 10.6 11.0 3.2 12.3
Fail M 64.8 72.1 63.3 66.3 62.0 59.3 62.0 59.3 64.9 68.6 5.0 19.0
n=29 SD 12.7 11.0 13.3 11.8 14.5 10.3 14.2 13.0 12.4 11.6 3.4 11.2
p.06 <.01 <.01 .06 <.05 <.05 <.05 .09 <.01 <.01 <.01 <.05
.06 .13 .09 .06 .09 .06 .08 .05 .10 .12 .12 .08
.43 .87 .72 .55 .61 .66 .62 .50 .72 .81 .75 .61
Note: All SCL-90-R scales are in T-scores (M=50, SD =10); SCL-90-R: Symptom Checklist-90-Revised; SOM: somatic distress; O-C: obsessive-
compulsive symptoms; I-S: interpersonal sensitivity; DEP: depression; ANX: anxiety; HOS: hostility; PHOB: phobic anxiety; PAR: paranoia; PSY: psychotic
symptoms; GSI: Global Severity Index;; Σ63: Sum of T-scores 63 on the SCL-90-R clinical scales; BDI-II: Deck Depression InventorySecond Edition;
WMT: Word Memory Test; EI-5
: Erdodi Index FiveRecognition based; d
: Cohensdfor the Pass vs. Fail post hoc contrast. Italic and bold values
represent standard deviations and Cohensd, respectively.
SCL-90-R Scale
Pass (0–1)
Borderline (2–3)
Fail (4)
Fig. 2. SCL-90-R proles associated with the three levels of EI-5
performance; number of participants in the Pass range (01) is 51; number of partici-
pants in the Borderline range (23) is 18; number of participants in the Fail range (4) is 29.
10 L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
provides empirical validation to earlier predictions that even a single error on FCR may indicate invalid responding (D. Delis,
personal communication, 10 May 2012). Our results are also consistent with research on the child version of FCR
(Lichtenstein et al., 2017). In addition, the consistently high specicity and +LR of the new cutoff against multiple reference
PVTs suggests that the more liberal FCR cutoff does not inate false positive rates.
Equally importantly, subsamples with FCR scores 16, 15 and 14 did not differ from each other in injury severity, neuro-
radiological ndings, or on the measures known to be sensitive to TBI (Booklet Category Test, Tactual Performance Test and
Trails B; Grant & Adams, 1996). These ndings suggest that FCR is independent of objective measures of impairment, con-
sistent with previous reports (Baldo et al., 2002;Donders & Strong, 2011). The fact that, paradoxically, a signicant differ-
ence emerged on holdtests (Boone, 2013) known to be resistant to the deleterious effects of TBI (i.e., word knowledge,
picture vocabulary and single word reading) provides further evidence that FCR is unrelated to cognitive impairment subse-
quent to TBI. In fact, internally inconsistent patterns of test scores have been identied as emergent markers of invalid re-
sponding (Boone, 2013;Larrabee, 2012;Slick, Sherman & Iverson, 1999).
Furthermore, there was a reverse injury severity effecton FCR. In other words, patients with mTBI were two to three
times more likely to fail FCR cutoffs compared to patients with moderate-to-severe TBI. Although counterintuitive, this phe-
nomenon is well-replicated in the research literature (Carone, 2008;Erdodi & Rai, 2017;Green, Iverson, & Allen, 1999;
Green, Flaro, & Courtney, 2009;Sweet, Goldman, & Guidotti Breting, 2013). In the broader context of this well-established
apparent paradox of elevated BR
in mTBI, the current results should alleviate concerns about false positive errors on FCR
due to genuine neurological impairment.
The second hypothesis that performance on FCR would be related to self-reported emotional distress was also supported.
Patients who obtained a perfect score on FCR had the lowest level of depression on SCL-90-R and BDI-II, both as continuous
scales and as percentage in the clinical range. Those who made any error on FCR reported more severe psychiatric symptoms
globally, with large to very large effect sizes. These ndings are consistent with some of the existing literature that documents
a link between psychiatric history and invalid performance on neurocognitive testing (Martens, Donders, & Millis, 2001;
Moore & Donders, 2004), but contradicts other reports that anxiety and depression are unrelated to PVT failure (Ashendorf,
Constantinou & McCaffery, 2004;Considine et al., 2011;Egeland et al., 2005;Rees et al., 2001).
The divergence between our study and some previous investigations on PVTs and psychological distress may be driven by
two main factors. First, many of them operationalized performance validity using a single criterion measure, such as the
TOMM or the Rey 15-item test at traditional cutoffs (Trial 2 <45 and free recall <9, respectively), which are known to have
limited sensitivity to invalid responding (Green, 2007;Reznek, 2005). Therefore, those negative ndings may reect unde-
tected invalid proles. Second, those studies focused on psychiatric disorders, whereas our sample was comprised of patients
with TBI, some of whom also reported emotional problems. As such, our positive ndings could be due to the additive effect
of neuropsychological decits subsequent to TBI, pre-existing or emerging decits in emotional regulation, or other contex-
tual factors uniquely related to TBI and post-TBI depression and anxiety.
While the evidence linking depression and memory decits is mixed both within and between studies (Bearden et al., 2006;
Ilsley, Moffoot, & Carroll, 1995;Keiski, Shore, & Hamilton, 2007;Kessels, Ruis, & Kappelle, 2007;Langenecker et al., 2005;
Raskin, Mateer, & Tweeten, 1998), there is growing evidence that memory tests are impacted more by invalid responding than
psychiatric disorders (Boone, 2013;Coleman, Rapport, Millis, Ricker, & Farchione, 1998;Larrabee, 2012;Suhr, Tranel, Wefel,
& Barrash, 1997;Trueblood, 1994). In fact, Rohling, Green, Allen and Iverson (2002) argue that a meaningful investigation of
the interaction between depression and cognitive functioning must exclude individuals who fail PVTs. Our ndings are congru-
ent with this line of research on co-existing TBI, self-reported emotional problems and PVT failures.
As FCR performance correlates with scores on both PVTs and psychiatric symptom inventories, the clinical interpretation
of failing this validity indicator is a challenge. The group-level pattern of scores observed in this sample ts several criteria of
Cogniform Disorderintroduced by Delis and Wetter (2007): internally inconsistent neurocognitive proles, combination of
test scores that are rare in patients with genuine neurological impairment, and objective evidence of poor effort. Although the
observational data presented in this study does not allow for causal attributions, they raise some important questions. Does
genuine emotional distress increase vulnerability to PVT failures? Do patients with non-credible presentation exaggerate both
emotional distress and cognitive decits? Are certain PVTs more sensitive than others to both forms of invalid responding?
Although there is an emerging consensus that symptom and performance validity are distinct constructs and therefore,
should be evaluated separately (van Dyke et al., 2013), it is plausible that they share part of their etiology. If the link between
FCR and psychiatric symptoms is replicated in future studies, failing FCR might become a marker of not only invalid perfor-
mance, but perhaps also of psychogenic interference”—a failure to demonstrate ones true ability level on cognitive testing
due to acute psychiatric symptoms. It is interesting that the FCR =15 group reported more severe psychiatric symptomatol-
ogy than the FCR 14 group. Also, the FCR =16 group produced a pattern of performance that is consistent with the bona
de cognitive sequelae subsequent to TBI (i.e., intact performance on holdtests, and mild decits on measures known to be
11L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
sensitive to head injury). In contrast, the FCR 14 group demonstrated uniformly low performance across both types of tests,
with the FCR =15 group in between.
It is possible that there are group-level differences in the etiology of PVT failures, with the more heavily psychogenic in-
uences having a milder impact than other factors that are known to have strong effects on PVT performance, such as external
incentives to appear impaired (Boone, 2013;Larrabee, 2012). However, this cannot be determined with the current sample,
given the absence of data on litigation status. While previous research found that certain PVTs appear to be uniquely sensitive
to emotional distress (Erdodi et al., 2016), it failed to disentangle the relative contribution of psychogenic interference and
volitional suppression of performance on cognitive testing.
The cumulative clinical evidence suggests that the etiology of invalid performance is likely multifactorial. A PVT failure
can be the expression of several contributing and potentially interacting factors and hence, does not automatically mean delib-
erate suppression of cognitive ability (i.e., malingering). A full consideration of alternative explanations to non-credible pre-
sentation is instrumental in providing an accurate, nuanced and clinically helpful interpretation of neurocognitive proles
(Bigler, 2012,2015). Developing a conceptually sound and empirically supported model for subtyping non-credible respond-
ing has important forensic and clinical applications.
For example, in personal injury litigation, multiple unequivocal PVT failures raise the possibility of malingering and thus,
have obvious implications for the legitimacy of the lawsuit. In contrast, a neuropsychologists conclusion that a plaintiff failed
to put forth adequate effort, but not deliberately so, may shift the focus to exploring other plausible clinical issues that may or
may not be related to the accident (depression, unresolved developmental trauma, exacerbation of a pre-existing psychological
vulnerability, righteous anger towards the perpetrator of the injury, etc.). In those cases, the assessors responsibility is to (a)
determine whether the data are consistent with an alternative accident-related etiology; (b) render an opinion that even if psy-
chogenic factors are operative, they cannot account for the level of impairment demonstrated during testing, or (c) conclude
that regardless of the reason behind unexpectedly low scores, they cannot be attributed to accident-related factors.
Even in clinical settings and in the absence of apparent external incentives to appear impaired, assessors often face the
complex task of interpreting co-existing PVT failures and medically veried neurological problems (Erdodi et al., 2016). In
such cases, it is the neuropsychologists responsibility to determine whether (a) low scores are a manifestation of a legitimate
disease process; (b) even in the context of documented severe impairments the low scores are still not credible; or (c) indepen-
dent of neurological manifestations, ancillary issues are contributing to low performance, such as living with a debilitating
neurological impairment for many years has resulted in unremitting dependence or chronic resignation in the face of cognitive
These considerations are important for optimizing the clinical management of the patient. If an evaluation is deemed valid
(i.e., PVT failures are attributable to despondent resignation that deated scores throughout testing), certain aspects of the pa-
tients impairment might be reversible. In such cases referral for psychotherapy or cognitive rehabilitation has the potential to
restore some of the cognitive functioning.
For example, in the present sample elevations on SCL-90-R were related to errors on FCR and failures on other PVTs. If
self-reported psychopathology is causally related to invalid responding, treating the psychiatric symptoms could conceivably
improve cognitive performance. Although speculations about the reasons behind poor efforts are epistemologically risky, pro-
viding a sound, albeit tentative, explanation could be important, as the clinical outcome hinges on the correct interpretation of
non-credible presentation. Beyond the simple valid/invaliddichotomy, the assessor carries the responsibility of determining
whether a meaningful intervention is feasible. Erring on either side can be costly. Dismissing a patient as non-credible may
deprive the individual of the opportunity to recover lost function. Recommending therapy for a malingerer may allocate lim-
ited health care services to an individual who is invested in appearing impaired and thus, is unlikely to benet from the
In conclusion, FCR scores should be interpreted in the larger context of injury severity, clinical and psycho-social history,
incentive status as well as the rest of the neurocognitive prole. Marginal failures (FCR =15) likely have a different clinical
meaning in patients with medically veried severe pathology and those with mild or questionable TBI. In the former group, a
single error may be a direct manifestation of the injury. Conversely, in the latter group it may raise concerns about non-
neurological factors contributing to the presentation.
The present study has several strengths. It provided a direct comparison of the classication accuracy of two different FCR
cutoffs across a wide range of reference PVTs in a clinically referred sample with mild and moderate-to-severe TBI. We also
examined the link between FCR failures and self-reported emotional functioning. Inevitably, the study has a number of limita-
tions, too: the sample is relatively small and geographically restricted. More importantly, the FCR =15 subsample was too
small to draw denite conclusions about the neurocognitive prole of patients who only failed the liberal cutoff on FCR.
In addition, as psychiatric symptoms were assessed using face-valid self-report inventories without built-in validity scales,
the veracity of these data is unknown, which is a considerable limitation of our measurement model. However, given the
12 L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
limited research on the link between emotional functioning and performance validity, documenting a systematic difference in
the level of self-reported psychiatric symptoms as a function of passing or failing PVTs is a meaningful initial step towards a
better understanding of this complex relationship. The fact that previous research that controlled for response bias in self-
report produced similar results (Erdodi, Sagar et al., 2017;Erdodi, Seke et al. 2017) suggests that the shared variance between
elevated symptom report and PVT failure cannot be attributed to a common malingering factor(i.e., the same people fabri-
cate/exaggerate both psychiatric problems and cognitive decits). More importantly, the nature of the data (archival/observa-
tional) precludes causal modeling of the main effects. Prospective experimental and longitudinal studies that can separate
invalid performance from psychiatric history by design are needed to determine the clinical meaning of FCR failuresevi-
dence of non-credible responding, emotional distress or both?
Even a single error on the FCR is a reliable marker of invalid responding. Based on its superior classication accuracy,
15 should replace the current de facto FCR cutoff of 14. Failing the FCR was associated with elevated self-reported psy-
chiatric symptoms. Given that the link between invalid performance and emotional distress is poorly understood, further
research is needed to explore the underlying causal mechanisms.
Conict of Interest
None declared.
The authors would like to thank Drs. Donders and Marshall for providing additional data on the clinical samples used in
their studies that were not included in the original publications.
American Psychiatric Association. (1996). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author.
An, K. Y., Kaploun, K., Erdodi, L. A., & Abeare, C. A. (2017). Performance validity in undergraduate research participants: A comparison of failure rates
across tests and cutoffs. The Clinical Neuropsychologist,31, 193206. doi:10.1080/13854046.2016.1217046.
An, K. Y., Zakzanis, K. K., & Joordens, S. (2012). Conducting research with non-clinical healthy undergraduates: Does effort play a role in neuropsychologi-
cal test performance? Archives of Clinical Neuropsychology,27, 849857.
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., et al. (2005). Sensitivity and specicity of nger tapping test scores for the detection of suspect
effort. The Clinical Neuropsychologist,19, 105120.
Ashendorf, L., Constantinou, M., & McCaffrey, R. J. (2004). The effect of depression and anxiety on the TOMM in community-dwelling older adults.
Archives of Clinical Neuropsychology,19, 125130.
Axelrod, B. N., Fichteberg, N. L., Millis, S. R., & Wertheimer, J. C. (2006). Detecting incomplete effort with Digit Span from the Wechsler Adult
Intelligence ScaleThird Edition. The Clinical Neuropsychologist,10, 513523.
Baker, K. A., Schmidt, M. F., Heinemann, A. W., Langley, M., & Miranti, S. V. (1998). The validity of the Katz Adjustment Scale among people with trau-
matic brain injury. Rehabilitation Psychology,43,3040.
Baldo, J. V., Delis, D., Kramer, J., & Shimamura, A. (2002). Memory performance on the California Verbal Learning Test-II: Findings from patients with
focal frontal lesions. Journal of the International Neuropsychological Society,8, 539546.
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., & McCaffrey, R. J. (2005). An examination of the California Verbal Learning Test II to detect incom-
plete effort in a traumatic brain injury sample. Applied Neuropsychology,12, 202207.
Bearden, C. E., Glahn, D. C., Monkul, E. S., Barrett, J., Najt, P., Villarreal, V., et al. (2006). Patterns of memory impairment in bipolar disorder and unipolar
major depression. Psychiatry Research,142, 139150.
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Beck Depression Inventory (2nd ed.). San Antonio, TX: Psychological Corporation.
Bigler, E. D. (2012). Symptom validity testing, effort and neuropsychological assessment. Journal of the International Neuropsychological Society,18,
Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity and performance validity testing. Brain Imaging and Behavior,9, 421444.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology. New York: Guilford.
13L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
Boone, K. B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical
Neuropsychologist,23, 729741.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E., Victor, T. L., et al. (2010). Examination of various WMS-III logical memory scores in
the assessment of response bias. The Clinical Neuropsychologist,24, 344357.
Carone, D. A. (2008). Children with moderate/severe brain damage/dysfunction outperform adults with mild-to-no brain damage on the Medical Symptom
Validity Test. Brain Injury,22, 960971.
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini, K. J., Boone, K. B., Kirkwood, M. W., et al. (2015). Ofcial position of the American
Academy of Clinical Neuropsychology Social Security Administration policy on validity testing: Guidance and recommendations for change. The Clinical
Neuropsychologist,29, 723740.
Connor, D. J., Drake, A. I., Bondi, M. W., & Delis, D. C. (1997). Detection of feigned cognitive impairments in patients with a history of mild to severe
closed head injury. Paper presented at the American Academy of Neurology, Boston.
Clark, L. R., Stricker, N. H., Libon, D. J., Delano-Wood, L., Salmon, D. P., Delis, D. C., et al. (2012). Yes/No versus forced-choice recognition memoryin
mild cognitive impairment and alzheimers disease: Patterns of impairment and associations with dementia severity. The Clinical Neuropsychologist,16,
Coleman, R. D., Rapport, L. J., Millis, S. R., Ricker, J. H., & Farchione, T. J. (1998). Effects of coaching on the California Verbal Learning Test. Journal of
Clinical and Experimental Neuropsychology,20, 201210.
Considine, C., Weisenbach, S. L., Walker, S. J., McFadden, E. M., Franti, L. M., Bieliauskas, L. A., et al. (2011). Auditory memory decrements, without dis-
simulation, among patients with major depressive disorder. Archives of Clinical Neuropsychology,26, 445453.
Constantinou, M., Bauer, L., Ashendorf, L., Fisher, J. M., & McCaffrey, R. J. (2005). Is poor performance on recognition memory effort measures indicative
of generalized poor performance on neuropsychological tests? Archives of Clinical Neuropsychology,20, 191198.
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (2000). ). California Verbal Learning TestSecond edition. San Antonio, TX: Psychological Corporation.
Delis, D., & Wetter, S. R. (2007). Cogniform disorder and cogniform condition: Proposed diagnoses for excessive cognitive symptoms. Archives of Clinical
Neuropsychology,22, 589604.
Derogatis, L. R. (1994). SCL-90-R: Administration, scoring, and procedures manual (3rd ed.). Minneaplois, MN: National Computer Systems.
Donders, J., & Strong, C. A. (2011). Embedded effort indicators on the California Verbal Learning TestSecond Edition: An attempted cross-validation. The
Clinical Neuropsychologist,25, 173184.
Egeland, J., Lund, A., Landro, N. I., Rund, B. R., Sudet, K., Asbjornsen, A., et al. (2005). Cortisol level predicts executive and memory function in depres-
sion, symptom level predicts psychomotor speed. Acta Psychiatrica Scandinavica,112, 434441.
Erdodi, L. A. (2017). Aggregating validity indicators: The salience of domain specicity and the indeterminate range in multivariate models of performance
validity assessment. Applied Neuropsychology: Adult. doi: 10.1080/23279095.2017.1384925Advance online publication.
Erdodi, L. A., & Rai, J. K. (2017). A single error is one too many: Examining alternative cutoffs on Trial 2 on the TOMM. Brain Injury,31, 13621368.
doi: 10.1080/02699052.2017.1332386.
Erdodi, L. A., Kirsch, N. L., Lajiness-ONeill, R., Vingilis, E., & Medoff, B. (2014). Comparing the Recognition Memory Test and the Word Choice Test in
a mixed clinical sample: Are they equivalent? Psychological Injury and Law,7, 255263. doi:10.1007/s12207-014-9197-8.
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-ONeill, R., & Medoff, B. (2014). Aggregating validity indicators embedded in ConnersCPT-II outper-
forms individual cutoffs at separating valid from invalid performance in adults with traumatic brain injury. Archives of Clinical Neuropsychology,29,
Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2017). The Stroop Test as a measure of performance validity in adults clin-
ically referred for neuropsychological assessment. Psychological Assessment. doi:10.1037/pas0000525.
Erdodi, L. A., Seke, K. R., Shahein, A., Tyson, B. T., Sagar, S., & Roth, R. M. (2017). Low scores on the Grooved Pegboard Test are associated with invalid
responding and psychiatric symptoms. Psychology and Neuroscience,10, 325344. doi: 10.1037/pne0000103.
Erdodi, L. A., Tyson, B., Abeare, T., Lichtenstein, C. A., Pelletier, J. D., Rai, C. L., et al. (2016). The BDAE Complex Ideational MaterialA measure of
receptive language or performance validity? Psychological Injury and Law,9, 112120. doi: 10.1007/s12207-016-9254-6.
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Zuccato, B. G., Rai, J. K., Seke, K. R., et al. (2017). Utility of critical items within the Recognition Memory Test
and Word Choice Test. Advance online publication. Applied Neuropsychology: Adult. doi:10.1080/23279095.2017.1298600.
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D., Abeare, C. A., Pelletiere, C. L., et al. (2017). The power of timing: Adding a time-to-completion
cutoff to the Word Choice Test and Recognition Memory Test improves classication accuracy. Journal of Clinical and Experimental Neuropsychology,
39, 369383. doi:10.1080/13803395.2016.1230181.
Frederick, R. I. (2003). VIP: Validity indicator prole. Manual (2nd ed.). Minneapolis, MN: NCS Pearson.
Grant I., & Adams K. M. (Eds.) (1996). Neuropsychological assessment of neuropsychiatric disorders. New York: Oxford University Press.
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on psychometric indicators of negative response bias: A methodological commentary with
recommendations. Archives of Clinical Neuropsychology,19, 533541.
Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J., & Crouch, J. A. (2002). Detecting malingered neurocognitive dysfunction with the Wisconsin
Card Sorting Test: A preliminary investigation in traumatic brain injury. The Clinical Neuropsychologist,16, 179191.
Green, P. (2003). Greens Word Memory Test. Edmonton, Canada: Greens Publishing.
Green, P. (2007). Spoiled for choice: Making comparisons between forced-choice effort tests. In Boone K. B. (Ed.), Assessment of feigned cognitive
impairment (pp. 5077). New York, NY: Guilford.
Green, P., Iverson, G., & Allen, L. (1999). Detecting malingering in head injury litigation with the Word Memory Test. Brain Injury,13, 813819.
Green, P., Flaro, L., & Courtney, J. (2009). Examining false positives on the word memory test in adults with mild traumatic brain injury. Brain Injury,23,
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesic measures with a large clinical sample. Psychological Assessment,6,
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, L. (2004). Revised comprehensive norms for an expanded Halstead-Reitan battery: Demographically
adjusted neuropsychological norms for African American and Caucasian adults. Lutz, Fla.: PAR.
14 L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S. R. (2009). American Academy of Neuropsychology Consensus Conference
Statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist,23, 10931129.
Hooen, D., Barak, O., Vakil, E., & Gilboa, A. (2005). Symptom Checklist 90 Revised scores in persons with traumatic brain injury: Affective reactions of
neurobehavioral outcomes of the injury? Applied Neuropsychology,12,3039.
Ilsley, J. E., Moffoot, A. P. R., & OCarroll, R. E. (1995). An analysis of memory dysfunction in major depression. Journal of Affective Disorders,35 (1-2), 19.
Iverson, G. L. (2003). Detecting malingering in civil forensic evaluations. In Horton A. M., & Hartlage L. C. (Eds.), Handbook of forensic neuropsychology
(pp. 137177). New York: Springer Publishing Company.
Keiski, M. A., Shore, D. L., & Hamilton, J. M. (2007). The role of depression in verbal memory following traumatic brain injury. The Clinical
Neuropsychologist,21, 744761.
Kessels, R. P. C., Ruis, C., & Kappelle, L. J. (2007). The impact of self-reported depressive symptoms on memory function in neurological outpatients.
Clinical Neurology and Neurosurgery,109, 323326.
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., Pancholi, S., Bhagwat, A., et al. (2013). Clinical utility of the ConnersContinuous Performance
Test-II to detect poor effort in U.S. military personnel following traumatic brain injury. Psychological Assessment,25, 339352.
Langenecker, S. A., Bieliauskas, L. A., Rapport, L. J., Zubieta, J. K., Wilde, E. A., & Berent, S. (2005). Face emotion perception and executive functioning
decits in depression. Journal of Clinical and Experimental Psychology,27, 320333.
Larrabee, G. J. (2003). Detection of malingering using atypical performance on standard neuropsychological tests. The Clinical Neuropsychologist,17,
Larrabee, G. J. (2012). Assessment of malingering. In Larrabee G. J. (Ed.), Forensic neuropsychology: A scientic approach. NY: Oxford University Press.
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap between neurocognitive processing theory and performance validity assessment among
the cognitively impaired: A review and methodological approach. Journal of the International Neuropsychological Society,20, 873886. doi:10.1017/
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017). Introducing a forced-choice recognition task to the California Verbal Learning TestChildrens
Version. Child Neuropsychology,23, 284299. doi:10.1080/09297049.2015.1135422.
Marschark, M., Richtsmeier, L. M., Richardson, J. T. E., Crovitz, H. F., & Henry, J. (2000). Intellectual and emotional functioning in college students follow-
ing mild traumatic brain injury in childhood and adolescence. Journal of Head Trauma Rehabilitation,15, 12271245.
Marshall, P., & Happe, M. (2007). The performance of individuals with mental retardation on cognitive tests assessing effort and motivation. The Clinical
Neuropsychologist,21, 826840.
Martens, M., Donders, J., & Millis, S. R. (2001). Evaluation of invalid response sets after traumatic head injury. Journal of Forensic Neuropsychology,2(1),
Moore, B. A., & Donders, J. (2004). Predictors of invalid neuropsychological test performance after traumatic brain injury. Brain Injury,18, 975984.
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J. (2010). Detection of malingering in mild traumatic brain injury with the ConnersContinuous
Performance Test-II. Journal of Clinical and Experimental Neuropsychology,32, 380387.
Palav, A., Ortega, A., & McCaffrey, R. J. (2001). Incremental validity of the MMPI-2 content scales: A preliminary study with brain-injured patients. Journal
of Head Trauma Rehabilitation,16, 275283.
Pearson (2009). Advanced clinical solutions for the WAIS-IV and WMS-IVTechnical manual. San Antonio, TX: Author.
Perneger, T. V. (1998). Whats wrong with Bonferroni adjustments. BMJ (Clinical research ed.),316, 12361238.
Raskin, S. A., Mateer, C. A., & Tweeten, R. (1998). Neuropsychological assessment of individuals with mild traumatic brain injury. The Clinical
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the Test of Memory Malingering. Archives of Clinical Neuropsychology,16, 501506.
Reese, C. S., Suhr, J. A., & Riddle, T. L. (2012). Exploration of malingering indices in the Wechsler Adult Intelligence ScaleFourth Edition Digit Span
subtest. Archives of Clinical Neuropsychology,27, 176181.
Reznek, L. (2005). The Rey 15-item memory test for malingering: A meta-analysis. Brain Injury,19, 539543. doi:10.1080/02699050400005242.
Rohling, M. L., Green, P., Allen, L. M., & Iverson, G. L. (2002). Depressive symptoms and neurocognitive test scores in patients passing symptom validity
tests. Archives of Clinical Neuropsychology,17, 205222.
Root, J. C., Robbins, R. N., Chang, L., & van Gorp, W. (2006). Detection of inadequate effort on the California Verbal Learning Test-Second edition: Forced
choice recognition and critical item analysis. Journal of the International Neuropsychological Society,12, 688696.
Ross, T. P., Poston, A. M., Rein, P. A., Salvatore, A. N., Wills, N. L., & York, T. M. (2016). Performance invalidity base rates among healthy undergraduate
research participants. Archives of Clinical Neuropsychology,31,97104.
Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology (Cambridge, Mass.),1,4346.
Santos, O. A., Kazakov, D., Reamer, M. K., Park, S. E., & Osmon, D. C. (2014). Effort in college undergraduate is sufcient on the Word Memory Test.
Archives of Clinical Neuropsychology,29, 609613.
Schwartz, E. S., Erdodi, L., Rodriguez, N., Jyotsna, J. G., Curtain, J. R., Flashman, L. A., et al. (2016). CVLT-II forced choice recognition trial as an embed-
ded validity indicator: A systematic review of the evidence. Journal of the International Neuropsychological Society,22, 851858. doi:10.1017/
Slick, D. J., Sherman, E. M. S., Grant, L., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clin-
ical practice and research. The Clinical Neuropsychologist,13, 545561.
Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine, B., Pangilinan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable digit span is no more accu-
rate than age corrected scaled score as an indicator of invalid performance in a veteran sample undergoing evaluation for mTBI. The Clinical
Neuropsychologist,27, 13621372.
Sprinkle, S. D., Lurie, D., Insko, S. L., Atkinson, G., Jones, G. L., Logan, A. R., et al. (2002). Criterion validity, severity cut scores, and test-retest reliability
of the Beck Depression Inventory-II in a university counseling center sample. Journal of Counseling Psychology,49, 381.
Storch, E. A., Roberti, J. W., & Roth, D. A. (2004). Factor structure, concurrent validity, and internal consistency of the Beck Depression InventorySecond
edition in a sample of college students. Depression and Anxiety,19, 187189.
15L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin Card Sorting Test in the detection of malingering in student simulator and patient samples. Journal of
Clinical and Experimental Psychology,21, 701708. doi:10.1076/jcen.21.5.701.868.
Suhr, J., Tranel, D., Wefel, J., & Barrash, J. (1997). Memory performance after head injury: Contributions of malingering, litigation status, psychological fac-
tors, and medication use. Journal of Clinical and Experimental Psychology,19, 500514.
Sugarman, M. A., & Axelrod, B. N. (2014). Embedded measures of performance validity using verbal uency tests in a clinical sample. Applied
Neuropsychology: Adult. DOI:10.1080/23279095.2013.873439.
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of performance validity using verbal uency tests in a clinical sample. Applied
Neuropsychology: Adult,22, 141146.
Sweet, J. J., Goldman, D. J., & Guidotti Breting, L. M. (2013). Traumatic brain injury: Guidance in a forensic context from outcome, dose-response, and
response bias research. Behavioral Sciences and the Law,31, 756778.
Tombaugh, T. N. (1996). Test of Memory Malingering. New York: Multi-Health Systems.
Trueblood, W. (1994). Qualitative and quantitative characteristics of malingered and other invalid WAIS-R and clinical memory data. Journal of Clinical and
Experimental Neuropsychology,14, 597607.
van Dyke, S. A., Millis, S. R., Axelrod, B. N., & Hanks, R. A. (2013). Assessing effort: Differentiating performance and symptom validity. The Clinical
Neuropsychologist,27, 12341246.
Westcott, M. C., & Alfano, D. P. (2005). The Symptom Checklist-90-Revised and mild traumatic brain injury. Brain Injury,19, 12611267.
Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., & Sweet, J. J. (2010). Effort indicators within the California Verbal Learning Test-II
(CVLT-II). The Clinical Neuropsychologist,24, 153168.
16 L.A. Erdodi et al. / Archives of Clinical Neuropsychology (2017); 116
Downloaded from
by University of Windsor Paul Martin Law Library,
on 27 December 2017
... FCR has been deemed highly specific and moderately sensitive to invalid responses in nonimpaired individuals (Schwartz et al., 2016), and has reasonable concurrent validity with other measures, such as the Test of Memory Malingering . It has also been employed in a brain injured sample, with results suggesting that even a single error on FCR is sufficient to identify invalid responses (Erdodi et al., 2018). The sole piece of the literature identified about the CVLT-II-SF FCR in dementia suggests a similar conclusion. ...
... In the literature, there have been several reports of appropriate cut scores for performance validity determination using FCR. For example, Erdodi et al. (2018) reported a potentially useful cutoff of one error on the CVLT-II FCR in a TBI sample, finding it unrelated to TBI-sensitive measures. A similar finding was reported by Fogel et al. (2013) in their dementia sample. ...
... They reported a potential cutoff of 8/9 for the CVLT-II-SF FCR as a measure of performance validity in a dementia sample, while acknowledging the significant relationship between performance validity test performance and cognition that has been replicated here and elsewhere (e.g., Burton et al., 2015;Dean et al., 2009). Given that the present mean CVLT-II-SF FCR value for those diagnosed with AD was 7.96 (SD ¼ 0.12), it may not be appropriate to ascertain poor performance validity based on existing suggestions for single-error cutoff scores (Erdodi et al., 2018;Fogel et al., 2013) in dementia. It is still possible that the proposed literature-based cutoff of one error is adequate for discerning performance validity (aligning with Schwartz et al. 2016) when combined with additional measures of performance validity and clinical judgment. ...
Full-text available
Performance validity tests are susceptible to false positives from genuine cognitive impairment (e.g., dementia); this has not been explored with the short form of the California Verbal Learning Test II (CVLT-II-SF). In a memory clinic sample, we examined whether CVLT-II-SF Forced Choice Recognition (FCR) scores differed across diagnostic groups, and how the severity of impairment [Clinical Dementia Rating Sum of Boxes (CDR-SOB) or Mini-Mental State Examination (MMSE)] modulated test performance. Three diagnostic groups were identified: subjective cognitive impairment (SCI; n = 85), amnestic mild cognitive impairment (a-MCI; n = 17), and dementia due to Alzheimer's Disease (AD; n = 50). Significant group differences in FCR were observed using one-way ANOVA; post-hoc analysis indicated the AD group performed significantly worse than the other groups. Using multiple regression, FCR performance was modeled as a function of the diagnostic group, severity (MMSE or CDR-SOB), and their interaction. Results yielded significant main effects for MMSE and diagnostic group, with a significant interaction. CDR-SOB analyses were non-significant. Increases in impairment disproportionately impacted FCR performance for persons with AD, adding caution to research-based cutoffs for performance validity in dementia. Caution is warranted when assessing performance validity in dementia populations. Future research should examine whether CVLT-II-SF-FCR is appropriately specific for best-practice testing batteries for dementia.
... The cognitive impairment group obtained a mean score of 43.9 (5.3) for trial 1 and a mean score of 48.6 (3.1) for trial 2, suggesting that performance on the TOMM is very resistant to different types of severe cognitive impairment (Tombaugh, 1996). While it is traditional to use the cutoff score of <45 on trial 2 as suggested in the manual, this study also examined alternative cutoffs for trial 2 based on existing literature (e.g., Erdodi et al., 2018;Martin et al., 2020) in addition to examining trial 1 data as part of secondary analyses. ...
... Table 5 contains the number of individuals who performed at various cutoffs on the CVLT-FC at three time points. Cutoff scores were determined using existing literature (e.g., Erdodi et al., 2018;Schwartz et al., 2016). One cutoff was based on a systematic review conducted by Schwartz and colleagues (2016) in which they applied a cutoff of 14 on the forced choice trial (sensitivity 50% and specificity 93%). ...
... One cutoff was based on a systematic review conducted by Schwartz and colleagues (2016) in which they applied a cutoff of 14 on the forced choice trial (sensitivity 50% and specificity 93%). Another, more stringent cutoff, was based on a study conducted with a group of mixed TBI individuals (75% mTBI) (Erdodi et al., 2018). In that study, they used a cutoff of 15 on the forced choice trial (sensitivity 56% and specificity 92%). ...
Full-text available
Objective Assessing performance validity is imperative in both clinical and research contexts as data interpretation presupposes adequate participation from examinees. Performance validity tests (PVTs) are utilized to identify instances in which results cannot be interpreted at face value. This study explored the hit rates for two frequently used PVTs in a research sample of individuals with and without histories of bipolar disorder (BD). Method As part of an ongoing longitudinal study of individuals with BD, we examined the performance of 736 individuals with BD and 255 individuals with no history of mental health disorder on the Test of Memory Malingering (TOMM) and the California Verbal Learning Test forced choice trial (CVLT-FC) at three time points. Results Undiagnosed individuals demonstrated 100% pass rate on PVTs and individuals with BD passed over 98% of the time. A mixed effects model adjusting for relevant demographic variables revealed no significant difference in TOMM scores between the groups, a = .07, SE = .07, p = .31. On the CVLT-FC, no clinically significant differences were observed ( ps < .001). Conclusions Perfect PVT scores were obtained by the majority of individuals, with no differences in failure rates between groups. The tests have approximately >98% specificity in BD and 100% specificity among non-diagnosed individuals. Further, nearly 90% of individuals with BD obtained perfect scores on both measures, a trend observed at each time point.
... During the past few decades, many embedded and freestanding PVTs have been developed, validated, researched, and adapted for use in the forensic context (Boone, 2013;Rogers & Bender, 2018). Well-known examples of PVTs are the Test of Memory Malingering (TOMM; Tombaugh, 1996) and Word Memory Test (WMT; Green et al., 1996), among free-standing measures, and the Reliable Digit Span (RDS) of the Wechsler Adult Intelligence Scale (e.g., Axelrod et al., 2006;Babikian et al., 2006;Erdodi & Abeare, 2020;Greiffenstein et al., 1994;Reese et al., 2012) and Forced Choice Recognition Trial of the California Verbal Learning Test (Erdodi et al., 2018;Greve et al., 2009;Slick et al., 2000;Wolfe et al., 2010), among embedded measures. ...
... Moreover, from a practical perspective, assessing symptom and performance validity is notably different from assessing other constructs in medicine and neuropsychology (Chafetz, 2020). For example, NRB levels are likely to vary across different measures of NRB, because examinees often deliberately choose to restrict their NRB to a very limited number of domains of psychological functioning and do well in other domains (e.g., to appear cognitively impaired, someone might deliberately try to pretend to be unable to perform mathematical calculations but perform well on memory tasks) (Cottingham et al., 2014;Erdodi et al., 2018). From this standpoint, using SVTs that provide information about the same evaluee from different angles, e.g., by relying on different detection strategies or by focusing on multiple domains (such as somatic, cognitive, and psychiatric) might, therefore, be more beneficial than using SVTs that use the same detection strategy or focus on the same one symptom domain. ...
... That said, they should not be dismissed when there are significantly elevated results. These findings would need to be considered carefully for all possible interpretations, with the best one based on the overall pattern of information, data inconsistencies, and data gathered throughout the evaluation (Erdodi et al., 2018;Merten & Merckelbach, 2013;Young, 2019Young, , 2021. ...
Full-text available
In psychological injury and related forensic evaluations, two types of tests are commonly used to assess Negative Response Bias (NRB): Symptom Validity Tests (SVTs) and Performance Validity Tests (PVTs). SVTs assess the credibility of self-reported symptoms, whereas PVTs assess the credibility of observed performance on cognitive tasks. Compared to the large and ever-growing number of published PVTs, there are still relatively few validated self-report SVTs available to professionals for assessing symptom validity. In addition, while several studies have examined how to combine and integrate the results of multiple independent PVTs, there are few studies to date that have addressed the combination and integration of information obtained from multiple self-report SVTs. The Special Issue of Psychological Injury and Law introduced in this article aims to help fill these gaps in the literature by providing readers with detailed information about the convergent and incremental validity, strengths and weaknesses, and applicability of a number of selected measures of NRB under different conditions and in different assessment contexts. Each of the articles in this Special Issue focuses on a particular self-report SVT or set of SVTs and summarizes their conditions of use, strengths, weaknesses, and possible cut scores and relative hit rates. Here, we review the psychometric properties of the 19 selected SVTs and discuss their advantages and disadvantages. In addition, we make tentative proposals for the field to consider regarding the number of SVTs to be used in an assessment, the number of SVT failures required to invalidate test results, and the issue of redundancy when selecting multiple SVTs for an assessment.
... This is not standard practice, as typically studies will utilize another robust PVT as a means of calculating sensitivity and specificity values for another PVT such as the TOMM. While an embedded PVT is not as robust as a stand-alone measure, the CVLT-II is still a reasonable embedded test to use when calculating sensitivity and specificity, as its features are consistent with stand-alone PVTs (Erdodi et al., 2018). ...
... Therefore, only an embedded PVT was available for use. An embedded PVT is not as robust as a stand-alone measure; however, the CVLT-II is still a reasonable embedded test to use when calculating sensitivity and specificity, as its features are consistent with standalone PVTs, and was developed with the primary purpose of evaluating response validity (Bauer et al., 2005;Erdodi et al., 2018;Schwartz et al., 2016). ...
Full-text available
The accuracy of neuropsychological assessments relies on participants exhibiting their true abilities during administration. The Test of Memory Malingering (TOMM) is a popular performance validity test used to determine whether an individual is providing honest answers. While the TOMM has proven to be highly sensitive to those who are deliberately exaggerating their symptoms, there is a limited explanation regarding the significance of using 45 as a cutoff score. The present study aims to further investigate this question by examining TOMM scores obtained in a large sample of active-duty military personnel (N = 859, M = 26 years, SD = 6.14, 97.31% males, 72.44% white). Results indicated that no notable discrepancies existed between the frequency of participants who scored a 45 and those who scored slightly below a 45 on the TOMM. The sensitivity and specificity of the TOMM were derived using the forced-choice recognition (FCR) scores obtained by participants on the California Verbal Learning Test, Second Edition (CVLT-II). The sensitivity for each trial of the TOMM was 0.84, 0.55, and 0.63, respectively; the specificity for each trial of the TOMM was 0.69, 0.93, and 0.92, respectively. Because sensitivity and specificity rates are both of importance in this study, balanced accuracy scores were also reported. Results suggested that various alternative cutoff scores produced a more accurate classification compared to the traditional cutoff of 45. Further analyses using Fisher's exact test also indicated that there were no significant performance differences on the FCR of the CVLT-II between individuals who received a 44 and individuals who received a 45 on the TOMM. The current study provides evidence on why the traditional cutoff may not be the most effective score. Future research should consider employing alternative methods which do not rely on a single score.
... In contrast, malingering is often associated with indiscriminate symptom exaggeration and fabrication (American Psychiatric Association, 2013). Distinguishing between these two presentations is a challenging (Boone, 2017;Sullivan & King, 2010) yet consequential differential diagnosis that has a significant impact on patient outcome and resource allocation (Chafetz & Underhill, 2013;Erdodi, Abeare et al., 2018;. Therefore, the combination of the two samples provides a valuable comparison of the V-5's classification accuracy in two different signal detection environments: experimentally induced gross symptom exaggeration in the absence of genuine psychopathology versus no incentive to over-report in the context of possible genuine psychopathology. ...
Full-text available
To examine the potential of the Five-Variable Psychiatric Screener (V-5) to serve as an embedded symptom validity test (SVT). In Study 1, 43 undergraduate students were randomly assigned to a control or an experimental malingering condition. In Study 2, 150 undergraduate students were recruited to examine the cognitive and emotional sequelae of self-reported trauma history. The classification accuracy of the V-5 was computed against the Inventory of Problems (IOP-29), a free-standing SVT. In Study 1, the V-5 was a poor predictor of experimental malingering status, but produced a high overall classification against the IOP-29. In Study 2, the V-5 was a stronger predictor of IOP-29 than self-reported trauma history. Results provide preliminary support for the utility of the V-5 as an embedded SVT. Given the combination of growing awareness of the need to determine the credibility of subjective symptom report using objective empirical methods and systemic pressures to abbreviate assessment, research on SVTs within rapid assessment instruments can provide practical psychometric solutions to this dilemma.
Full-text available
This study was designed to determine the clinical utility of embedded performance validity indicators (EVIs) in adults with intellectual disability (ID) during neuropsychological assessment. Based on previous research, unacceptably high (>16%) base rates of failure (BRFail) were predicted on EVIs using on the method of threshold, but not on EVIs based on alternative detection methods. A comprehensive battery of neuropsychological tests was administered to 23 adults with ID (MAge = 37.7 years, MFSIQ = 64.9). BRFail were computed at two levels of cut-offs for 32 EVIs. Patients produced very high BRFail on 22 EVIs (18.2%-100%), indicating unacceptable levels of false positive errors. However, on the remaining ten EVIs BRFail was <16%. Moreover, six of the EVIs had a zero BRFail, indicating perfect specificity. Consistent with previous research, individuals with ID failed the majority of EVIs at high BRFail. However, they produced BRFail similar to cognitively higher functioning patients on select EVIs based on recognition memory and unusual patterns of performance, suggesting that the high BRFail reported in the literature may reflect instrumentation artefacts. The implications of these findings for clinical and forensic assessment are discussed.
Full-text available
This study was designed to compare the validity of the Inventory of Problems (IOP-29) and its newly developed memory module (IOP-M) in 150 patients clinically referred for neuropsychological assessment. Criterion groups were psychometrically derived based on established performance and symptom validity tests (PVTs and SVTs). The criterion-related validity of the IOP-29 was compared to that of the Negative Impression Management scale of the Personality Assessment Inventory (NIMPAI) and the criterion-related validity of the IOP-M was compared to that of Trial-1 on the Test of Memory Malingering (TOMM-1). The IOP-29 correlated significantly more strongly (z = 2.50, p = .01) with criterion PVTs than the NIMPAI (rIOP-29 = .34; rNIM-PAI = .06), generating similar overall correct classification values (OCCIOP-29: 79–81%; OCCNIM-PAI: 71–79%). Similarly, the IOP-M correlated significantly more strongly (z = 2.26, p = .02) with criterion PVTs than the TOMM-1 (rIOP-M = .79; rTOMM-1 = .59), generating similar overall correct classification values (OCCIOP-M: 89–91%; OCCTOMM-1: 84–86%). Findings converge with the cumulative evidence that the IOP-29 and IOP-M are valuable additions to comprehensive neuropsychological batteries. Results also confirm that symptom and performance validity are distinct clinical constructs, and domain specificity should be considered while calibrating instruments.
Full-text available
Objective: This study was designed to replicate previous research on critical item analysis within the Word Choice Test (WCT). Method: Archival data were collected from a mixed clinical sample of 119 consecutively referred adults (Mage = 51.7, Meducation = 14.7). The classification accuracy of the WCT was calculated against psychometrically defined criterion groups. Results: Critical item analysis identified an additional 2%-5% of the sample that passed traditional cutoffs as noncredible. Passing critical items after failing traditional cutoffs was associated with weaker independent evidence of invalid performance, alerting the assessor to the elevated risk for false positives. Failing critical items in addition to failing select traditional cutoffs increased overall specificity. Non-White patients were 2.5 to 3.5 times more likely to Fail traditional WCT cutoffs, but select critical item cutoffs limited the risk to 1.5-2. Conclusions: Results confirmed the clinical utility of critical item analysis. Although the improvement in sensitivity was modest, critical items were effective at containing false positive errors in general, and especially in racially diverse patients. Critical item analysis appears to be a cost-effective and equitable method to improve an instrument's classification accuracy. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Objective: We aimed to assess the utility of a novel and easy-to-administer performance validity test (PVT), the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) Forced Choice recognition trial (RFC), compared with previously developed RBANS PVTs. Method: We compared the RFC with the RBANS Effort Index (EI) and Effort Scale (ES) in a sample of 62 non-litigating older adults (mean age = 74 years, 52% female) with varying levels of cognitive impairment. Results: A significantly greater proportion of the sample met EI criteria for non-credible performance (EI > 2; 31%) compared with RFC criteria (RFC < 9; 15%). Among participants with Major Neurocognitive Disorder, 60% met EI criteria for non-credible performance, 32% met ES criteria (ES > 12), and 24% met RFC criteria. Conclusions: The RFC may have greater specificity than other RBANS PVTs among individuals with more severe cognitive impairment. Further research is needed to establish the classification accuracy of the RFC for assessing performance validity.
Objective: We sought to determine the utility of a new performance validity index that was recently proposed. In particular, we wanted to determine if this index would be associated with a specificity of at least .90, a sensitivity of at least .40, and an Area Under the Curve of at least .70 in a traumatic brain injury (TBI) sample. Method: We used logistic regression to investigate how well this new index could distinguish persons with TBI (n = 148) who were evaluated within 1-36 months after injury. All participants had been classified on the basis of at least two independent performance validity tests as having provided valid performance (n = 128) or invalid performance (n = 20). Results: The new performance validity index had acceptable specificity (.96) but had suboptimal sensitivity (.35) and Area Under the Curve (.66). It was concerning that almost half (5/12) of the cases that were identified by this index as providing invalid effort were false positives. Although a slightly more liberal cut-off improved sensitivity, the problem with poor positive predictive power remained. The conventional Forced Choice index had relatively better classification accuracy. Conclusion: Differences in base rates between the original sample of Martin et al. and the current one most likely affected positive predictive power of the new index. Although their performance validity has excellent specificity, the current results do not support the application of this index in the clinical evaluation of patients with traumatic brain injury when base rates of invalid performance differ markedly from those in the original study.
Full-text available
This study was designed to develop performance validity indicators embedded within the Delis-Kaplan Executive Function Systems (D-KEFS) version of the Stroop task. Archival data from a mixed clinical sample of 132 patients (50% male; MAge= 43.4; MEducation= 14.1) clinically referred for neuropsychological assessment were analyzed. Criterion measures included the Warrington Recognition Memory Test-Words and 2 composites based on several independent validity indicators. An age-corrected scaled score ≤6 on any of the 4 trials reliably differentiated psychometrically defined credible and noncredible response sets with high specificity (.87-.94) and variable sensitivity (.34-.71). An inverted Stroop effect was less sensitive (.14-.29), but comparably specific (.85-90) to invalid performance. Aggregating the newly developed D-KEFS Stroop validity indicators further improved classification accuracy. Failing the validity cutoffs was unrelated to self-reported depression or anxiety. However, it was associated with elevated somatic symptom report. In addition to processing speed and executive function, the D-KEFS version of the Stroop task can function as a measure of performance validity. A multivariate approach to performance validity assessment is generally superior to univariate models. (PsycINFO Database Record
Full-text available
This study was designed to examine the “domain specificity” hypothesis in performance validity tests (PVTs) and the epistemological status of an “indeterminate range” when evaluating the credibility of a neuropsychological profile using a multivariate model of performance validity assessment. While previous research suggests that aggregating PVTs produces superior classification accuracy compared to individual instruments, the effect of the congruence between the criterion and predictor variable on signal detection and the issue of classifying borderline cases remain understudied. Data from a mixed clinical sample of 234 adults referred for cognitive evaluation (MAge ¼ 46.6; MEducation ¼ 13.5) were collected. Two validity composites were created: one based on five verbal PVTs (EI-5VER) and one based on five nonverbal PVTs (EI-5NV) and compared against several other PVTs. Overall, language-based tests of cognitive ability were more sensitive to elevations on the EI-5VER compared to visual-perceptual tests; whereas, the opposite was observed with the EI-5NV. However, the match between predictor and criterion variable had a more complex relationship with classification accuracy, suggesting the confluence of multiple factors (sensory modality, cognitive domain, testing paradigm). An “indeterminate range” of performance validity emerged that was distinctly different from both the Pass and the Fail group. Trichotomized criterion PVTs (Pass-Borderline-Fail) had a negative linear relationship with performance on tests of cognitive ability, providing further support for an “in-between” category separating the unequivocal Pass and unequivocal Fail classification range. The choice of criterion variable can influence classification accuracy in PVT research. Establishing a Borderline range between Pass and Fail more accurately reflected the distribution of scores on multiple PVTs. The traditional binary classification system imposes an artificial dichotomy on PVTs that was not fully supported by the data. Accepting “indeterminate” as a legitimate third outcome of performance validity assessment has the potential to improve the clinical utility of PVTs and defuse debates regarding “near-Passes” and “soft Fails.”
Full-text available
A link between noncredible responding and low scores on the Grooved Pegboard Test (GPB) is well documented in the clinical literature. However, no specific validity cutoffs have emerged in previous research. This study was designed to examine the classification accuracy of various demographically adjusted cutoffs on the GPB against established measures of performance validity. Analyses were based on a mixed clinical sample of 190 patients (52.1% female) medically referred for neuropsychological assessment. Mean age of participants was 44.1 years, with a mean education of 13.9 years. Criterion measures were the Recognition Memory Test and 3 composites based on several embedded validity indicators. A T score ≤29 for either hand or ≤31 on both hands were reliable markers of invalid performance (sensitivity ≤ .29 -.63; specificity = .85-.91). Ipsative analyses revealed that these T score-based cutoffs have zero false positive rates. Failing these cutoffs had no consistent relationship with overall cognitive functioning. A moderate relationship between GPB failure and self-reported anxiety and depression emerged on face-valid screening measures. There was also a moderate relationship between GPB failure and Personality Assessment Inventory scales measuring somatic complaints, borderline traits, antisocial features, and substance use. The newly introduced GPB validity cutoffs were effective at separating credible and noncredible performance on neuropsychological testing. The complex relationship between failing the GPB and emotional problems is consistent with the psychogenic interference hypothesis. It may provide insight into the mechanism behind invalid responding and thus warrants further investigation.
Full-text available
Objective: This study investigated the potential of alternative, more liberal cutoffs on Trial 2 of the Test of Memory Malingering (TOMM) to improve classification accuracy relative to the standard cutoffs (≤44). Method: The sample consisted of 152 patients (49.3% male) with psychiatric conditions (PSY) and traumatic brain injury (TBI) referred for neuropsychological assessment in a medico-legal setting (MAge = 44.4, MEducation = 11.9 years). Classification accuracy for various TOMM Trial 2 cutoffs was computed against three criterion measures. Results: Patients with TBI failed TOMM Trial 2 cutoffs at higher rates than patients with PSY. Trial 2 ≤49 achieved acceptable combinations of sensitivity (0.38–0.67) and specificity (0.89–0.96) in all but one comparison group. Trial 2 ≤48 improved specificity (0.94–0.98) with minimal loss in sensitivity. The standard cutoff (≤44) disproportionally traded sensitivity (0.15–0.50) for specificity (0.96–1.00).Conclusions: One error on TOMM Trial 2 constitutes sufficient evidence to question the credibility of a response set. However, the confidence in classifying a score as invalid continues to increase with each additional error. Even at the most liberal conceivable cutoff (≤49), the TOMM detected only about half of the patients who failed other criterion measures. Therefore, it should never be used in isolation to determine performance validity.
Full-text available
This study was designed to examine the clinical utility of critical items within the Recognition Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean age ¼ 45.3 years; mean level of education ¼ 13.9 years). The credibility of a given response set was psychometrically defined using three separate composite measures, each of which was based on multiple independent performance validity indicators. Critical items improved the classification accuracy of both tests. They increased sensitivity by correctly identifying an additional 2–17% of the invalid response sets that passed the traditional cutoffs based on total score. They also increased specificity by providing additional evidence of noncredible performance in response sets that failed the total score cutoff. The combination of failing the traditional cutoff, but passing critical items was associated with increased risk of misclassifying the response set as invalid. Critical item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items require no additional test material or administration time, but help reduce both false positive and false negative errors, they represent a versatile, valuable, and time-and cost-effective supplement to performance validity assessment.
Full-text available
The Stroop paradigm has many variants. Due to its potential to function as an embedded validity indicator, this study was designed to develop performance validity indicators embedded within the Delis-Kaplan Executive Function Systems (D-KEFS) version of the Stroop task.
Full-text available
Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
Full-text available
Objectives: The Forced Choice Recognition (FCR) trial of the California Verbal Learning Test, 2nd edition, was designed as an embedded performance validity test (PVT). To our knowledge, this is the first systematic review of classification accuracy against reference PVTs. Methods: Results from peer-reviewed studies with FCR data published since 2002 encompassing a variety of clinical, research, and forensic samples were summarized, including 37 studies with FCR failure rates (N=7575) and 17 with concordance rates with established PVTs (N=4432). Results: All healthy controls scored >14 on FCR. On average, 16.9% of the entire sample scored ≤14, while 25.9% failed reference PVTs. Presence or absence of external incentives to appear impaired (as identified by researchers) resulted in different failure rates (13.6% vs. 3.5%), as did failing or passing reference PVTs (49.0% vs. 6.4%). FCR ≤14 produced an overall classification accuracy of 72%, demonstrating higher specificity (.93) than sensitivity (.50) to invalid performance. Failure rates increased with the severity of cognitive impairment. Conclusions: In the absence of serious neurocognitive disorder, FCR ≤14 is highly specific, but only moderately sensitive to invalid responding. Passing FCR does not rule out a non-credible presentation, but failing FCR rules it in with high accuracy. The heterogeneity in sample characteristics and reference PVTs, as well as the quality of the criterion measure across studies, is a major limitation of this review and the basic methodology of PVT research in general. (JINS, 2016, 22, 851-858).
Full-text available
Objective: This study compared failure rates on performance validity tests (PVTs) across liberal and conservative cutoffs in a sample of undergraduate students participating in academic research. Method: Participants (n = 120) were administered four free-standing PVTs (Test of Memory Malingering, Word Memory Test, Rey 15-Item Test, Hiscock Forced-Choice Procedure) and three embedded PVTs (Digit Span, letter and category fluency). Participants also reported their perceived level of effort during testing. Results: At liberal cutoffs, 36.7% of the sample failed ≥1 PVTs, 6.7% failed ≥2, and .8% failed 3. At conservative cutoffs, 18.3% of the sample failed ≥1 PVTs, 2.5% failed ≥2, and .8% failed 3. Participants were 3 to 5 times more likely to fail embedded (15.8-30.8%) compared to free-standing PVTs (3.3-10.0%). There was no significant difference in failure rates between native and non-native English speaking participants at either liberal or conservative cutoffs. Additionally, there was no relation between self-reported effort and PVT failure rates. Conclusions: Although PVT failure rates varied as a function of PVTs and cutoffs, between a third and a fifth of the sample failed ≥1 PVTs, consistent with high initial estimates of invalid performance in this population. Embedded PVTs had notably higher failure rates than free-standing PVTs. Assuming optimal effort in research using students as participants without a formal assessment of performance validity introduces a potentially significant confound in the study design.
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.