ArticlePDF Available


This study was designed to examine the effect of language proficiency and level of verbal mediation on failure rates on performance validity tests (PVTs). PVTs with high and low verbal mediation were administered to 80 healthy community-dwelling English-Arabic bilinguals. Digit Span and Animal Fluency were administered in both English and Arabic, in counterbalanced order, as part of a brief battery of neuropsychological tests. Participants with Arabic as their dominant language were 2 to 16 times more likely to fail PVTs with high verbal mediation compared to native speakers of English. When Digit Span and Animal Fluency were administered in the nondominant language, participants were 2 to 18 times more likely to fail the validity cutoffs. Language dominance explained between 6 and 31% of variance in dichotomized outcomes (pass/fail) on PVTs with high verbal mediation. None of the participants failed any PVTs with low verbal mediation. Limited language proficiency may result in dramatic increases in false positive rates on PVTs with high verbal mediation. Failure on a PVT administered in English to an examinee with a different linguistic background should be interpreted with great caution.
Limited English Proficiency Increases Failure Rates
on Performance Validity Tests with High Verbal Mediation
Laszlo A. Erdodi
&Shayna Nussbaum
&Sanya Sagar
&Christopher A. Abeare
Eben S. Schwartz
Received: 11 October 2016 /Accepted: 18 January 2017
#Springer Science+Business Media New York 2017
Abstract This study was designed to examine the effect of
language proficiency and level of verbal mediation on failure
rates on performance validity tests (PVTs). PVTs with high
and low verbal mediation were administered to 80 healthy
community-dwelling English-Arabic bilinguals. Digit Span
and Animal Fluency were administered in both English and
Arabic, in counterbalanced order, as part of a brief battery of
neuropsychological tests. Participants with Arabic as their
dominant language were 2 to 16 times more likely to fail
PVTs with high verbal mediation compared to native speakers
of English. When Digit Span and Animal Fluency were ad-
ministered in the nondominant language, participants were 2
to 18 times more likely to fail the validity cutoffs. Language
dominance explained between 6 and 31% of variance in di-
chotomized outcomes (pass/fail) on PVTs with high verbal
mediation. None of the participants failed any PVTs with
low verbal mediation. Limited language proficiency may re-
sult in dramatic increases in false positive rates on PVTs with
high verbal mediation. Failure on a PVT administered in
English to an examinee with a different linguistic background
should be interpreted with great caution.
Keywords Cross-cultural neuropsychology .Performance
validity testing .Bilingualism .Word C h oice Te s t .Complex
Ideational Material
Consistently demonstrating ones highest, or at least typi-
cal, level of functioning is a basic assumption underlying
neuropsychological assessment. Bigler (2015) called this
assumption the BAchilles heel^of cognitive testing, which
is a fitting metaphor that acknowledges a significant vul-
nerability in an otherwise strong design. This potential
threat to the clinical utility of test data has become a source
of ongoing controversy that has polarized the profession.
On the one hand, there is a growing consensus that the
credibility of test scores cannot be assumed but must be
evaluated using objective, empirical measures (Bush et al.
2014; Heilbronner et al. 2009). On the other hand, con-
cerns about the clinical and forensic interpretation of per-
formance validity tests (PVTs) have been accumulating.
These concerns include the high cost of false positive
errors, lack of a gold standard measure, unclear clinical
interpretation of scores in the failing range (i.e., below-
cutoff performance may indicate malingering, low moti-
vation or the expression of a disease process), poorly un-
derstood relationship between PVTs and neuroanatomy/
neurophysiology, and genuine, severe neurocognitive im-
pairment as a confound (Bigler 2012,2015; Boone 2013).
Demographic variables like age (Lichtenstein et al. 2017;
Lichtenstein et al. 2014) and education (Pearson 2009)
have also been reported to influence base rates of failure
on PVTs. In addition, Leighton et al. (2014) pointed out
that the effect of the variability in testing paradigm, sen-
sory modality, or other stimulus properties of PVTs on the
probability of failure has not been studied systematically.
Native level English proficiency is another, less commonly
examined assumption in neuropsychology. Most tests were de-
veloped for and normed on native speakers of English (NSE).
The extent to which these tests provide a valid measure of
*Laszlo A. Erdodi
Department of Psychology, University of Windsor, 168 Chrysler Hall
South, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada
Waukesha Memorial Hospital, Waukesha, WI, USA
Psychol. Inj. and Law
DOI 10.1007/s12207-017-9282-x
cognitive ability in individuals with limited English proficiency
(LEP) was largely unknown until recent investigations. On
purely rational grounds, LEP is expected to deflate perfor-
mance primarily on tests with high verbal mediation (i.e., tasks
for which being NSE is a fundamental requirement for exam-
inees to demonstrate their true ability level on cognitive tests).
Surprisingly, some early studies were inconsistent with
this hypothesis. Coffey et al. (2005) found that level of
acculturation was significantly related to scores on the
Wisconsin Card Sorting Testan instrument that, at face
value, appears to have low verbal mediation. Although
acculturation is not synonymous with level of English pro-
ficiency, the two constructs are highly related. Their com-
munity sample of Mexican Americans performed poorly
on all nine key variables examined (medium to very large
effects) compared to the English norms. At the same time,
no differences were found compared to the Spanish norms.
Therefore, the authors concluded that the Wisconsin Card
Sorting Test was not a culture-free measure.
Subsequent studies, however, found that the deleterious
effect of LEP was limited to tests with high verbal media-
tion. Razani et al. (2007) reported a significant difference
with large effect between their NSE and LEP samples on
verbal IQ, but no difference on performance IQ or even full
scale IQ. Likewise, the results of the study by Boone et al.
(2007) were broadly consistent with the hypothesis that the
level of verbal mediation drives the performance differ-
ences between NSE and LEP. Most of the significant con-
trasts within a battery of neuropsychological tests were ob-
served on measures with high verbal mediation: Digit Span,
Letter Fluency, and picture naming (medium to large ef-
fects). Surprisingly, on a measure of visual-constructional
ability (the copy trial of the Rey Complex Figure Test), the
LEP group outperformed NSE (medium effect), suggesting
that the lower scores on the other tests are not due to inher-
ent differences in global cognitive functioning.
Findings by Razani et al. (2007) provide further evi-
dence that the level of verbal mediation accounts for a sig-
nificant portion of variability in the neuropsychological
profile of individuals with LEP. A large effect was observed
on Digit Span between NSE and LEP, but no difference on
Digit-Symbol Coding. In addition, between-group differ-
ences were more likely to emerge on the more difficult trials
of the Trail Making Test, Stroop, and Auditory Consonant
Trigrams. However, more recent investigations failed to
replicate this with the Auditory Consonant Trigrams
(Erdodi et al. 2016), raising the possibility that the effect
of English proficiency on test performance varies not only
across instruments, but also across samples. Overall, the
evidence suggests that while LEP does not affect perfor-
mance on nonverbal processing speed tasks, its deleterious
effects are more likely to become apparent as the task de-
mands or the level of verbal mediation increases.
The issue of performance validity and English proficiency
is compounded in the interpretation of PVTs developed in
North America and administered to individuals with LEP.
Salazar et al. (2007) were among the first to examine the
confluence of these two factors. They found that NSE
outperformed the LEP group on the Reliable Digit Span, while
the opposite was the case for the Rey Complex Figure Test
effort equation (Lu et al. 2003). There was no difference be-
tween the groups on the Dot Counting Test (DCT; Boone et al.
2002)andtheReyFifteenItemTest(Rey-15;Rey1964), two
free-standing PVTs specifically designed to evaluate the cred-
ibility of a response set.
Burton et al. (2012) compared the performance of Spanish-
speaking examinees across settings (clinical, criminal, and
civil forensic) and instruments. The Test of Memory
Malingering and the Rey-15 at standard cutoffs were effective
at differentiating the groups, while the DCT was not. These
results further emphasize the complicated interaction between
language proficiency, referral source, level of verbal media-
tion, and idiosyncratic stimulus properties of individual PVTs,
consistent with earlier investigations that concluded that PVTs
with low verbal mediation administered in Spanish are capa-
ble of distinguishing between credible and noncredible indi-
viduals (Vilar-Lopez et al. 2008).
The literature reviewed above converges in a number of
tentative conclusions. First, LEP reliably deflates performance
on tests with high verbal mediation. Second, the deleterious
effects of LEP extend to some tests with low verbal mediation
and tend to become more pronounced as task complexity in-
creases. Third, the persistent negative findings on certain low
verbal mediation tests when comparing NSE and LEP, in com-
bination with the occasional superiority of the LEP groups,
suggest that the observed differences are unlikely to be simply
driven by lower language proficiency or overall cognitive
functioning of individuals with LEP. Instead, they may reflect
cultural differences in cognitive processing or approaches to
testing. Finally, this pattern of findings results in a predictable
increase in the base rates of failure, but this effect is limited to
PVTs with high verbal mediation. Given the potentially high
stakes of performance validity assessment (i.e., determining
the credibility of an individualsoverall presentation, and re-
sultant assessment interpretation and clinical recommenda-
tions), further investigation of the topic seems warranted.
One of the notable limitations of existing research is the
lack of direct comparison between performance in English
and the participantsother (dominant) language. The pres-
ent study was designed to address that. In addition to rep-
licating elements of earlier investigations (a strategic mix-
ture of PVTs with high and low verbal mediation) in a
different, non-Spanish speaking bilingual sample, two of
the tests were administered in both English and Arabic.
Therefore, the concepts of Bdominant language^and NSE
could be conceptually separated and studied
Psychol. Inj. and Law
independently. Based on the research literature reviewed
above, we hypothesized that limited proficiency in the lan-
guage of test administration will be associated with higher
failure rate on PVTs with high verbal mediation. No dif-
ference was predicted on PVTs with low verbal mediation.
Eighty healthy English-Arabic bilinguals were recruited for an
academic research project through a research participant pool
of a mid-sized Canadian university and the surrounding com-
munity. The study was approved by the institutional review
board. Relevant ethical guidelines regulating research with
human participants were followed throughout the project.
Mean age of the sample was 26.8 years (SD =16.0).Mean
level of education was 14.2 years (SD = 1.7). The majority of
the participants were female (60%) and English-dominant
(71.2%). Language dominance was determined based on a
combination of self-reported relative language proficiency,
language use pattern, and immigration history. Participants
were asked to rate their proficiency in English and Arabic
relative to each other, in percentage, so that the two ratings
add up to 100, following the methodology described by
Erdodi and Lajiness-ONeill (2012).
For example, stable bilinguals would rate themselves as 50/
50, indicating that they are equally proficient in both lan-
guages. Participants who immigrated to Canada as adults
would rate themselves 40/60 or 30/70, indicating that they
speak Arabic better than English. These individuals were clas-
sified as Arabic-dominant, and thus, as having LEP.
Conversely, participants who were born in Canada, grew up
as NSE, and had limited proficiency in Arabic would rate
themselves as 60/40 or 70/30, and were classified as
Two PVTs with high verbal mediation (the Word Choice Test
and Complex Ideational Material) and two with low verbal
mediation (Rey-15 and Digit-Symbol Coding) were adminis-
tered in English only. The Word Choice Test (Pearson 2009)is
a free-standing PVT based on the forced choice recognition
paradigm. The examinee is presented with 50 words, one at a
time, at the rate of 3 s per word. The words are printed on a
card and simultaneously read aloud by the examiner. After the
learning trial, the examinee is presented with a card containing
50 word pairs (a target and a foil) and asked to identify the
word that was part of the original list.
Given the high imageability and concreteness in combina-
tion with low word frequency (Davis 2014), discriminating
between targets and foils during the recognition trial of the
WCT is very easy. Even in clinical settings, credible patients
tend to perform near the ceiling, with means ranging from
49.1 (Davis 2014) to 49.4 (Erdodi et al. 2014), which is com-
parable to the performance of university students in research
settings (49.4; Barhon et al. 2015).
Complex Ideational Material is part of the Boston
Diagnostic Aphasia Battery (Goodglass et al. 2001). It is a
sentence comprehension task originally designed to aid in
diagnosing and subtyping aphasia. The examiner asks a series
of yes/no questions of increasing difficulty to evaluate the
examinees receptive language skills. Raw scores range from
0 to 12. The average performance in the normative sample
was close to the ceiling (M= 11.2, SD = 1.1; Borod et al.
1980). Recent investigations revealed that in individuals with-
out bona fide aphasia, a low score on Complex Ideational
Material is a reliable indicator of invalid performance
(Erdodi and Roth 2016; Erdodi et al. 2016).
The Rey-15 is one of the oldest stand-alone PVTs (Rey
1964). The examinee is presented with a card with five rows,
each having three sequentially organized symbols for 10 s.
The task is to reproduce as many of the original items as
possible. Given the simplicity of the task, healthy controls
produce near-perfect scores. Although performance is not im-
mune to genuine neurological impairment, the Rey-15 is gen-
erally robust to the deleterious effects of brain injury (Lezak
et al. 2012), making it suitable as a PVT (Boone 2013;Morse
et al. 2013;OBryant et al. 2003). However, the low sensitiv-
ity of the Rey-15 has been repeatedly identified as a liability
(Reznek 2005; Rüsseler et al. 2008).
The Digit-Symbol Coding subtest of the Wechsler Adult
Intelligence Scales is a timed symbol substitution task mea-
suring attention, visual scanning, and psychomotor processing
speed. Although sensitive to the effect of diffuse neuropsychi-
atric deficits (Lezak et al. 2012), below certain cutoffs perfor-
mance on Coding is confounded by invalid responding.
Therefore, this test can function as an effective embedded
validity indicator (Erdodi et al. 2016; Trueblood 1994).
All tests were administered according to the standard proce-
dures outlined in the technical manual by a trained research
assistant who was fluent in both English and Arabic.
Participants were instructed to perform to the best of their
ability. However, they were not warned about the presence
of PVTs, following recommendations based on previous em-
pirical research on the negative effects of sensitizing exam-
inees to the issue of performance validity (Boone 2007;
Youn g j oh n et a l . 1999).
Digit Span and Animal Fluency were administered in both
languages, in counterbalanced order, once at the beginning
and once at the end ofthe test battery. In addition to measuring
Psychol. Inj. and Law
auditory attention, working memory, language skills, and pro-
cessing speed, both tasks are well-established embedded va-
lidity indicators (Boone 2013; Sugarman and Axelrod 2015).
Data Analysis
The main descriptive statistics were failure rate (percent
of the sample that scored below the validity cutoffs) and
relative risk. Statistical significance was determined using
ttests or χ
. Effect size estimates were expressed in Φ
Relative risk ratios were computed to provide a single-
number comparison of failure rates between the English-
and Arabic-dominant groups.
As a group, English-dominant participants were significantly
younger (M= 20.2 years, SD = 2.5) than Arabic-dominant par-
ticipants (M=42.9 years, SD =22.9): t(78) = 7.44, p<.001.
The Arabic-dominant sample had significantly higher levels
of education (M= 15.0 years, SD = 1.5) compared to their
English-dominant counterparts (M= 13.8 years, SD = 1.7):
t(78) = 3.00, p< .01. There was no significant difference in
the gender ratio between the two groups (36.8 vs. 47.8%
male): χ
(1) = 0.82, p= .36.
The English/Arabic Animal Fluency raw score ratio was
significantly higher and more variable (M=2.62, SD =1.55)
in the English-dominant sample as compared to the Arabic-
dominant sample (M=0.90,SD = 0.29): t(78) = 5.25, p<.001,
d= 1.54 (very large effect). In other words, English-dominant
participants generated, on average, 2.6 times more animal
names in English than Arabic. Conversely, the output of
Arabic-dominant participants on the English version was
around 90% of their performance in Arabic. Likewise, there
was a pronounced difference on the Boston Naming Test
Short Form between the English-dominant (M=11.8,
SD = 2.2) and Arabic-dominant (M= 7.1, SD =3.1) subsam-
ples: t(78) = 7,69, p< .001, d= 1.75 (very large effect).
Performance on this test has been identified as a reliable indi-
cator of language proficiency (Erdodi et al. 2016;Morenoand
Kutas 2005). These findings provide empirical support for
classifying participants into the two groups based on language
dominance (i.e., English vs. Arabic) in addition to the self-
rated language proficiency.
The Arabic-dominant sample was 16 times more likely to
fail the Word Choice Test accuracy scores and two to three
times more likely to fail Complex Ideational Material com-
pared to the English-dominant sample. Since only Arabic-
dominant participants failed the Word Choice Test time cutoff,
a risk ratio could not be computed. All contrasts comparing
the PVT failure rates as a function of language dominance
(defined as English or Arabic) were statistically significant.
No participant failed the Rey-15 or Digit-Symbol Coding in
the entire sample (Table 1).
Participants were almost five times more likely to fail the
age-corrected scaled score cutoff when the Digit Span task
was administered in their nondominant language. This con-
trast was associated with a large effect size (Φ
Compared to the dominant language, risk of failing Reliable
Digit Span doubled during the nondominant language admin-
istration (medium effect size: Φ
=.06). Therelativeriskwas
the highest on longest digits forward: nondominant language
administration carried an almost 18-fold risk of failure (medi-
um-large effect size: Φ
On Animal Fluency, when the task was administered in
the nondominant language, participants were three to four
times more likely to score in the failing range. All con-
trasts were statistically significant (Table 2). The effect of
language dominance was more pronounced on demo-
graphically adjusted T-scores (Φ
= .31) as compared to
raw scores (Φ
= .22). However, both of these effect size
estimates fall in the very large range.
The present study was designed to examine failure rate on
PVTs with high and low verbal mediation in an English-
Arabic bilingual sample. Consistent with previous reports
(Boone et al. 2007; Razani et al. 2007) and our initial hypoth-
esis, when PVTs with high verbal mediation were adminis-
tered to participants with LEP, failure rates were two to 16
times higher as compared to NSE. As in earlier studies, no
difference was observed on PVTs with low verbal mediation
(Salazar et al. 2007), suggesting that the difference in relative
risk for PVT failure between the LEP and NSE samples rep-
resents false positive errors.
When Digit Span and Animal Fluency were administered
in the participants nondominant language, they were two to
18 times more likely to fail validity cutoffs as compared to
when these tests were administered in their dominant lan-
guage. These within-individual contrasts provide a conceptual
control condition that enhances the interpretation of the data,
by redefining the independent variable from BEnglish vs.
Arabic^to Bdominant vs. nondominant language.^Hence,
they examine the effect of language proficiency as presented
with a specific cognitive task in a given language, regardless
of the individuals language dominance. As such, English-
dominant participants presented with a task in English are
grouped together with Arabic-dominant participants presented
with a task in Arabic and compared to English-dominant par-
ticipants presented with a task in Arabic grouped togetherwith
Arabic-dominant participants presented with a task in English.
These comparisons model the effect of the cognitive vulnera-
bility stemming from limited language proficiency that
Psychol. Inj. and Law
manifests as poor performance during neuropsychological as-
sessment. If the test in question is a PVT, the end result is a
substantially higher risk of failing the cutoff and, therefore,
being erroneously labeled as Binvalid^or Bnoncredible.^
Our results suggest that the inner logic of performance va-
lidity assessment (i.e., the taskissoeasythatbelow-threshold
performance can be considered evidence of noncredible
responding [Boone 2013;Larrabee2012]) may not apply to
PVTs with high verbal mediation administered to examinees
with LEP, as predicted by Bigler (2015). In such cases, scores
in the failing range are more likely to reflect limited proficiency
in the language of test administration rather than invalid per-
formance. Based on the existing evidence, PVTs with low ver-
bal mediation appear to be appropriate to use in LEP popula-
tions(Razanietal.2007; Salazar et al. 2007), although more
research is clearly needed to better understand the interactions
between LEP, performance validity, level of education and ac-
culturation, task complexity, and level of verbal mediation
(Boone et al. 2007;Coffeyetal.2005; Razani et al. 2007).
A potential weakness of the present design is the inherent
differences in the signal detection profiles of the PVTs used.
One could argue that the negative results on PVTs with low
verbal mediation reflect the inability of these instruments to
detect invalid response sets rather than convincing evidence of
credible performance. Indeed, both the Rey-15 (Reznek 2005;
Rüsseler et al. 2008) and the Digit-Symbol Coding (Erdodi
et al. 2016) have been criticized for low sensitivity. A careful
comparison offailure rate on these two PVTs relative to others
partially substantiates these concerns.
In the Spanish speaking forensic sample of Burton and
colleagues (2012), the base rate of failure on the Rey-15 was
47% (vs. 33% onthe Test of Memory Malingering), indicating
that the instrument is capable of detecting invalid responding.
Therefore, in contrast with these studies, the 0% failure rate
observed in the present sample may in fact reflect true nega-
tives. In the study by Erdodi and Roth (2016), the failure rate
on Digit-Symbol Coding was lower (21.2%) compared to
some of the Digit Span-based PVTs (38.752.2%), but com-
parable to validity indicators embedded in Animal Fluency
(18.532.8%). Likewise, the failure rate on Rey-15 (33.3%)
was similar to that on Complex Ideational Material (26.5%).
Tabl e 1 Failure rates on
performance validity tests
administered in English only as a
function of language dominance
Dominant language WCT CIM Rey-15 CD
Accuracy Time Raw score T-score FR ACSS
English n109 1300
n= 57 % 1.8 0.0 15.8 22.8 0.0 0.0
Arabic n62111200
n= 23 % 26.1 8.7 47.8 52.2 0.0 0.0
12.2 5.08 8.97 6.58 ––
p<.01 <.05 <.01 <.01 ––
RR 16.3 3.03 2.29 ––
Raw score cutoffs were not listed in order to protect test security
WCT Word Choice Test (Pearson 2009), Accuracy number of words correctly recognized out of 50 cutoff
associated with a base rate of 25% in the overall clinical sample in the normative data (Erdodi et al. 2016;
Pearson 2009), Time completion time for the recognition trial in seconds (cutoff suggested by Erdodi et al. 2016),
CIM Complex Ideational Material subtest of the Boston Diagnostic Aphasia Examination, Raw score number of
correct responses out of 12 (the liberal cutoff suggested by Erdodi et al. (2016) and Erdodi and Roth (2016)was
used), T-score demographically adjusted score based on the norms by Heaton et al. (2004)cutoff 29 (Erdodi
et al. 2016; Erdodi and Roth 2016), Rey-15 FR Rey Fifteen-Item Test free recall trial (traditional cutoff suggested
by Lezak (1995), CD
ACSS age-corrected scaled scores on the Digit-Symbol Coding subtest of the
Wechsler Adult Intelligence Scale Third Edition (<6; Erdodi et al. 2016; Trueblood 1994), RR relative risk
Table 2 Failure rates on performance validity tests administered in
both English and Arabic as a function of language dominance
Digit span
Animal fluency
Language ACSS RDS LDF Raw score T-score
Dominant 8.7% 18.7% 1.2% 16.3% 23.8%
Nondominant 41.3% 39.9% 21.2% 61.2% 78.9%
11.0 4.48 8.12 17.2 24.4
p<.01 <.05 <.05 <.05 <.01
.14 .06 .10 .22 .31
RR 4.75 2.13 17.7 3.76 3.32
Raw score cutoffs were not listed in order to protect test security
WAI S-I II Wechsler Adult Intelligence Scale Third Edition, ACSS age-
corrected scaled score (cutoff 6; Babikian et al. 2006; Hayward et al.
1987; Heinly et al. 2005; Trueblood 1994), RDS Reliable Digit Span
(cutoff associated with a base rate of 25% in the overall clinical sample
in the normative data; Babikian et al. 2006;Heinlyetal.2005;Pearson
2009), LDF longest digit span forward (cutoff suggested by Lezak et al.
2012), Raw score number of animals generated in 1 min (liberal cutoff
suggested by Sugarman and Axelrod 2015; Hayward et al. 1987), T-score
demographically adjusted score based on the norms by Heaton et al.
(2004)cutoff 33 (Sugarman and Axelrod 2015), RR relative risk
Psychol. Inj. and Law
Although the failure rate on Rey-15 (12.4%) and Coding
(14.2%) were also some of the lowest in the larger scale study
by Erdodi et al. (2016), they were broadly consistent with
values observed on other, more robust PVTs such as Word
Choice Test (27.0%), Digit Span (7.124.6%), Complex
Ideational Material (13.119.4%), and Animal Fluency (11.7
21.6%). Furthermore, other empirical investigations found the
Rey-15 useful at differentiating valid from invalid response sets
in NSE (Morse et al. 2013;OBryant et al. 2003).
In addition, a more recent study by An et al. (2016)
fluency tasks had a consistently higher failure rate than
well-established, robust stand-alone PVTs in a sample of
Canadian university students who volunteered to partici-
pate in academic research projects. Therefore, while Rey-
15 and Digit-Symbol Coding might still underestimate the
true rate of noncredible responding in the current sample,
the markedly different failure rates between PVTs with
high and low verbal mediation cannot be solely attributed
to instrumentation artifacts.
An additional possible confound in the present study
design is the difference in age and education between the
two criterion groups: the Arabic-dominant sample was
older and better educated than the English-dominant sam-
ple. While this likely reflects true population level differ-
ences (i.e., those with LEP likely immigrated to Canada
as adults, and therefore, they tended to be older; in turn,
older individuals had more time to advance their educa-
tion), both of these demographic variables have been
shown to influence performance on cognitive tests
(Heaton et al. 2004; Mitrushina et al. 1999). However,
our findings indicate that there was no systematic, clini-
cally relevant difference in failure rates as a function of
demographically corrected (Digit Span age-corrected
scaled score, Animal Fluency T-score) vs. raw score
(Reliable Digit Span, Longest Digits Forward, number
of animal names generated during Animal Fluency) based
cutoffs (Table 2).
There was considerable fluctuation in effect sizes associat-
ed with language dominance across instruments and cutoffs.
Within Digit Span, language proficiency had the strongest
relationship with the age-corrected scaled score (large effect).
Conversely, the Reliable Digit Span was the least affected by
language dominance (medium effect). Even though partici-
pants who were administered the test in their nondominant
language were twice as likely to fail this PVT compared to
when it was administered in their dominant language, the
Reliable Digit Span was nevertheless the most robust validity
indicator with high verbal mediation of the ones examined in
the present study. However, the effect of language dominance
on the likelihood of failing the Animal Fluency cutoffs was
very large. As such, the use of these cutoffs in examinees with
LEP is difficult to justify.
In the context of the landmark paper by Green et al.
(2001) demonstrating that noncredible responding ex-
plained between 49 and 54% of variance in neuropsycho-
logical test scores, our data indicate that language profi-
ciency accounts for 631% in failure rate on PVTs with
high verbal mediation. Given that the base rate of failure
on PVTs with low verbal mediation was zero regardless of
language proficiency, most of the failures on PVTs with
high verbal mediation are likely false positive errors. The
implication of this finding to clinical practice is that PVTs
with high verbal mediation are unreliable indicators of
noncredible performance in examinees with LEP. At the
same time, our data support the continued use of PVTs
with low verbal mediation, provided that the examinee
was able to comprehend the test instructions.
Results should be interpreted in the context of the
studys limitations. As discussed above, future studies
would benefit from using multiple, more sensitive PVTs,
especially in the low verbal mediation category. In addi-
tion, the sample is restricted to a single geographic area
and English-Arabic bilinguals. Replications based on par-
ticipants with diverse ethnic and linguistic backgrounds,
and using different instruments are crucial to establish the
generalizability of our findings.
Despite the common variability in results across stud-
ies (Leighton et al. 2014), the cumulative evidence con-
verges in one main conclusion: Cultural influences on
neuropsychological testing are significant, vary across
measures, and can significantly alter the clinical interpre-
tation of the data. Depending on the context, the lan-
guage in which the material is presented can have subtle
(Erdodi and Lajiness-ONeill 2012)or unexpectedly
strong (Coffey et al. 2005) effects. Our finding that
LEP can dramatically increase the failure rate on PVTs
with high verbal mediation has far-reaching clinical and
forensic implications, substantiates Biglers(2012,2015)
concerns about inflated false positive rates in vulnerable
populations, and warrants further investigation. In the
meantime, given the high cost of misclassifying an indi-
vidual as noncredible in both clinical and forensic assess-
ments, the use of PVTs with high verbal mediation in
individuals with LEP should either be avoided altogether,
or interpreted with caution.
Compliance with Ethical Standards
Conflict of Interest This project received no financial support from
outside funding agencies. The authors have no disclosures to make that
could be interpreted as conflict of interests.
Human and Animal Rights and Informed Consent Relevant ethical
guidelines regulating research involving human participants were follow-
ed throughout the project. All data collection, storage, and processing was
done in compliance with the Helsinki Declaration.
Psychol. Inj. and Law
An, K. Y., Kaploun, K., Erdodi, L. A., & Abeare, C. A. (2016).
Performance validity in undergraduate research participants: a com-
parison of failure rates across tests and cutoffs. The Clinical
Neuropsychologist. doi:10.1080/13854046.2016.1217046.
Advance online publication.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and
specificity of various Digit Span scores in the detection of suspect
effort. The Clinical Neuropsychologist, 20,145159.
Barhon, L. I., Batchelor, J., Meares, S., Chekaluk, E., & Shores, E. A.
(2015). A comparison of the degree of effort involved in the TOMM
and the ACS Word Choice Test using a dual-task paradigm. Applied
Neuropsychology: Adult, 22,114123.
Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity
and performance validity testing. Brain Imaging and Behavior, 9,
Bigler, E. D. (2012). Symptom validity testing, effort and neuropsycho-
logical assessment. Journal of the International Neuropsychological
Society, 18,632642.
Boone, K. B. (2007). Assessment of feigned cognitive impairment. A
neuropsychological perspective. New York, NY: Guilford.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.New
York, NY: Guilford.
Boone, K., Lu, P., & Herzberg, D. (2002). The dot counting test.Los
Angeles: Western Psychological Services.
Boone, K. B., Victor, T. L., Wen, J., Razani, J., & Ponton, P. (2007). The
association between neuropsychological scores and ethnicity, lan-
guage, and acculturation variables in a large patient population.
Archives of Clinical Neuropsychology, 22,355365.
Borod, J. C., Goodglass, H., & Kaplan, E. (1980). Normative data on the
Boston diagnostic aphasia examination, parietal lobe battery, and the
Boston naming test. Journal of Clinical Neuropsychology, 2, 209215.
Burton, V., Vilar-Lopez, R., & Puente, A. E. (2012). Measuring effort in
neuropsychological evaluations of forensic cases of Spanish
speakers. Archives of Clinical Neuropsychology, 27(3), 262267.
Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psychological
assessment of symptom and performance validity, response bias,
and malingering: official position of the association of psychological
advancement in psychological injury and law. Psychological Injury
and Law, 7,197205.
Coffey, D., Marmol, L., Schock, L., & Adams, W. (2005). The influence
of acculturation on the Wisconsin card sorting test by Mexican
Americans. Archives of Clinical Neuropsychology, 20,795803.
Davis, J. J. (2014). Further consideration of advanced clinical solutions
word choice: comparison to the recognition memory testwords
and classification accuracy on a clinical sample. The Clinical
Neuropsychologist, 28(8), 12781294.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski,
B., Zuccato, B. G., & Roth, R. M. (2016). Wechsler Adult
Intelligence Scale Fourth Edition (WAIS-IV) processing speed
scores as measures of non-credible responding: The third generation
of embedded performance validity indicators. Advance online pub-
lication. Psychological Assessment.doi:10.1037/pas0000319
Erdodi, L. A., Jongsma, K. A., & Issa, M. (2016). The 15-item version of
the Boston Naming test as an index of English proficiency. The
Clinical Neuropsychologist.doi:10.1080/13854046.2016.1224392.
Advance online publication.
Erdodi, L. A., Kirsch, N. L., Lajiness-ONeill, R., Vingilis, E., & Medoff,
B. (2014). Comparing the recognition memory test and the word
choice test in a mixed clinical sample: Are they equivalent?
Psychological Injury and Law, 7(3), 255263.
Erdodi, L., & Lajiness-ONeill, R. (2012). Humor perception in bilin-
guals: Is language more than a code? International Journal of
Humor Research, 25(4), 459468. doi:10.1515/humor-2012-0024
Erdodi, L. A.,Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier, C.
L., Rai, J. K., & Roth, R. M. (2016). The BDAE ComplexIdeational
Material A measure of receptive language or performance validi-
ty? Psychological Injury and Law, 9,112120.
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D., Abeare, C.
A., Pelletiere, C. L., Roth, R. M. (2016). The power of timing:
Adding a time-to-completion cutoff to the Word Choice Test and
Recognition Memory Test improves classification accuracy. Journal
of Clinical and Experimental Neuropsychology. Advance online
publication. doi:10.1080/13803395.2016.1230181
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE Complex
Ideational Material are associated with invalid performance in adults
without aphasia. Advance online publication. Applied
Neuropsychology: Adult.doi:10.1080/23279095.2016.1154856
Green, P., Rohling, M. L.,Lees-Haley, P. R., & Allen, L. M. (2001). Effort
has a greater effect on test scores than severe brain injury in com-
pensation claimants. Brain Injury, 15(12), 10451060.
Goodglass, H., Kaplan, E., & Barresi, B. (2001). Boston Diagnostic
Aphasia Examination (3rd ed.). Philadelphia: Lippincott Williams
& Wilkins.
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can localized
brain impairment be simulated on neuropsychological test profiles?
Australian and New Zealand Journal of Psychiatry, 21,8793.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised
comprehensive norms for an expanded Halstead-Reitan battery:
Demographically adjusted neuropsychological norms for African
American and Caucasian adults. Lutz, FL: Psychological
Assessment Resources.
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S.
R. (2009). American academy of neuropsychology consensus con-
ference statement on the neuropsychological assessment of effort,
response bias, and malingering. The Clinical Neuropsychologist,
23(7), 10931129.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A.
(2005). WAIS Digit-Span-based indicators of malingered
neurocognitive dysfunction: Classification accuracy in traumatic
brain injury. Assessment, 12(4), 429444.
Larrabee, G. J. (2012). Forensic neuropsychology: A scientific approach.
New York: Oxford University Press.
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap
between neurocognitive processing theory and performance validity
assessment among the cognitively impaired: A review and method-
ological approach. Journal of the International Neuropsychological
Society, 20,8
Lezak, M. D. (1995). Neuropsychological assessment. New York: Oxford
University Press.
Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012).
Neuropsychological assessment. New York: Oxford University
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz, A., &
Flaro, L. (2017). Wisconsin Card Sorting Test embedded validity
indicators developed for adults can be extended to children. Child
online publication.
Lichtenstein, J. D., Moser, R. S., & Schatz, P. (2014). Age and test setting
affect the prevalence of invalid baseline scores on neurocognitive
tests. American Journal of Sports Medicine, 42(2), 479484.
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003).
Effectiveness of the Rey-Osterrieth complex figure test and the
Meyers and Meyers recognition trial in the detection of suspect
effort. The Clinical Neuropsychologist, 17(3), 426440.
Mitrushina, M. N., Boone, K. B., & DElia, L. F. (1999). Handbook of
normative data for neuropsychological assessment.NewYork,NY:
Oxford University Press.
Moreno, E. M., & Kutas, M. (2005). Processing semantic anomalies in
two languages: An electrophysiological exploration in both
Psychol. Inj. and Law
languages of Spanish-English bilinguals. Cognitive Brain Research,
Morse, C. L.,Douglas-Newman, K., Mandel, S., & Swirsky-Sacchetti, T.
(2013). Utility of the Rey-15 recognition trial to detect invalid per-
formance in a forensic neuropsychological sample. The Clinical
Neuropsychologist, 27(8), 13951407.
OBryant, S. E., Hilsabeck, R. C., Fisher, J. M., & McCaffrey, R. J.
(2003). Utility of the Trail Making Test in the assessment of malin-
gering in a sample of mild traumatic brain injury litigants. The
Clinical Neuropsychologist, 17(1), 6974.
Pearson (2009). Advanced Clinical Solutions for the WAIS-IVand WMS-
IV Technical Manual. San Antonio, TX: Author
Razani, J., Murcia, G., Tabares, J., & Wong, J. (2007). The effects of
culture on the WASI test performance in ethnically diverse individ-
uals. The Clinical Neuropsychologist, 21,776788.
Razani, J., Burciaga, J., Madore, M., & Wong, J. (2007). Effects of ac-
culturation on tests of attention and information processing in an
ethnically diverse group. Archives of Clinical Neuropsychology,
Rey, A. (1964). Lexamen clinique en psychologie. Paris: Presses
Universitaires de France.
Reznek, L. (2005). The Rey 15-item memory test for malingering: A
meta-analysis. Brain Injury, 19(7), 539543.
Rüsseler, J., Brett, A., Klaue, U., Sailer, M., & Münte, T. F. (2008). The
effect of coaching on the simulated malingering of memory impair-
ment. BMC Neurology, 8(37), 114.
effort tests in ethnic minorities and in non-English speaking
and English as a second language populations. In K. B.
Boone (Ed.), Assessment of feigned cognitive impairment: A
neuropsychological perspective (pp. 405427). New York:
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of
performance validity using verbal fluency tests in a clinical sample.
Applied Neuropsychology: Adult, 22(2), 141146.
Trueblood, W. (1994). Qualitative and quantitative characteristics of
malingered and other invalid WAIS-R and clinical memory
data. Journal of Clinical and Experimental Neuropsychology,
14(4), 697607.
Vilar-Lopez, R., Gomez-Rio, M., Caracuel-Romero, A., Llamas-Elvira,
J., & Perez-Garcia, M. (2008). Use of specific malingering measures
in a Spanish sample. Journal of Clinical and Experimental
Neuropsychology, 30(6), 710722.
Youngjohn, J. R., Lees-Haley, P. R., & Binder, L. M. (1999). Comment:
Warning malingerers produces more sophisticated malingering.
Archives of Clinical Neuropsychology, 14,511515.
Psychol. Inj. and Law
... This observed pattern has been consistently reported in previous studies (Charchat-Fichman et al., 2011;Riva et al., 2000) and may be suggestive of a transition in the development of executive function and language skills (Anderson, 2002). Language proficiency has been shown to have a large effect on VF measures, so that, individuals with low language proficiency or performing tasks in a non-dominant language were at higher risk of failing the required verbal task (Brantuo et al., 2022;Erdodi, Jongsma, et al., 2017;Erdodi, Nussbaum, et al., 2017). ...
This study assessed the quantitative and qualitative performance of Lebanese-speaking children on verbal fluency (VF) tasks and investigated the effects of sociodemographic characteristics. This study included 219 Lebanese children aged between 5 and 12 years and 11 months, whose native language is Lebanese-Arabic. Semantic and letter VF tasks were assessed using a range of categories and letters. Switching and clustering strategies were analyzed for 177 Lebanese children. The number of words produced presented a significant increase with age (p < .004) in semantic (SVF), while in letter (LVF), the differences were significant between extreme age groups. Females generated more words in the clothes (p = .003) and household items (p = .002) categories. The total number of switches and clusters showed a significantly increasing pattern with age (p < .05). The number of switches was higher for participants with high maternal (p < .001) and paternal (p < .013) educational levels. Regression analyses showed that the total number of switches and clusters, and the mean cluster size had a significant effect on SVF performance (p < .001). The current study generated preliminary norms for VF tasks for Lebanese-speaking children. The results of the current study have an important contribution to neuropsychology research and clinical practice.
... The majority of the sample was White, constraining the generalizability of the results to racially more diverse populations-especially given the differential failure rates observed within this study. Relatedly, findings may not generalize to examinees with limited English proficiency, who tend to have a significant disadvantage on verbal memory tests in general Finlay et al., 2010) and are also at a higher risk of failing the WCT (Erdodi, Nussbaum, et al., 2017). Finally, time-to-completion for the WCT was not recorded. ...
Full-text available
Objective: This study was designed to replicate previous research on critical item analysis within the Word Choice Test (WCT). Method: Archival data were collected from a mixed clinical sample of 119 consecutively referred adults (Mage = 51.7, Meducation = 14.7). The classification accuracy of the WCT was calculated against psychometrically defined criterion groups. Results: Critical item analysis identified an additional 2%-5% of the sample that passed traditional cutoffs as noncredible. Passing critical items after failing traditional cutoffs was associated with weaker independent evidence of invalid performance, alerting the assessor to the elevated risk for false positives. Failing critical items in addition to failing select traditional cutoffs increased overall specificity. Non-White patients were 2.5 to 3.5 times more likely to Fail traditional WCT cutoffs, but select critical item cutoffs limited the risk to 1.5-2. Conclusions: Results confirmed the clinical utility of critical item analysis. Although the improvement in sensitivity was modest, critical items were effective at containing false positive errors in general, and especially in racially diverse patients. Critical item analysis appears to be a cost-effective and equitable method to improve an instrument's classification accuracy. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... Previous research has indicated that language proficiency has an impact on participants' understanding of the task and questions (Nijdam-Jones & Rosenfeld, 2017). Similar was shown in other work focused on Performance Validity Tests (PVTs; Erdodi et al., 2017), hence, if the language influences PVT outcomes, one can assume that such effects are even stronger when using self-report SVTs. Yet, studies including asylum seekers provided evidence that SVT outcomes are significantly more impacted by the incentives behind the assessment rather than evaluees' language skills (van der Heide et al, 2017;van der Heide & Merckelbach, 2016). ...
Full-text available
Feigning causes personal and societal consequences, in both civil and criminal context. We investigated whether presenting the consequences of feigning can diminish symptom endorsement in feigned Posttraumatic Stress Disorder (PTSD). We randomly allocated non-native English speaking undergraduates (N = 145) to five conditions: 1) Truth tellers (n = 31), 2) Civil context feigners (n = 27), 3) Civil context warned feigners (n = 26), 4) Criminal context feigners (n = 29), and 5) Criminal context warned feigners (n = 32). All feigning groups received a vignette depicting a situation in which claiming PTSD would be beneficial. One vignette referred to the personal injury claim, whereas the second was about the aggravated assault charges. Additionally, one feigning group from each setting received information about the consequences of feigning (i.e., warned feigners). After receiving the instructions, all participants were administered the Self-Report Symptom Inventory (SRSI), a measure of symptom endorsement. Truth tellers endorsed fewer symptoms than all feigning groups, which mostly did not differ. Yet, criminal warned feigners (59%) were significantly less frequently detected on the SRSI as overreporters than other feigning groups (86.2%–89%). Hence, emphasizing the negative consequences of overreporting may diminish symptom endorsement, but only in high-stake situations. The implications and limitations (e.g., online measure administration) of this work are discussed.
... However, if the BNT is already routinely administered as an ability test for clinical reasons, the cumulative empirical evidence suggests that, at sufficiently high (and population-specific) cutoffs, it also provides strong evidence of invalid performance. Finally, given that the BNT is highly sensitive to limited English proficiency (Ali et al., 2020;Erdodi, Nussbaum, Sagar, Abeare, & Schwartz, 2017;Erdodi, Jongsma & Issa, 2017;Stalhammar et al., 2020), validity cutoffs should not be applied to examinees who are not native speakers of English to protect against unacceptably high false positive rates (Lippa, 2018). ...
Full-text available
This study was designed to examine alternative validity cutoffs on the Boston Naming Test (BNT). Archival data were collected from 206 adults assessed in a medicolegal setting following a motor vehicle collision. Classification accuracy was evaluated against three criterion PVTs. The first cutoff to achieve minimum specificity (.87-.88) was T ≤ 35, at .33-.45 sensitivity. T ≤ 33 improved specificity (.92-.93) at .24-.34 sensitivity. BNT validity cutoffs correctly classified 67–85% of the sample. Failing the BNT was unrelated to self-reported emotional distress. Although constrained by its low sensitivity, the BNT remains a useful embedded PVT.
... Alternative validity scales based on different detection methods (response consistency, rarely endorsed symptoms, logistic regression equations) should also be considered. Finally, in an increasingly diverse world, studies examining the cross-cultural validity of the TSI-2 would greatly expand the scope of the instrument (Ali et al., 2020;Erdodi et al., 2017). ...
Full-text available
This systematic review was performed to summarize existing research on the symptom validity scales within the Trauma Symptom Inventory–Second Edition (TSI-2), a relatively new self-report measure designed to assess the psychological sequelae of trauma. The TSI-2 has built-in symptom validity scales to monitor response bias and alert the assessor of non-credible symptom profiles. The Atypical Response scale (ATR) was designed to identify symptom exaggeration or fabrication. Proposed cutoffs on the ATR vary from ≥ 7 to ≥ 15, depending on the assessment context. The limited evidence available suggests that ATR has the potential to serve as measure of symptom validity, although its classification accuracy is generally inferior compared to well-established scales. While the ATR seems sufficiently sensitive to symptom over-reporting, significant concerns about its specificity persist. Therefore, it is proposed that the TSI-2 should not be used in isolation to determine the validity of the symptom presentation. More research is needed for development of evidence-based guidelines about the interpretation of ATR scores.
... Notably, racial diversity has been addressed minimally in EPVT research and as such, differences in EPVT performance and false positive rates among racially diverse examinees are largely unknown. However, some authors have reported findings raising concern for potentially higher false positive rates using certain EPVTs with African Americans and examinees for whom English is the nondominant language (e.g., Erdodi et al., 2017;Victor et al., 2009). Therefore, understanding the role of race and ethnicity in EPVT performance appears to be critical, and intentional effort to include these populations in future EPVT research is clearly warranted. ...
We tested the usefulness of six embedded performance validity tests (EPVTs) in identifying performance invalidity in a mixed clinical sample. Using a retrospective design, 181 adults were classified as valid (n=146) or invalid (n=35) performance based upon their performance on one of three standalone PVTs (Test of Memory Malingering, Victoria Symptom Validity Test, Dot Counting Test). Multiple cutoffs were identified corresponding to predetermined false positive rates of 0, 5, 10, and 15% for each of six EPVTs. EPVT cutoffs corresponding to the predetermined false positive benchmarks were generally more conservative than currently established scores. Sensitivity was low (.0%–42.9%) for individual EPVTs across these cutoffs and was moderately improved by the combination of multiple EPVT failures. The optimal number of EPVT failures using the 10% false positive rate was 2. Although the overall classification accuracy of 80.7% and specificity of 89.0% were comparable to prior research, the sensitivity of 45.7% was more modest than previous estimates. Low sensitivities indicate that this combination of EPVTs failed to detect a majority of invalid performers.
Full-text available
Base rates of failure (BRFail) on performance validity tests (PVTs) were examined in university students with limited English proficiency (LEP). BRFail was calculated for several free-standing and embedded PVTs. All free-standing PVTs and certain embedded indicators were robust to LEP. However, LEP was associated with unacceptably high BRFail (20–50%) on several embedded PVTs with high levels of verbal mediation (even multivariate models of PVT could not contain BRFail). In conclusion, failing free-standing/dedicated PVTs cannot be attributed to LEP. However, the elevated BRFail on several embedded PVTs in university students suggest an unacceptably high overall risk of false positives associated with LEP.
Full-text available
Objective The objective of the present study was to examine the neurocognitive profiles associated with limited English proficiency (LEP). Method A brief neuropsychological battery including measures with high (HVM) and low verbal mediation (LVM) was administered to 80 university students: 40 native speakers of English (NSEs) and 40 with LEP. Results Consistent with previous research, individuals with LEP performed more poorly on HVM measures and equivalent to NSEs on LVM measures—with some notable exceptions. Conclusions Low scores on HVM tests should not be interpreted as evidence of acquired cognitive impairment in individuals with LEP, because these measures may systematically underestimate cognitive ability in this population. These findings have important clinical and educational implications.
Full-text available
This study was designed to examine the effect of limited English proficiency (LEP) on the Hopkins Verbal Learning Test-Revised (HVLT-R). The HVLT-R was administered to 28 undergraduate student volunteers. Half were native speakers of English (NSE), half had LEP. The LEP sample performed significantly below NSE on individual acquisition trials and delayed free recall (large effects). In addition, participants with LEP scored 1.5-2 SDs below the normative mean. There was no difference in performance during recognition testing. LEP status was associated with a clinically significant deficit on the HVLT-R in a sample of cognitively healthy university students. Results suggest that low scores on auditory verbal learning tests in individuals with LEP should not be automatically interpreted as evidence of memory impairment or learning disability. LEP should be considered as grounds for academic accommodations. The generalizability of the findings is constrained by the small sample size.
The accuracy of performance validity tests (PVTs) with culturally diverse populations has increasingly been questioned. High false positive rates have been found in some PVTs in culturally and linguistically diverse individuals within the U.S. and internationally. No study to date has investigated the accuracy of PVTs with Chinese-speaking immigrants (CSI) in the U.S. The current study aimed to evaluate two PVTs, the Test of Memory Malingering (TOMM) and Dot Counting Test (DCT), to determine their accuracy in a community sample of CSI with limited English proficiency. These two measures were used in a simulation design, contrasting 52 participants who were instructed to respond honestly to 22 participants instructed to feign incompetency to stand trial. Results demonstrated the scores of TOMM Trial 1 and Trial 2 were effective in classifying honest responders from simulators, whereas the DCT E-score did not differentiate the groups better than chance. However, false positive rates for the TOMM Trial 1, Trial 2, and the DCT E-score were relatively low. Only one honest responder (1.9%) was classified as exerting insufficient effort in TOMM Trial 1 and DCT E-score, and the TOMM Trial 2 did not misclassify any honest responders. Implications and cautionary statements are provided and discussed.
Full-text available
Past studies have examined the ability of the Wisconsin Card Sorting Test (WCST) to discriminate valid from invalid performance in adults using both individual embedded validity indicators (EVIs) and multivariate approaches. This study is designed to investigate whether the two most stable of these indicators—failures to maintain set (FMS) and the logistical regression equation S-BLRE—can be extended to pediatric populations. The classification accuracy for FMS and S-BLRE was examined in a mixed clinical sample of 226 children aged 7 to 17 years (64.6% male, MAge = 13.6 years) against a combination of established performance validity tests (PVTs). The results show that at adult cutoffs, FMS and S-BLRE produce an unacceptably high failure rate (33.2% and 45.6%) and low specificity (.55–.72), but an upward adjustment in cutoffs significantly improves classification accuracy. Defining Pass as <2 and Fail as ≥4 on FMS results in consistently good specificity (.89–.92) but low and variable sensitivity (.00–.33). Similarly, cutting the S-BLRE distribution at 3.68 produces good specificity (.90–.92) but variable sensitivity (.06–.38). Passing or failing FMS or S-BLRE is unrelated to age, gender and IQ. The data from this study suggest that in a pediatric sample, adjusted cutoffs on the FMS and S-BLRE ensure good specificity, but with low or variable sensitivity. Thus, they should not be used in isolation to determine the credibility of a response set. At the same time, they can make valuable contributions to pediatric neuropsychology by providing empirically-supported, expedient and cost-effective indicators to enhance performance validity assessment.
Full-text available
Introduction: The Recognition Memory Test (RMT) and Word Choice Test (WCT) are structurally similar, but psychometrically different. Previous research demonstrated that adding a time-to-completion cutoff improved the classification accuracy of the RMT. However, the contribution of WCT time-cutoffs to improve the detection of invalid responding has not been investigated. The present study was designed to evaluate the classification accuracy of time-to-completion on the WCT compared to the accuracy score and the RMT. Method: Both tests were administered to 202 adults (Mage = 45.3 years, SD = 16.8; 54.5% female) clinically referred for neuropsychological assessment in counterbalanced order as part of a larger battery of cognitive tests. Results: Participants obtained lower and more variable scores on the RMT (M = 44.1, SD = 7.6) than on the WCT (M = 46.9, SD = 5.7). Similarly, they took longer to complete the recognition trial on the RMT (M = 157.2 s,SD = 71.8) than the WCT (M = 137.2 s, SD = 75.7). The optimal cutoff on the RMT (≤43) produced .60 sensitivity at .87 specificity. The optimal cutoff on the WCT (≤47) produced .57 sensitivity at .87 specificity. Time-cutoffs produced comparable classification accuracies for both RMT (≥192 s; .48 sensitivity at .88 specificity) and WCT (≥171 s; .49 sensitivity at .91 specificity). They also identified an additional 6-10% of the invalid profiles missed by accuracy score cutoffs, while maintaining good specificity (.93-.95). Functional equivalence was reached at accuracy scores ≤43 (RMT) and ≤47 (WCT) or time-to-completion ≥192 s (RMT) and ≥171 s (WCT). Conclusions: Time-to-completion cutoffs are valuable additions to both tests. They can function as independent validity indicators or enhance the sensitivity of accuracy scores without requiring additional measures or extending standard administration time.
Full-text available
Objective: The present study was designed to examine the potential of the Boston Naming Test - Short Form (BNT-15) to provide an objective estimate of English proficiency. A secondary goal was to examine the effect of limited English proficiency (LEP) on neuropsychological test performance. Method: A brief battery of neuropsychological tests was administered to 79 bilingual participants (40.5% male, MAge = 26.9, MEducation = 14.2). The majority (n = 56) were English dominant (EN), and the rest were Arabic dominant (AR). The BNT-15 was further reduced to 10 items that best discriminated between EN and AR (BNT-10). Participants were divided into low, intermediate, and high English proficiency subsamples based on BNT-10 scores (≤6, 7-8, and ≥9). Performance across groups was compared on neuropsychological tests with high and low verbal mediation. Results: The BNT-15 and BNT-10 respectively correctly identified 89 and 90% of EN and AR participants. Level of English proficiency had a large effect (partial η(2) = .12-.34; Cohen's d = .67-1.59) on tests with high verbal mediation (animal fluency, sentence comprehension, word reading), but no effect on tests with low verbal mediation (auditory consonant trigrams, clock drawing, digit-symbol substitution). Conclusions: The BNT-15 and BNT-10 can function as indices of English proficiency and predict the deleterious effect of LEP on neuropsychological tests with high verbal mediation. Interpreting low scores on such measures as evidence of impairment in examinees with LEP would likely overestimate deficits.
Full-text available
Objective: This study compared failure rates on performance validity tests (PVTs) across liberal and conservative cutoffs in a sample of undergraduate students participating in academic research. Method: Participants (n = 120) were administered four free-standing PVTs (Test of Memory Malingering, Word Memory Test, Rey 15-Item Test, Hiscock Forced-Choice Procedure) and three embedded PVTs (Digit Span, letter and category fluency). Participants also reported their perceived level of effort during testing. Results: At liberal cutoffs, 36.7% of the sample failed ≥1 PVTs, 6.7% failed ≥2, and .8% failed 3. At conservative cutoffs, 18.3% of the sample failed ≥1 PVTs, 2.5% failed ≥2, and .8% failed 3. Participants were 3 to 5 times more likely to fail embedded (15.8-30.8%) compared to free-standing PVTs (3.3-10.0%). There was no significant difference in failure rates between native and non-native English speaking participants at either liberal or conservative cutoffs. Additionally, there was no relation between self-reported effort and PVT failure rates. Conclusions: Although PVT failure rates varied as a function of PVTs and cutoffs, between a third and a fifth of the sample failed ≥1 PVTs, consistent with high initial estimates of invalid performance in this population. Embedded PVTs had notably higher failure rates than free-standing PVTs. Assuming optimal effort in research using students as participants without a formal assessment of performance validity introduces a potentially significant confound in the study design.
Full-text available
Scores on the Complex Ideational Material (CIM) were examined in reference to various performance validity tests (PVTs) in 106 adults clinically referred for neuropsychological assessment. The main diagnostic categories, reflecting a continuum between neurological and psychiatric disorders, were epilepsy, psychiatric disorders, postconcussive disorder, and psychogenic non-epileptic seizures. Cross-validation analyses suggest that in the absence of bona fide aphasia, a raw score ≤9 or T score ≤29 on the CIM is more likely to reflect non-credible presentation than impaired receptive language skills. However, these cutoffs may be associated with unacceptably high false positive rates in patients with longstanding, documented neurological deficits. Therefore, more conservative cutoffs (≤8/23) are recommended in such populations. Contrary to the widely accepted assumption that psychiatric disorders are unrelated to performance validity, results were consistent with the psychogenic interference hypothesis, suggesting that emotional distress increases the likelihood of PVT failures even in the absence of apparent external incentives to underperform on cognitive testing.
Full-text available
Research suggests that select processing speed measures can also serve as embedded validity indicators (EVIs). The present study examined the diagnostic utility of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) subtests as EVIs in a mixed clinical sample of 205 patients medically referred for neuropsychological assessment (53.3% female, mean age = 45.1). Classification accuracy was calculated against 3 composite measures of performance validity as criterion variables. A PSI ≤79 produced a good combination of sensitivity (.23-.56) and specificity (.92-.98). A Coding scaled score ≤5 resulted in good specificity (.94-1.00), but low and variable sensitivity (.04-.28). A Symbol Search scaled score ≤6 achieved a good balance between sensitivity (.38-.64) and specificity (.88-.93). A Coding-Symbol Search scaled score difference ≥5 produced adequate specificity (.89-.91) but consistently low sensitivity (.08-.12). A 2-tailed cutoff on the Coding/Symbol Search raw score ratio (≤1.41 or ≥3.57) produced acceptable specificity (.87-.93), but low sensitivity (.15-.24). Failing ≥2 of these EVIs produced variable specificity (.81-.93) and sensitivity (.31-.59). Failing ≥3 of these EVIs stabilized specificity (.89-.94) at a small cost to sensitivity (.23-.53). Results suggest that processing speed based EVIs have the potential to provide a cost-effective and expedient method for evaluating the validity of cognitive data. Given their generally low and variable sensitivity, however, they should not be used in isolation to determine the credibility of a given response set. They also produced unacceptably high rates of false positive errors in patients with moderate-to-severe head injury. Combining evidence from multiple EVIs has the potential to improve overall classification accuracy. (PsycINFO Database Record
Full-text available
Complex Ideational Material (CIM) is a sentence comprehension task designed to detect pathognomonic errors in receptive language. Nevertheless, patients with apparently intact language functioning occasionally score in the impaired range. If these instances reflect poor test taking effort, CIM has potential as a performance validity test (PVT). Indeed, in 68 adults medically referred for neuropsychological assessment, CIM was a reliable marker of psychometrically defined invalid responding. A raw score ≤9 or T-score ≤29 achieved acceptable combinations of sensitivity (.34-.40) and specificity (.82-.90) against two reference PVTs, and produced a zero overall false positive rate when scores on all available PVTs were considered. More conservative cutoffs (≤8/ ≤ 23) with higher specificity (.95-1.00) but lower sensitivity (.14-.17) may be warranted in patients with longstanding, documented neurological deficits. Overall, results indicate that in the absence of overt aphasia, poor performance on CIM is more likely to reflect invalid responding than true language impairment. The implications of the clinical interpretation of CIM are discussed.
How neuropsychological assessment findings are deemed valid has been a topic of numerous articles but few have addressed any role that neuroimaging studies could provide. Within military and various clinical samples of individuals undergoing neuropsychological evaluations, high levels of failure on measures of symptom validity testing (SVT) and/or performance validity testing (PVT) have been reported. Where 'failure' is defined as a below cut-score performance on some pre-determined set-point on a SVT/PVT measure, are such failures always indicative of invalid test findings or are there other explanations, especially based on informative neuroimaging findings? This review starts with the premise that even though the SVT/PVT task is designed to be simple and easy to perform, it nonetheless requires intact frontoparietal attention, working memory and task engagement (motivation) networks. If there is damage or pathology within any aspect of these networks as demonstrated by neuroimaging findings, the patient may perform below the cut-point as a result of the underlying damage or pathophysiology. The argument is made that neuroimaging findings should be considered as to where SVT/PVT cut-points are established and there should be much greater flexibility in SVT/PVT measures based on other personal, demographic and neuroimaging information. Several case studies are used to demonstrate these points.