Limited English Proficiency Increases Failure Rates
on Performance Validity Tests with High Verbal Mediation
Laszlo A. Erdodi
&Christopher A. Abeare
Eben S. Schwartz
Received: 11 October 2016 /Accepted: 18 January 2017
#Springer Science+Business Media New York 2017
Abstract This study was designed to examine the effect of
language proficiency and level of verbal mediation on failure
rates on performance validity tests (PVTs). PVTs with high
and low verbal mediation were administered to 80 healthy
community-dwelling English-Arabic bilinguals. Digit Span
and Animal Fluency were administered in both English and
Arabic, in counterbalanced order, as part of a brief battery of
neuropsychological tests. Participants with Arabic as their
dominant language were 2 to 16 times more likely to fail
PVTs with high verbal mediation compared to native speakers
of English. When Digit Span and Animal Fluency were ad-
ministered in the nondominant language, participants were 2
to 18 times more likely to fail the validity cutoffs. Language
dominance explained between 6 and 31% of variance in di-
chotomized outcomes (pass/fail) on PVTs with high verbal
mediation. None of the participants failed any PVTs with
low verbal mediation. Limited language proficiency may re-
sult in dramatic increases in false positive rates on PVTs with
high verbal mediation. Failure on a PVT administered in
English to an examinee with a different linguistic background
should be interpreted with great caution.
Keywords Cross-cultural neuropsychology .Performance
validity testing .Bilingualism .Word C h oice Te s t .Complex
Consistently demonstrating one’s highest, or at least typi-
cal, level of functioning is a basic assumption underlying
neuropsychological assessment. Bigler (2015) called this
assumption the BAchilles heel^of cognitive testing, which
is a fitting metaphor that acknowledges a significant vul-
nerability in an otherwise strong design. This potential
threat to the clinical utility of test data has become a source
of ongoing controversy that has polarized the profession.
On the one hand, there is a growing consensus that the
credibility of test scores cannot be assumed but must be
evaluated using objective, empirical measures (Bush et al.
2014; Heilbronner et al. 2009). On the other hand, con-
cerns about the clinical and forensic interpretation of per-
formance validity tests (PVTs) have been accumulating.
These concerns include the high cost of false positive
errors, lack of a gold standard measure, unclear clinical
interpretation of scores in the failing range (i.e., below-
cutoff performance may indicate malingering, low moti-
vation or the expression of a disease process), poorly un-
derstood relationship between PVTs and neuroanatomy/
neurophysiology, and genuine, severe neurocognitive im-
pairment as a confound (Bigler 2012,2015; Boone 2013).
Demographic variables like age (Lichtenstein et al. 2017;
Lichtenstein et al. 2014) and education (Pearson 2009)
have also been reported to influence base rates of failure
on PVTs. In addition, Leighton et al. (2014) pointed out
that the effect of the variability in testing paradigm, sen-
sory modality, or other stimulus properties of PVTs on the
probability of failure has not been studied systematically.
Native level English proficiency is another, less commonly
examined assumption in neuropsychology. Most tests were de-
veloped for and normed on native speakers of English (NSE).
The extent to which these tests provide a valid measure of
*Laszlo A. Erdodi
Department of Psychology, University of Windsor, 168 Chrysler Hall
South, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada
Waukesha Memorial Hospital, Waukesha, WI, USA
Psychol. Inj. and Law
cognitive ability in individuals with limited English proficiency
(LEP) was largely unknown until recent investigations. On
purely rational grounds, LEP is expected to deflate perfor-
mance primarily on tests with high verbal mediation (i.e., tasks
for which being NSE is a fundamental requirement for exam-
inees to demonstrate their true ability level on cognitive tests).
Surprisingly, some early studies were inconsistent with
this hypothesis. Coffey et al. (2005) found that level of
acculturation was significantly related to scores on the
Wisconsin Card Sorting Test—an instrument that, at face
value, appears to have low verbal mediation. Although
acculturation is not synonymous with level of English pro-
ficiency, the two constructs are highly related. Their com-
munity sample of Mexican Americans performed poorly
on all nine key variables examined (medium to very large
effects) compared to the English norms. At the same time,
no differences were found compared to the Spanish norms.
Therefore, the authors concluded that the Wisconsin Card
Sorting Test was not a culture-free measure.
Subsequent studies, however, found that the deleterious
effect of LEP was limited to tests with high verbal media-
tion. Razani et al. (2007) reported a significant difference
with large effect between their NSE and LEP samples on
verbal IQ, but no difference on performance IQ or even full
scale IQ. Likewise, the results of the study by Boone et al.
(2007) were broadly consistent with the hypothesis that the
level of verbal mediation drives the performance differ-
ences between NSE and LEP. Most of the significant con-
trasts within a battery of neuropsychological tests were ob-
served on measures with high verbal mediation: Digit Span,
Letter Fluency, and picture naming (medium to large ef-
fects). Surprisingly, on a measure of visual-constructional
ability (the copy trial of the Rey Complex Figure Test), the
LEP group outperformed NSE (medium effect), suggesting
that the lower scores on the other tests are not due to inher-
ent differences in global cognitive functioning.
Findings by Razani et al. (2007) provide further evi-
dence that the level of verbal mediation accounts for a sig-
nificant portion of variability in the neuropsychological
profile of individuals with LEP. A large effect was observed
on Digit Span between NSE and LEP, but no difference on
Digit-Symbol Coding. In addition, between-group differ-
ences were more likely to emerge on the more difficult trials
of the Trail Making Test, Stroop, and Auditory Consonant
Trigrams. However, more recent investigations failed to
replicate this with the Auditory Consonant Trigrams
(Erdodi et al. 2016), raising the possibility that the effect
of English proficiency on test performance varies not only
across instruments, but also across samples. Overall, the
evidence suggests that while LEP does not affect perfor-
mance on nonverbal processing speed tasks, its deleterious
effects are more likely to become apparent as the task de-
mands or the level of verbal mediation increases.
The issue of performance validity and English proficiency
is compounded in the interpretation of PVTs developed in
North America and administered to individuals with LEP.
Salazar et al. (2007) were among the first to examine the
confluence of these two factors. They found that NSE
outperformed the LEP group on the Reliable Digit Span, while
the opposite was the case for the Rey Complex Figure Test
effort equation (Lu et al. 2003). There was no difference be-
tween the groups on the Dot Counting Test (DCT; Boone et al.
free-standing PVTs specifically designed to evaluate the cred-
ibility of a response set.
Burton et al. (2012) compared the performance of Spanish-
speaking examinees across settings (clinical, criminal, and
civil forensic) and instruments. The Test of Memory
Malingering and the Rey-15 at standard cutoffs were effective
at differentiating the groups, while the DCT was not. These
results further emphasize the complicated interaction between
language proficiency, referral source, level of verbal media-
tion, and idiosyncratic stimulus properties of individual PVTs,
consistent with earlier investigations that concluded that PVTs
with low verbal mediation administered in Spanish are capa-
ble of distinguishing between credible and noncredible indi-
viduals (Vilar-Lopez et al. 2008).
The literature reviewed above converges in a number of
tentative conclusions. First, LEP reliably deflates performance
on tests with high verbal mediation. Second, the deleterious
effects of LEP extend to some tests with low verbal mediation
and tend to become more pronounced as task complexity in-
creases. Third, the persistent negative findings on certain low
verbal mediation tests when comparing NSE and LEP, in com-
bination with the occasional superiority of the LEP groups,
suggest that the observed differences are unlikely to be simply
driven by lower language proficiency or overall cognitive
functioning of individuals with LEP. Instead, they may reflect
cultural differences in cognitive processing or approaches to
testing. Finally, this pattern of findings results in a predictable
increase in the base rates of failure, but this effect is limited to
PVTs with high verbal mediation. Given the potentially high
stakes of performance validity assessment (i.e., determining
the credibility of an individual’soverall presentation, and re-
sultant assessment interpretation and clinical recommenda-
tions), further investigation of the topic seems warranted.
One of the notable limitations of existing research is the
lack of direct comparison between performance in English
and the participants’other (dominant) language. The pres-
ent study was designed to address that. In addition to rep-
licating elements of earlier investigations (a strategic mix-
ture of PVTs with high and low verbal mediation) in a
different, non-Spanish speaking bilingual sample, two of
the tests were administered in both English and Arabic.
Therefore, the concepts of Bdominant language^and NSE
could be conceptually separated and studied
Psychol. Inj. and Law
independently. Based on the research literature reviewed
above, we hypothesized that limited proficiency in the lan-
guage of test administration will be associated with higher
failure rate on PVTs with high verbal mediation. No dif-
ference was predicted on PVTs with low verbal mediation.
Eighty healthy English-Arabic bilinguals were recruited for an
academic research project through a research participant pool
of a mid-sized Canadian university and the surrounding com-
munity. The study was approved by the institutional review
board. Relevant ethical guidelines regulating research with
human participants were followed throughout the project.
Mean age of the sample was 26.8 years (SD =16.0).Mean
level of education was 14.2 years (SD = 1.7). The majority of
the participants were female (60%) and English-dominant
(71.2%). Language dominance was determined based on a
combination of self-reported relative language proficiency,
language use pattern, and immigration history. Participants
were asked to rate their proficiency in English and Arabic
relative to each other, in percentage, so that the two ratings
add up to 100, following the methodology described by
Erdodi and Lajiness-O’Neill (2012).
For example, stable bilinguals would rate themselves as 50/
50, indicating that they are equally proficient in both lan-
guages. Participants who immigrated to Canada as adults
would rate themselves 40/60 or 30/70, indicating that they
speak Arabic better than English. These individuals were clas-
sified as Arabic-dominant, and thus, as having LEP.
Conversely, participants who were born in Canada, grew up
as NSE, and had limited proficiency in Arabic would rate
themselves as 60/40 or 70/30, and were classified as
Two PVTs with high verbal mediation (the Word Choice Test
and Complex Ideational Material) and two with low verbal
mediation (Rey-15 and Digit-Symbol Coding) were adminis-
tered in English only. The Word Choice Test (Pearson 2009)is
a free-standing PVT based on the forced choice recognition
paradigm. The examinee is presented with 50 words, one at a
time, at the rate of 3 s per word. The words are printed on a
card and simultaneously read aloud by the examiner. After the
learning trial, the examinee is presented with a card containing
50 word pairs (a target and a foil) and asked to identify the
word that was part of the original list.
Given the high imageability and concreteness in combina-
tion with low word frequency (Davis 2014), discriminating
between targets and foils during the recognition trial of the
WCT is very easy. Even in clinical settings, credible patients
tend to perform near the ceiling, with means ranging from
49.1 (Davis 2014) to 49.4 (Erdodi et al. 2014), which is com-
parable to the performance of university students in research
settings (49.4; Barhon et al. 2015).
Complex Ideational Material is part of the Boston
Diagnostic Aphasia Battery (Goodglass et al. 2001). It is a
sentence comprehension task originally designed to aid in
diagnosing and subtyping aphasia. The examiner asks a series
of yes/no questions of increasing difficulty to evaluate the
examinee’s receptive language skills. Raw scores range from
0 to 12. The average performance in the normative sample
was close to the ceiling (M= 11.2, SD = 1.1; Borod et al.
1980). Recent investigations revealed that in individuals with-
out bona fide aphasia, a low score on Complex Ideational
Material is a reliable indicator of invalid performance
(Erdodi and Roth 2016; Erdodi et al. 2016).
The Rey-15 is one of the oldest stand-alone PVTs (Rey
1964). The examinee is presented with a card with five rows,
each having three sequentially organized symbols for 10 s.
The task is to reproduce as many of the original items as
possible. Given the simplicity of the task, healthy controls
produce near-perfect scores. Although performance is not im-
mune to genuine neurological impairment, the Rey-15 is gen-
erally robust to the deleterious effects of brain injury (Lezak
et al. 2012), making it suitable as a PVT (Boone 2013;Morse
et al. 2013;O’Bryant et al. 2003). However, the low sensitiv-
ity of the Rey-15 has been repeatedly identified as a liability
(Reznek 2005; Rüsseler et al. 2008).
The Digit-Symbol Coding subtest of the Wechsler Adult
Intelligence Scales is a timed symbol substitution task mea-
suring attention, visual scanning, and psychomotor processing
speed. Although sensitive to the effect of diffuse neuropsychi-
atric deficits (Lezak et al. 2012), below certain cutoffs perfor-
mance on Coding is confounded by invalid responding.
Therefore, this test can function as an effective embedded
validity indicator (Erdodi et al. 2016; Trueblood 1994).
All tests were administered according to the standard proce-
dures outlined in the technical manual by a trained research
assistant who was fluent in both English and Arabic.
Participants were instructed to perform to the best of their
ability. However, they were not warned about the presence
of PVTs, following recommendations based on previous em-
pirical research on the negative effects of sensitizing exam-
inees to the issue of performance validity (Boone 2007;
Youn g j oh n et a l . 1999).
Digit Span and Animal Fluency were administered in both
languages, in counterbalanced order, once at the beginning
and once at the end ofthe test battery. In addition to measuring
Psychol. Inj. and Law
auditory attention, working memory, language skills, and pro-
cessing speed, both tasks are well-established embedded va-
lidity indicators (Boone 2013; Sugarman and Axelrod 2015).
The main descriptive statistics were failure rate (percent
of the sample that scored below the validity cutoffs) and
relative risk. Statistical significance was determined using
ttests or χ
. Effect size estimates were expressed in Φ
Relative risk ratios were computed to provide a single-
number comparison of failure rates between the English-
and Arabic-dominant groups.
As a group, English-dominant participants were significantly
younger (M= 20.2 years, SD = 2.5) than Arabic-dominant par-
ticipants (M=42.9 years, SD =22.9): t(78) = 7.44, p<.001.
The Arabic-dominant sample had significantly higher levels
of education (M= 15.0 years, SD = 1.5) compared to their
English-dominant counterparts (M= 13.8 years, SD = 1.7):
t(78) = 3.00, p< .01. There was no significant difference in
the gender ratio between the two groups (36.8 vs. 47.8%
(1) = 0.82, p= .36.
The English/Arabic Animal Fluency raw score ratio was
significantly higher and more variable (M=2.62, SD =1.55)
in the English-dominant sample as compared to the Arabic-
dominant sample (M=0.90,SD = 0.29): t(78) = 5.25, p<.001,
d= 1.54 (very large effect). In other words, English-dominant
participants generated, on average, 2.6 times more animal
names in English than Arabic. Conversely, the output of
Arabic-dominant participants on the English version was
around 90% of their performance in Arabic. Likewise, there
was a pronounced difference on the Boston Naming Test –
Short Form between the English-dominant (M=11.8,
SD = 2.2) and Arabic-dominant (M= 7.1, SD =3.1) subsam-
ples: t(78) = 7,69, p< .001, d= 1.75 (very large effect).
Performance on this test has been identified as a reliable indi-
cator of language proficiency (Erdodi et al. 2016;Morenoand
Kutas 2005). These findings provide empirical support for
classifying participants into the two groups based on language
dominance (i.e., English vs. Arabic) in addition to the self-
rated language proficiency.
The Arabic-dominant sample was 16 times more likely to
fail the Word Choice Test accuracy scores and two to three
times more likely to fail Complex Ideational Material com-
pared to the English-dominant sample. Since only Arabic-
dominant participants failed the Word Choice Test time cutoff,
a risk ratio could not be computed. All contrasts comparing
the PVT failure rates as a function of language dominance
(defined as English or Arabic) were statistically significant.
No participant failed the Rey-15 or Digit-Symbol Coding in
the entire sample (Table 1).
Participants were almost five times more likely to fail the
age-corrected scaled score cutoff when the Digit Span task
was administered in their nondominant language. This con-
trast was associated with a large effect size (Φ
Compared to the dominant language, risk of failing Reliable
Digit Span doubled during the nondominant language admin-
istration (medium effect size: Φ
the highest on longest digits forward: nondominant language
administration carried an almost 18-fold risk of failure (medi-
um-large effect size: Φ
On Animal Fluency, when the task was administered in
the nondominant language, participants were three to four
times more likely to score in the failing range. All con-
trasts were statistically significant (Table 2). The effect of
language dominance was more pronounced on demo-
graphically adjusted T-scores (Φ
= .31) as compared to
raw scores (Φ
= .22). However, both of these effect size
estimates fall in the very large range.
The present study was designed to examine failure rate on
PVTs with high and low verbal mediation in an English-
Arabic bilingual sample. Consistent with previous reports
(Boone et al. 2007; Razani et al. 2007) and our initial hypoth-
esis, when PVTs with high verbal mediation were adminis-
tered to participants with LEP, failure rates were two to 16
times higher as compared to NSE. As in earlier studies, no
difference was observed on PVTs with low verbal mediation
(Salazar et al. 2007), suggesting that the difference in relative
risk for PVT failure between the LEP and NSE samples rep-
resents false positive errors.
When Digit Span and Animal Fluency were administered
in the participant’s nondominant language, they were two to
18 times more likely to fail validity cutoffs as compared to
when these tests were administered in their dominant lan-
guage. These within-individual contrasts provide a conceptual
control condition that enhances the interpretation of the data,
by redefining the independent variable from BEnglish vs.
Arabic^to Bdominant vs. nondominant language.^Hence,
they examine the effect of language proficiency as presented
with a specific cognitive task in a given language, regardless
of the individual’s language dominance. As such, English-
dominant participants presented with a task in English are
grouped together with Arabic-dominant participants presented
with a task in Arabic and compared to English-dominant par-
ticipants presented with a task in Arabic grouped togetherwith
Arabic-dominant participants presented with a task in English.
These comparisons model the effect of the cognitive vulnera-
bility stemming from limited language proficiency that
Psychol. Inj. and Law
manifests as poor performance during neuropsychological as-
sessment. If the test in question is a PVT, the end result is a
substantially higher risk of failing the cutoff and, therefore,
being erroneously labeled as Binvalid^or Bnoncredible.^
Our results suggest that the inner logic of performance va-
lidity assessment (i.e., the taskissoeasythatbelow-threshold
performance can be considered evidence of noncredible
responding [Boone 2013;Larrabee2012]) may not apply to
PVTs with high verbal mediation administered to examinees
with LEP, as predicted by Bigler (2015). In such cases, scores
in the failing range are more likely to reflect limited proficiency
in the language of test administration rather than invalid per-
formance. Based on the existing evidence, PVTs with low ver-
bal mediation appear to be appropriate to use in LEP popula-
tions(Razanietal.2007; Salazar et al. 2007), although more
research is clearly needed to better understand the interactions
between LEP, performance validity, level of education and ac-
culturation, task complexity, and level of verbal mediation
(Boone et al. 2007;Coffeyetal.2005; Razani et al. 2007).
A potential weakness of the present design is the inherent
differences in the signal detection profiles of the PVTs used.
One could argue that the negative results on PVTs with low
verbal mediation reflect the inability of these instruments to
detect invalid response sets rather than convincing evidence of
credible performance. Indeed, both the Rey-15 (Reznek 2005;
Rüsseler et al. 2008) and the Digit-Symbol Coding (Erdodi
et al. 2016) have been criticized for low sensitivity. A careful
comparison offailure rate on these two PVTs relative to others
partially substantiates these concerns.
In the Spanish speaking forensic sample of Burton and
colleagues (2012), the base rate of failure on the Rey-15 was
47% (vs. 33% onthe Test of Memory Malingering), indicating
that the instrument is capable of detecting invalid responding.
Therefore, in contrast with these studies, the 0% failure rate
observed in the present sample may in fact reflect true nega-
tives. In the study by Erdodi and Roth (2016), the failure rate
on Digit-Symbol Coding was lower (21.2%) compared to
some of the Digit Span-based PVTs (38.7–52.2%), but com-
parable to validity indicators embedded in Animal Fluency
(18.5–32.8%). Likewise, the failure rate on Rey-15 (33.3%)
was similar to that on Complex Ideational Material (26.5%).
Tabl e 1 Failure rates on
performance validity tests
administered in English only as a
function of language dominance
Dominant language WCT CIM Rey-15 CD
WAIS -I II
Accuracy Time Raw score T-score FR ACSS
English n109 1300
n= 57 % 1.8 0.0 15.8 22.8 0.0 0.0
n= 23 % 26.1 8.7 47.8 52.2 0.0 0.0
12.2 5.08 8.97 6.58 ––
p<.01 <.05 <.01 <.01 ––
RR 16.3 –3.03 2.29 ––
Raw score cutoffs were not listed in order to protect test security
WCT Word Choice Test (Pearson 2009), Accuracy number of words correctly recognized out of 50 –cutoff
associated with a base rate of ≤25% in the overall clinical sample in the normative data (Erdodi et al. 2016;
Pearson 2009), Time completion time for the recognition trial in seconds (cutoff suggested by Erdodi et al. 2016),
CIM Complex Ideational Material subtest of the Boston Diagnostic Aphasia Examination, Raw score number of
correct responses out of 12 (the liberal cutoff suggested by Erdodi et al. (2016) and Erdodi and Roth (2016)was
used), T-score demographically adjusted score based on the norms by Heaton et al. (2004)–cutoff ≤29 (Erdodi
et al. 2016; Erdodi and Roth 2016), Rey-15 FR Rey Fifteen-Item Test free recall trial (traditional cutoff suggested
by Lezak (1995), CD
ACSS age-corrected scaled scores on the Digit-Symbol Coding subtest of the
Wechsler Adult Intelligence Scale –Third Edition (<6; Erdodi et al. 2016; Trueblood 1994), RR relative risk
Table 2 Failure rates on performance validity tests administered in
both English and Arabic as a function of language dominance
WAI S-I II
Language ACSS RDS LDF Raw score T-score
Dominant 8.7% 18.7% 1.2% 16.3% 23.8%
Nondominant 41.3% 39.9% 21.2% 61.2% 78.9%
11.0 4.48 8.12 17.2 24.4
p<.01 <.05 <.05 <.05 <.01
.14 .06 .10 .22 .31
RR 4.75 2.13 17.7 3.76 3.32
Raw score cutoffs were not listed in order to protect test security
WAI S-I II Wechsler Adult Intelligence Scale –Third Edition, ACSS age-
corrected scaled score (cutoff ≤6; Babikian et al. 2006; Hayward et al.
1987; Heinly et al. 2005; Trueblood 1994), RDS Reliable Digit Span
(cutoff associated with a base rate of ≤25% in the overall clinical sample
in the normative data; Babikian et al. 2006;Heinlyetal.2005;Pearson
2009), LDF longest digit span forward (cutoff suggested by Lezak et al.
2012), Raw score number of animals generated in 1 min (liberal cutoff
suggested by Sugarman and Axelrod 2015; Hayward et al. 1987), T-score
demographically adjusted score based on the norms by Heaton et al.
(2004)–cutoff ≤33 (Sugarman and Axelrod 2015), RR relative risk
Psychol. Inj. and Law
Although the failure rate on Rey-15 (12.4%) and Coding
(14.2%) were also some of the lowest in the larger scale study
by Erdodi et al. (2016), they were broadly consistent with
values observed on other, more robust PVTs such as Word
Choice Test (27.0%), Digit Span (7.1–24.6%), Complex
Ideational Material (13.1–19.4%), and Animal Fluency (11.7–
21.6%). Furthermore, other empirical investigations found the
Rey-15 useful at differentiating valid from invalid response sets
in NSE (Morse et al. 2013;O’Bryant et al. 2003).
In addition, a more recent study by An et al. (2016)
fluency tasks had a consistently higher failure rate than
well-established, robust stand-alone PVTs in a sample of
Canadian university students who volunteered to partici-
pate in academic research projects. Therefore, while Rey-
15 and Digit-Symbol Coding might still underestimate the
true rate of noncredible responding in the current sample,
the markedly different failure rates between PVTs with
high and low verbal mediation cannot be solely attributed
to instrumentation artifacts.
An additional possible confound in the present study
design is the difference in age and education between the
two criterion groups: the Arabic-dominant sample was
older and better educated than the English-dominant sam-
ple. While this likely reflects true population level differ-
ences (i.e., those with LEP likely immigrated to Canada
as adults, and therefore, they tended to be older; in turn,
older individuals had more time to advance their educa-
tion), both of these demographic variables have been
shown to influence performance on cognitive tests
(Heaton et al. 2004; Mitrushina et al. 1999). However,
our findings indicate that there was no systematic, clini-
cally relevant difference in failure rates as a function of
demographically corrected (Digit Span age-corrected
scaled score, Animal Fluency T-score) vs. raw score
(Reliable Digit Span, Longest Digits Forward, number
of animal names generated during Animal Fluency) based
cutoffs (Table 2).
There was considerable fluctuation in effect sizes associat-
ed with language dominance across instruments and cutoffs.
Within Digit Span, language proficiency had the strongest
relationship with the age-corrected scaled score (large effect).
Conversely, the Reliable Digit Span was the least affected by
language dominance (medium effect). Even though partici-
pants who were administered the test in their nondominant
language were twice as likely to fail this PVT compared to
when it was administered in their dominant language, the
Reliable Digit Span was nevertheless the most robust validity
indicator with high verbal mediation of the ones examined in
the present study. However, the effect of language dominance
on the likelihood of failing the Animal Fluency cutoffs was
very large. As such, the use of these cutoffs in examinees with
LEP is difficult to justify.
In the context of the landmark paper by Green et al.
(2001) demonstrating that noncredible responding ex-
plained between 49 and 54% of variance in neuropsycho-
logical test scores, our data indicate that language profi-
ciency accounts for 6–31% in failure rate on PVTs with
high verbal mediation. Given that the base rate of failure
on PVTs with low verbal mediation was zero regardless of
language proficiency, most of the failures on PVTs with
high verbal mediation are likely false positive errors. The
implication of this finding to clinical practice is that PVTs
with high verbal mediation are unreliable indicators of
noncredible performance in examinees with LEP. At the
same time, our data support the continued use of PVTs
with low verbal mediation, provided that the examinee
was able to comprehend the test instructions.
Results should be interpreted in the context of the
study’s limitations. As discussed above, future studies
would benefit from using multiple, more sensitive PVTs,
especially in the low verbal mediation category. In addi-
tion, the sample is restricted to a single geographic area
and English-Arabic bilinguals. Replications based on par-
ticipants with diverse ethnic and linguistic backgrounds,
and using different instruments are crucial to establish the
generalizability of our findings.
Despite the common variability in results across stud-
ies (Leighton et al. 2014), the cumulative evidence con-
verges in one main conclusion: Cultural influences on
neuropsychological testing are significant, vary across
measures, and can significantly alter the clinical interpre-
tation of the data. Depending on the context, the lan-
guage in which the material is presented can have subtle
(Erdodi and Lajiness-O’Neill 2012)or unexpectedly
strong (Coffey et al. 2005) effects. Our finding that
LEP can dramatically increase the failure rate on PVTs
with high verbal mediation has far-reaching clinical and
forensic implications, substantiates Bigler’s(2012,2015)
concerns about inflated false positive rates in vulnerable
populations, and warrants further investigation. In the
meantime, given the high cost of misclassifying an indi-
vidual as noncredible in both clinical and forensic assess-
ments, the use of PVTs with high verbal mediation in
individuals with LEP should either be avoided altogether,
or interpreted with caution.
Compliance with Ethical Standards
Conflict of Interest This project received no financial support from
outside funding agencies. The authors have no disclosures to make that
could be interpreted as conflict of interests.
Human and Animal Rights and Informed Consent Relevant ethical
guidelines regulating research involving human participants were follow-
ed throughout the project. All data collection, storage, and processing was
done in compliance with the Helsinki Declaration.
Psychol. Inj. and Law
An, K. Y., Kaploun, K., Erdodi, L. A., & Abeare, C. A. (2016).
Performance validity in undergraduate research participants: a com-
parison of failure rates across tests and cutoffs. The Clinical
Advance online publication.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and
specificity of various Digit Span scores in the detection of suspect
effort. The Clinical Neuropsychologist, 20,145–159.
Barhon, L. I., Batchelor, J., Meares, S., Chekaluk, E., & Shores, E. A.
(2015). A comparison of the degree of effort involved in the TOMM
and the ACS Word Choice Test using a dual-task paradigm. Applied
Neuropsychology: Adult, 22,114–123.
Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity
and performance validity testing. Brain Imaging and Behavior, 9,
Bigler, E. D. (2012). Symptom validity testing, effort and neuropsycho-
logical assessment. Journal of the International Neuropsychological
Boone, K. B. (2007). Assessment of feigned cognitive impairment. A
neuropsychological perspective. New York, NY: Guilford.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology.New
York, NY: Guilford.
Boone, K., Lu, P., & Herzberg, D. (2002). The dot counting test.Los
Angeles: Western Psychological Services.
Boone, K. B., Victor, T. L., Wen, J., Razani, J., & Ponton, P. (2007). The
association between neuropsychological scores and ethnicity, lan-
guage, and acculturation variables in a large patient population.
Archives of Clinical Neuropsychology, 22,355–365.
Borod, J. C., Goodglass, H., & Kaplan, E. (1980). Normative data on the
Boston diagnostic aphasia examination, parietal lobe battery, and the
Boston naming test. Journal of Clinical Neuropsychology, 2, 209–215.
Burton, V., Vilar-Lopez, R., & Puente, A. E. (2012). Measuring effort in
neuropsychological evaluations of forensic cases of Spanish
speakers. Archives of Clinical Neuropsychology, 27(3), 262–267.
Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psychological
assessment of symptom and performance validity, response bias,
and malingering: official position of the association of psychological
advancement in psychological injury and law. Psychological Injury
and Law, 7,197–205.
Coffey, D., Marmol, L., Schock, L., & Adams, W. (2005). The influence
of acculturation on the Wisconsin card sorting test by Mexican
Americans. Archives of Clinical Neuropsychology, 20,795–803.
Davis, J. J. (2014). Further consideration of advanced clinical solutions
word choice: comparison to the recognition memory test—words
and classification accuracy on a clinical sample. The Clinical
Neuropsychologist, 28(8), 1278–1294.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski,
B., Zuccato, B. G., & Roth, R. M. (2016). Wechsler Adult
Intelligence Scale –Fourth Edition (WAIS-IV) processing speed
scores as measures of non-credible responding: The third generation
of embedded performance validity indicators. Advance online pub-
lication. Psychological Assessment.doi:10.1037/pas0000319
Erdodi, L. A., Jongsma, K. A., & Issa, M. (2016). The 15-item version of
the Boston Naming test as an index of English proficiency. The
Advance online publication.
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff,
B. (2014). Comparing the recognition memory test and the word
choice test in a mixed clinical sample: Are they equivalent?
Psychological Injury and Law, 7(3), 255–263.
Erdodi, L., & Lajiness-O’Neill, R. (2012). Humor perception in bilin-
guals: Is language more than a code? International Journal of
Humor Research, 25(4), 459–468. doi:10.1515/humor-2012-0024
Erdodi, L. A.,Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier, C.
L., Rai, J. K., & Roth, R. M. (2016). The BDAE ComplexIdeational
Material –A measure of receptive language or performance validi-
ty? Psychological Injury and Law, 9,112–120.
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D., Abeare, C.
A., Pelletiere, C. L., …Roth, R. M. (2016). The power of timing:
Adding a time-to-completion cutoff to the Word Choice Test and
Recognition Memory Test improves classification accuracy. Journal
of Clinical and Experimental Neuropsychology. Advance online
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE Complex
Ideational Material are associated with invalid performance in adults
without aphasia. Advance online publication. Applied
Green, P., Rohling, M. L.,Lees-Haley, P. R., & Allen, L. M. (2001). Effort
has a greater effect on test scores than severe brain injury in com-
pensation claimants. Brain Injury, 15(12), 1045–1060.
Goodglass, H., Kaplan, E., & Barresi, B. (2001). Boston Diagnostic
Aphasia Examination (3rd ed.). Philadelphia: Lippincott Williams
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can localized
brain impairment be simulated on neuropsychological test profiles?
Australian and New Zealand Journal of Psychiatry, 21,87–93.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised
comprehensive norms for an expanded Halstead-Reitan battery:
Demographically adjusted neuropsychological norms for African
American and Caucasian adults. Lutz, FL: Psychological
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S.
R. (2009). American academy of neuropsychology consensus con-
ference statement on the neuropsychological assessment of effort,
response bias, and malingering. The Clinical Neuropsychologist,
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A.
(2005). WAIS Digit-Span-based indicators of malingered
neurocognitive dysfunction: Classification accuracy in traumatic
brain injury. Assessment, 12(4), 429–444.
Larrabee, G. J. (2012). Forensic neuropsychology: A scientific approach.
New York: Oxford University Press.
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap
between neurocognitive processing theory and performance validity
assessment among the cognitively impaired: A review and method-
ological approach. Journal of the International Neuropsychological
Lezak, M. D. (1995). Neuropsychological assessment. New York: Oxford
Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012).
Neuropsychological assessment. New York: Oxford University
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz, A., &
Flaro, L. (2017). Wisconsin Card Sorting Test embedded validity
indicators developed for adults can be extended to children. Child
Lichtenstein, J. D., Moser, R. S., & Schatz, P. (2014). Age and test setting
affect the prevalence of invalid baseline scores on neurocognitive
tests. American Journal of Sports Medicine, 42(2), 479–484.
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003).
Effectiveness of the Rey-Osterrieth complex figure test and the
Meyers and Meyers recognition trial in the detection of suspect
effort. The Clinical Neuropsychologist, 17(3), 426–440.
Mitrushina, M. N., Boone, K. B., & D’Elia, L. F. (1999). Handbook of
normative data for neuropsychological assessment.NewYork,NY:
Oxford University Press.
Moreno, E. M., & Kutas, M. (2005). Processing semantic anomalies in
two languages: An electrophysiological exploration in both
Psychol. Inj. and Law
languages of Spanish-English bilinguals. Cognitive Brain Research,
Morse, C. L.,Douglas-Newman, K., Mandel, S., & Swirsky-Sacchetti, T.
(2013). Utility of the Rey-15 recognition trial to detect invalid per-
formance in a forensic neuropsychological sample. The Clinical
Neuropsychologist, 27(8), 1395–1407.
O’Bryant, S. E., Hilsabeck, R. C., Fisher, J. M., & McCaffrey, R. J.
(2003). Utility of the Trail Making Test in the assessment of malin-
gering in a sample of mild traumatic brain injury litigants. The
Clinical Neuropsychologist, 17(1), 69–74.
Pearson (2009). Advanced Clinical Solutions for the WAIS-IVand WMS-
IV –Technical Manual. San Antonio, TX: Author
Razani, J., Murcia, G., Tabares, J., & Wong, J. (2007). The effects of
culture on the WASI test performance in ethnically diverse individ-
uals. The Clinical Neuropsychologist, 21,776–788.
Razani, J., Burciaga, J., Madore, M., & Wong, J. (2007). Effects of ac-
culturation on tests of attention and information processing in an
ethnically diverse group. Archives of Clinical Neuropsychology,
Rey, A. (1964). L’examen clinique en psychologie. Paris: Presses
Universitaires de France.
Reznek, L. (2005). The Rey 15-item memory test for malingering: A
meta-analysis. Brain Injury, 19(7), 539–543.
Rüsseler, J., Brett, A., Klaue, U., Sailer, M., & Münte, T. F. (2008). The
effect of coaching on the simulated malingering of memory impair-
ment. BMC Neurology, 8(37), 1–14.
effort tests in ethnic minorities and in non-English speaking
and English as a second language populations. In K. B.
Boone (Ed.), Assessment of feigned cognitive impairment: A
neuropsychological perspective (pp. 405–427). New York:
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of
performance validity using verbal fluency tests in a clinical sample.
Applied Neuropsychology: Adult, 22(2), 141–146.
Trueblood, W. (1994). Qualitative and quantitative characteristics of
malingered and other invalid WAIS-R and clinical memory
data. Journal of Clinical and Experimental Neuropsychology,
Vilar-Lopez, R., Gomez-Rio, M., Caracuel-Romero, A., Llamas-Elvira,
J., & Perez-Garcia, M. (2008). Use of specific malingering measures
in a Spanish sample. Journal of Clinical and Experimental
Neuropsychology, 30(6), 710–722.
Youngjohn, J. R., Lees-Haley, P. R., & Binder, L. M. (1999). Comment:
Warning malingerers produces more sophisticated malingering.
Archives of Clinical Neuropsychology, 14,511–515.
Psychol. Inj. and Law