The effects of coaching on the verbal and nonverbal medical symptom validity tests.
ABSTRACT Evaluation of resistance to coaching is an important step in the validation of symptom validity tests (SVTs) for clinical use in neuropsychological evaluations. In the present study coaching effects were evaluated for two recently developed SVTs, the Medical Symptom Validity Test (MSVT) and Nonverbal Medical Symptom Validity Test (NVMSVT) as compared with a well-validated existing SVT, the Test of Memory Malingering (TOMM). This study used a simulation design that included 103 healthy younger study volunteers who were randomly assigned into one of four conditions: Symptom Coaching, Test Coaching, Combined Coaching, or Best Effort Control. Specificity for all SVTs was excellent (96-100%). Test Coaching, either alone or combined with Symptom Coaching, was more effective than Symptom Coaching alone in producing raw scores suggestive of "better" effort for all SVTs. However, there were only modest declines in the obtained sensitivity, which remained above 80% for all SVTs. These results provide empirical support for the classification accuracy of the MSVT and NVMSVT, even when challenged with combined coaching interventions. However, further validation using known-groups designs and clinical samples is needed.
-
Citations (0)
-
Cited In (0)
Page 1
Thi s art i cl e w as dow nl oaded by: [Edi t h Cow an Uni versi t y]
On: 06 M ay 2013, At : 02:07
Publ i sher: Rout l edge
Inf orm a Lt d Regi st ered i n Engl and and W al es Regi st ered Num ber: 1072954 Regi st ered
of f i ce: M ort i m er House, 37-41 M ort i m er St reet , London W 1T 3JH, UK
The Cl i ni cal Neuropsychol ogi st
Publ i cat i on det ai l s, i ncl udi ng i nst r uct i ons f or aut hor s and
subscr i pt i on i nf or m at i on:
ht t p: //www. t andf onl i ne. com /l oi /nt cn20
The Ef f ect s of Coachi ng on t he Verbal
and Nonverbal Medi cal Sym pt om
Val i di t y Test s
Mi chael W ei nbor n a , St even Paul W oods a b , Cl ai r e Nul sen a &
Angel a Lei ght on a
a School of Psychol ogy , Uni ver si t y of W est er n Aust r al i a, Cr awl ey ,
W est er n Aust r al i a, Aust r al i a
b Depar t m ent of Psychi at r y , Uni ver si t y of Cal i f or ni a, San Di ego,
CA, USA
Publ i shed onl i ne: 31 May 2012.
To ci t e t hi s art i cl e: Mi chael W ei nbor n , St even Paul W oods , Cl ai r e Nul sen & Angel a Lei ght on ( 2012) :
The Ef f ect s of Coachi ng on t he Ver bal and Nonver bal Medi cal Sym pt om Val i di t y Test s, The Cl i ni cal
Neur opsychol ogi st , 26: 5, 832- 849
To l i nk t o t hi s art i cl e: ht t p: //dx. doi . or g/10. 1080/13854046. 2012. 686630
PLEASE SCROLL DOW N FOR ARTICLE
Ful l t erm s and condi t i ons of use: ht t p://w w w. t andf onl i ne. com /page/t erm s-and-condi t i ons
Thi s art i cl e m ay be used f or research, t eachi ng, and pri vat e st udy purposes. Any
subst ant i al or syst em at i c reproduct i on, redi st ri but i on, resel l i ng, l oan, sub-l i censi ng,
syst em at i c suppl y, or di st ri but i on i n any f orm t o anyone i s expressl y f orbi dden.
The publ i sher does not gi ve any w arrant y express or i m pl i ed or m ake any represent at i on
t hat t he cont ent s w i l l be com pl et e or accurat e or up t o dat e. The accuracy of any
i nst ruct i ons, f orm ul ae, and drug doses shoul d be i ndependent l y veri f i ed w i t h pri m ary
sources. The publ i sher shal l not be l i abl e f or any l oss, act i ons, cl ai m s, proceedi ngs,
dem and, or cost s or dam ages w hat soever or how soever caused ari si ng di rect l y or
i ndi rect l y i n connect i on w i t h or ari si ng out of t he use of t hi s m at eri al .
Page 2
The Clinical Neuropsychologist, 2012, 26 (5), 832–849
http://www.psypress.com/tcn
ISSN: 1385-4046 print/1744-4144 online
http://dx.doi.org/10.1080/13854046.2012.686630 http://dx.doi.org/10.1080/13854046.2012.686630
The Effects of Coaching on the Verbal and Nonverbal
Medical Symptom Validity TestsMedical Symptom Validity Tests
Michael Weinborn1, Steven Paul Woods1,2, Claire Nulsen1,
and Angela Leighton1
1School of Psychology, University of Western Australia, Crawley,
Western Australia, Australia
2Department of Psychiatry, University of California, San Diego, CA, USA
Evaluation of resistance to coaching is an important step in the validation of symptom
validity tests (SVTs) for clinical use in neuropsychological evaluations. In the present study
coaching effects were evaluated for two recently developed SVTs, the Medical Symptom
Validity Test (MSVT) and Nonverbal Medical Symptom Validity Test (NVMSVT) as
compared with a well-validated existing SVT, the Test of Memory Malingering (TOMM).
This study used a simulation design that included 103 healthy younger study volunteers who
were randomly assigned into one of four conditions: Symptom Coaching, Test Coaching,
Combined Coaching, or Best Effort Control. Specificity for all SVTs was excellent
(96–100%). Test Coaching, either alone or combined with Symptom Coaching, was more
effective than Symptom Coaching alone in producing raw scores suggestive of ‘‘better’’
effort for all SVTs. However, there were only modest declines in the obtained sensitivity,
which remained above 80% for all SVTs. These results provide empirical support for the
classification accuracy of the MSVT and NVMSVT, even when challenged with combined
coaching interventions. However, further validation using known-groups designs and
clinical samples is needed. clinical samples is needed.
Keywords: Neuropsychological
Coaching.
Coaching.
assessment;
assessment;
Symptom validity
Symptom validity
assessment;
assessment;
Forensic psychology;
Forensic psychology;
INTRODUCTIONINTRODUCTION
The need for objective evaluation of symptom validity in neuropsychological
assessment, particularly when there is potential for secondary gain, has been
persuasively argued by many authors (e.g., Bush et al., 2005; Heilbronner, Sweet,
Morgan, Larrabee, & Millis, 2010; Larrabee, 2007) and there is evidence that formal
assessment of symptom validity is becoming the norm in clinical practice
(e.g., Sharland & Gfeller, 2007). With the growing interest in SVTs concerns have
arisen that increased awareness of their use among attorneys and litigants, and
therefore more specific and sophisticated information, may be available to
individuals attempting to feign neurocognitive impairment. Coaching, or provision
of information to individuals being evaluated that is designed to help them
successfully feign impairment, is thought to be relatively commonplace in forensic
evaluations (Suhr & Gunstad, 2007). The importance of resistance to coaching as a evaluations (Suhr & Gunstad, 2007). The importance of resistance to coaching as a
The Clinical Neuropsychologist, 2012, 26 (5), 832–849
http://www.psypress.com/tcn
ISSN: 1385-4046 print/1744-4144 online
The Effects of Coaching on the Verbal and Nonverbal
Michael Weinborn1, Steven Paul Woods1,2, Claire Nulsen1,
and Angela Leighton1
1School of Psychology, University of Western Australia, Crawley,
Western Australia, Australia
2Department of Psychiatry, University of California, San Diego, CA, USA
Evaluation of resistance to coaching is an important step in the validation of symptom
validity tests (SVTs) for clinical use in neuropsychological evaluations. In the present study
coaching effects were evaluated for two recently developed SVTs, the Medical Symptom
Validity Test (MSVT) and Nonverbal Medical Symptom Validity Test (NVMSVT) as
compared with a well-validated existing SVT, the Test of Memory Malingering (TOMM).
This study used a simulation design that included 103 healthy younger study volunteers who
were randomly assigned into one of four conditions: Symptom Coaching, Test Coaching,
Combined Coaching, or Best Effort Control. Specificity for all SVTs was excellent
(96–100%). Test Coaching, either alone or combined with Symptom Coaching, was more
effective than Symptom Coaching alone in producing raw scores suggestive of ‘‘better’’
effort for all SVTs. However, there were only modest declines in the obtained sensitivity,
which remained above 80% for all SVTs. These results provide empirical support for the
classification accuracy of the MSVT and NVMSVT, even when challenged with combined
coaching interventions. However, further validation using known-groups designs and
Keywords: Neuropsychological
The need for objective evaluation of symptom validity in neuropsychological
assessment, particularly when there is potential for secondary gain, has been
persuasively argued by many authors (e.g., Bush et al., 2005; Heilbronner, Sweet,
Morgan, Larrabee, & Millis, 2010; Larrabee, 2007) and there is evidence that formal
assessment of symptom validity is becoming the norm in clinical practice
(e.g., Sharland & Gfeller, 2007). With the growing interest in SVTs concerns have
arisen that increased awareness of their use among attorneys and litigants, and
therefore more specific and sophisticated information, may be available to
individuals attempting to feign neurocognitive impairment. Coaching, or provision
of information to individuals being evaluated that is designed to help them
successfully feign impairment, is thought to be relatively commonplace in forensic
Address correspondence to: Michael Weinborn, School of Psychology, University of Western
Australia,35StirlingHighway,Crawley,
weinborn@uwa.edu.au
Accepted for publication: March 30, 2012. First published online: May 31, 2012.
WesternAustralia6009.E-mail:michael.
? 2012 Psychology Press, an imprint of the Taylor & Francis group, an Informa business
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 3
desirable characteristic of SVTs has been underlined by multiple authors (e.g.,
Hartman, 2002; Rogers, 2008).
A number of important methodological considerations have been raised in
coaching research. In their review of the SVT coaching literature Suhr and Gunstad
(2007) underline the importance of the content of coaching in determining whether it
is successful in helping participants avoid detection. Coaching about traumatic
brain injury (TBI) symptoms alone has been ineffective in raising SVT scores in
some studies (e.g., Powell, Gfeller, Hendricks, & Sharland, 2004), but effective in
others (Erdal, 2004; Suhr & Gunstad, 2000). Coaching specific to test-taking
strategies has also produced inconsistent results, with some studies finding
improved SVT scores (e.g., Cato, Brewster, Ryan, & Guiliano, 2002; Powell
et al., 2004), but others finding no difference (e.g., Frederick & Foster, 1991; Inman
et al., 1998). Suhr and Gunstad (2007) conclude that it is the combination of
symptom and test specific coaching that may lead to more effective malingering and
decreased detection, with only 1 of 10 SVT measures showing resistance to such
coaching in the five studies they reviewed.
Of note, Ben-Porath (1994) suggested a potential cyclical pattern whereby
SVTs are developed, followed by knowledge about the SVTs becoming widespread,
and coaching about that specific test or method becoming common. Therefore there
is a growing need for the development of new SVTs and the evaluation of new and
existing SVTs with regard to their resistance to coaching. Two recently developed
SVTs for which construct validity data are only beginning to emerge are the Medical
Symptom Validity Test (MSVT; Green, 2004) and its visual counterpart, the
Nonverbal Medical Symptom Validity Test (NVMSVT; Green, 2007).
The Medical Symptom Validity Test (MSVT)
The MSVT (Green, 2004) is a computer-administered forced-choice recogni-
tion SVT comprising 10 word pairs (20 words in total); in which each of the words
in the pair represents a single object (e.g., ‘‘Ballpoint – Pen’’). In addition to the
forced-choice SVT trials the MSVT also includes two traditional retrospective
memory measures, Paired Associates Recall and Free Recall. The construct validity
of the MSVT has been evaluated in a few clinical and compensation-seeking
samples, including children and adults with traumatic brain injury (Armistead-
Jehle, 2010; Carone, 2008; Kirkwood & Kirk, 2010), as well as disability claimants
(Chafetz, 2008; Gervais, Wygant, Sellbom, & Ben-Porath, 2011; Richman et al.,
2006; Stevens, Friedel, Mehren, & Merten, 2007). Of note, the design of the MSVT
allows for an additional level of analysis beyond conventional cut-off scores to
address potential false-positive errors amongst individuals with severe cognitive
impairment. The utility of analyses of pattern of performance across subtests with
varying degrees of difficulty (e.g., Forced-Choice vs. Paired Associate or Free
Recall) has been explored with the MSVT for this purpose. For example, Howe and
colleagues (Howe, Anderson, Kaufman, Sachs, & Loring, 2007; Howe & Loring,
2009) found high false positive errors for individuals with dementia using cut-offs
provided in the test manual, but identified distinctive MSVT performance patterns
consistent with dementia (the ‘‘Dementia Profile’’) versus invalid effort in a sample
of patients referred to a memory disorder clinic. When decision rules based on these
THE CLINICAL NEUROPSYCHOLOGIST 833
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 4
profiles were implemented, false positive errors for the MSVT fell to approximately
5% in both studies.
The vulnerability of the MSVT to coaching has not been fully determined.
Indeed, it appears that only two studies have evaluated the MSVT with a coached
simulation design. Using a German adaptation of the task, Merten and colleagues
found that the MSVT displayed excellent sensitivity (100%) and specificity (94%),
with 97% total correct classification using a combined general test and symptom
coaching simulation approach in a small sample of healthy young adults (Merten,
Green, Henry, Blaskewitz, & Brockhaus, 2005). In a similar study, Gorny and
Merten (2005) reported that more specific test identification coaching resulted in
reduced sensitivity of the German MSVT to simulated malingering.
The Nonverbal Medical Symptom Validity Test (NVMSVT)
The NVMSVT was designed as a forced-choice recognition SVT in a visual
modality as an analogue to the verbally based MSVT, with 10 paired objects
(e.g., a cartoon drawing of a baseball resting in a catcher’s mitt) presented on a
computer screen (20 objects in total).
The NVMSVT test manual includes information from multiple unpublished
studies, including data from simulators gathered by independent clinicians from
multiple countries. These clinicians were asked to contribute one participant each,
instructing the participant to simulate severe memory impairment, but apparently
not provided with any symptom or test coaching. The manual reports that the
NVMSVT had 72.5% sensitivity in the simulators, 95% specificity in dementia
patients, and 100% specificity in volunteers providing good effort (Green, 2007).
More recently, some initial studies evaluating the NVMSVT in clinical and litigant
samples have emerged (e.g., Green, Flaro, Brockhaus, & Montijo, 2010; Henry,
Merten, Wolf, & Harth, 2010). Singhal, Green, Ashaye, Shankar, and Gill (2009)
evaluated the NVMSVT and found 90% sensitivity for 10 coached simulators and
100% specificity for the individuals with advanced dementia when the ‘‘dementia
profile’’ was applied. However, the coaching paradigm was relatively rudimentary,
as simulators were only advised that individuals with dementia exhibit ‘‘certain
patterns’’ of performance, but were not provided with any additional information
about the nature of that performance. They were also instructed to simulate ‘‘early’’
dementia in order to encourage a subtle approach (p. 723).
Objectives of the present study
While these early data provide some support for the robustness of the MSVT
and NVMSVT to coaching, the interpretation and clinical application of these
studies is limited by small sample sizes, nonspecific coaching instructions, and non-
English language test adaptations. Therefore the primary objective of this study was
to systematically evaluate the MSVT and NVMSVT measures with regard to their
vulnerability to a variety of coaching interventions compared with an existing well-
validated measure, the TOMM. The present study will employ symptom (Symptom
Coached Group, SCG) and test coaching (Test Coached Group, TCG), as well as
combined symptom and test coaching conditions (Symptom plus Test Coached
834 MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 5
Group, STCG), which will be compared with best effort (Best Effort Comparison
Group, BECG). Based on previous research it is expected that raw scores on the
MSVT and NVMSVT trials meant to measure symptom validity—Immediate
Recall (IR), Delayed Recall (DR), and Consistency (CNS)—will be highest
(suggestive of maximal effort) for the BECG, followed by the combined STCG,
then the TCG and SCG. Despite such differences in raw scores it is expected that
classification accuracy rates produced by the MSVT and NVMSVT will be resistant
to coaching based on symptom information and test taking strategies in isolation
due to high cut-off scores. However, STCG and TCG conditions may result in more
simulators going undetected.
METHOD
Participants
Participants were a convenience sample of 123 volunteers who were either
students at the University of Western Australia (92%) or recruited through knowing
one of the student researchers involved in the study (8%). Participants were
screened for history of psychiatric disorders (e.g., depression, schizophrenia-
spectrum disorders) or risk factors for neuropsychological impairment (e.g., TBI,
stroke, seizure disorder). Compliance and understanding of the experimental
instructions were assessed by questionnaire at the conclusion of the procedure.
Participants were excluded if they were unable to identify the test instructions via a
multiple-choice format (e.g., ‘‘I was asked to simulate memory impairment but
avoid detection using the test taking tips provided to make the simulation more
convincing.’’) or if they did not acknowledge doing their best to comply with the
instructions provided.
In total 20 participants were excluded from the final analyses due to medical/
psychiatric history (n¼5), inadequate compliance (n¼10), or examiner/procedural
errors (n¼5). For the 103 participants remaining in the final analyses the mean age
was 21.08 (SD¼2.41). There were 33 males (32%) and 70 females (68%); 60
participants (58.3%) identified as Caucasian, 42 participants (40.8%) identified
their ethnicity as Asian, and 1 participant (1%) identified as ‘‘other’’. Participants
were not compensated for participation.
Procedure and measures
Participants were randomly assigned to the Symptom Coached Group (SCG,
n¼27), Test Coached Group (TCG, n¼26), Combined Symptom and Test
Coached Group (STCG, n¼24), or the Best Effort Comparison Group (BECG,
n¼26). After informed consent was obtained, participants completed a brief
demographics questionnaire. Demographic information for each group is presented
in Table 1. Groups did not differ in age, gender, ethnic composition, or student
status (all ps4.10).
Participants were provided with a test-taking scenario in which they were
asked to imagine that they had been in a car accident. Scenarios and coaching
instructions were similar to those used by Powell et al. (2004); see the Appendix for
THE CLINICAL NEUROPSYCHOLOGIST835
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 6
detailed coaching instructions. Identical group-specific information was then
provided to all participants in each group.
Symptom Coached Group. Participants in the SCG were asked to simulate
memory impairment as part of an effort to obtain financial compensation for the
car accident. They were provided with information about common sequelae of TBI
(e.g., memory and attention problems) and asked to present as neurocognitively
impaired, while at the same time remaining credible and avoiding being identified as
feigning or exaggerating by the evaluator.
Test Coached Group. Participants in the TCG were given similar instruc-
tions to those in the SCG, but instead of information about TBI, they were provided
with some basic test-taking strategies to avoid detection (e.g., ‘‘answer the easy
items correctly’’).
Combined Symptom and Test Coached Group. Participants in this group
were provided with both sets of information provided to the other coached groups
and instructed to use this information to avoid detection.
Best Effort Comparison Group. Those in the BECG were told that they
had not been injured and had no lasting effects of the accident. They were instructed
to complete all tests to the best of their ability.
After reading the scenario, participants were escorted to a researcher who had
been trained in test administration by the first author, and who was blind to the
participant’s assigned group. All participants were administered the MSVT and
NVMSVT, and the Test of Memory Malingering (TOMM) as part of a larger
neuropsychological battery.
The MSVT consists of 10 word pairs (20 words in total), with each pair
representing a single object (e.g., ‘‘Ballpoint – Pen’’). The words are presented one at
a time on a computer screen, with 6 seconds allotted per word pair. There are two
Table 1. Demographic information by group
Demographic
Characteristics
Best Effort
Control
Group (n¼26)
21.00 (1.33)
Symptom
Coached
(n¼27)
20.30 (1.07)
Test
Coached
(n¼26)
21.92 (3.02)
Symptom and
Test Coached
(n¼24)
21.13 (3.33)
p-value
Age: M (SD) 0.106
Gender
Male
Female
30.80%
69.20%
33.30%
66.70%
38.50%
61.50%
25%
75%
0.782
Ethnicity
Caucasian
Asian
Other
0.571
61.50%
38.50%
0%
63%
37%
0%
61.50%
38.50%
0%
45.80%
50.00%
4.20%
Student Status
Student
Non-student
0.600
96.20%
7.40%
92.60%
3.80%
96.20%
3.80%
100.00%
0.00%
M¼Mean; SD¼Standard Deviation.
836MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 7
learning phases, after which the participant selects the target word from a foil
(e.g., ‘‘Ballpoint’’ or ‘‘Stone’’) using the computer mouse (the Immediate
Recognition trial, IR). Feedback is provided on correctness. The Delayed
Recognition trial (DR) occurs after an approximately 10-minute delay and in the
same manner as the IR trial. The Consistency index (CNS) is determined by the
number of times the participant chooses the same response on IR and DR,
regardless of accuracy. That is, if the participant selected ‘‘Ballpoint’’ for both IR
and DR, OR if they chose ‘‘Stone’’ in both IR and DR, they would obtain a score of
1. Only if their responses were INCONSISTENT (Ballpoint in IR and Stone in DR,
or vice versa) would the participant obtain a score of 0 for that trial. Cut-off scores
for valid vs. invalid performance on each measure (IR, DR, and CNS) are provided
in the test manual.
The NVMSVT consists of 10 drawings of object pairs (e.g., a baseball resting
in a catcher’s mitt), resulting in 20 objects in total. The presentation is similar to the
MSVT, with two learning trials and a 10-minute delayed trial, producing IR, DR,
and CNS scores. Additionally, during the DR subtest, two similar FC recognition
trials are interpolated with the standard target–foil trials: first, the Delayed
Recognition-Variations (DRV) trial includes the target, but the foil is another image
of the target in a slightly altered version (e.g., a baseball with seams of a different
color); secondly, the Delayed Recognition-Archetypes (DRA) trial, in which the
participant must identify the target from the foil, which is an ‘‘archetype.’’ The
archetypal foil image is purported by the author to represent objects of ‘‘primordial
significance’’, e.g., a snake coiled to strike (Green, 2007), and therefore should be
easier for the person to remember if previously encountered during the test and
making the recognition of the foil easier. Paired Associates and Free Recall memory
subtests are also included in the NVMSVT. Cut-off scores for valid and invalid
performance based on performance on NVMSVT subtests have been provided by
the test publisher, but formalized decision rules based on performance patterns
similar to those utilized on the MSVT are also presented (Green, 2007).
In addition to the IR, DR, and CNS measures (and in the case of the
NVMSVT, the DRA and DRV subtests), the MSVT and NVMSVT both include a
Paired Associates subtest (in which the first word/image in each of the 10 pairs is
provided and the person is asked to recall the second word/image in the pair) and a
Free Recall subtest in which the person is simply asked to spontaneously recall as
many items as they can with no other prompts. These subtests are meant to be tests
of actual memory ability as opposed to IR, DR, and CNS, which are designed as
indicators of test-taking effort alone (Green, 2004).
The TOMM is a 50-item individually administered forced-choice recognition
SVT in which participants are shown line drawings of common objects for 3 seconds
each over two learning trials. Participants then select the target object from a foil,
and are provided verbal feedback regarding correctness. After an approximately
15-minute delay the optional retention trial was administered. Cut-off scores based
on Trial 2 and the Retention Trial performance as recommended in the manual
(Tombaugh, 1996) were used to classify tests scores as valid or invalid.
Suhr and Gunstad (2007) advise administering the SVTs being evaluated
within a battery that includes other neuropsychological measures to more closely
simulate an actual neuropsychological evaluation. Therefore the present study also
THE CLINICAL NEUROPSYCHOLOGIST837
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 8
included a brief battery of measures of attention, executive function, and
verbal fluency. These included: (1) The Digit Span subtest of the WAIS-III
(Wechsler, 1997); (2) Trailmaking tests, form A & B (Reitan, 1958); (3) The
Controlled Oral Word Association Test (COWA; Benton, Hamsher, & Sivan, 1994);
(4) the Animal Naming test; and (5) the Action Fluency Test (Piatt, Fields, Paolo, &
Tro ¨ ster, 1999).
The order of test administration was the same for all participants and was as
follows: (1) TOMM Trials 1 & 2; (2) Action Fluency Test; (3) MSVT Learning trials
and IR; (4) Digit Span; (5) TOMM Retention; (6) MSVT DR, Paired Associates,
and Free Recall; (7) NVMSVT Learning and IR; (8) Trailmaking tests; (9) COWA;
(10) Animal Naming Test; (12) NVMSVT DR, Paired Associates, and Free Recall.
Following completion of the testing, the brief compliance measure was
administered.
Data analyses
Data for all SVT measures were not normally distributed (all Kolmogorov-
Smirnov p values 5.05), with the exception of the Delayed Recall-Archetypes
subtest of the NV-MSVT (p4.10). Therefore Kruskal-Wallis tests were completed
to evaluate overall group differences in performance for all relevant SVT measures,
with follow-up Mann-Whitney U tests to characterize specific group differences.
Descriptive classification accuracy statistics (e.g., sensitivity, specificity, hit rate)
were then calculated for each measure and McNemar’s chi-square test was used to
determine if the classification accuracy statistics associated with each SVT differed.
RESULTS
Group differences: Level of performance
As seen in Table 2, Kruskal-Wallis analyses revealed overall group differences
for the TOMM (Trials 1, 2 and Retention, all ps5.001), MSVT (IR, DR and CNS,
all ps5.001), and NVMSVT (IR, DR, CNS, DRA, DRV, all ps5.001). In
addition group differences were found for the PA and FR subtests of the MSVT and
NV-MSVT. Mann-Whitney U tests revealed that all simulation groups had
significantly lower scores (indicating lower levels of effort) than the BECG for all
trials of the TOMM, MSVT, and NVMSVT. Results of pairwise comparisons with
effect sizes are presented in Table 3.
Further analysis of group differences specific to the TOMM indicated that the
SCG produced scores indicative of poorer effort in comparison with the TCG and
STCG for all trials (all ps5.05). Of note, however, the STCG and TCG
did not differ from one another (all ps4.10), suggesting the addition of
symptom information did not add to the benefit of test-specific coaching for
the TOMM.
For the MSVT, differences between the Test Coached and Symptom coached
Groups did not reach the level of statistical significance for SVT subtests
(all ps4.10), but the TCG produced higher scores on the PA subtest. There was
some evidence of improved effort when symptom and test coaching were combined
for the MSVT, as the STCG performed significantly better on the IR, DR, PA, and
838 MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 9
FR trials (ps5.05) when compared with the SCG. Further, the STCG performed
better on the IR and FR trials compared with test coaching alone (p5.05),
suggesting that combined coaching may be of further benefit for the MSVT.
For the NVMSVT, test coaching alone did not produce scores indicative of
better effort than the SCG (all ps4.10), with the exception of the DRV (p5.01)
and FR (p5.05) subtests. However, the combined STCG obtained better effort
scores on the IR, DR, DRA, and DRV trials (ps5.05), but not the CNS index
(p4.10), when compared to the SCG. Combined symptom and test coaching did
not appear to be associated with any improvement in effort scores over test
coaching alone (all ps4.10), with the exception of the DRA trial (p5.05).
In summary, consistent with the first prediction, the Best Effort Comparison
Group produced scores indicative of the highest levels of effort for all three SVT
measures, followed by the Symptom and Test Coached Group, then the Test
Coached Group, with scores suggestive of the lowest levels of effort produced by the
Symptom coached Group. However, the difference between the STCG and TCG
were only significant for the IR trial of the MSVT and the DRA trial of the
NVSMVT, while test coaching, either alone or in combination with symptom
coaching, appeared to result in better effort scores for nearly all measures in
comparison with symptom coaching alone.
Table 2. Performance on Symptom Validity Test by Group
Best Effort
Comparison
Group
Symptom
Coached
Group
Test
Coached
Group
Symptom
Plus Test
Coached
Groupp-value
TOMM: Raw Score (SD)
TOMM Trial 1
TOMM Trial 2
TOMM Retention Trial
46.65 (3.22)abc
49.81 (0.49)abc
49.88 (0.43)abc
25.07 (5.29)de
26.04 (7.77)de
26.19 (8.25)de
30.58 (7.22)
32.19 (7.60)
31.62 (7.63)
32.00 (7.86)
34.08 (9.34)
33.25 (11.20)
5.001
5.001
5.001
MSVT: % Score (SD)
Immediate Recall
Delayed Recall
Consistency
Paired Associates
Free Recall
99.04 (2.01)abc
99.23 (1.84)abc
98.27 (2.43)abc
98.85 (3.26)abc
84.81 (13.07)abc
59.26 (18.06)e
53.52 (18.60)e
61.30 (13.49)
47.78 (22.76)de
37.96 (17.11)e
63.46 (15.73)f
62.12 (17.67)
62.88 (15.82)
60.38 (19.69)
44.81 (12.61)f
74.58 (17.69)
69.17 (20.25)
71.25 (16.43)
70.00 (22.26)
56.88 (17.68)
5.001
5.001
5.001
5.001
5.001
NV-MSVT: % Score (SD)
Immediate Recall
Delayed Recall
Consistency
Delayed Recognition – Archetypes
Delayed Recognition – Variations
Paired Associates
Free Recall
99.81 (.98)abc
97.5 (4.06)abc
97.31 (4.06)abc
88.85 (12.02)abc
97.69 (4.30)abc
99.61 (1.96)abc
78.46 (12.94)abc
71.11 (22.88)e
55.74 (19.60)e
60.92 (15.38)
52.04 (20.90)e
50.00 (24.96)de
71.48 (26.12)
40.00 (22.01)d
80.38 (15.03)
62.31 (17.28)
61.15 (17.96)
58.85 (12.02)
68.46 (18.91)
83.85 (14.72)
54.23 (16.47)
85.00 (13.99)
71.04 (18.99)
68.33 (21.14)
69.37 (14.69)
69.17 (29.18)
85.00 (14.45)
51.04 (16.22)
5.001
5.001
5.001
5.001
50.001
50.001
50.001
aBECG vs. SCG p5.05;bBECG vs. TCG p5.05;cBECG vs. STCG p5.05;dSCG vs. TCG p5.05;
eSCG vs. STCG p5.05;
STG¼Symptom Coached Group, TCG¼Test Coached Group, STCG¼Symptom and Test Coached
Group. TOMM¼Test
NVMSVT¼Non-Verbal Medical Symptom Validity Test.
fTCG vs. STCG p5.05. BECG¼Best Effort Comparison Group,
of MemoryMalingering, MSVT¼MedicalSymptomValidity Test,
THE CLINICAL NEUROPSYCHOLOGIST 839
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 10
Group differences: Classification accuracy
Figure 1 presents the proportion of each group performing below the cut-off,
suggestive of invalid performance for the TOMM, MSVT, and NVMSVT.
The TOMM and MSVT obtained a specificity of 100%, with none of the
BECG falling below the cut-offs indicative of invalid performance. Interestingly,
however, a single BECG member fell just below the cut-off for the NVMSVT,
resulting in a specificity of 96%. This individual primarily fell below the cut-off due
to a low (70%) score on the DRA, while performing at 90% or above on IR, DR,
CNS, DRV, and PA. This pattern of a poorer performance for one of the easiest
subtests while scoring better on a more difficult subtest (DRA) was consistent with
variable and suspect effort rather than true cognitive impairment according to
decision rules as described by the manual and Henry et al. (2010).
The TOMM’s sensitivity was 100% for the SCG, as all participants in this
group scored below the cut-off, 89% for the TCG with 23 of 26 group members
correctly identified, and 83% for the STCG with 20 of 24 simulated malingerers
correctly classified. Overall hit rate for the TOMM was 100% based on the SCG,
Table 3. Effect sizes comparing performance on the sub-tests of the symptom validity tests between
groups
BECG vs.
SCG
BECG vs.
TCG
BECG vs.
STCG
SCG vs.
TCG
SCG vs.
STCG
TCG vs.
STCG
TOMM
TOMM T1
TOMM T2
TOMM Ret
?0.9a
?0.9a
?0.9a
?0.8a
?0.9a
?0.9a
?0.8a
?0.8a
?0.8a
?0.4b
?0.3c
?0.3c
?0.4b
?0.4b
?0.3c
?0.1
?0.1
?0.1
MSVT
IR
DR
CNS
PA
FR
?0.9a
?0.9a
?0.9a
?0.9a
?0.8a
?0.8a
?0.8a
?0.8a
?0.8a
?0.8a
?0.8a
?0.7a
?0.7a
?0.7a
?0.7a
?0.1
?0.2
0.0
?0.3c
?0.2
?0.4b
?0.4c
?0.3
?0.4b
?0.5a
?0.3c
?0.2
?0.2
?0.2
?0.4c
NVMSVT
IR
DR
CNS
DR-A
DR-V
PA
FR
?0.8a
?0.8a
?0.8a
?0.7a
?0.8a
?0.7a
?0.7a
?0.7a
?0.8a
?0.8a
?0.8a
?0.8a
?0.6a
?0.6a
?0.7a
?0.8a
?0.7a
?0.6a
?0.6a
?0.7a
?0.7a
?0.2
?0.2
0.0
?0.2
?0.4b
?0.2
?0.3c
?0.3c
?0.4b
?0.2
?0.4b
?0.3c
?0.2
?0.3
?0.1
?0.2
?0.2
–0.4c
?0.1
0.0
0.0
ap5.001,bp5.01,cp5.05.
Effect sizes presented as r¼z/pN, with values of small¼0.1, medium¼0.3, and large¼0.5 (Cohen,
1988). BECG¼Best Effort Comparison Group, STG¼Symptom Coached Group, TCG¼Test Coached
Group, STCG¼Symptom and Test Coached Group. TOMM¼Test of Memory Malingering,
MSVT¼Medical Symptom Validity Test, NVMSVT¼Non-Verbal Medical Symptom Validity Test,
IR¼Immediate Recall, DR¼Delayed Recall, CNS¼Consistency Score, PA¼Paired Associates,
FR¼Free Recall, DR-A¼Delayed Recall–Archetypes, DR-V¼Delayed Recall–Variations.
840 MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 11
94% based on the TCG, and 92% for the STCG. While more simulators in the TCG
and STCG escaped detection, this difference did not reach statistical significance,
w2(2, N¼77)¼4.55, p¼.10.
Similarly, the MSVT produced sensitivity values of 100% for the SCG, 92%
for the TCG, and 83% for the combined STCG, with hit rates of 100%, 96%, and
92% respectively. Group differences in sensitivity for the MSVT similarly were not
statistically significant, w2(2, N¼77)¼4.91, p¼.09.
Sensitivity for the NVMSVT was 96% for the SCG, 92% for the TCG, and
88% for the STCG, producing overall hit rates of 96% based on the SCG, 94% for
the TCG, and 92% for the STCG. Differences in sensitivity across coaching
conditions were not significant, w2(2, N¼77)¼1.37, p¼.50.
Finally, a series of McNemar’s chi square analyses revealed that the
classification accuracy rates produced by the TOMM, MSVT, and NV-MSVT
did not significantly differ from one another (all ps4.10).
DISCUSSION
SVTs based on the forced-choice recognition paradigm have been considered
the most well validated approach to evaluate suboptimal effort in neuropsycholog-
ical assessment (Heilbronner et al., 2010). However, as established measures based
solely on this methodology become well known they become potentially more
vulnerable to coaching, resulting in a need for the development of new SVTs that
include novel methodology. The primary aim of the present study was to evaluate
the vulnerability to coaching of two recently developed SVTs (the Medical
Symptom Validity Test, MSVT, and the Nonverbal Medical Symptom Validity
Test, NVMSVT) in comparison with an already extensively validated SVT (the Test
Figure 1. Sensitivity (% identified by the symptom validity tests as malingering) of Test of Memory
Malingering (TOMM), Medical Symptom Validity Test (MSVT), and NonVerbal Medical Symptom
Validity Test (NVMSVT) in three coached simulated malingerer groups and one non-simulation group.
THE CLINICAL NEUROPSYCHOLOGIST841
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 12
of Memory Malingering, TOMM) within a coached simulation methodology.
A group asked to complete the measures to the best of their ability (BECG) served
as a comparison group to three groups of simulated malingerers. Specifically, the
effects of symptom coaching alone (the SCG), test coaching alone (the TCG), and
combined symptom and test coaching (the STCG) were evaluated.
In general the MSVT and NVMSVT identified simulated malingerers, even
when participants were provided coaching specific to both SVT measures and brain
injury symptoms. Test coaching, either alone or in combination with symptom
coaching, was more effective in producing scores suggestive of better effort than
symptom coaching alone for all SVTs. Test coaching alone was sufficient to produce
better effort scores on the TOMM compared with symptom coaching alone,
consistent with the findings of Powell et al. using similar methodology (2004). But
for the MSVT and NVMSVT combined test and symptom coaching was required.
These results indicate that coaching attenuated the most flagrant forms of
dissimulation.
However, the high cut-off scores used in all the SVTs resulted in high levels of
correct classification of simulators with sensitivity declining, but only modestly,
across coaching conditions (96–100% for the SCG, 89–92% for the TCG, and
83–87% for the combined STCG). Overall classification accuracy remained at or
above 92% for all SVTs in all coaching conditions. These classification accuracy
estimates were consistent with those obtained in other simulation studies using the
TOMM (e.g., Powell et al., 2004; Rees, Tombaugh, Gansler, & Moczynski, 1998)
and the preliminary studies using the MSVT (Gorny & Merten, 2005; Merten et al.,
2005). The classification accuracy of the MSVT and NVMSVT did not differ from
the TOMM.
While not entirely unaffected by coaching, all three SVTs were relatively
resistant to coaching related misclassification, with sensitivities remaining above
80%. However, the decline in sensitivity remains concerning and underlines the
need for use of multiple SVTs within a battery. While use of multiple SVTs may also
increase the potential for false-positive errors, it is interesting to note that when
failure on ANY of the three SVTs in the current study was used to identify
simulators in the most effectively coached groups (the TCG and combined TSCG),
sensitivity rose to 92%, with only 4 of the 50 simulators misclassified.
Additionally, the present findings underscore the need to continue developing
new methodologies for identification of invalid test performance. Movement
beyond the simple two-choice forced alternative methodology has already begun.
Newly developed SVTs employ methodologies such as expanded multiple-choice
sets (e.g., the Amsterdam Short-Term Memory Test; Schmand & Lindeboom, 2005)
or tasks that engage implicit memory processes (e.g., the Word Completion
Memory Test; Hilsabeck, LeCompte, Marks, & Grafman, 2001). Additionally, the
use of ‘‘embedded’’ SVTs has become an increasing focus of interest (e.g., Victor,
Boone, Serpa, Buehler, & Ziegler, 2009), both to increase efficiency by tests serving
‘‘double duty’’, but also as these tests may have further resistance to coaching
(Rohling & Boone, 2007).
Importantly, the inclusion of forced-choice recognition trials in combination
with traditional memory measures of varying difficulty (Paired Associates Recall
and spontaneous Free Recall) on both the MSVT and NVMSVT allow for another
842 MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 13
alternative method of identification of simulators. Howe et al. (2007) compared a
group with suspected motivation to malinger to a group of memory disorder clinic
patients with no such known motivation in a differential prevalence design. Those
memory-impaired patients without motivation to malinger produced a profile
consistent with the gradient of difficulty of the subtests. That is, they performed
better on easier tasks (the forced-choice IR and DR trials) than on the more difficult
(Paired Associates and Free Recall) tasks. Individuals thought to be producing
invalid effort produced profiles with scores on the easy tasks lower, and scores on
the hard tasks higher than those produced by the dementia group. While the very
small number of patients identified as at risk for malingering makes the conclusions
drawn from the study necessarily tentative, such innovative approaches are needed,
and are particularly important in addressing potential false positive misclassifica-
tions when evaluating patients with significant cognitive impairment. The sample
employed in the present study would not be likely to have any significant cognitive
impairment (and were indeed screened to make sure that this was the case).
However, the Dementia Profile as outlined by Howe et al. (2007) was applied to
evaluate whether this sample of coached simulated malingerers produced profiles
different from those with true impairment, or profiles that could be inaccurately
misclassified as false positive errors. Importantly, of the 71 simulators that
performed below the cut-off on the MSVT, 59 (83%) did NOT have profiles
consistent with the criteria for the Dementia Profile and would have been correctly
identified as simulators even with this more stringent approach.
The design of the NVMSVT allows for a similar level of analysis based on
performance patterns across task difficulty. As seen in the present data, the
simulation group produced equivalent or higher mean scores for the Paired
Associates subtest of the NVMSVT than for the objectively easier forced-choice
recognition subtests. The manual provides an algorithm to evaluate whether the
pattern of performance of those individuals identified as putting forth suboptimal
effort based on cut-off scores is more consistent with severe cognitive impairment or
response bias. Among the 72 simulators scoring below the cut off, only 6 (8%)
would not have been correctly identified as such using the supplementary criteria
included in the manual.
Limitations of the present study
Limitations include those associated with the simulation design, which have
been described in detail elsewhere (e.g., Larrabee, 2005), but primarily involve
concerns about generalizability to malingerers within a clinical context. This issue
clearly remains an important one and underlines the need for multiple validation
studies using diverse methodologies to evaluate new SVTs. As described elsewhere
(Larrabee, 2005; Rogers, 2008; Suhr & Gunstad, 2007), simulation and known
groups designs have complementary methodological strengths and weaknesses, and
simulation studies in particular allow for evaluation of coaching effects. As Suhr
and Gunstad (2007) point out, provision of coaching information to a
non-litigating/non-forensic research sample is not thought likely to lead to or
enhance malingering in real life, whereas provision of such coaching to those
THE CLINICAL NEUROPSYCHOLOGIST843
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 14
currently in litigation or with other real-world motivation to malinger is clearly
ethically problematic.
Limitations specific to the present study include a somewhat limited sample
size; however, the sample sizes in the current study were consistent with (Powell
et al., 2004) or larger than (Gorny & Merten, 2005; Merten et al., 2005; Rees et al.,
1998) many previous studies. Such small sample sizes have generally been adequate
in simulation and coaching studies, as effect sizes tend to be very large, at least
between normal effort and simulation groups. However, as Gorny and Merten
(2005) point out, as the research questions have become more sophisticated, and
differences between coaching styles have become more of a focus, we may expect
more modest effect sizes. Supporting this, a visual inspection of SVT performance
across coaching conditions in the current study does tend to support a systematic
increase in raw scores as quality and amount of coaching information increases.
A similar trend was noted by Gorny and Merten (2005) in their comparison of four
coaching conditions. The current study was not sufficiently powered to detect
medium and small effect sizes and thus it is important to note that the lack of
differences between some coaching conditions may reflect a lack of power rather
than a true absence of effect.
The lack of a clinical simulating group and use of financial incentives as
recommended by Rogers (2008) may limit generalizability. However, use of clinical
simulators has other drawbacks, such as possible malingering within the clinical
group (Larrabee, 2005) and the previously noted ethical considerations of providing
malingering coaching techniques to individuals who may be at increased risk for
symptom exaggeration. Of note, at least one study has supported that simulators
(including demographically dissimilar participants) perform similarly to suspected
malingerers (Brennan & Gouvier, 2006).
Conclusions and future directions
The present study provides additional support for the validity of the MSVT
and NVMSVT, with classification accuracy statistics and resistance to coaching
comparable to an existing well-validated SVT, the TOMM. While test coaching was
associated with lower relative sensitivity values, overall classification accuracy
remained acceptable, particularly when performance across multiple SVTs was
considered. In addition the current study only evaluated the efficacy of coaching
characterized by provision of information relevant to symptoms associated with
brain injury, as well as general SVT related information. However, much more
specific information about approaches to identifying dissimulation is currently
available on the internet (e.g., Bauer & McCaffery, 2006). Evaluation of the effects
of access to this more detailed and test specific (e.g., test appearance and cut-off
scores) information should be completed as part of the validation process for
all SVTs.
ACKNOWLEDGMENTS
The authors would like to extend their gratitude to Dr. Paul Green for
providing us with the MSVT and NVMSVT, as well as the students enrolled in the
research methods unit who assisted with the data collection and processing.
844 MICHAEL WEINBORN ET AL.
Downloaded by [Edith Cowan University] at 02:07 06 May 2013
Page 15
REFERENCES
Armistead-Jehle, P. (2010). Symptom validity test performance in U.S. veterans referred for
evaluation of mild TBI. Applied Neuropsychology, 17, 52–59.
Bauer, L., & McCaffrey, R. J. (2006). Coverage of the Test of Memory Malingering, Victoria
Symptom Validity Test, and Word Memory Test on the internet: Is test security
threatened? Archives of Clinical Neuropsychology, 21, 121–126.
Ben-Porath, Y. S. (1994). The ethical dilemma of coached malingering research. Psychological
Assessment, 6, 14–15.
Benton, A. L., Hamsher, K., de, S., & Sivan, A. B. (1994). Manual for the Multilingual
Aphasia Examination (3rd ed.). Iowa City: AJA Associates.
Brennan, A. M., & Gouvier, W. D. (2006). Are we honestly studying malingering? A profile
and comparison of simulated and suspected malingerers. Applied Neuropsychology, 13,
1–11.
Bush, S. S., Ruff, R. M., Tro ¨ ster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H.,
et al. (2005). Symptom validity assessment: Practice issues and medical necessity:
NAN Policy & Planning Committee. Archives of Clinical Neuropsychology, 20,
419–426.
Carone, D. A. (2008). Children with moderate/severe brain damage/dysfunction outperform
adults with mild-to-no brain damage on the Medical Symptom Validity Test. Brain
Injury, 22, 960–971.
Cato, M. A., Brewster, J., Ryan, T., & Giuliano, A. J. (2002). Coaching and the ability to
simulate mild traumatic brain injury symptoms. The Clinical Neuropsychologist, 16,
524–535.
Chafetz, M. (2008). Malingering on the social security disability consultative exam: Predictors
and base rates. The Clinical Neuropsychologist, 22, 529–546.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Erlbaum.
Erdal, K. (2004). The effects of motivation, coaching, and knowledge of neuropsychology on
thesimulated malingeringof head
Neuropsychology, 19, 73–88.
Frederick, R. I., & Foster, H. G. (1991). Multiple measures of malingering on a forced-choice
test of cognitive ability. Psychological Assessment: A Journal of Consulting and Clinical
Psychology, 3, 596–602.
Gervais, R., Wygant, D., Sellbom, M., & Ben-Porath, Y. (2011). Associations between
symptom validity test failure and scores on the MMPI-2-RF validity and substantive
scales. Journal of Personality Assessment, 93(5), 508–517.
Gorny, I., & Merten, T. (2005). Symptom information-warning-coaching: How do they affect
successful feigninginaneuropsychological
Neuropsychology, 4, 71–97.
Green, P. (2004). Medical Symptom Validity Test for Windows: User’s manual. Edmonton,
Canada: Green’s Publishing.
Green, P. (2007). Manual for the Nonverbal Medical Symptom Validity Test. Edmonton,
Canada: Green’s Publishing.
Green, P., Flaro, L., Brockhaus, R., & Montijo, J. (2012). Performance on the WMT, MSVT,
and NV-MSVT in children with developmental disabilities and in adults with mild
traumatic brain injury. In C. R. Reynolds & J. A. M. Horton (Eds.), Detection of
malingering during head injury litigation (pp. 201–219). USA: Springer.
Hartman, D. E. (2002). The unexamined lie is a lie worth fibbing Neuropsychological
malingering and the Word Memory Test. Archives of Clinical Neuropsychology, 17,
709–714.
injury(References).Archives ofClinical
assessment?Journal ofForensic
THE CLINICAL NEUROPSYCHOLOGIST 845
Downloaded by [Edith Cowan University] at 02:07 06 May 2013