Content uploaded by Jennifer L Callahan
Author content
All content in this area was uploaded by Jennifer L Callahan on Apr 06, 2015
Content may be subject to copyright.
Evidence-Based Technical Skills Training in Pre-Practicum
Psychological Assessment
Jennifer L. Callahan
University of North Texas
The use of practice in prepracticum assessment training is extremely widespread (Childs & Eyde, 2002);
however, multiple studies have observed practice to be an ineffective method for fostering basic scoring
competencies. The aim of the current study was to determine whether the use of practice as a competency
training method in psychological assessment could be improved. A conceptual map was developed that
incorporates feedback with sequential, cumulative, and increasingly complex practice. This map was then
applied to prepracticum training in the technical skills required for psychological assessment of intelli-
gence and personality. In addition to examining errors associated with the Wechsler Adult Intelligence
Scale and Wechsler Intelligence Scale for Children, the Rorschach Comprehensive System was also
investigated. Using the methods herein, Wechsler scoring errors were significantly reduced within three
training experiences to a mean well below published error rates. Training in the Rorschach Comprehen-
sive System also indicated substantial gains in coding accuracy within three practice experiences that
remained largely intact at 8-weeks follow-up. Future research of this prepracticum training model, which
lends itself to a full range of competencies across training levels, is strongly encouraged.
Keywords: prepracticum, psychological assessment, examiner training, practice, training model
Supplemental materials: http://dx.doi.org/10.1037/tep0000061.supp
Competency in psychological assessment has been described as,
“a defining aspect of psychological expertise” (p. 726) that is,
“essential to all health-service practices in psychology, transcend-
ing specialties” (p. 732; Krishnamurthy et al., 2004). The identi-
fication of training in psychological assessment as necessarily
beginning during prepracticum (Hatcher & Lassiter, 2007) and
spanning one’s entire career (Krishnamurthy et al., 2004) under-
scores importance, but also poses some challenges to trainers. In
particular, trainers must determine which specific skills they will
aim to selectively develop among trainees during the brief pre-
practicum window. Childs and Eyde (2002) observed that, among
the 84 accredited clinical psychology doctoral programs they re-
viewed, all provided training in both intelligence and personality
assessment and almost all required formal supervised practice in
administration, scoring, and interpretation of psychological tests.
A more recent study by Ready (2013) reports similar findings
among 77 accredited clinical psychology doctoral programs, sug-
gesting that much of prepracticum assessment training may focus
on what the literature conceptualizes as “technical assessment
skills” (p. 732; Krishnamurthy et al., 2004).
Within the small body of existing literature, the most common
method of assessing competency in technical assessment skills is
to examine the prevalence and types of scoring errors. The
Wechsler Intelligence Scales have been the most frequently con-
sidered in this regard with studies of trainees’ competency span-
ning multiple editions of both the Wechsler Adult Intelligence
Scale (WAIS) and Wechsler Intelligence Scale for Children
(WISC) versions. Although the mean number of errors demon-
strates some variability among the different trainee samples, the
results of these studies have been remarkably consistent in two
regards: (a) errors of administration, recording, and calculation are
very common; and (b) repetitious practice does not reduce errors
(Alfonso, Johnson, Patinella, & Rader, 1998;Belk, LoBello, Ray,
& Zachar, 2002;Franklin, Stillman, Burpeau, & Sabers, 1982;
Levenson, Golden-Scaduto, Aiosa-Karpas, & Ward, 1988;Loe,
Kadlubek, & Marks, 2007;Ryan & Schnakenberg-Ott, 2003;Sher-
rets, Gard, & Langner, 1979;Slate & Chick, 1989;Slate & Jones,
1990;Slate, Jones, & Covert, 1992;Slate, Jones, & Murray, 1991).
In light of Childs and Eyde’s (2002) report that most doctoral
programs require practice as a part of their assessment training, the
consistent finding that practice does not reduce errors is a training
conundrum. Can the use of practice as a competency training
method in psychological assessment be improved? Exploring this
question was the central aim of this study.
Within the broad literature on the acquisition of expertise, it has
been suggested that, unlike simple repetitious practice, practice
that deliberately focuses on improving specific aspects of increas-
ing difficulty while obtaining immediate substantive feedback can
foster expertise (see Ericsson, Krampe, & Tesch-Romer, 1993, for
This article was published Online First August 18, 2014.
JENNIFER L. CALLAHAN earned her PhD in clinical psychology from the
University of Wisconsin-Milwaukee, completed her internship and post-
doctoral training at Yale University, and holds board certification in
clinical psychology. She is currently associate professor and director of
clinical training for the clinical psychology program at the University of
North Texas, where she directs the Evidence-Based Training and Compe-
tencies Research Lab.
CORRESPONDENCE CONCERNING THIS ARTICLE should be addressed to
Jennifer L. Callahan, Department of Psychology, University of North
Texas, 1155 Union Circle #311280, Denton TX 76205. E-mail: jennifer
.callahan@unt.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Training and Education in Professional Psychology © 2014 American Psychological Association
2015, Vol. 9, No. 1, 21–27 1931-3918/15/$12.00 http://dx.doi.org/10.1037/tep0000061
21
a salient and oft cited framework on the development of expertise).
Stated in the training vernacular, the suggestion is that effective
practice is “sequential, cumulative, and graded in complexity” (p.
60, Implementing Regulation C-26; Commission on Accreditation,
2007) and paired with immediate feedback. Very little research has
empirically considered this training suggestion, though Conner and
Woodall (1983) reported that coupling practice with structured
feedback reduced administrative errors and total error rate, despite
no significant effect on response scoring or calculation errors.
Subsequently, Slate and Jones (1989) used a quasi-experimental
design in teaching the WISC to small sequential cohorts. The
experimental group was given detailed information about frequent
errors made by the preceding control group and how the errors
could be avoided. This feedback resulted in the experimental group
making fewer errors; however, neither group demonstrated an
improvement in accuracy over seven practice administrations.
Though neither of these studies demonstrated strongly positive
final outcomes with respect to trainees’ scoring competency, they
do lend support to the premise that use of feedback may facilitate
acquisition of technical assessment skills during prepracticum
training.
The current study sought to more fully explore the use of
feedback, coupled with sequential, cumulative, and increasingly
complex practice, during prepracticum training in the technical
skills associated with psychological assessment of intelligence and
personality. In addition to examining errors associated with the
WAIS and WISC, the Rorschach Comprehensive System (CS) was
also included so that both intellectual and personality assessment
are represented. A basic conceptual map was crafted to guide the
development of a training curriculum that could be subjected to
empirical study and hypothesis testing. As shown in Figure 1,
across training sequences, technical skills were developed cumu-
latively while complexity was increased. For the WAIS and WISC,
three types of previously researched scoring errors were mapped
onto this conceptual framework: administrative errors, recording
errors, and computational errors (Loe, Kadlubek, & Marks, 2007).
Administrative errors included failures to query appropriately,
starting with the wrong item, failure to apply basal or ceiling rules,
assigning an incorrect number of points to an item, and/or admin-
istering an incorrect number of items. Recording errors consisted
of either a failure to record a response or a failure to indicate
completion times when required. Computational errors could result
from any of the following: miscalculation of chronological age,
subtest raw, scaled, and/or standard scores, or incorrect conversion
into index scores, IQ scores, percentile ranks, and/or confidence
intervals. Errant values in any of the computational error catego-
ries that were the solely the result of an earlier administrative error
were noted, but they were excluded from the computational error
total to avoid progressively compounding errors. The first training
experience for each Wechsler measure involved performing com-
putations (i.e., chronological age, raw scores, scaled scores, index
scores, percentile ranks, confidence intervals, and discrepancy
scores) as a homework assignment and then receiving feedback.
The second training experience additionally required filling out the
protocol record form while administering the Wechsler to an
advanced graduate student volunteer. In addition to receiving
feedback on scoring, trainees received in vivo feedback regarding
their administration skills from the advanced graduate student
volunteer. The third training experience again required adminis-
tration, recording, and computation tasks, though within the con-
text of a brief assessment battery (that, beyond the WAIS or
WISC, included a clinical interview and an achievement measure)
using a community volunteer who was previously unknown to the
trainee.
Training with the Rorschach CS was also mapped onto the
conceptual framework presented in Figure 1. The first Rorschach
CS training experience consisted of coding
1
homework, accom-
plished over several weeks, in which trainees progressively coded
isolated elements of responses (e.g., coding Location and Space for
each response, receiving feedback, coding Determinants, receiving
feedback, and so forth) until the entire response was coded. The
second experience had trainees code all responses on a nonclinical
profile before receiving feedback. All coding for Experience 2 was
required to be completed within 1 week of assignment. The third
training experience required coding of all responses on a clinical
profile under time constraints (1 hr, 45 min) with only the Ror-
schach CS workbook available (Exner, 2001). As reflected in
Figure 1, response coding was required in all three training expe-
riences (band appearing in all three columns). The response diffi-
culty level increased after the first experience but was approxi-
mately equivalent in Experiences 2 and 3 (middle band of
Experiences 2 and 3). The naturalistic conditions were more
closely approximated across Rorschach Experiences 2 and 3
(lower band in each) by exacerbating the time constraint and
imposing resource limitations.
In light of the established literature and the conceptual frame-
work for the current study, the first hypothesis was that trainees
would make significantly fewer (a) administration, (b) recording,
and (c) computational errors on each successive training experi-
ence associated with each Wechsler test. The second hypothesis
was that, after adjusting for protocol difficulty, coding errors
would diminish across Rorschach CS training experiences.
1
Scoring in the Rorschach CS is referred to as coding; as such, the term
coding will be used throughout this article.
Experience 1 Experience 2 Experience 3
Complexity
Sequence of Training
Feedback
Feedback
Figure 1. Conceptual model for an evidence-based training approach.
Each successive training experience is cumulative, with greater complexity
progressively introduced by adding requirements (middle band of Experi-
ences 2 and 3) or increasing difficulty associated with a requirement (lower
band of Experiences 2 and 3). Feedback was additionally provided within
Experience 1 for the Rorschach CS and within Experience 2 for both the
WAIS and WISC.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
22 CALLAHAN
Method
Participants
Data were derived from archival records (years 2009–2013) of
a two-course sequence in psychological assessment. The sequence
was required of all students without previous salient coursework
who were enrolled in their first year of graduate training. Students
with prior assessment coursework were individually reviewed by
faculty and, when appropriate, routed to more advanced course-
work instead. Therefore, all participants were novice trainees, at
the prepracticum level of training. A staff member of the Depart-
ment of Psychology independently randomly sorted students into
two enrollment groups before the start of each semester: a moni-
toring condition and a nonmonitoring condition. Students were
provided with enrollment codes each semester that were linked
specifically to their identity, ensuring that no codes could be
successfully exchanged among students. The staff member was
unaware of which group was the monitoring condition and had
only one stipulation: within each cohort of each of the programs
(clinical, counseling, and so forth), an equal number of students
should be assigned to each condition (i.e., restricted randomization
procedures). Both conditions received the same curriculum, but
because of the limited human resources available data were cap-
tured only for those trainees prospectively assigned to the moni-
toring condition. There were 37 students assigned to the monitor-
ing condition for each course. Twenty students were randomized to
the monitoring condition for both courses in the sequence. Thus,
the total number of participants across the two-course sequence is
69, though only 37 were enrolled in either course. Participants
included 53 women (76.8%) and 16 men (23.2%). Public disclo-
sures reflected 18 minority (race, ethnicity, and/or sexual orienta-
tion; 26.1%) and 51 nonminority (73.9%) trainees. Trainees were
drawn from a graduate Counseling Psychology master’s degree
program (n⫽14; 20.3%) or one of three American Psychological
Association (APA) accredited doctoral (Ph.D.) programs in Clin-
ical Psychology (n⫽20; 29.0%), Clinical Health Psychology and
Behavioral Medicine (accredited as Clinical; n⫽15; 21.7%), or
Counseling Psychology (n⫽18; 26.1%). The program affiliation
of two participants was not identified. Each program espouses a
scientist-practitioner training model and is housed in the same
Department of Psychology within a large, public university. All
students and their data were treated in accordance with the APAs
Ethical Principles of Psychologists and Code of Conduct (Amer-
ican Psychological Association, 2010) and the local Institutional
Review Board granted approval of the current study.
Procedures
At the conclusion of each term, trainees were asked to provide
copies of their protocols for subsequent data entry. Data were not
compiled for hypothesis testing until after the final cohort provided
their copies; thus, no analyses were conducted between training
cohorts. Because of space constraints, details of the training cur-
riculums are not provided herein. However, the curriculum details
did undergo peer review and are offered as supplemental materials
online via this journal for trainers who may wish to replicate the
curriculum.
The Wechsler training protocols used in this study differed
among participants (i.e., they were not drawn from a standardized
bank of protocols) and were scored by seven raters under the
supervision of the principal investigator. All raters were advanced
doctoral students with experience as a graduate assessment course
teaching fellow. An additional rater independently rescored the
protocols used in rater training (WAIS n⫽42; WISC n⫽42).
Two-way random model, consistency intraclass coefficients (ICC)
were computed and found to be uniformly excellent across both
the WAIS and the WISC rater training protocols. With respect to
scoring errors in administration, the WAIS ICC ⫽.92 and the
WISC ICC ⫽.99. The ICC for scoring recording errors was 1.00
for both the WAIS and the WISC protocols. Scoring of computa-
tional errors produced an ICC of .96 for the WAIS and .98 for the
WISC. The principal investigator was the final arbiter of accuracy.
Rorschach CS training protocols (n⫽6) were drawn from a
training manual for coding the Rorschach CS, which is available
from http://dx.doi.org/10.1037/1931–3918.1.2.125.supp (Hilsen-
roth & Charnas, 2007). These protocols reflect both nonclinical
and clinical populations and were standardized by expert CS
scorers, who additionally graded the inherent coding difficulty
associated with each protocol. For each Rorschach CS training
protocol, an error was tallied every time the trainee made an
inaccurate code (Guarnaccia, Dill, Sabatino, & Southwick, 2001).
Accuracy was then computed at the individual trainee level as the
number of errors divided by the number possible on each training
experience for each of the following coding segments: Location/
Space, Developmental Quality, Determinants, Form Quality, Pairs/
Reflections, Contents, Populars, Z-scores, and Special Scores.
Results
Upon compiling the archival data, it was found that some
trainees had omitted some of their Wechsler protocols resulting in
uneven protocol numbers across training experiences. For the
WAIS, 90.9% (n⫽101) of protocols were submitted; for the
WISC, 80.2% (n⫽89) of protocols were submitted. Consulting
the instructor grade books indicated that all protocols were com-
pleted and nonsubmitted protocols were not worse than those
turned in; there were no patterns of missingness to the data.
Among the Wechsler intelligence tests training experiences, 90.1%
(n⫽91) of the 101 submitted WAIS protocols and 85.4% (n⫽76)
of the 89 submitted WISC protocols contained at least one error in
administration, recording, or computation. Table 1 provides de-
scriptive statistics for the types of errors associated with each
protocol classification (homework, advanced graduate student vol-
unteer, and community volunteer).
The first hypothesis was that trainees would make significantly
fewer (a) administration, (b) recording, and (c) computational
errors on each successive training experience associated with each
Wechsler test. With respect to administration errors, visual inspec-
tion of Table 1 suggests a pattern of increasing errors for both
Wechsler tests; however, only the WAIS reveals a statistically
significant increase in administrative errors between the advanced
graduate student volunteer and the community volunteer protocols,
t(34) ⫽⫺3.23, p⫽.003, d⫽⫺0.57. For recording errors, errors
significantly decreased between WAIS protocols, t(33) ⫽1.98,
p⫽.05, d⫽0.48 to the point that the mean errors associated with
the community volunteer protocols does not significantly differ
from zero. For the WISC, cursory inspection of Table 1 may
suggest an increase in recording errors; however, this is a bit
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
23
EVIDENCE-BASED PREPRACTICUM
illusory. Although the mean number of recording errors on the
advanced graduate student volunteer protocol appears to be near
zero, a one sample ttest indicates that the mean is actually
significantly greater than zero, t(32) ⫽2.51, p⫽.02, owing to a
very small associated SD. In contrast, although the mean for the
community volunteer protocols is fractionally higher, the associ-
ated SD is larger and this results in the mean score being nonsig-
nificantly different from zero. Finally, with respect to computa-
tional errors, paired samples ttests largely support the hypothesis
of progressively fewer errors within each Wechsler test. Within the
WAIS experiences, trainees made significantly fewer computa-
tional errors on the community volunteer protocol than they did on
either the homework protocol, t(33) ⫽2.54, p⫽.02, d⫽0.43, or
the advanced graduate student volunteer protocol, t(34) ⫽2.33,
p⫽.03, d⫽0.38. Similarly, within the WISC experiences,
trainees again made significant gains in computational accuracy
between their homework protocol and their community volunteer
protocol, t(24) ⫽2.04, p⫽.05, d⫽0.59.
The second hypothesis was that, after adjusting for protocol
difficulty, coding errors would diminish across Rorschach CS
training experiences. To test this hypothesis, expert consensus
coding on standardized CS protocols served as the criterion against
which all Rorschach protocols (there were no missing protocols) in
this study were assessed. The proportion of agreement with the
expert criterion was computed for each trainee on every CS coding
category associated with each training experience. For ease of
comparison with findings among trainees reported elsewhere in the
literature, the mean proportions are expressed as percentages in
Table 2. As described in the methods section, expert ratings of
coding difficulty have been previously established for each of the
standardized training protocols used in this study. These difficulty
ratings were used to make coding accuracy adjustments to each
trainee protocol; accuracy adjustments served to standardize the
difficulty level across the training experiences so that changes in
trainees’ coding accuracy could be examined. To accomplish ad-
justments, the most difficult protocol (72% difficulty, clinical
protocol) was used as the reference point of comparison in the
following formula:
Dif ficulty Adjusted Accuracy ⫽attained accuracy ⫺attained
accuracy * 共0.72 ⫺comparison protocol difficulty兲
In conducting repeated measures ANOVA, Maulchy’s test in-
dicated that the assumption of sphericity had been violated for the
coding categories of Location/Space (
2
(5) ⫽59.71, p⫽.000),
Determinants (
2
(5) ⫽29.06, p⫽.000), Form Quality (
2
(5) ⫽
15.40, p⫽.009), Contents (
2
(5) ⫽26.93, p⫽.000), and
Populars (
2
(5) ⫽18.02, p⫽.003). Degrees of freedom were
corrected using Greenhouse-Geiser estimates of sphericity for Lo-
cation/Space (ε⫽.59), Determinants (ε⫽.65), and Contents (ε⫽
.65), whereas Huynh-Feldt estimates of sphericity were used to
correct the degrees of freedom associated with Form Quality (ε⫽
.89) and Populars (ε⫽.83). As shown in Figure 2, repeated
measures analysis of variance (ANOVA) tests were significant on
each set of difficulty-adjusted accuracy scores. The corresponding
sparklines consistently illustrate that the greatest change occurred
between the second (nonclinical, untimed protocol) and third (clin-
Table 1
Administration, Recording, and Computation Error Rates in
Order of Protocol Completion
Order Protocol MSDRange Sum % errant
WAIS-IV
1 Homework
Computation 5.09 8.68 0–35 173 77.4%
2 Graduate volunteer 9.51 9.94 0–41 333 94.3%
Administration 4.20 3.65 0–17 147 94.3%
Recording 0.77 1.82 0–8 27 25.7%
Computation 4.54 7.80 0–33 159 71.4%
3 Community volunteer 8.63 7.89 0–41 302 97.1%
Administration 6.36 4.47 0–20 229 97.2%
Recording 0.14 0.43 0–2 5 11.4%
Computation 2.03 4.85 0–27 73 47.2%
WISC-IV
1 Homework
Computation 5.38 9.42 0–34 140 59.3%
2 Graduate volunteer 6.75 6.62 0–37 216 96.9%
Administration 4.79 6.85 0–33 163 91.2%
Recording 0.21 0.49 0–2 7 18.2%
Computation 2.88 3.94 0–16 95 63.6%
3 Community volunteer 6.80 4.74 1–21 204 96.7%
Administration 6.27 5.64 1–28 207 84.8%
Recording 0.50 1.76 0–8 16 9.4%
Computation 1.00 1.39 0–6 30 50.0%
Note. % errant indicates the number of protocols that contained at least
one error.
Table 2
Coding Accuracy on Rorschach Response Segments
Order Protocol experience Loc DvQ Det FQ 2 Con P Z Spec
1 Sequential segments 99.9% 81.2% 79.0% 79.0% 86.2% 72.0% 77.7% 69.6% 48.1%
2 Nonclinical responses 97.2% 83.8% 80.7% 86.2% 86.5% 86.6% 72.4% 64.7% 73.5%
Hilsenroth et al. (2007) 96% 96% 85% 93% 91% 95% 92% 86% 89%
Guarnaccia et al. (2001) 82% 77% 75% 61% 93% 90% 87% 65% 56%
3 Clinical responses 98.4% 73.6% 73.5% 80.0% 91.4% 80.6% 96.2% 66.0% 61.8%
Hilsenroth et al. (2007) 99% 91% 78% 80% 92% 90% 97% 83% 65%
Guarnaccia et al. (2001) 82% 76% 51% 61% 93% 67% 93% 72% 34%
Meyer et al. (2002) 90% 92% 73% 82% 93% 76% 95% — 82%
4 Follow-up protocol 99.8% 83.6% 79.0% 68.9% 88.0% 76.0% 88.1% 71.8% 65.3%
Note. Loc ⫽Location/Space; DvQ ⫽Developmental Quality; Det ⫽Determinants; FQ ⫽Form Quality; 2 ⫽Pairs/Reflections; Con ⫽Contents; P ⫽
Populars; Z ⫽Z-scores; Spec ⫽Special Scores. Bold ⫽text refers to data gathered from participants in this study, whereas the remaining information
provides comparisons with the extant literature.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
24 CALLAHAN
ical, timed protocol) training experiences. Additionally, the highest
accuracy was attained on the third (clinical, timed) protocol for
coding Location/Space, Developmental Quality, Contents, and
Populars. For codes of Determinants, Form Quality, Pairs/Reflec-
tions, Z-Score, and Special Scores highest accuracy was associated
with the 8-week follow-up (clinical, timed) protocol.
Discussion
The conceptual map presented in Figure 1 was used to develop
a training strategy and corresponding hypotheses pertaining to
technical skills training in assessment prepracticum. Analyses re-
vealed that trainees demonstrated scoring competency gains across
the three WAIS training experiences, with significant reductions in
both recording and computation errors. Additional gains in scoring
accuracy were then observed within the WISC training experi-
ences. The current findings stand in stark contrast to most of the
existing literature, which strongly indicates that Wechsler practice
does not reduce scoring errors (see introduction section for refer-
ences). Previous studies of Wechsler scoring errors have found no
significant error reductions across as many as 5 to 15 administra-
tions (Belk et al., 2002;Conner & Woodall, 1983;Slate & Jones,
1990;Slate et al., 1991); yet, in the current study, significant
effects were observed within three training experiences involving
only two administrations per test. More importantly, in the current
study was there a deliberate focus on improving specific, technical
skills of increasing difficulty whereas obtaining immediate sub-
stantive feedback. Most of the earlier studies cited, by way of
contrast, relied on nonspecific repetition without changes in diffi-
culty and feedback was often minimal or provided only after a
delay. Although the findings from the current study appear to
deviate from most of the established Wechsler training literature,
there is distal support in the literature to suggest the findings herein
are not spurious (Conner & Woodall, 1983;Slate & Jones, 1989).
Furthermore, the findings are consistent with literature on the
development of expertise (Ericsson, Krampe, & Tesch-Romer,
1993). In the only other study where both the WAIS and WISC
were included, Slate, Jones, and Covert (1992) observed negative
transference from the WAIS to the WISC, concluding that, rather
than fostering proficiency, trainees seemed to practice errors.
However, in the current study, examination of the sum of errors
across protocols, and the associated error ranges, indicates the
possibility of a positive transference of learning from the WAIS
training experiences to the WISC training experiences (WAIS
graduate volunteer errors ⬎WAIS community volunteer errors ⬎
WISC graduate and community volunteers’ errors). Future re-
search specifically examining transference of learning in assess-
ment prepracticum would be helpful. Future research is also en-
couraged to specifically focus more on training in Wechsler
administration. Contrary to prediction, a significant increase in
WAIS administrative errors between the advanced graduate stu-
dent volunteer and the community volunteer protocols was ob-
served among participants in this study. It is possible that the
training experiences created an increase in cognitive load on
participants, thereby increasing likelihood of error and confound-
ing analyses (Sweller, 1988). Unlike the Rorschach tasks, in which
correction was made in analyses for increasing difficulty across the
standardized protocols, the Wechsler tasks drew from nonstandard
protocols and no standardized adjustments for protocol difficulty
could be implemented. Future research on WAIS administration
errors that correct for protocol difficulty is recommended. Despite
the isolated increase in WAIS administration errors, total Wechsler
scoring errors were significantly reduced, and in fewer practices, at
the end of the prepracticum training. In addition, the mean number
of total errors falls well below rates published elsewhere. As an
exemplar of this point, in the most recent study of this kind Loe
and colleagues (2007) reported a mean of 25.8 errors per WISC
protocol, whereas in the current study the mean number of total
errors on the third WISC protocol was 6.8. Aside from replication
of these findings at other training sites, it could be clarifying if
future research used a dismantling design to determine whether the
strong training effects observed in this study are because of the
incorporation of rapid substantive feedback versus the sequential
and cumulative presentation of increasingly complex training ex-
periences.
With respect to prepracticum technical skills training in Ror-
schach CS coding, analyses of difficulty-adjusted accuracy rates
clearly indicate substantial training gains in every scoring cate-
gory. As shown in Figure 2, the most substantial gains were
attained between the second (nonclinical, untimed protocol) and
third (clinical, timed protocol) training experiences. This is per-
haps surprising because the third protocol was 40% more difficult
than the second protocol and was associated with both time and
resource constraints. Although the third experience functionally
served as coding practice, it was under testing conditions and this
may be responsible for the reported findings herein. Practice
testing is perhaps the most potent way to use practice, with
consistently high utility in augmenting performance (Dunlosky,
Rawson, Marsh, Nathan, & Willingham, 2013). Also notable in the
Figure 2. Results of repeated measures ANOVA tests, using difficulty-adjusted scores, for scoring accuracy on
Rorschach response segments. Sparklines depict adjusted mean scores associated with each of the four training
experiences, with the darkened column demarking the protocol with the highest mean accuracy.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
25
EVIDENCE-BASED PREPRACTICUM
CS findings is the pattern of attained highest accuracy among
coding categories. Location/Space, Developmental Quality, Con-
tents, and Populars attained highest accuracy on the third protocol,
whereas for Determinants, Form Quality, Pairs/Reflections,
Z-Score, and Special Scores highest accuracy was associated with
the follow-up (clinical, timed) protocol. One possibility that has
not been considered within trainee samples learning the CS is that
some coding categories may be more difficult than others and
require longer for consolidation of learning. Support for this pos-
sibility may be inferred from Viglione (2002) who observed that
coding Location/Space and Populars seems to be less challenging
than other coding elements. The precipitous drop in Developmen-
tal Quality coding accuracy at follow-up is also remarkable. Pars-
ing the nature of errors was not done for this study, but a look at
the raw data suggests that the decline might have been tied to
either occasionally forgetting to score and/or intermittent reaching
(i.e., coding responses as involving synthesis, when no synthesis
was present). At the other extreme, the very high accuracy in
coding Location and Space should be viewed cautiously. The
standardized protocols used in this study present the correct codes
for these variables as part of every response. Thus, the only way
for a trainee to get the wrong answer was to transcribe it incor-
rectly into their sequence of scores (the same point is true for the
comparison Hilsenroth et al., 2007 data shown in Table 2).
In this study, trainees’ coding accuracy was operationalized as
the proportion of agreement with the expert scoring. A limitation
to this method is that it does not distinguish between omission
versus commission coding errors. Cohen (1960) noted the addi-
tional limitation that overall proportions can be elevated according
to probabilities equal to the observed base rates if coders (in this
case, the trainee under consideration and the expert raters) engaged
in guessing. Within the existing Rorschach CS coding accuracy
literature, some researchers have reported coefficients (e.g., in a
trainee specific sample, see Hilsenroth, Charnas, Zodan, &
Streiner, 2007), which are meant to correct for chance agreement.
However, the assumptions are not appropriately met in the current
study to utilize coefficients. The formula for is based on the
assumption of complete statistical independence of raters. The
calculated estimate of agreement is applicable only if the raters
guess on every case and make their guess with appropriate prob-
abilities. In this study, the assumption of complete guessing with
appropriate probabilities by both raters (the trainee and the expert)
was not met in any of the training experiences. Although the use
of coefficients would not have been appropriate, the issue of
intermittent chance agreement remains. Future research that crafts
a theoretical model of rater decision-making and then empirically
models rater agreement could potentially produce an appropriate
chance correction (Agresti, 1992;Uebersax, 1987). Such research
may be informative to advancing the CS literature on interrater
reliabilities.
The use of a quasi-experimental design, rather than a true
experimental design, could be viewed as a limitation to the current
study. Although a control group was not included, it is important
to note that restricted randomization was used so that half of each
cohort within each of the graduate programs (that varied in size
across programs and years) was included. By balancing group
sizes, the restricted randomization method is less impacted by
selection bias (Lachin, Matts, & Wei, 1988). Further, the more
naturalistic conditions within this study may result in greater
external validity and generalization of findings. An additional
limitation may be that not all Wechlser protocols were submitted
for analyses (90.9% of WAIS protocols and 80.2% of WISC
protocols were submitted). Examining the course grade books
indicated that these missing protocols were completed and not
better or worse than those submitted. Although speculative, one
possibility is that these novice trainees simply were strained for
time and their organization of their belongings progressively suf-
fered during the first course (WAIS training preceded WISC
training), resulting in the loss of random protocols. Some support
for this hypothesis may be found in the observation that all
protocols were submitted in the second course, when participants
may have grown more accustomed to the demands of graduate
training and enacted better personal management skills.
In summary, this article introduces a basic conceptual map (see
Figure 1) that can facilitate an evidenced-based approach to train-
ing by providing a structure for hypothesis formation and testing.
Key to this map is the integration of sequential, cumulative, and
increasingly complex training experiences with rapid and substan-
tive feedback. In the current study this framework was illustra-
tively applied to technical skills training in psychological assess-
ment at the prepracticum level, but the conceptual model is
sufficiently flexible for application to varying levels of a full range
of practicum competencies (e.g., competencies pertaining to psy-
chotherapy, supervision, or consultation). Doing so elucidates test-
able hypotheses for investigation so that the necessary, but cur-
rently insufficient, research base fostering evidence-based training
may be strengthened. Future research in this vein is strongly
encouraged. Additionally, in light of reports that assessment train-
ing has been reduced and trainees are insufficiently prepared for
internship (e.g., Clemence & Handler, 2001;Stedman, Hatch, &
Schoenfeld, 2001), the evidence-based benefits of assessment
training as shown in this study may also be informative during
training program discussions regarding allocations of time, inten-
sity, and resources for assessment training.
References
Agresti, A. (1992). Modelling patterns of agreement and disagreement.
Statistical Methods in Medical Research, 1, 201–218. doi:10.1177/
096228029200100205
Alfonso, V. C., Johnson, A., Patinella, L., & Rader, D. E. (1998). Common
WISC-III errors: Evidence from graduate students in training. Psychol-
ogy in the Schools, 35, 119–125. doi:10.1002/(SICI)1520-
6807(199804)35:2⬍119::AID-PITS3⬎3.0.CO;2-K
American Psychological Association. (2010). Ethical principles of psy-
chologists and code of conduct: including 2010 amendments. Washing-
ton, DC: American Psychological Association.
Belk, M. S., LoBello, S. G., Ray, G. E., & Zachar, P. (2002). WISC-III
administration, clerical, and scoring errors made by student examiners.
Journal of Psychoeducational Assessment, 20, 290–300. doi:10.1177/
073428290202000305
Childs, R. A., & Eyde, L. D. (2002). Assessment training in clinical
psychology doctoral programs: What should we teach? What do we
teach? Journal of Personality Assessment, 78, 130–144. doi:10.1207/
S15327752JPA7801_08
Clemence, A. J., & Handler, L. (2001). Psychological assessment on
internship: A survey of training directors and their expectations for
students. Journal of Personality Assessment, 76, 18–47. doi:10.1207/
S15327752JPA7601_2
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
26 CALLAHAN
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educa-
tional and Psychological Measurement, 20, 37–46. doi:10.1177/
001316446002000104
Commission on Accreditation. (2007). Implementing regulations: Section
C: IRs related to the guidelines and principles. Washington, DC: Amer-
ican Psychological Association.
Conner, R., & Woodall, F. E. (1983). The effects of experience and
structured feedback on WISC-R error rates made by student-examiners.
Psychology in the Schools, 20, 376–379. doi:10.1002/1520-
6807(198307)20:3⬍376::AID-PITS2310200320⬎3.0.CO;2-M
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham,
D. T. (2013). Improving students’ learning with effective learning tech-
niques: Promising directions from cognitive and educational psychol-
ogy. Psychological Science in the Public Interest, 14, 4–58. doi:
10.1177/1529100612453266
Ericsson, K. A., Krampe, R. T., & Tesch-Romer, C. (1993). The role of
deliberate practice in the acquisition of expert performance. Psycholog-
ical Review, 100, 363–406. doi:10.1037/0033-295X.100.3.363
Exner, J. E., Jr. (2001). A Rorschach workbook for the comprehensive
system (5th ed.). Asheville, NC: Rorschach Workshops.
Franklin, M. R., Stillman, P. L., Burpeau, M. Y., & Sabers, D. L. (1982).
Examiner error in intelligence testing: Are you a source? Psychology in
the Schools, 19, 563–569. doi:10.1002/1520-6807(198210)19:4⬍563::
AID-PITS2310190427⬎3.0.CO;2-Q
Guarnaccia, V., Dill, C. A., Sabatino, S., & Southwick, S. (2001).
Scoring accuracy using the comprehensive system for the Rorschach.
Journal of Personality Assessment, 77, 464–474. doi:10.1207/
S15327752JPA7703_07
Hatcher, R. L., & Lassiter, K. D. (2007). Initial training in professional
psychology: The Practicum Competencies Outline. Training and Edu-
cation in Professional Psychology, 1, 49–63. doi:10.1037/1931-3918.1
.1.49
Hilsenroth, M. J., & Charnas, J. W. (2007). Training manual for Rorschach
interrater reliability (2nd ed.). Unpublished Manuscript, The Derner
Institute of Advanced Psychological Studies, Adelphi University, Gar-
den City, NY. Retrieved from doi:10.1037/1931-3918.1.2.125.supp
Hilsenroth, M. J., Charnas, J. W., Zodan, J., & Streiner, D. L. (2007).
Criterion-based training for Rorschach scoring. Training and Education
in Professional Psychology, 1, 125–134. doi:10.1037/1931-3918.1.2.125
Krishnamurthy, R., VandeCreek, L., Kaslow, N. J., Tazeau, Y. N., Miville,
M. L., Kerns, R.,...Benton, S. A. (2004). Achieving competency in
psychological assessment: Directions for education and training. Journal
of Clinical Psychology, 60, 725–739. doi:10.1002/jclp.20010
Lachin, J. M., Matts, J. P., & Wei, L. J. (1988). Randomization in clinical
trials: Conclusions and recommendations. Controlled Clinical Trials, 9,
365–374. doi:10.1016/0197-2456(88)90049-9
Levenson, R. L., Golden-Scaduto, C. J., Aiosa-Karpas, C. J., & Ward,
A. W. (1988). Effects of examiners’ education and sex on presence and
type of clerical errors made on WISC-R protocols. Psychological Re-
ports, 62, 659–664.
Loe, S. A., Kadlubek, R. M., & Marks, W. J. (2007). Administration and
scoring errors on the WISC-IV among graduate student examiners.
Journal of Psychoeducational Assessment, 25, 237–247. doi:10.1177/
0734282906296505
Meyer, G. J., Hilsenroth, M. J., Baxter, D., Exner, J. E., Jr., Fowler, J. C.,
Piers, C. C., & Resnick, J. (2002). An examination of interrater reliabil-
ity for scoring the Rorschach comprehensive system in eight data sets.
Journal of Personality Assessment, 78, 219–274. doi:10.1207/
S15327752JPA7802_03
Ready, R. E. (2013, February). Training in psychological assessment:
Current practices of clinical psychology programs. Paper presented at
the Forty First Annual Meeting of the International Neuropsychological
Society, Waikoloa, HI.
Ryan, J. J., & Schnakenberg-Ott, S. D. (2003). Scoring reliability on the
Wechsler Intelligence Scale-Third Edition (WAIS-III). Assessment, 10,
151–159. doi:10.1177/1073191103010002006
Sherrets, S. D., Gard, G., & Langner, H. (1979). Frequency of clerical
errors on WISC protocols. Psychology in the Schools, 16, 495–496.
Slate, J. R., & Chick, D. (1989). WISC-R examiner errors: Cause for
concern. Psychology in the Schools, 26, 78–84. doi:10.1002/1520-
6807(198901)26:1⬍78::AID-PITS2310260111⬎3.0.CO;2-5
Slate, J. R., & Jones, C. H. (1989). Can teaching of the WISC-R be
improved? Quasi-experimental exploration. Professional Psychology:
Research and Practice, 20, 408–410. doi:10.1037/0735-7028.20.6.408
Slate, J. R., & Jones, C. H. (1990). Student error in administering the
WISC-R: Identifying problem areas. Measurement and Evaluation in
Counseling and Development, 23, 137–140.
Slate, J. R., Jones, C. H., & Covert, T. L. (1992). Rethinking the instruc-
tional design for teaching the WISC-R: The effects of practice admin-
istrations. College Student Journal, 26, 285–289.
Slate, J. R., Jones, C. H., & Murray, R. A. (1991). Teaching administration
and scoring of the Wechsler Adult Intelligence Scale-Revised: An em-
pirical evaluation of practice administrations. Professional Psychology:
Research and Practice, 22, 375–379. doi:10.1037/0735-7028.22.5.375
Stedman, J. M., Hatch, J. P., & Schoenfeld, L. S. (2001). The current status
of psychological assessment training in graduate and professional
schools. Journal of Personality Assessment, 77, 398–407. doi:10.1207/
S15327752JPA7703_02
Sweller, J. (1988). Cognitive load during problem solving: Effects on
learning. Cognitive Science, 12, 257–285. doi:10.1207/s15516
709cog1202_4
Uebersax, J. S. (1987). Diversity of decision-making models and the
measurement of interrater agreement. Psychological Bulletin, 101, 140–
146. doi:10.1037/0033-2909.101.1.140
Viglione, D. J. (2002). Rorschach coding solutions: A reference guide for
the comprehensive system. San Diego, CA: Donald J. Viglione.
Received August 30, 2013
Revision received March 21, 2014
Accepted April 23, 2014 䡲
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
27
EVIDENCE-BASED PREPRACTICUM
A preview of this full-text is provided by American Psychological Association.
Content available from Training and Education in Professional Psychology
This content is subject to copyright. Terms and conditions apply.