ArticlePDF Available

Evidence-Based Technical Skills Training in Pre-Practicum Psychological Assessment

American Psychological Association
Training and Education in Professional Psychology
Authors:

Abstract and Figures

The use of practice in prepracticum assessment training is extremely widespread (Childs & Eyde, 2002); however, multiple studies have observed practice to be an ineffective method for fostering basic scoring competencies. The aim of the current study was to determine whether the use of practice as a competency training method in psychological assessment could be improved. A conceptual map was developed that incorporates feedback with sequential, cumulative, and increasingly complex practice. This map was then applied to prepracticum training in the technical skills required for psychological assessment of intelligence and personality. In addition to examining errors associated with the Wechsler Adult Intelligence Scale and Wechsler Intelligence Scale for Children, the Rorschach Comprehensive System was also investigated. Using the methods herein, Wechsler scoring errors were significantly reduced within three training experiences to a mean well below published error rates. Training in the Rorschach Comprehensive System also indicated substantial gains in coding accuracy within three practice experiences that remained largely intact at 8-weeks follow-up. Future research of this prepracticum training model, which lends itself to a full range of competencies across training levels, is strongly encouraged.
Content may be subject to copyright.
Evidence-Based Technical Skills Training in Pre-Practicum
Psychological Assessment
Jennifer L. Callahan
University of North Texas
The use of practice in prepracticum assessment training is extremely widespread (Childs & Eyde, 2002);
however, multiple studies have observed practice to be an ineffective method for fostering basic scoring
competencies. The aim of the current study was to determine whether the use of practice as a competency
training method in psychological assessment could be improved. A conceptual map was developed that
incorporates feedback with sequential, cumulative, and increasingly complex practice. This map was then
applied to prepracticum training in the technical skills required for psychological assessment of intelli-
gence and personality. In addition to examining errors associated with the Wechsler Adult Intelligence
Scale and Wechsler Intelligence Scale for Children, the Rorschach Comprehensive System was also
investigated. Using the methods herein, Wechsler scoring errors were significantly reduced within three
training experiences to a mean well below published error rates. Training in the Rorschach Comprehen-
sive System also indicated substantial gains in coding accuracy within three practice experiences that
remained largely intact at 8-weeks follow-up. Future research of this prepracticum training model, which
lends itself to a full range of competencies across training levels, is strongly encouraged.
Keywords: prepracticum, psychological assessment, examiner training, practice, training model
Supplemental materials: http://dx.doi.org/10.1037/tep0000061.supp
Competency in psychological assessment has been described as,
“a defining aspect of psychological expertise” (p. 726) that is,
“essential to all health-service practices in psychology, transcend-
ing specialties” (p. 732; Krishnamurthy et al., 2004). The identi-
fication of training in psychological assessment as necessarily
beginning during prepracticum (Hatcher & Lassiter, 2007) and
spanning one’s entire career (Krishnamurthy et al., 2004) under-
scores importance, but also poses some challenges to trainers. In
particular, trainers must determine which specific skills they will
aim to selectively develop among trainees during the brief pre-
practicum window. Childs and Eyde (2002) observed that, among
the 84 accredited clinical psychology doctoral programs they re-
viewed, all provided training in both intelligence and personality
assessment and almost all required formal supervised practice in
administration, scoring, and interpretation of psychological tests.
A more recent study by Ready (2013) reports similar findings
among 77 accredited clinical psychology doctoral programs, sug-
gesting that much of prepracticum assessment training may focus
on what the literature conceptualizes as “technical assessment
skills” (p. 732; Krishnamurthy et al., 2004).
Within the small body of existing literature, the most common
method of assessing competency in technical assessment skills is
to examine the prevalence and types of scoring errors. The
Wechsler Intelligence Scales have been the most frequently con-
sidered in this regard with studies of trainees’ competency span-
ning multiple editions of both the Wechsler Adult Intelligence
Scale (WAIS) and Wechsler Intelligence Scale for Children
(WISC) versions. Although the mean number of errors demon-
strates some variability among the different trainee samples, the
results of these studies have been remarkably consistent in two
regards: (a) errors of administration, recording, and calculation are
very common; and (b) repetitious practice does not reduce errors
(Alfonso, Johnson, Patinella, & Rader, 1998;Belk, LoBello, Ray,
& Zachar, 2002;Franklin, Stillman, Burpeau, & Sabers, 1982;
Levenson, Golden-Scaduto, Aiosa-Karpas, & Ward, 1988;Loe,
Kadlubek, & Marks, 2007;Ryan & Schnakenberg-Ott, 2003;Sher-
rets, Gard, & Langner, 1979;Slate & Chick, 1989;Slate & Jones,
1990;Slate, Jones, & Covert, 1992;Slate, Jones, & Murray, 1991).
In light of Childs and Eyde’s (2002) report that most doctoral
programs require practice as a part of their assessment training, the
consistent finding that practice does not reduce errors is a training
conundrum. Can the use of practice as a competency training
method in psychological assessment be improved? Exploring this
question was the central aim of this study.
Within the broad literature on the acquisition of expertise, it has
been suggested that, unlike simple repetitious practice, practice
that deliberately focuses on improving specific aspects of increas-
ing difficulty while obtaining immediate substantive feedback can
foster expertise (see Ericsson, Krampe, & Tesch-Romer, 1993, for
This article was published Online First August 18, 2014.
JENNIFER L. CALLAHAN earned her PhD in clinical psychology from the
University of Wisconsin-Milwaukee, completed her internship and post-
doctoral training at Yale University, and holds board certification in
clinical psychology. She is currently associate professor and director of
clinical training for the clinical psychology program at the University of
North Texas, where she directs the Evidence-Based Training and Compe-
tencies Research Lab.
CORRESPONDENCE CONCERNING THIS ARTICLE should be addressed to
Jennifer L. Callahan, Department of Psychology, University of North
Texas, 1155 Union Circle #311280, Denton TX 76205. E-mail: jennifer
.callahan@unt.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Training and Education in Professional Psychology © 2014 American Psychological Association
2015, Vol. 9, No. 1, 21–27 1931-3918/15/$12.00 http://dx.doi.org/10.1037/tep0000061
21
a salient and oft cited framework on the development of expertise).
Stated in the training vernacular, the suggestion is that effective
practice is “sequential, cumulative, and graded in complexity” (p.
60, Implementing Regulation C-26; Commission on Accreditation,
2007) and paired with immediate feedback. Very little research has
empirically considered this training suggestion, though Conner and
Woodall (1983) reported that coupling practice with structured
feedback reduced administrative errors and total error rate, despite
no significant effect on response scoring or calculation errors.
Subsequently, Slate and Jones (1989) used a quasi-experimental
design in teaching the WISC to small sequential cohorts. The
experimental group was given detailed information about frequent
errors made by the preceding control group and how the errors
could be avoided. This feedback resulted in the experimental group
making fewer errors; however, neither group demonstrated an
improvement in accuracy over seven practice administrations.
Though neither of these studies demonstrated strongly positive
final outcomes with respect to trainees’ scoring competency, they
do lend support to the premise that use of feedback may facilitate
acquisition of technical assessment skills during prepracticum
training.
The current study sought to more fully explore the use of
feedback, coupled with sequential, cumulative, and increasingly
complex practice, during prepracticum training in the technical
skills associated with psychological assessment of intelligence and
personality. In addition to examining errors associated with the
WAIS and WISC, the Rorschach Comprehensive System (CS) was
also included so that both intellectual and personality assessment
are represented. A basic conceptual map was crafted to guide the
development of a training curriculum that could be subjected to
empirical study and hypothesis testing. As shown in Figure 1,
across training sequences, technical skills were developed cumu-
latively while complexity was increased. For the WAIS and WISC,
three types of previously researched scoring errors were mapped
onto this conceptual framework: administrative errors, recording
errors, and computational errors (Loe, Kadlubek, & Marks, 2007).
Administrative errors included failures to query appropriately,
starting with the wrong item, failure to apply basal or ceiling rules,
assigning an incorrect number of points to an item, and/or admin-
istering an incorrect number of items. Recording errors consisted
of either a failure to record a response or a failure to indicate
completion times when required. Computational errors could result
from any of the following: miscalculation of chronological age,
subtest raw, scaled, and/or standard scores, or incorrect conversion
into index scores, IQ scores, percentile ranks, and/or confidence
intervals. Errant values in any of the computational error catego-
ries that were the solely the result of an earlier administrative error
were noted, but they were excluded from the computational error
total to avoid progressively compounding errors. The first training
experience for each Wechsler measure involved performing com-
putations (i.e., chronological age, raw scores, scaled scores, index
scores, percentile ranks, confidence intervals, and discrepancy
scores) as a homework assignment and then receiving feedback.
The second training experience additionally required filling out the
protocol record form while administering the Wechsler to an
advanced graduate student volunteer. In addition to receiving
feedback on scoring, trainees received in vivo feedback regarding
their administration skills from the advanced graduate student
volunteer. The third training experience again required adminis-
tration, recording, and computation tasks, though within the con-
text of a brief assessment battery (that, beyond the WAIS or
WISC, included a clinical interview and an achievement measure)
using a community volunteer who was previously unknown to the
trainee.
Training with the Rorschach CS was also mapped onto the
conceptual framework presented in Figure 1. The first Rorschach
CS training experience consisted of coding
1
homework, accom-
plished over several weeks, in which trainees progressively coded
isolated elements of responses (e.g., coding Location and Space for
each response, receiving feedback, coding Determinants, receiving
feedback, and so forth) until the entire response was coded. The
second experience had trainees code all responses on a nonclinical
profile before receiving feedback. All coding for Experience 2 was
required to be completed within 1 week of assignment. The third
training experience required coding of all responses on a clinical
profile under time constraints (1 hr, 45 min) with only the Ror-
schach CS workbook available (Exner, 2001). As reflected in
Figure 1, response coding was required in all three training expe-
riences (band appearing in all three columns). The response diffi-
culty level increased after the first experience but was approxi-
mately equivalent in Experiences 2 and 3 (middle band of
Experiences 2 and 3). The naturalistic conditions were more
closely approximated across Rorschach Experiences 2 and 3
(lower band in each) by exacerbating the time constraint and
imposing resource limitations.
In light of the established literature and the conceptual frame-
work for the current study, the first hypothesis was that trainees
would make significantly fewer (a) administration, (b) recording,
and (c) computational errors on each successive training experi-
ence associated with each Wechsler test. The second hypothesis
was that, after adjusting for protocol difficulty, coding errors
would diminish across Rorschach CS training experiences.
1
Scoring in the Rorschach CS is referred to as coding; as such, the term
coding will be used throughout this article.
Experience 1 Experience 2 Experience 3
Complexity
Sequence of Training
Feedback
Feedback
Figure 1. Conceptual model for an evidence-based training approach.
Each successive training experience is cumulative, with greater complexity
progressively introduced by adding requirements (middle band of Experi-
ences 2 and 3) or increasing difficulty associated with a requirement (lower
band of Experiences 2 and 3). Feedback was additionally provided within
Experience 1 for the Rorschach CS and within Experience 2 for both the
WAIS and WISC.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
22 CALLAHAN
Method
Participants
Data were derived from archival records (years 2009–2013) of
a two-course sequence in psychological assessment. The sequence
was required of all students without previous salient coursework
who were enrolled in their first year of graduate training. Students
with prior assessment coursework were individually reviewed by
faculty and, when appropriate, routed to more advanced course-
work instead. Therefore, all participants were novice trainees, at
the prepracticum level of training. A staff member of the Depart-
ment of Psychology independently randomly sorted students into
two enrollment groups before the start of each semester: a moni-
toring condition and a nonmonitoring condition. Students were
provided with enrollment codes each semester that were linked
specifically to their identity, ensuring that no codes could be
successfully exchanged among students. The staff member was
unaware of which group was the monitoring condition and had
only one stipulation: within each cohort of each of the programs
(clinical, counseling, and so forth), an equal number of students
should be assigned to each condition (i.e., restricted randomization
procedures). Both conditions received the same curriculum, but
because of the limited human resources available data were cap-
tured only for those trainees prospectively assigned to the moni-
toring condition. There were 37 students assigned to the monitor-
ing condition for each course. Twenty students were randomized to
the monitoring condition for both courses in the sequence. Thus,
the total number of participants across the two-course sequence is
69, though only 37 were enrolled in either course. Participants
included 53 women (76.8%) and 16 men (23.2%). Public disclo-
sures reflected 18 minority (race, ethnicity, and/or sexual orienta-
tion; 26.1%) and 51 nonminority (73.9%) trainees. Trainees were
drawn from a graduate Counseling Psychology master’s degree
program (n14; 20.3%) or one of three American Psychological
Association (APA) accredited doctoral (Ph.D.) programs in Clin-
ical Psychology (n20; 29.0%), Clinical Health Psychology and
Behavioral Medicine (accredited as Clinical; n15; 21.7%), or
Counseling Psychology (n18; 26.1%). The program affiliation
of two participants was not identified. Each program espouses a
scientist-practitioner training model and is housed in the same
Department of Psychology within a large, public university. All
students and their data were treated in accordance with the APAs
Ethical Principles of Psychologists and Code of Conduct (Amer-
ican Psychological Association, 2010) and the local Institutional
Review Board granted approval of the current study.
Procedures
At the conclusion of each term, trainees were asked to provide
copies of their protocols for subsequent data entry. Data were not
compiled for hypothesis testing until after the final cohort provided
their copies; thus, no analyses were conducted between training
cohorts. Because of space constraints, details of the training cur-
riculums are not provided herein. However, the curriculum details
did undergo peer review and are offered as supplemental materials
online via this journal for trainers who may wish to replicate the
curriculum.
The Wechsler training protocols used in this study differed
among participants (i.e., they were not drawn from a standardized
bank of protocols) and were scored by seven raters under the
supervision of the principal investigator. All raters were advanced
doctoral students with experience as a graduate assessment course
teaching fellow. An additional rater independently rescored the
protocols used in rater training (WAIS n42; WISC n42).
Two-way random model, consistency intraclass coefficients (ICC)
were computed and found to be uniformly excellent across both
the WAIS and the WISC rater training protocols. With respect to
scoring errors in administration, the WAIS ICC .92 and the
WISC ICC .99. The ICC for scoring recording errors was 1.00
for both the WAIS and the WISC protocols. Scoring of computa-
tional errors produced an ICC of .96 for the WAIS and .98 for the
WISC. The principal investigator was the final arbiter of accuracy.
Rorschach CS training protocols (n6) were drawn from a
training manual for coding the Rorschach CS, which is available
from http://dx.doi.org/10.1037/1931–3918.1.2.125.supp (Hilsen-
roth & Charnas, 2007). These protocols reflect both nonclinical
and clinical populations and were standardized by expert CS
scorers, who additionally graded the inherent coding difficulty
associated with each protocol. For each Rorschach CS training
protocol, an error was tallied every time the trainee made an
inaccurate code (Guarnaccia, Dill, Sabatino, & Southwick, 2001).
Accuracy was then computed at the individual trainee level as the
number of errors divided by the number possible on each training
experience for each of the following coding segments: Location/
Space, Developmental Quality, Determinants, Form Quality, Pairs/
Reflections, Contents, Populars, Z-scores, and Special Scores.
Results
Upon compiling the archival data, it was found that some
trainees had omitted some of their Wechsler protocols resulting in
uneven protocol numbers across training experiences. For the
WAIS, 90.9% (n101) of protocols were submitted; for the
WISC, 80.2% (n89) of protocols were submitted. Consulting
the instructor grade books indicated that all protocols were com-
pleted and nonsubmitted protocols were not worse than those
turned in; there were no patterns of missingness to the data.
Among the Wechsler intelligence tests training experiences, 90.1%
(n91) of the 101 submitted WAIS protocols and 85.4% (n76)
of the 89 submitted WISC protocols contained at least one error in
administration, recording, or computation. Table 1 provides de-
scriptive statistics for the types of errors associated with each
protocol classification (homework, advanced graduate student vol-
unteer, and community volunteer).
The first hypothesis was that trainees would make significantly
fewer (a) administration, (b) recording, and (c) computational
errors on each successive training experience associated with each
Wechsler test. With respect to administration errors, visual inspec-
tion of Table 1 suggests a pattern of increasing errors for both
Wechsler tests; however, only the WAIS reveals a statistically
significant increase in administrative errors between the advanced
graduate student volunteer and the community volunteer protocols,
t(34) ⫽⫺3.23, p.003, d⫽⫺0.57. For recording errors, errors
significantly decreased between WAIS protocols, t(33) 1.98,
p.05, d0.48 to the point that the mean errors associated with
the community volunteer protocols does not significantly differ
from zero. For the WISC, cursory inspection of Table 1 may
suggest an increase in recording errors; however, this is a bit
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
23
EVIDENCE-BASED PREPRACTICUM
illusory. Although the mean number of recording errors on the
advanced graduate student volunteer protocol appears to be near
zero, a one sample ttest indicates that the mean is actually
significantly greater than zero, t(32) 2.51, p.02, owing to a
very small associated SD. In contrast, although the mean for the
community volunteer protocols is fractionally higher, the associ-
ated SD is larger and this results in the mean score being nonsig-
nificantly different from zero. Finally, with respect to computa-
tional errors, paired samples ttests largely support the hypothesis
of progressively fewer errors within each Wechsler test. Within the
WAIS experiences, trainees made significantly fewer computa-
tional errors on the community volunteer protocol than they did on
either the homework protocol, t(33) 2.54, p.02, d0.43, or
the advanced graduate student volunteer protocol, t(34) 2.33,
p.03, d0.38. Similarly, within the WISC experiences,
trainees again made significant gains in computational accuracy
between their homework protocol and their community volunteer
protocol, t(24) 2.04, p.05, d0.59.
The second hypothesis was that, after adjusting for protocol
difficulty, coding errors would diminish across Rorschach CS
training experiences. To test this hypothesis, expert consensus
coding on standardized CS protocols served as the criterion against
which all Rorschach protocols (there were no missing protocols) in
this study were assessed. The proportion of agreement with the
expert criterion was computed for each trainee on every CS coding
category associated with each training experience. For ease of
comparison with findings among trainees reported elsewhere in the
literature, the mean proportions are expressed as percentages in
Table 2. As described in the methods section, expert ratings of
coding difficulty have been previously established for each of the
standardized training protocols used in this study. These difficulty
ratings were used to make coding accuracy adjustments to each
trainee protocol; accuracy adjustments served to standardize the
difficulty level across the training experiences so that changes in
trainees’ coding accuracy could be examined. To accomplish ad-
justments, the most difficult protocol (72% difficulty, clinical
protocol) was used as the reference point of comparison in the
following formula:
Dif ficulty Adjusted Accuracy attained accuracy attained
accuracy * 0.72 comparison protocol difficulty
In conducting repeated measures ANOVA, Maulchy’s test in-
dicated that the assumption of sphericity had been violated for the
coding categories of Location/Space (
2
(5) 59.71, p.000),
Determinants (
2
(5) 29.06, p.000), Form Quality (
2
(5)
15.40, p.009), Contents (
2
(5) 26.93, p.000), and
Populars (
2
(5) 18.02, p.003). Degrees of freedom were
corrected using Greenhouse-Geiser estimates of sphericity for Lo-
cation/Space (ε.59), Determinants (ε.65), and Contents (ε
.65), whereas Huynh-Feldt estimates of sphericity were used to
correct the degrees of freedom associated with Form Quality (ε
.89) and Populars (ε.83). As shown in Figure 2, repeated
measures analysis of variance (ANOVA) tests were significant on
each set of difficulty-adjusted accuracy scores. The corresponding
sparklines consistently illustrate that the greatest change occurred
between the second (nonclinical, untimed protocol) and third (clin-
Table 1
Administration, Recording, and Computation Error Rates in
Order of Protocol Completion
Order Protocol MSDRange Sum % errant
WAIS-IV
1 Homework
Computation 5.09 8.68 0–35 173 77.4%
2 Graduate volunteer 9.51 9.94 0–41 333 94.3%
Administration 4.20 3.65 0–17 147 94.3%
Recording 0.77 1.82 0–8 27 25.7%
Computation 4.54 7.80 0–33 159 71.4%
3 Community volunteer 8.63 7.89 0–41 302 97.1%
Administration 6.36 4.47 0–20 229 97.2%
Recording 0.14 0.43 0–2 5 11.4%
Computation 2.03 4.85 0–27 73 47.2%
WISC-IV
1 Homework
Computation 5.38 9.42 0–34 140 59.3%
2 Graduate volunteer 6.75 6.62 0–37 216 96.9%
Administration 4.79 6.85 0–33 163 91.2%
Recording 0.21 0.49 0–2 7 18.2%
Computation 2.88 3.94 0–16 95 63.6%
3 Community volunteer 6.80 4.74 1–21 204 96.7%
Administration 6.27 5.64 1–28 207 84.8%
Recording 0.50 1.76 0–8 16 9.4%
Computation 1.00 1.39 0–6 30 50.0%
Note. % errant indicates the number of protocols that contained at least
one error.
Table 2
Coding Accuracy on Rorschach Response Segments
Order Protocol experience Loc DvQ Det FQ 2 Con P Z Spec
1 Sequential segments 99.9% 81.2% 79.0% 79.0% 86.2% 72.0% 77.7% 69.6% 48.1%
2 Nonclinical responses 97.2% 83.8% 80.7% 86.2% 86.5% 86.6% 72.4% 64.7% 73.5%
Hilsenroth et al. (2007) 96% 96% 85% 93% 91% 95% 92% 86% 89%
Guarnaccia et al. (2001) 82% 77% 75% 61% 93% 90% 87% 65% 56%
3 Clinical responses 98.4% 73.6% 73.5% 80.0% 91.4% 80.6% 96.2% 66.0% 61.8%
Hilsenroth et al. (2007) 99% 91% 78% 80% 92% 90% 97% 83% 65%
Guarnaccia et al. (2001) 82% 76% 51% 61% 93% 67% 93% 72% 34%
Meyer et al. (2002) 90% 92% 73% 82% 93% 76% 95% — 82%
4 Follow-up protocol 99.8% 83.6% 79.0% 68.9% 88.0% 76.0% 88.1% 71.8% 65.3%
Note. Loc Location/Space; DvQ Developmental Quality; Det Determinants; FQ Form Quality; 2 Pairs/Reflections; Con Contents; P
Populars; Z Z-scores; Spec Special Scores. Bold text refers to data gathered from participants in this study, whereas the remaining information
provides comparisons with the extant literature.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
24 CALLAHAN
ical, timed protocol) training experiences. Additionally, the highest
accuracy was attained on the third (clinical, timed) protocol for
coding Location/Space, Developmental Quality, Contents, and
Populars. For codes of Determinants, Form Quality, Pairs/Reflec-
tions, Z-Score, and Special Scores highest accuracy was associated
with the 8-week follow-up (clinical, timed) protocol.
Discussion
The conceptual map presented in Figure 1 was used to develop
a training strategy and corresponding hypotheses pertaining to
technical skills training in assessment prepracticum. Analyses re-
vealed that trainees demonstrated scoring competency gains across
the three WAIS training experiences, with significant reductions in
both recording and computation errors. Additional gains in scoring
accuracy were then observed within the WISC training experi-
ences. The current findings stand in stark contrast to most of the
existing literature, which strongly indicates that Wechsler practice
does not reduce scoring errors (see introduction section for refer-
ences). Previous studies of Wechsler scoring errors have found no
significant error reductions across as many as 5 to 15 administra-
tions (Belk et al., 2002;Conner & Woodall, 1983;Slate & Jones,
1990;Slate et al., 1991); yet, in the current study, significant
effects were observed within three training experiences involving
only two administrations per test. More importantly, in the current
study was there a deliberate focus on improving specific, technical
skills of increasing difficulty whereas obtaining immediate sub-
stantive feedback. Most of the earlier studies cited, by way of
contrast, relied on nonspecific repetition without changes in diffi-
culty and feedback was often minimal or provided only after a
delay. Although the findings from the current study appear to
deviate from most of the established Wechsler training literature,
there is distal support in the literature to suggest the findings herein
are not spurious (Conner & Woodall, 1983;Slate & Jones, 1989).
Furthermore, the findings are consistent with literature on the
development of expertise (Ericsson, Krampe, & Tesch-Romer,
1993). In the only other study where both the WAIS and WISC
were included, Slate, Jones, and Covert (1992) observed negative
transference from the WAIS to the WISC, concluding that, rather
than fostering proficiency, trainees seemed to practice errors.
However, in the current study, examination of the sum of errors
across protocols, and the associated error ranges, indicates the
possibility of a positive transference of learning from the WAIS
training experiences to the WISC training experiences (WAIS
graduate volunteer errors WAIS community volunteer errors
WISC graduate and community volunteers’ errors). Future re-
search specifically examining transference of learning in assess-
ment prepracticum would be helpful. Future research is also en-
couraged to specifically focus more on training in Wechsler
administration. Contrary to prediction, a significant increase in
WAIS administrative errors between the advanced graduate stu-
dent volunteer and the community volunteer protocols was ob-
served among participants in this study. It is possible that the
training experiences created an increase in cognitive load on
participants, thereby increasing likelihood of error and confound-
ing analyses (Sweller, 1988). Unlike the Rorschach tasks, in which
correction was made in analyses for increasing difficulty across the
standardized protocols, the Wechsler tasks drew from nonstandard
protocols and no standardized adjustments for protocol difficulty
could be implemented. Future research on WAIS administration
errors that correct for protocol difficulty is recommended. Despite
the isolated increase in WAIS administration errors, total Wechsler
scoring errors were significantly reduced, and in fewer practices, at
the end of the prepracticum training. In addition, the mean number
of total errors falls well below rates published elsewhere. As an
exemplar of this point, in the most recent study of this kind Loe
and colleagues (2007) reported a mean of 25.8 errors per WISC
protocol, whereas in the current study the mean number of total
errors on the third WISC protocol was 6.8. Aside from replication
of these findings at other training sites, it could be clarifying if
future research used a dismantling design to determine whether the
strong training effects observed in this study are because of the
incorporation of rapid substantive feedback versus the sequential
and cumulative presentation of increasingly complex training ex-
periences.
With respect to prepracticum technical skills training in Ror-
schach CS coding, analyses of difficulty-adjusted accuracy rates
clearly indicate substantial training gains in every scoring cate-
gory. As shown in Figure 2, the most substantial gains were
attained between the second (nonclinical, untimed protocol) and
third (clinical, timed protocol) training experiences. This is per-
haps surprising because the third protocol was 40% more difficult
than the second protocol and was associated with both time and
resource constraints. Although the third experience functionally
served as coding practice, it was under testing conditions and this
may be responsible for the reported findings herein. Practice
testing is perhaps the most potent way to use practice, with
consistently high utility in augmenting performance (Dunlosky,
Rawson, Marsh, Nathan, & Willingham, 2013). Also notable in the
Figure 2. Results of repeated measures ANOVA tests, using difficulty-adjusted scores, for scoring accuracy on
Rorschach response segments. Sparklines depict adjusted mean scores associated with each of the four training
experiences, with the darkened column demarking the protocol with the highest mean accuracy.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
25
EVIDENCE-BASED PREPRACTICUM
CS findings is the pattern of attained highest accuracy among
coding categories. Location/Space, Developmental Quality, Con-
tents, and Populars attained highest accuracy on the third protocol,
whereas for Determinants, Form Quality, Pairs/Reflections,
Z-Score, and Special Scores highest accuracy was associated with
the follow-up (clinical, timed) protocol. One possibility that has
not been considered within trainee samples learning the CS is that
some coding categories may be more difficult than others and
require longer for consolidation of learning. Support for this pos-
sibility may be inferred from Viglione (2002) who observed that
coding Location/Space and Populars seems to be less challenging
than other coding elements. The precipitous drop in Developmen-
tal Quality coding accuracy at follow-up is also remarkable. Pars-
ing the nature of errors was not done for this study, but a look at
the raw data suggests that the decline might have been tied to
either occasionally forgetting to score and/or intermittent reaching
(i.e., coding responses as involving synthesis, when no synthesis
was present). At the other extreme, the very high accuracy in
coding Location and Space should be viewed cautiously. The
standardized protocols used in this study present the correct codes
for these variables as part of every response. Thus, the only way
for a trainee to get the wrong answer was to transcribe it incor-
rectly into their sequence of scores (the same point is true for the
comparison Hilsenroth et al., 2007 data shown in Table 2).
In this study, trainees’ coding accuracy was operationalized as
the proportion of agreement with the expert scoring. A limitation
to this method is that it does not distinguish between omission
versus commission coding errors. Cohen (1960) noted the addi-
tional limitation that overall proportions can be elevated according
to probabilities equal to the observed base rates if coders (in this
case, the trainee under consideration and the expert raters) engaged
in guessing. Within the existing Rorschach CS coding accuracy
literature, some researchers have reported coefficients (e.g., in a
trainee specific sample, see Hilsenroth, Charnas, Zodan, &
Streiner, 2007), which are meant to correct for chance agreement.
However, the assumptions are not appropriately met in the current
study to utilize coefficients. The formula for is based on the
assumption of complete statistical independence of raters. The
calculated estimate of agreement is applicable only if the raters
guess on every case and make their guess with appropriate prob-
abilities. In this study, the assumption of complete guessing with
appropriate probabilities by both raters (the trainee and the expert)
was not met in any of the training experiences. Although the use
of coefficients would not have been appropriate, the issue of
intermittent chance agreement remains. Future research that crafts
a theoretical model of rater decision-making and then empirically
models rater agreement could potentially produce an appropriate
chance correction (Agresti, 1992;Uebersax, 1987). Such research
may be informative to advancing the CS literature on interrater
reliabilities.
The use of a quasi-experimental design, rather than a true
experimental design, could be viewed as a limitation to the current
study. Although a control group was not included, it is important
to note that restricted randomization was used so that half of each
cohort within each of the graduate programs (that varied in size
across programs and years) was included. By balancing group
sizes, the restricted randomization method is less impacted by
selection bias (Lachin, Matts, & Wei, 1988). Further, the more
naturalistic conditions within this study may result in greater
external validity and generalization of findings. An additional
limitation may be that not all Wechlser protocols were submitted
for analyses (90.9% of WAIS protocols and 80.2% of WISC
protocols were submitted). Examining the course grade books
indicated that these missing protocols were completed and not
better or worse than those submitted. Although speculative, one
possibility is that these novice trainees simply were strained for
time and their organization of their belongings progressively suf-
fered during the first course (WAIS training preceded WISC
training), resulting in the loss of random protocols. Some support
for this hypothesis may be found in the observation that all
protocols were submitted in the second course, when participants
may have grown more accustomed to the demands of graduate
training and enacted better personal management skills.
In summary, this article introduces a basic conceptual map (see
Figure 1) that can facilitate an evidenced-based approach to train-
ing by providing a structure for hypothesis formation and testing.
Key to this map is the integration of sequential, cumulative, and
increasingly complex training experiences with rapid and substan-
tive feedback. In the current study this framework was illustra-
tively applied to technical skills training in psychological assess-
ment at the prepracticum level, but the conceptual model is
sufficiently flexible for application to varying levels of a full range
of practicum competencies (e.g., competencies pertaining to psy-
chotherapy, supervision, or consultation). Doing so elucidates test-
able hypotheses for investigation so that the necessary, but cur-
rently insufficient, research base fostering evidence-based training
may be strengthened. Future research in this vein is strongly
encouraged. Additionally, in light of reports that assessment train-
ing has been reduced and trainees are insufficiently prepared for
internship (e.g., Clemence & Handler, 2001;Stedman, Hatch, &
Schoenfeld, 2001), the evidence-based benefits of assessment
training as shown in this study may also be informative during
training program discussions regarding allocations of time, inten-
sity, and resources for assessment training.
References
Agresti, A. (1992). Modelling patterns of agreement and disagreement.
Statistical Methods in Medical Research, 1, 201–218. doi:10.1177/
096228029200100205
Alfonso, V. C., Johnson, A., Patinella, L., & Rader, D. E. (1998). Common
WISC-III errors: Evidence from graduate students in training. Psychol-
ogy in the Schools, 35, 119–125. doi:10.1002/(SICI)1520-
6807(199804)35:2119::AID-PITS33.0.CO;2-K
American Psychological Association. (2010). Ethical principles of psy-
chologists and code of conduct: including 2010 amendments. Washing-
ton, DC: American Psychological Association.
Belk, M. S., LoBello, S. G., Ray, G. E., & Zachar, P. (2002). WISC-III
administration, clerical, and scoring errors made by student examiners.
Journal of Psychoeducational Assessment, 20, 290–300. doi:10.1177/
073428290202000305
Childs, R. A., & Eyde, L. D. (2002). Assessment training in clinical
psychology doctoral programs: What should we teach? What do we
teach? Journal of Personality Assessment, 78, 130–144. doi:10.1207/
S15327752JPA7801_08
Clemence, A. J., & Handler, L. (2001). Psychological assessment on
internship: A survey of training directors and their expectations for
students. Journal of Personality Assessment, 76, 1847. doi:10.1207/
S15327752JPA7601_2
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
26 CALLAHAN
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educa-
tional and Psychological Measurement, 20, 37–46. doi:10.1177/
001316446002000104
Commission on Accreditation. (2007). Implementing regulations: Section
C: IRs related to the guidelines and principles. Washington, DC: Amer-
ican Psychological Association.
Conner, R., & Woodall, F. E. (1983). The effects of experience and
structured feedback on WISC-R error rates made by student-examiners.
Psychology in the Schools, 20, 376–379. doi:10.1002/1520-
6807(198307)20:3376::AID-PITS23102003203.0.CO;2-M
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham,
D. T. (2013). Improving students’ learning with effective learning tech-
niques: Promising directions from cognitive and educational psychol-
ogy. Psychological Science in the Public Interest, 14, 4–58. doi:
10.1177/1529100612453266
Ericsson, K. A., Krampe, R. T., & Tesch-Romer, C. (1993). The role of
deliberate practice in the acquisition of expert performance. Psycholog-
ical Review, 100, 363–406. doi:10.1037/0033-295X.100.3.363
Exner, J. E., Jr. (2001). A Rorschach workbook for the comprehensive
system (5th ed.). Asheville, NC: Rorschach Workshops.
Franklin, M. R., Stillman, P. L., Burpeau, M. Y., & Sabers, D. L. (1982).
Examiner error in intelligence testing: Are you a source? Psychology in
the Schools, 19, 563–569. doi:10.1002/1520-6807(198210)19:4563::
AID-PITS23101904273.0.CO;2-Q
Guarnaccia, V., Dill, C. A., Sabatino, S., & Southwick, S. (2001).
Scoring accuracy using the comprehensive system for the Rorschach.
Journal of Personality Assessment, 77, 464474. doi:10.1207/
S15327752JPA7703_07
Hatcher, R. L., & Lassiter, K. D. (2007). Initial training in professional
psychology: The Practicum Competencies Outline. Training and Edu-
cation in Professional Psychology, 1, 4963. doi:10.1037/1931-3918.1
.1.49
Hilsenroth, M. J., & Charnas, J. W. (2007). Training manual for Rorschach
interrater reliability (2nd ed.). Unpublished Manuscript, The Derner
Institute of Advanced Psychological Studies, Adelphi University, Gar-
den City, NY. Retrieved from doi:10.1037/1931-3918.1.2.125.supp
Hilsenroth, M. J., Charnas, J. W., Zodan, J., & Streiner, D. L. (2007).
Criterion-based training for Rorschach scoring. Training and Education
in Professional Psychology, 1, 125–134. doi:10.1037/1931-3918.1.2.125
Krishnamurthy, R., VandeCreek, L., Kaslow, N. J., Tazeau, Y. N., Miville,
M. L., Kerns, R.,...Benton, S. A. (2004). Achieving competency in
psychological assessment: Directions for education and training. Journal
of Clinical Psychology, 60, 725–739. doi:10.1002/jclp.20010
Lachin, J. M., Matts, J. P., & Wei, L. J. (1988). Randomization in clinical
trials: Conclusions and recommendations. Controlled Clinical Trials, 9,
365–374. doi:10.1016/0197-2456(88)90049-9
Levenson, R. L., Golden-Scaduto, C. J., Aiosa-Karpas, C. J., & Ward,
A. W. (1988). Effects of examiners’ education and sex on presence and
type of clerical errors made on WISC-R protocols. Psychological Re-
ports, 62, 659664.
Loe, S. A., Kadlubek, R. M., & Marks, W. J. (2007). Administration and
scoring errors on the WISC-IV among graduate student examiners.
Journal of Psychoeducational Assessment, 25, 237–247. doi:10.1177/
0734282906296505
Meyer, G. J., Hilsenroth, M. J., Baxter, D., Exner, J. E., Jr., Fowler, J. C.,
Piers, C. C., & Resnick, J. (2002). An examination of interrater reliabil-
ity for scoring the Rorschach comprehensive system in eight data sets.
Journal of Personality Assessment, 78, 219–274. doi:10.1207/
S15327752JPA7802_03
Ready, R. E. (2013, February). Training in psychological assessment:
Current practices of clinical psychology programs. Paper presented at
the Forty First Annual Meeting of the International Neuropsychological
Society, Waikoloa, HI.
Ryan, J. J., & Schnakenberg-Ott, S. D. (2003). Scoring reliability on the
Wechsler Intelligence Scale-Third Edition (WAIS-III). Assessment, 10,
151–159. doi:10.1177/1073191103010002006
Sherrets, S. D., Gard, G., & Langner, H. (1979). Frequency of clerical
errors on WISC protocols. Psychology in the Schools, 16, 495–496.
Slate, J. R., & Chick, D. (1989). WISC-R examiner errors: Cause for
concern. Psychology in the Schools, 26, 7884. doi:10.1002/1520-
6807(198901)26:178::AID-PITS23102601113.0.CO;2-5
Slate, J. R., & Jones, C. H. (1989). Can teaching of the WISC-R be
improved? Quasi-experimental exploration. Professional Psychology:
Research and Practice, 20, 408410. doi:10.1037/0735-7028.20.6.408
Slate, J. R., & Jones, C. H. (1990). Student error in administering the
WISC-R: Identifying problem areas. Measurement and Evaluation in
Counseling and Development, 23, 137–140.
Slate, J. R., Jones, C. H., & Covert, T. L. (1992). Rethinking the instruc-
tional design for teaching the WISC-R: The effects of practice admin-
istrations. College Student Journal, 26, 285–289.
Slate, J. R., Jones, C. H., & Murray, R. A. (1991). Teaching administration
and scoring of the Wechsler Adult Intelligence Scale-Revised: An em-
pirical evaluation of practice administrations. Professional Psychology:
Research and Practice, 22, 375–379. doi:10.1037/0735-7028.22.5.375
Stedman, J. M., Hatch, J. P., & Schoenfeld, L. S. (2001). The current status
of psychological assessment training in graduate and professional
schools. Journal of Personality Assessment, 77, 398407. doi:10.1207/
S15327752JPA7703_02
Sweller, J. (1988). Cognitive load during problem solving: Effects on
learning. Cognitive Science, 12, 257–285. doi:10.1207/s15516
709cog1202_4
Uebersax, J. S. (1987). Diversity of decision-making models and the
measurement of interrater agreement. Psychological Bulletin, 101, 140
146. doi:10.1037/0033-2909.101.1.140
Viglione, D. J. (2002). Rorschach coding solutions: A reference guide for
the comprehensive system. San Diego, CA: Donald J. Viglione.
Received August 30, 2013
Revision received March 21, 2014
Accepted April 23, 2014
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
27
EVIDENCE-BASED PREPRACTICUM
... Similar to supervision of therapy, supervision of assessment requires helping supervisees develop mastery in ethical issues, professional behavior, interpersonal relationships, and diagnosis and case conceptualization. However, supervision of assessment also includes an emphasis on strong fidelity to testing procedures, such as flawless administration; familiarity with a wide variety of assessment instruments used for diagnostic rule-out; understanding statistical, empirical, and psychometric properties of clinical instruments; and communicating test results clearly in a comprehensive report (Callahan, 2015;Dumont & Willis, 2004;Finkelstein & Tuckman, 1997). As existing supervision models focus primarily on therapy, they do not fully account for these aspects of clinical assessment, leaving critical gaps in the development of supervision skills in assessment. ...
... Although graduate psychology programs require assessment training, errors in testing are nevertheless very common among graduate students (Loe, Kadlubek, & Marks, 2007). Results from research suggest that repeated practice and passive approaches where students are expected to learn from their mistakes have not decreased testing errors (Callahan, 2015;Loe et al., 2007). It has been suggested that several specific processes be implemented in order to increase student competency in assessment, including closely monitoring administration and scoring errors, providing focused feedback regarding all testing errors, having peers review each other's administration and scoring, and requiring that assessment supervisors review video administrations (Loe et al., 2007;Slate & Jones, 1990;Slate, Jones, Murray, & Coulter, 1993). ...
... First, while models of supervision training exist (Bernard & Goodyear, 2014;Falender & Shafranske, 2004;Stoltenberg & McNeill, 2010), these models often do not take into account the unique features of assessment trainingspecifically rigorous fidelity in administration, psychometric knowledge, and written communication. Second, although others have identified challenges in supervising assessment training, including administration errors and fidelity (Callahan, 2015;Loe et al., 2007) and have proposed suggested remedies, the comprehensive implementation of these solutions has not been explored in any explicit assessment approach. Third, while some models of training have recognized the importance of demonstrating skill in the supervision of clinical assessment as part of establishing competency in this domain (Finkelstein & Tuckman, 1997), a practical approach for developing competency specifically in supervisory skills is lacking. ...
... all other estimated kappa values were higher. Callahan (2015) evaluated coding accuracy of CS protocols through a three-stage training experience, followed by an 8week follow-up. The accuracy of coding all the Rorschach response segments improved over time, although the proportions of agreement with the expert scoring were generally lower for FQ (68.9%) and Special Scores (65.3%) at the 8-week follow-up protocol. ...
... However, before accepting the relative independence of experience from perceived coding difficulty, further research with sufficient power would be necessary. In addition, we did not investigate the relationship between experience and accuracy (e.g., Callahan, 2015), so, despite our findings, it is highly likely that accuracy improves with experience. ...
... Based on findings in the literature, Viglione and Meyer (2008) indicated that Form Dominance, shading subtypes, Cognitive Special Scores, and the distinction between Level 1 and Level 2 cognitive codes were the coding decisions with lower reliabilities. Lower interrater reliabilities for Determinants, FQ, and Special Scores were also reported by Hilsenroth et al. (2007), Guarnaccia et al. (2001), and Callahan (2015) for the CS and by Kivisalu et al. (2016) for R-PAS. These coding decisions also had higher mean ratings of difficulty in the New Learner Survey. ...
Article
Learning to code the imagery, communication, and behavior associated with Rorschach responding is challenging. Although there is some survey research on graduate students' impressions of their Rorschach training, research has not identified which coding decisions students find to be the most problematic and time-consuming. We surveyed students to identify what they struggled with most when learning coding and to quantify how difficult it is to learn how to code. Participants (n = 191) from the United States, Brazil, Denmark, Israel, and Italy rated 57 aspects of coding using a 4-point scale that encompassed both the time required to code and the subjective difficulty of doing so. Mean ratings for coding in general indicated that students considered the overall task challenging. Ratings also revealed that students struggled most with Cognitive Special Scores, Determinants, and extrapolating from the tables to code Form Quality for objects that were not specifically listed. The findings offer suggestions about how to improve the guidelines for some of the more difficult variables and where it is most necessary to focus teaching time. Taking these steps might help new students in learning the Rorschach.
... Learning how to administer and score performance-based tests like intelligence tests and the Rorschach consumes substantially more training time than self-report questionnaires like the MMPIs, which can be administered by staff and scored by the computer. Research suggests that a substantial amount of training is needed to competently administer and score the Rorschach (Callahan, 2015;Rosso & Camoirano, 2022). ...
Article
In this brief article, we update the training of newer versions of the Minnesota Multiphasic Personality Inventory (MMPI) and Rorschach and compare to a 2015 assessment training survey of American Psychological Association accredited clinical psychology doctoral programs. The survey sample sizes for 2015, 2021, and 2022 were 83, 81, and 88, respectively. By 2015, of the programs teaching any adult MMPI version, almost all (94%) were still teaching the MMPI-2, and 68% had started teaching the MMPI-2-RF. In 2021 and 2022, respectively, almost all programs (96% and 94%) had started teaching the MMPI-2-RF or MMPI-3, although most were still teaching the MMPI-2 (77% and 66%). By 2015, of the programs teaching the Rorschach, 85% were still teaching the Comprehensive System (CS) and 60% had started teaching the Rorschach Performance Assessment System (R-PAS). In 2021 and 2022, respectively, most programs had started teaching R-PAS (77% and 77%) although many (65% and 50%) were still teaching the CS. Therefore, doctoral programs are indeed switching to newer versions of the MMPI and Rorschach, although more slowly than one might expect. We recommend that APA provide more guidance in selecting test versions for training programs, practitioners, and researchers.
... Again, such summative evaluations of PWCs or locally specific competencies appear to hold promise for use in formative assessment to influence the growth trajectory of a given trainee. More to the point, feedback within training (which is akin to formative assessment) has been shown to result in greater trainee competency in less time and with fewer errors than feedback provided via post hoc supervision (which is akin to summative assessment; e.g., Callahan, 2015;Hilsenroth et al., 2007), which suggests an untapped potential for use of Competency measures in the formative assessment of trainees. ...
Article
Full-text available
There is a paucity of information concerning normative reference ranges on standardized measures of profession-wide competencies for the purpose of conducting formative assessments. The present study draws from a convenience sample to provide developmental (first/second half of training year) normative data for use in formative assessments of individual trainees and program-level quality improvement processes. Data reveal the anticipated pattern of competency scores generally improving across any given training year, with the strongest gains in competencies tied to assessment, supervision, and advocacy. A secondary aim, which emerged after study launch, was to evaluate whether training disruptions due an infectious viral pandemic (COVID-19) exerted demonstrable impacts at the aggregate level on trainee competency development. This sample of doctoral trainees evidenced no pandemic-associated suppression of Competency attainment. Rather, this sample of trainees evidenced growth in focal competencies tied to policy creation, systems-change, management structure, and leadership. Training implications are discussed.
... Contemporary training in these programs focuses on students' benchmarked progression through the development of foundational skills (Rodolfa et al., 2005). Such efforts coincide with the outcomes-focused approach to educational practices in assessment (e.g., Callahan, 2015) that dominate HSP training programs (Kaslow et al., 2006). As such, competency domains (Kaslow, 2004;Kaslow, Borden, et al., 2004;Rodolfa et al., 2005Rodolfa et al., , 2013 are codified in their HSP educational role as evidenced by their integration into training standards set by the American Psychological Association's (APA, 2018) Standards for Accreditation for Programs in Health Service Psychology. ...
Article
Full-text available
Objective Attaining competence in assessment is a necessary step in graduate training and has been defined to include multiple domains of training relevant to this attainment. While important to ensure trainees meet these standards of training, it is critical to understand how and if competence shapes a trainees' professional identity, therein promoting lifelong competency. Methods The current study assessed currently enrolled graduate trainees' knowledge and perception of their capabilities related to assessment to determine if self‐reported and performance‐based competence would incrementally predict their intention to use assessment in their future above basic training characteristics and intended career interests. Results Self‐reported competence, but not performance‐based competence, played an incremental role in trainees' intention to use assessments in their careers. Multiple graduate training characteristics and practice experiences were insignificant predictors after accounting for other relative predictors (i.e., intended career settings, integrated reports). Conclusion Findings are discussed about the critical importance of incorporating a hybrid competency‐capability assessment training framework to further emphasize the role of trainee self‐efficacy in hopes of promoting lifelong competence in their continued use of assessments.
... The shift to teleassessment (i.e., the administration of psychological assessments via telepsychology), however, has been hindered by the limited literature on teleassessment administration and training (Hames et al., 2020). This is problematic for training clinics, as psychological test administration is a ubiquitous and pivotal practice component of doctoral training and clinical practice at large (Callahan, 2015;Krishnamurthy et al., 2004;Youngstrom, 2013). ...
Article
Full-text available
Health service psychologists have made a rapid transition to delivering telepsychology services during the COVID-19 pandemic. The provision of remote assessment services, or teleassessment, however, has lagged behind given the limited evidence base. This delay has been uniquely challenging for university training clinics, which are equally responsible for developing trainee assessment competencies and providing high-quality assessments to clients. Training clinics have been tasked with implementing programmatic adaptation to meet this need with limited guidance. We address this gap by describing the considerations university training clinics must make under physical distancing policies, including protections for the health of trainees and clients, ensuring standardized administration of assessments, providing developmentally appropriate training opportunities, and guaranteeing transparency in the consent and feedback processes. We recommend solutions to reconcile these inherent challenges and highlight training opportunities as they relate to the development of profession-wide competencies and ethical principles. These recommendations demonstrate that by integrating flexibility into program curriculums, training clinics can continue to adhere to accreditation standards while developing trainee competencies in assessment during the COVID-19 pandemic.
... Ingram et al. (2019) also found that the opportunity for supervised assessment practice was lower than theory-based learning. This discrepancy provides a potential barrier to the critically important integration of knowledge and skill-based competencies (Childs et al., 2002;Mihura et al., 2017) for assessment training outcomes (Callahan, 2015). ...
Article
Full-text available
Assessment is critical to health service psychology and represents a core area of coverage during doctoral training. Despite this, training practices in assessment are understudied. Accordingly, this study utilized a national sampling of students (n 534) enrolled in an American Psychological Association–accredited health service psychology doctoral program with substantive training in clinical or counseling psychology. We asked trainees to rate their competency for instruments in which they had training. We examined trends in training experiences, including both theory-based education and applied clinical opportunities, and explored differences in instrument training trends across program type (PhD/PsyD) and program discipline (clinical/counseling). Results of this study suggest a general convergence with professional practice trends in terms of instrument coverage, less clinical training, and exposure compared with didactic methods and generally small differences across program type and discipline in perceived competence and instrument exposure. Implications for training and education in psychological assessment are discussed. Public Significance Statement This study examines assessment training patterns in health service psychology students using a nationally sampled and representative sample. The patterns of training coverage mirror the instrument use patterns of psychologists who are currently in clinical practice. Students receive more frequent didactic and classroom exposure during training than practice opportunities with clients. Future research will benefit from evaluating the differential impact of classroom and clinical training experiences on competency, both perceived and performance based.
... during psychology doctoral training and of psychological practice at large (Callahan, 2015). Specifically, assessment training is part of HSP programs and is noted by the Standards of Accreditation (SoA; American Psychological Association [APA], Commission on Accreditation, 2015) as one of nine profession-wide competencies that students are expected to attain. ...
Article
Full-text available
There is a critical lack of research on training in supervision in the area of psychological assessment within health service psychology programs. This study sought to fill this research gap by presenting empirical data on the development of profession-wide competencies delineated by the new Standards of Accreditation using a peer mentorship approach, the multilevel assessment supervision and training (MAST) approach, implemented in a university training clinic. Questionnaires on training satisfaction and the development of profession-wide competencies were administered to both peer mentors (who received training in supervision through the MAST approach) and mentees (who received the peer mentorship). Data collected from these participants (n = 49) indicated that the MAST approach provided several benefits for both peer mentors and mentees. Specifically, peer mentors reported that receiving training in supervision through the MAST approach was extremely useful for their professional development and continued to have benefits beyond graduate school. They also reported high levels of perceived competency in assessment and supervision. Mentees reported that having a peer mentor was helpful in their assessment training, especially in the development of technical skills such as scoring and report writing. Data also revealed areas where training in assessment supervision should be further developed, such as multicultural competency. This study highlights the need for further empirical research on training in supervision in the area of assessment.
... With fairly easy training modifications, significantly improved competency in these skills quickly emerges(Callahan, 2015). ...
Article
Full-text available
Any psychologist in the course of his or her professional career will affect thousands of lives. It stands to reason then that the training of psychologists is, in its own way, a systems-level intervention with tremendous social impact. In the ideal world, training, like other interventions, would be grounded in evidence. But what exactly do we know about the science of training? Elsewhere in this same issue, we address that question by providing a literature review of empirically-grounded, evidence-based studies pertaining to: (a) admissions, curriculum, and research training (Callahan & Watkins, 2018a); (b) prepracticum and practicum training (Callahan & Watkins, 2018b); and (c) supervision, competency, and internship training (Callahan & Watkins, 2018c). In this opening commentary to that series of articles, we conceptualize training as a systems-level intervention that is perhaps the most impactful intervention available to our field. We provide a rationale for our evidence-based training-focused literature review and an overview of our search methods herein. We provide commentary on perceived barriers to evidence-based training and identify tribalism as being particularly pernicious to this perspective. We argue that data, not tribalism, should drive educational decisions and actions, particularly given that training significantly impacts our discipline and the public it serves.
Article
This article begins by reviewing the proficiency of personality assessment in the context of the competencies movement, which has dominated health service psychology in recent years. It examines the value of including a capability framework for advancing this proficiency and enhancing the quality of personality assessments, including Therapeutic Assessment (Finn & Tonsager, 1997 Finn, S. E., & Tonsager, M. E. (1997). Information-gathering and therapeutic models of assessment: Complementary paradigms. Psychological Assessment, 9, 374–385. doi:10.1037//1040-3590.9.4.374[Crossref], [Web of Science ®] [Google Scholar]), that include a personality assessment component. This hybrid competency–capability framework is used to set the stage for the conduct of personality assessments in a variety of contexts and for the optimal training of personality assessment. Future directions are offered in terms of ways psychologists can strengthen their social contract with the public and offer a broader array of personality assessments in more diverse contexts and by individuals who are both competent and capable.
Article
Full-text available
The theoretical framework presented in this article explains expert performance as the end result of individuals' prolonged efforts to improve performance while negotiating motivational and external constraints. In most domains of expertise, individuals begin in their childhood a regimen of effortful activities (deliberate practice) designed to optimize improvement. Individual differences, even among elite performers, are closely related to assessed amounts of deliberate practice. Many characteristics once believed to reflect innate talent are actually the result of intense practice extended for a minimum of 10 years. Analysis of expert performance provides unique evidence on the potential and limits of extreme environmental adaptation and learning.
Article
Full-text available
Many students are being left behind by an educational system that some people believe is in crisis. Improving educational outcomes will require efforts on many fronts, but a central premise of this monograph is that one part of a solution involves helping students to better regulate their learning through the use of effective learning techniques. Fortunately, cognitive and educational psychologists have been developing and evaluating easy-to-use learning techniques that could help students achieve their learning goals. In this monograph, we discuss 10 learning techniques in detail and offer recommendations about their relative utility. We selected techniques that were expected to be relatively easy to use and hence could be adopted by many students. Also, some techniques (e.g., highlighting and rereading) were selected because students report relying heavily on them, which makes it especially important to examine how well they work. The techniques include elaborative interrogation, self-explanation, summarization, highlighting (or underlining), the keyword mnemonic, imagery use for text learning, rereading, practice testing, distributed practice, and interleaved practice. To offer recommendations about the relative utility of these techniques, we evaluated whether their benefits generalize across four categories of variables: learning conditions, student characteristics, materials, and criterion tasks. Learning conditions include aspects of the learning environment in which the technique is implemented, such as whether a student studies alone or with a group. Student characteristics include variables such as age, ability, and level of prior knowledge. Materials vary from simple concepts to mathematical problems to complicated science texts. Criterion tasks include different outcome measures that are relevant to student achievement, such as those tapping memory, problem solving, and comprehension. We attempted to provide thorough reviews for each technique, so this monograph is rather lengthy. However, we also wrote the monograph in a modular fashion, so it is easy to use. In particular, each review is divided into the following sections: General description of the technique and why it should work How general are the effects of this technique? 2a. Learning conditions 2b. Student characteristics 2c. Materials 2d. Criterion tasks Effects in representative educational contexts Issues for implementation Overall assessment The review for each technique can be read independently of the others, and particular variables of interest can be easily compared across techniques. To foreshadow our final recommendations, the techniques vary widely with respect to their generalizability and promise for improving student learning. Practice testing and distributed practice received high utility assessments because they benefit learners of different ages and abilities and have been shown to boost students’ performance across many criterion tasks and even in educational contexts. Elaborative interrogation, self-explanation, and interleaved practice received moderate utility assessments. The benefits of these techniques do generalize across some variables, yet despite their promise, they fell short of a high utility assessment because the evidence for their efficacy is limited. For instance, elaborative interrogation and self-explanation have not been adequately evaluated in educational contexts, and the benefits of interleaving have just begun to be systematically explored, so the ultimate effectiveness of these techniques is currently unknown. Nevertheless, the techniques that received moderate-utility ratings show enough promise for us to recommend their use in appropriate situations, which we describe in detail within the review of each technique. Five techniques received a low utility assessment: summarization, highlighting, the keyword mnemonic, imagery use for text learning, and rereading. These techniques were rated as low utility for numerous reasons. Summarization and imagery use for text learning have been shown to help some students on some criterion tasks, yet the conditions under which these techniques produce benefits are limited, and much research is still needed to fully explore their overall effectiveness. The keyword mnemonic is difficult to implement in some contexts, and it appears to benefit students for a limited number of materials and for short retention intervals. Most students report rereading and highlighting, yet these techniques do not consistently boost students’ performance, so other techniques should be used in their place (e.g., practice testing instead of rereading). Our hope is that this monograph will foster improvements in student learning, not only by showcasing which learning techniques are likely to have the most generalizable effects but also by encouraging researchers to continue investigating the most promising techniques. Accordingly, in our closing remarks, we discuss some issues for how these techniques could be implemented by teachers and students, and we highlight directions for future research.
Article
Full-text available
The purpose of this study was to examine the most frequent administration, clerical, and scoring errors made by graduate student examiners who administer the WISC-III. An additional goal was to document the effect of these errors on the IQ values and Index Scores. The graduate students' test protocols contained numerous administration, clerical, and scoring errors that influenced Full Scale IQs on two thirds of the protocols (average change was .83 points). When failure to record errors (failing to record responses on the test protocol) were omitted from the analysis, the subtests found most prone to error were Comprehension, Vocabulary, and Similarities. Additionally, no improvement in test administration occurred over the course of several test administrations. Findings of this study have implications for the education and training of psychology graduate students enrolled in intelligence testing courses.
Article
Analyzes 150 Wechsler Adult Intelligence Scale—Revised (WAIS—R) protocols completed by 20 graduate students to examine the effect of practice administrations in teaching the WAIS—R. Failure to record both responses and times decreased over 10 administrations, but no other improvement occurred across either 5 or 10 administrations. Rather than becoming proficient in test administration and scoring, Ss often practiced errors, and extended practice with the Wechsler Intelligence Scale for Children—Revised (WISC—R) created negative transfer to the WAIS—R. Verbal subtests were especially prone to examiner error. The observed errors affected 88% of the Full-Scale IQs assigned by Ss. Implications are discussed, including possible effects of examiner errors on placement decisions, and suggestions for improving practice and training are provided.
Book
This is the definitive, supplementary reference book for accurate coding of the Rorschach Comprehensive System. http://www.rorschachcodingsolutions.com/login.asp
Article
In this article we review the important statistical properties of the urn randomization (design) for assigning patients to treatment groups in a clinical trial. The urn design is the most widely studied member of the family of adaptive biased-coin designs. Such designs are a compromise between designs that yield perfect balance in treatment assignments and complete randomization which eliminates experimental bias. The urn design forces a small-sized trial to be balanced but approaches complete randomization as the size of the trial (n) increases. Thus, the urn design is not as vulnerable to experimental bias as are other restricted randomization procedures. In a clinical trial it may be difficult to postulate that the study subjects constitute a random sample from a well-defined homogeneous population. In this case, a randomization model provides a preferred basis for statistical inference. We describe the large-sample permutational null distributions of linear rank statistics for testing the equality of treatment groups based on the urn design. In general, these permutation tests may be different from those based on the population model, which is equivalent to assuming complete randomization. Poststratified subgroup analyses can also be performed on the basis of the urn design permutational distribution. This provides a basis for analyzing the subset of patients with observed responses when some patients' responses can be assumed to be missing-at-random. For multiple mutually exclusive strata, these tests are correlated. For this case, a combined covariate-adjusted test of treatment effect is described. Finally, we show how to generalize the urn design to a prospectively stratified trial with a fairly large number of strata.
Article
The purpose of this study was to examine the most frequent administration, clerical, and scoring errors made by graduate student examiners who administer the WIS-III. An additional goal was to document the effect of these errors on the IQ values and Index Scores. The graduate students' test protocols contained numerous administration, clerical, and scoring errors that influenced Full Scale IQs on two thirds of the protocols (average change was .83 points). When failure to record errors (failing to record responses on the test protocol) were omitted from the analysis, the subtests found most prone to error were Comprehension, Vocabulary and Similarities. Additionally, no improvement in test administration occurred over the course of several test administrations. Findings of this study have implications for the education and training of psychology graduate students enrolled in intelligence testing courses.
Article
A total of 51 Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV) protocols, administered by graduate students in training, were examined to obtain data describing the frequency of examiner errors and the impact of errors on resultant test scores. Present results were generally consistent with previous research examining graduate students' errors on the previous two editions of the WISC. Students committed errors on 98% of the protocols examined and averaged 25.8 errors per protocol. The most common errors were failure to query verbal responses, assigning too many points to an answer, and failure to record an examinee's response on the test protocol. Errors resulted in inaccurate test composite scores, with the Full Scale IQ and Verbal Comprehension Index most frequently affected. Error rates did not improve significantly over the course of three practice administrations.
Article
While errors on the WISC-R are conceived primarily in terms of internal consistency and stability over time, examiners make mistakes that contribute to the inaccuracy of test scores. Studies to date mainly have investigated general scoring errors, rather than specific items most prone to error. Investigation of graduate students' test protocols indicated numerous scoring and mechanical errors that influenced the Full Scale IQ scores on two-thirds of the protocols. Particularly prone to error were Verbal subtests of Vocabulary, Comprehension, and Similarities. More importantly, specific items on subtests in which numerous mistakes occurred were noted, as well as the most likely type of error for each item. These findings have implications for the education and training of assessment specialists.