ArticlePDF Available

Clinical Performance Evaluations of Third-Year Medical Students and Association With Student and Evaluator Gender

Authors:

Abstract and Figures

Purpose: Clinical performance evaluations are major components of medical school clerkship grades. But are they sufficiently objective? This study aimed to determine whether student and evaluator gender is associated with assessment of overall clinical performance. Method: This was a retrospective analysis of 4,272 core clerkship clinical performance evaluations by 829 evaluators of 155 third-year students, within the Alpert Medical School grading database for the 2013-2014 academic year. Overall clinical performance, assessed on a three-point scale (meets expectations, above expectations, exceptional), was extracted from each evaluation, as well as evaluator gender, age, training level, department, student gender and age, and length of observation time. Hierarchical ordinal regression modeling was conducted to account for clustering of evaluations. Results: Female students were more likely to receive a better grade than males (adjusted odds ratio [AOR] 1.30, 95% confidence interval [CI] 1.13-1.50), and female evaluators awarded lower grades than males (AOR 0.72, 95% CI 0.55-0.93), adjusting for department, observation time, and student and evaluator age. The interaction between student and evaluator gender was significant (P = .03), with female evaluators assigning higher grades to female students, while male evaluators' grading did not differ by student gender. Students who spent a short time with evaluators were also more likely to get a lower grade. Conclusions: A one-year examination of all third-year clerkship clinical performance evaluations at a single institution revealed that male and female evaluators rated male and female students differently, even when accounting for other measured variables.
Content may be subject to copyright.
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Academic Medicine, Vol. 92, No. 6 / June 2017 835
Research Report
Selection of graduating medical
students into residency programs is
driven by multiple factors. However,
according to program directors, the most
important selection criteria are students’
grades on required core clerkships.1
Clinical performance evaluations (CPEs)
are used in most core clinical clerkships as
assessment and grading tools for medical
students. Clinicians who work with
medical students are asked to complete
formal evaluations of each student’s
basic clinical skills, such as history taking
and case presentation, as well as fund of
knowledge and professionalism. In most
clerkships, these evaluations, along with
standardized written examinations and
objective structured clinical examinations
(OSCEs), provide the data from which
students’ final clerkship grades are
determined. Studies show that these
CPEs are weighted more heavily than the
other evaluation methods, accounting for
50% to 70% of the final grade across all
clerkships.2,3 Despite the importance of
core clerkship clinical evaluations, there
is a paucity of literature examining the
degree of objectivity of this measure.4
The numerous evaluations that
occur over the course of attaining
entrance to medical school and
during the preclinical years are largely
standardized and unlikely to exhibit
grader-dependent bias. In contrast,
medical students are evaluated in a
more subjective manner when being
assessed on their clinical performance.
For that reason, the association of
grading with gender and the gender
pairing of trainer and trainee is
important, yet these factors are not well
understood in the medical setting in
areas where subjectivity of grading is
high. Literature from the education field
has shown that student gender often
plays a role in how students are treated
and graded.5,6 In primary schools, girls
are awarded better grades than boys,
despite similar test scores, which some
researchers attribute to “noncognitive
skills”—specifically, “a more
developed attitude towards learning.6
Additionally, teachers’ gender can affect
their expectations and perceptions
of educational competence and
performance.7,8 Furthermore, studies9–11
suggest that gender pairing can
enhance, through a “role-model effect,
student engagement and behavior, or,
conversely, gender noncongruence may
induce “stereotype threat,” in which
anxiety that one will confirm a negative
stereotype can lead to a decrement in
performance.
A few small studies12–14 have suggested
an interaction between student and
evaluator gender in the grading of
medical students’ simulated clinical
performance on OSCEs by standardized
patients (SPs). One small study of OSCE
grading13 found that male and female
medical students fared similarly overall;
however, when graded solely by female
SPs, women scored significantly higher,
yet male and female students were rated
the same by male SPs. These findings
were replicated in a more recent study
of OSCE grading,14 which specifically
Abstract
Purpose
Clinical performance evaluations are
major components of medical school
clerkship grades. But are they sufficiently
objective? This study aimed to determine
whether student and evaluator gender
is associated with assessment of overall
clinical performance.
Method
This was a retrospective analysis
of 4,272 core clerkship clinical
performance evaluations by 829
evaluators of 155 third-year students,
within the Alpert Medical School
grading database for the 2013–2014
academic year. Overall clinical
performance, assessed on a
three-point scale (meets expectations,
above expectations, exceptional), was
extracted from each evaluation, as
well as evaluator gender, age, training
level, department, student gender and
age, and length of observation time.
Hierarchical ordinal regression modeling
was conducted to account for clustering
of evaluations.
Results
Female students were more likely to
receive a better grade than males
(adjusted odds ratio [AOR] 1.30, 95%
confidence interval [CI] 1.13–1.50),
and female evaluators awarded lower
grades than males (AOR 0.72, 95% CI
0.55–0.93), adjusting for department,
observation time, and student and
evaluator age. The interaction between
student and evaluator gender was
significant (P = .03), with female
evaluators assigning higher grades
to female students, while male
evaluators’ grading did not differ by
student gender. Students who spent a
short time with evaluators were also
more likely to get a lower grade.
Conclusions
A one-year examination of all third-year
clerkship clinical performance evaluations
at a single institution revealed that male
and female evaluators rated male and
female students differently, even when
accounting for other measured variables.
Acad Med. 2017;92:835–840.
First published online January 17, 2017
doi: 10.1097/ACM.0000000000001565
Please see the end of this article for information
about the authors.
Correspondence should be addressed to Alison
Riese, Department of Pediatrics, Alpert Medical
School, Hasbro Children’s Hospital, 593 Eddy St.,
Providence, RI 02903; telephone: (401) 444-8531;
e-mail: ariese@lifespan.org.
Clinical Performance Evaluations of Third-Year
Medical Students and Association With
Student and Evaluator Gender
Alison Riese, MD, MPH, Leah Rappaport, MD, Brian Alverson, MD,
Sangshin Park, DVM, MPH, PhD, and Randal M. Rockney, MD
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Research Report
Academic Medicine, Vol. 92, No. 6 / June 2017
836
examined the gender interaction during
a “gender-sensitive” patient situation, the
examination of the chest.
Similar disparities in grading regarding
student and evaluator gender have
been found in a few small studies of
nonsimulated clinical settings.15,16 A
small study of students completing a
required one-month ambulatory care
medicine clerkship at the Medical
College of Wisconsin16 showed that the
highest mean grade was given by male
preceptors to female students, and the
lowest mean grade was given by female
preceptors to male students. In a study
of evaluations of internal medicine
residents, male residents received higher
grades from male attendings than from
female attendings.17 Conversely, a study
of medical student grading in obstetrics–
gynecology18 found that female students
performed better on written exams
and OSCEs; however, they were graded
similarly to male students by their faculty
evaluators.
The influence of gender on grading
in the clinical setting is important to
understand, considering the highly
subjective nature of clinical evaluations
compared with multiple-choice tests,
where gender has no bearing on
grade assignment, as well as the more
structured setting of OSCEs, where
graders are generally well trained and
have more uniform interactions with
the students being assessed. CPEs are
completed by evaluators of all training
levels, who interact with students in
various types of settings and over varying
durations, yet their assessments are
weighted heavily in clerkship grading.
As a first step in any effort to increase
objectivity in clinical grade assignment,
it is necessary to fully understand what
issues influence evaluators’ grading of
student clinical performance. There has
been no study examining third-year
core clerkships as a whole to see how the
gender of the evaluator and the gender
of the student may be associated with
differences in the clinical evaluation of
the student. We carried out this study
to determine whether student and
evaluator gender is associated with the
grades assigned. Secondarily, we sought
to explore other student and evaluator
factors that may be associated with
variance in grading.
Method
This was a retrospective study conducted
at the Alpert Medical School (AMS).
All 4,462 CPEs recorded in the medical
school’s grading database (OASIS) from
third-year core clerkships during the
2013–2014 academic year were initially
included. At AMS, the core clerkships
and their duration during the study
period consisted of internal medicine
(12 weeks) and surgery, obstetrics–
gynecology, family medicine, pediatrics,
and psychiatry (each 6 weeks). The
medical school’s administrative offices
compiled deidentified demographic
information about the student and
evaluator for each CPE and assigned
an ID number for each student and
each evaluator who was involved in
the CPEs being studied. The evaluator
IDs were used to account for nesting of
evaluations among evaluators—that is,
cases where evaluators assessed more
than one student. As the indicator of the
student’s global clinical performance, the
“overall clinical performance” grade on
the CPE, which from now on we refer to
as the student grade, was extracted from
each CPE. The possible grades that could
be selected by each evaluator completing
a CPE were “exceptional, “above
expectations,“meets expectations,” and
“below expectations.” An evaluation was
excluded if it was noted to be a duplicate
entry or if data were incomplete for the
primary outcome or predictor variables.
Additionally, CPEs with a grade of
“below expectations” were excluded
because of the rare occurrence (< 1%) of
this grade.
Because we were provided deidentified
data, we were not able to match those
data with any objective nonclinical
evaluations. However, we did compare
the United States Medical Licensing
Examination (USMLE) Step 1 scores
for men versus women in the class of
2015. The medical school administrative
offices provided the means and standard
deviations (SDs) of the USMLE Step 1
scores for the male and female students in
that class, since these students’ CPE data
were in our study. The means and SDs for
these two groups were compared using
Student t test. This study was declared
exempt by the Lifespan institutional
review board. (Lifespan Corporation,
Rhode Island’s largest health system,
is affiliated with the AMS of Brown
University.)
For each CPE, the dataset contained
demographic information about the
clerkship context, the student, and the
evaluator. Clerkship characteristics for
each CPE consisted of the clerkship
department and the length of observation
time for the student/evaluator (either
< 2 half-days or 2 half-days). Student
demographic information included
student gender and age (grouped as
25–27 years old and 28 years old).
Evaluator variables were evaluator gender,
Table 1
Demographic Information Regarding
the Third-Year Medical Students and
Their Evaluators at Alpert Medical
School, 2013 to 2014a
Characteristic No. (%)
Students (n = 155)
Gender
Male 76 (49.0)
Female 79 (51.0)
Age quartile
2526 years 49 (33.3)
27 years 36 (24.5)
28 years 27 (18.4)
> 28 years 35 (23.8)
Evaluators (n = 829)
Gender
Male 399 (48.1)
Female 430 (51.9)
Age quartile
2530 years 290 (37.8)
3140 years 210 (27.4)
4150 years 137 (17.9)
> 50 years 130 (17.0)
Training level
Resident (PGY 1) 168 (22.3)
Resident (PGY 2) 89 (11.8)
Resident (PGY 35) 155 (20.6)
Attending 342 (45.4)
Department
Family medicine 126 (15.2)
Psychiatry 50 (6.0)
Internal medicine 336 (40.5)
Obstetrics
gynecology
47 (5.7)
Pediatrics 155 (18.7)
Surgery 115 (13.9)
Abbreviation: PGY indicates postgraduate year.
aThe authors carried out a retrospective analysis
of 4,272 third-year clerkship clinical performance
evaluations, involving the evaluators and the students
whose data are in this table, to determine whether
student and evaluator gender are associated with
assessment of overall clinical performance.
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Research Report
Academic Medicine, Vol. 92, No. 6 / June 2017 837
age (in quartiles), and training level
(residency year or attending).
All statistical analyses were performed
using SAS 9.4 (SAS Institute, Cary,
North Carolina). A P value < .05 was
considered to be statistically significant.
This study examined the associations of
final grade with gender and covariates
using chi-square tests. Hierarchical
ordinal regression modeling was
conducted to examine the effects of
student and evaluator characteristics on
a student’s grade (“exceptional,“above
expectations,” or “meets expectations”),
adjusting for nonindependence, or
“clustering, of evaluators who rated
more than one student. Gender and
covariates with a P value < .05 in the
univariable model were incorporated
into a multivariable regression model,
which was built by the stepwise
selection procedure. Variables that
significantly reduced residual variance
were retained in the final model. To
avoid colinearity, phi coefficients were
estimated for two independent variables.
If high colinearity among variables was
observed (r > 0.6), we selected the most
relevant variable to the student’s grade
for multivariable modeling. Because
of the small number of evaluations in
family medicine and psychiatry, data
from these specialties were combined
for the multivariable modeling. After the
main effects model was built, interaction
terms were explored for significance.
Results
Of the 4,462 CPEs initially included in
this study, 190 (0.043%) were excluded.
Thirty-eight were excluded because they
were duplicates, and 136 were excluded
because of missing values in predictors
of interest (student or evaluator gender;
no. = 18) or in the outcome of interest
(grade; no. = 118). In addition, 16 CPEs
were excluded because of a “below
expectations” grade. Thus, the final
study dataset comprised 4,272 CPEs,
which were completed by 829 evaluators
regarding the performance of 155
students. The mean (SD) USMLE Step
1 score for the AMS class of 2015 was
221 (18.70) for women and 231 (18.98)
for men (P = .0083). The median age
of students was 27 years (interquartile
range [IQR] 26–28 years); the median
age of evaluators was 33 years (IQR
29–45 years). (See Table 1 for student
and evaluator demographics.) While the
number of students rotating through
each clerkship was consistent, the number
of CPEs for each student varied by
clerkship. The internal medicine clerkship
evaluators completed 1,267 CPEs
(30% of all CPEs), and the pediatrics
clerkship evaluators completed 1,154
(27%), which means that these two
clerkships contributed a large percentage
of CPEs compared with the percentages
contributed by the other four clerkships.
There was variability in the number
of CPEs per student (median 27, IQR
6–39) and CPEs per evaluator (median
3, IQR 1–7). Each clerkship, student, and
evaluator characteristic examined was
associated with a statistically significant
difference in the distribution of grades
received. (See Table 2.)
In univariable models, all predictors
were associated with the grade. Because
of high correlation between faculty age
Table 2
Associations of Third-Year Clinical Performance Grades With Clerkship, Student, and
Evaluator Characteristics, Alpert Medical School, 2013 to 2014a
Characteristic
No. (%) of
all clinical
evaluations
(n = 4,272)
No. (%) of grades by category
P
value
Meets
expectations
(n =721)
Above
expectations
(n = 1,826)
Exceptional
(n = 1,725)
Clerkships
Department
Obstetricsgynecology 602 (14.1) 136 (22.6) 271 (45.0) 195 (32.4) < .001
Pediatrics 1,154 (27.0) 236 (20.5) 483 (41.9) 435 (37.7)
Psychiatry 300 (7.0) 45 (15.0) 136 (45.3) 119 (39.7)
Family medicine 369 (8.6) 52 (14.1) 158 (42.8) 159 (43.1)
Surgery 580 (13.6) 68 (11.7) 268 (46.2) 244 (42.1)
Internal medicine 1,267 (29.7) 184 (14.5) 510 (40.3) 573 (45.2)
Observation time
2 half-days 505 (11.8) 143 (28.4) 261 (51.8) 100 (19.8) < .001
> 2 half-days 3,785 (88.2) 578 (15.3) 1,565 (41.5) 1,625 (43.1)
Students
Gender
Male 2,036 (47.7) 381 (18.7) 857 (42.1) 798 (39.2) .009
Female 2,236 (52.3) 340 (15.2) 969 (43.3) 927 (41.5)
Age
2527 years 2,542 (63.7) 482 (19.0) 1,091 (42.9) 969 (38.1) < .001
28 years 1,448 (36.3) 211 (14.6) 617 (42.6) 620 (42.8)
Evaluators
Gender
Male 2,081 (48.7) 263 (12.6) 893 (42.9) 925 (44.5) < .001
Female 2,191 (51.3) 458 (20.9) 933 (42.6) 800 (36.5)
Age quartile
2530 years 1,671 (41.2) 300 (18.0) 659 (39.4) 712 (42.6) < .001
3140 years 1,111 (27.4) 193 (17.4) 450 (40.5) 468 (42.1)
4150 years 701 (17.3) 88 (12.6) 325 (46.4) 288 (41.1)
51 years 573 (14.1) 93 (16.2) 276 (48.2) 204 (35.6)
Training level
Resident (PGY 1) 763 (18.8) 115 (15.1) 281 (36.8) 367 (48.1) < .001
Resident (PGY 2) 647 (15.9) 94 (14.5) 278 (43.0) 275 (42.5)
Resident (PGY 35) 883 (21.7) 179 (20.3) 350 (39.6) 354 (40.1)
Attending 1,776 (43.7) 287 (16.2) 826 (46.5) 663 (37.3)
Abbreviation: PGY indicates postgraduate year.
aThe authors carried out a retrospective analysis of 4,272 third-year clerkship clinical performance evaluations
by 829 evaluators of 155 students to determine whether student and evaluator gender are associated with
assessment of overall clinical performance.
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Research Report
Academic Medicine, Vol. 92, No. 6 / June 2017
838
and training level (phi coefficient 0.84),
only evaluator age was considered for the
multivariable model. A total of 32.9% of the
variability in the grades was accounted for
by within-evaluator nesting of grades in the
multivariable model (intraclass correlation
coefficient = 0.329; P < .001). All significant
differences in the univariable models were
retained in the multivariable model. In the
multivariable model, female student gender
was associated with higher grades (adjusted
odds ratio [AOR], 1.30; 95% CI, 1.13–1.50).
Female faculty gender was associated
with lower grades (AOR, 0.72; 95% CI,
0.55–0.93). Longer observation time, older
student age, and younger evaluator age were
all associated with higher grades. Evaluators
in internal medicine had the highest odds
of giving a better grade, while those in
obstetrics–gynecology had the lowest odds.
(See Table 3.)
The interaction between student and faculty
gender, adjusted for all other main effects,
was also significant (P = .03; see Figure 1).
Male evaluators did not significantly differ
in their grading of male and female students
(P = .29); however, female evaluators
gave lower grades to male students
compared with female students (P < .001).
Additionally, a significant interaction
between faculty age and faculty gender was
found (P = .047), with older male evaluators
giving significantly lower grades than
younger men (P = .001), while there was
no significant difference in grading for the
female age groups (P = .71). (See Figure 2).
There was no interaction between student
gender and student age (P = .63).
Discussion
In one year at a large U.S. medical school,
there were over 4,000 CPEs of students
in core clerkships, and data revealed
that in clerkship grading, overall, male
students received lower grades than female
students on their CPEs. This finding is
in accordance with literature examining
gender differences in clinical performance.
In general, male and female medical
students perform similarly on the MCAT
exam and have similar preclinical GPAs
and USLME tests scores,15,19,20 albeit with
factors including content area and student
and school characteristics playing a role
in performance. The class of students
represented in our dataset actually differed
in their performance on USMLE Step 1,
with men performing significantly better.
In contrast, other studies15,21 suggest
that female medical students do tend
to perform better on OSCEs, including
those that are part of the USMLE Step
2 Clinical Skills (CS) exam, and receive
better evaluations on their actual clinical
performance. There was no interaction
between evaluator gender and student
gender found in the study of Step 2 CS
scoring.21 However, our findings show that
the discrepancy in clinical performance
grades between male and female medical
students was driven primarily by female
evaluators.
The discrepancy between male and
female evaluators’ assessment of medical
Table 3
Odds Ratios of Receiving a Higher Grade—by Clerkship, Student, and Evaluator
Characteristics—on 4,272 Third-Year Clerkship Clinical Performance Evaluations,
Alpert Medical School, 2013 to 2014a,b
Characteristic
Univariable model Multivariable model
OR (95% CI) P value AOR (95% CI) P value
Clerkships
Department
Obstetricsgynecology 1.00c< .001 1.00c.002
Pediatrics 1.11 (0.67–1.83) 0.83 (0.49–1.40)
Family medicine and psychiatryd1.52 (0.91–2.53) 1.24 (0.71–2.17)
Surgery 1.88 (1.10–3.21) 1.31 (0.74–2.33)
Internal medicine 2.09 (1.30–3.36) 1.61 (0.97–2.69)
Observation time
2 half-days 1.00c< .001 1.00c< .001
> 2 half-days 2.72 (2.20–3.38) 2.74 (2.17–3.46)
Students
Gender
Male 1.00c.016 1.00c.026
Female 1.17 (1.03–1.33) 1.30 (1.13–1.50)
Age group
2527 years 1.00c< .001 1.00c< .001
28 years 1.39 (1.21–1.60) 1.48 (1.27–1.71)
Evaluators
Gender
Male 1.00c< .001 1.00c.012
Female 0.64 (0.51–0.81) 0.72 (0.55–0.93)
Age group
2530 years 1.00c.034 1.00c.028
3140 years 0.78 (0.57–1.06) 0.76 (0.56–1.04)
4150 years 0.77 (0.54–1.09) 0.72 (0.50–1.03)
51 years 0.59 (0.41–0.85) 0.57 (0.39–0.84)
Training levele
Attending 1.00c< .001
Resident (PGY 1) 1.90 (1.40–2.59)
Resident (PGY 2) 1.68 (1.21–2.35)
Resident (PGY 35) 1.29 (0.95–1.76)
Abbreviations: PGY indicates postgraduate year; OR, odds ratio; AOR, adjusted odds ratio.
aThe authors carried out a retrospective analysis of 4,272 third-year clerkship clinical performance evaluations
by 829 evaluators of 155 students to determine whether student and evaluator gender are associated with
assessment of overall clinical performance.
bA total of 32.9% of the variability in the grades was accounted for by “within-evaluator” nesting of grades in
the multivariable model (intraclass correlation coefficient = 0.33; P < .001).
cReference.
dBecause of the small number of evaluations, family medicine and psychiatry were combined for the multivariable
modeling.
eTraining level was not retained in the multivariable model because of high phi coefficient (0.84) between faculty
age quartile and training levels.
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Research Report
Academic Medicine, Vol. 92, No. 6 / June 2017 839
students’ clinical performance is most
perplexing. Medical students’ clinical
performance is influenced by attributes
outside of medical knowledge and
clinical acumen. Indeed, two studies22,23
reported that medical students who
showed empathy received better clinical
evaluations, and women scored higher
on empathy scales than men did.
Additionally, some studies22,23 found
that female students’ interpersonal skills
surpassed those of men. In primary care,
a study24 showed that female physicians
communication skills surpassed those
of their male counterparts, which, if
future studies confirm this result, is
an important finding because doctor–
patient communication has been linked
to improved health outcomes.25 If the
body of literature showing that women
outperform men in the clinical setting is
applied, our findings suggest that female
evaluators accurately detected superior
performance in their female students,
while male evaluators either were unable
to detect these differences or were biased
in their grading methods.
However, it is likely that this finding
highlights an even more complicated
interplay between gender and academic
performance and assessment. As in
the primary education world, female
students’ “learning attitude” may also
play a role, as well as the possible role
modeling of same-gender evaluators and
the stereotype threat of opposite-gender
graders, which may influence students
to perform differently depending on
the gender of their evaluators. Another
potential complicating matter is that
patients may interact differently with
medical students depending on the
student’s gender, which could also affect
the assessment of their performance.
This has been demonstrated in a study26
examining physician–patient interaction,
where patients were found to speak
differently and make more psychosocial
disclosures to female physicians.
Whatever the cause, it is concerning that
our study findings suggest that male
and female students experience different
gradings of their clinical performances,
and that the gender of the evaluator is an
independent driver of this difference.
Our data also found a significant
interaction between evaluator age and
gender, with younger male evaluators
awarding higher grades than older male
evaluators and than female evaluators in
all age groups. While younger evaluators
have been found to be more lenient
graders in other studies,27,28 to our
knowledge the age–gender interaction
has not been examined elsewhere,
and this finding warrants additional
investigation. Again, it is concerning that
intrinsic evaluator characteristics have
led to differential grading of students.
Either improved training of graders
is needed, or the characteristics of the
evaluators must be taken into account
when considering their ability to give
fair clerkship grades.
Our data also demonstrate substantial
differences in the way clerkship students
are graded by department at our school,
a finding that we suspect applies to
many schools. This variability should
be examined to provide a consistent
approach to CPEs. Differences in the
structure and duration of the different
core clerkships, as well as the time
students spend with evaluators, must be
taken into consideration when looking
at CPEs. In some cases, the structure of
the clerkship and number of evaluators
providing CPEs may result in fewer
Figure 1 Two-way interaction effect of student and evaluator gender on predicted probabilities
of evaluators’ grading of male and female students on 4,272 clinical performance evaluations of
medical students completing third-year core clerkships at Alpert Medical School, 2013 to 2014.
Male evaluators did not significantly differ in their grading of male and female students (P = .29);
however, female evaluators gave lower grades to male students than to female students
(P < .001).
Figure 2 Two-way interaction effect of evaluator gender and age quartile on predicted probabilities
of their grading of male and female students on 4,272 clinical performance evaluations of medical
students completing third-year core clerkships at Alpert Medical School, 2013 to 2014. A significant
interaction between faculty age and faculty gender was found (P = .047), with older male evaluators
giving significantly lower grades than younger men (P = .001), while there was no significant
difference in grading for the female age groups (P = .71).
Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited.
Research Report
Academic Medicine, Vol. 92, No. 6 / June 2017
840
grading events per student, which may
exaggerate the influence of gender and
age on a student’s final clerkship grade.
Our study has some limitations. We
evaluated only one year of grading events
at one medical school in the United States.
A multicenter study would be needed
to see if these data are generalizable to
other institutions. The grading system
used is an ordinal one, and these data
may not be reflective of data produced by
other grading systems at other medical
schools. We were not able to adjust for
or compare clinical performance grades
with standardized test scores, since the
individual-level data were not available
in our dataset. Further, we recognize
that gender representation, and thus
gender interactions at a medical school in
2013–2014, might be very different from
what was obtained in previous years, when
gender relationships and generational
differences would perhaps skew data in
other ways.
Further study is needed to learn whether
the trends of gender-pairing influence on
grading at our medical school are found
at other medical schools. Additionally,
the cause of the grading differences by
evaluator and student gender is still
unknown. Next steps may include a
qualitative approach to discover reasons
for the discrepancy in how medical
students’ performance is perceived
and assessed by evaluators of different
genders.
Acknowledgments: The authors would like to
acknowledge the assistance of Alpert Medical
Schools’ Medical School Administrative Office
for the compilation of the dataset used for this
study. They would also like to thank Jennifer F.
Friedman, MD, MPH, PhD, for her mentorship
and guidance, and Kelvin Moore for his efforts
assisting with the literature review.
Funding/Support: None reported.
Other disclosures: None reported.
Ethical approval: This study was reviewed by the
Lifespan institutional review board and deemed
exempt. (Lifespan Corporation, Rhode Island’s
largest health system, is affiliated with the Alpert
Medical School of Brown University.)
Previous presentations: Pediatric Hospital
Medicine Conference, Chicago, Illinois, July 29,
2016; and Lifespan Annual Research Celebration,
Providence, Rhode Island, October 20, 2016.
A. Riese is assistant professor, Department of
Pediatrics and Medical Science, Section of Medical
Education, Alpert Medical School of Brown
University, Providence, Rhode Island.
L. Rappaport is a first-year pediatrics resident,
University of Michigan Medical School, Ann Arbor,
Michigan.
B. Alverson is associate professor, Department
of Pediatrics and Medical Science, Section of
Medical Education, Alpert Medical School of Brown
University, Providence, Rhode Island.
S. Park is postdoctoral research associate, Alpert
Medical School of Brown University and Center
for International Health Research at Rhode Island
Hospital, Providence, Rhode Island.
R.M. Rockney is professor, Department of
Pediatrics, Family Medicine, and Medical Science,
Section of Medical Education, Alpert Medical School
of Brown University, Providence, Rhode Island.
References
1 Green M, Jones P, Thomas JX Jr. Selection
criteria for residency: Results of a national
program directors survey. Acad Med.
2009;84:362–367.
2 Kassebaum DG, Eaglen RH. Shortcomings
in the evaluation of students’ clinical skills
and behaviors in medical school. Acad Med.
1999;74:842–849.
3 Hemmer PA, Papp KK, Mechaber AJ,
Durning SJ. Evaluation, grading, and use of
the RIME vocabulary on internal medicine
clerkships: Results of a national survey and
comparison to other clinical clerkships. Teach
Learn Med. 2008;20:118–126.
4 Holmboe ES. Faculty and the observation
of trainees’ clinical skills: Problems and
opportunities. Acad Med. 2004;79:16–22.
5 Lavy V, Sand E. On the Origins of Gender
Human Capital Gaps: Short- and Long-Term
Consequences of Teachers’ Stereotypical
Biases. Cambridge, MA: National Bureau of
Economic Research; 2015.
6 Cornwell C, Mustard DB, Van Parys J.
Noncognitive skills and the gender disparities
in test scores and teacher assessments:
Evidence from primary school. J Hum
Resour. 2013;48:236–264.
7 Mullola S, Ravaja N, Lipsanen J, et al.
Gender differences in teachers’ perceptions
of students’ temperament, educational
competence, and teachability. Br J Educ
Psychol. 2012;82(pt 2):185–206.
8 Heyder A, Kessels U. Do teachers equate
male and masculine with lower academic
engagement? How students’ gender
enactment triggers gender stereotypes at
school. Soc Psychol Educ. 2015;18:467–485.
9 Dee TS. Teachers and the gender gaps
in student achievement. J Hum Resour.
2007;42:528–554.
10 Keller J. Stereotype threat in classroom
settings: The interactive effect of domain
identification, task difficulty and stereotype
threat on female students’ maths performance.
Br J Educ Psychol. 2007;77(pt 2):323–338.
11 Huguet P, Regner I. Stereotype threat
among schoolgirls in quasi-ordinary
classroom circumstances. J Educ Psychol.
2007;99(3):545.
12 Ramsbottom-Lucier M, Johnson MM,
Elam CL. Age and gender differences in
students’ preadmission qualifications and
medical school performances. Acad Med.
1995;70:236–239.
13 Dawson-Saunders B, Rutala PJ, Witzke DB,
Leko EO, Fulginiti JV. The influences of
student and standardized patient genders
on scoring in an objective structured
clinical examination. Acad Med. 1991;66(9
suppl):S28–S30.
14 Carson JA, Peets A, Grant V, McLaughlin K.
The effect of gender interactions on students’
physical examination ratings in objective
structured clinical examination stations.
Acad Med. 2010;85:1772–1776.
15 Haist SA, Wilson JF, Elam CL, Blue AV,
Fosson SE. The effect of gender and age on
medical school performance: An important
interaction. Adv Health Sci Educ Theory
Pract. 2000;5:197–205.
16 Wang-Cheng RM, Fulkerson PK, Barnas
GP, Lawrence SL. Effect of student and
preceptor gender on clinical grades in
an ambulatory care clerkship. Acad Med.
1995;70:324–326.
17 Rand VE, Hudes ES, Browner WS, Wachter
RM, Avins AL. Effect of evaluator and
resident gender on the American Board of
Internal Medicine evaluation scores. J Gen
Intern Med. 1998;13:670–674.
18 Bienstock JL, Martin S, Tzou W, Fox HE.
Medical students’ gender is a predictor of
success in the obstetrics and gynecology basic
clerkship. Teach Learn Med. 2002;14:240–243.
19 Cuddy MM, Swanson DB, Clauser BE. A
multilevel analysis of examinee gender and
USMLE Step 1 performance. Acad Med.
2008;83(10 suppl):S58–S62.
20 Cuddy MM, Swanson DB, Clauser BE.
A multilevel analysis of the relationships
between examinee gender and United States
Medical Licensing Exam (USMLE) Step 2
CK content area performance. Acad Med.
2007;82(10 suppl):S89–S93.
21 Swygert KA, Cuddy MM, van Zanten M,
Haist SA, Jobe AC. Gender differences in
examinee performance on the Step 2 Clinical
Skills data gathering (DG) and patient note
(PN) components. Adv Health Sci Educ
Theory Pract. 2012;17:557–571.
22 Austin EJ, Evans P, Goldwater R, Potter V. A
preliminary study of emotional intelligence,
empathy and exam performance in first
year medical students. Pers Individ Dif.
2005;39:1395–1405.
23 Hojat M, Gonnella JS, Mangione S, et al.
Empathy in medical students as related to
academic performance, clinical competence
and gender. Med Educ. 2002;36:522–527.
24 Roter DL, Hall JA, Aoki Y. Physician gender
effects in medical communication: A meta-
analytic review. JAMA. 2002;288:756–764.
25 Street RL Jr, Makoul G, Arora NK,
Epstein RM. How does communication
heal? Pathways linking clinician–patient
communication to health outcomes. Patient
Educ Couns. 2009;74:295–301.
26 Hall JA, Roter DL. Do patients talk differently
to male and female physicians? A meta-
analytic review. Patient Educ Couns.
2002;48:217–224.
27 Hull AL. Medical student performance. A
comparison of house officer and attending
staff as evaluators. Eval Health Prof.
1982;5(1):87–94.
28 Spielvogel R, Stednick Z, Beckett L, Latimore
D. Sources of variability in medical student
evaluations on the internal medicine clinical
rotation. Int J Med Educ. 2012;3:245–251.
... 5 Since that time, additional work has examined the association between gender and medical school performance with mixed results; generally, either no gender-associated difference is observed or a small increase in clinical performance is noted for women. [6][7][8][9][10] In addition, research has found that narrative evaluations of women include more personality terms, whereas the focus for men includes more competencyrelated skills. [11][12][13] These existing studies focus on 1 metric of evaluation, either overall scores or language analysis, without considering individual components of evaluations and how these metrics interact. ...
... 5 More recent work has found higher clinical grades for women, with gender concordance or discordance in evaluator-student pairings associated with outcomes. 6,10 Our data set consisted of composite evaluations, each including evaluators of all genders. As such, we were unable to examine whether a particular evaluator-student gender concordance was associated with the observed differences. ...
... That said, this correlation agrees with previous research showing that longer observation time is associated with higher grades. 6,19 This finding raises the possibility of an actionable intervention that may substantively improve learner outcomes and equity-cognizant consideration of interaction time between faculty and students. ...
Article
Full-text available
Importance: Women studying medicine currently equal men in number, but evidence suggests that men and women might not be evaluated equally throughout their education. Objective: To examine whether there are differences associated with gender in either objective or subjective evaluations of medical students in an internal medicine clerkship. Design, setting, and participants: This single-center retrospective cohort study evaluated data from 277 third-year medical students completing internal medicine clerkships in the 2017 to 2018 academic year at an academic hospital and its affiliates in Pennsylvania. Data were analyzed from September to November 2020. Exposure: Gender, presumed based on pronouns used in evaluations. Main outcomes and measures: Likert scale evaluations of clinical skills, standardized examination scores, and written evaluations were analyzed. Univariate and multivariate linear regression were used to observe trends in measures. Word embeddings were analyzed for narrative evaluations. Results: Analyses of 277 third-year medical students completing an internal medicine clerkship (140 women [51%] with a mean [SD] age of 25.5 [2.3] years and 137 [49%] presumed men with a mean [SD] age of 25.9 [2.7] years) detected no difference in final grade distribution. However, women outperformed men in 5 of 8 domains of clinical performance, including patient interaction (difference, 0.07 [95% CI, 0.04-0.13]), growth mindset (difference, 0.08 [95% CI, 0.01-0.11]), communication (difference, 0.05 [95% CI, 0-0.12]), compassion (difference, 0.125 [95% CI, 0.03-0.11]), and professionalism (difference, 0.07 [95% CI, 0-0.11]). With no difference in examination scores or subjective knowledge evaluation, there was a positive correlation between these variables for both genders (women: r = 0.35; men: r = 0.26) but different elevations for the line of best fit (P < .001). Multivariate regression analyses revealed associations between final grade and patient interaction (women: coefficient, 6.64 [95% CI, 2.16-11.12]; P = .004; men: coefficient, 7.11 [95% CI, 2.94-11.28]; P < .001), subjective knowledge evaluation (women: coefficient, 6.66 [95% CI, 3.87-9.45]; P < .001; men: coefficient, 5.45 [95% CI, 2.43-8.43]; P < .001), reported time spent with the student (women: coefficient, 5.35 [95% CI, 2.62-8.08]; P < .001; men: coefficient, 3.65 [95% CI, 0.83-6.47]; P = .01), and communication (women: coefficient, 6.32 [95% CI, 3.12-9.51]; P < .001; men: coefficient, 4.21 [95% CI, 0.92-7.49]; P = .01). The model based on the men's data also included growth mindset as a significant variable (coefficient, 4.09 [95% CI, 0.67-7.50]; P = .02). For narrative evaluations, words in context with "he or him" and "she or her" differed, with agentic terms used in descriptions of men and personality descriptors used more often for women. Conclusions and relevance: Despite no difference in final grade, women scored higher than men on various domains of clinical performance, and performance in these domains was associated with evaluators' suggested final grade. The content of narrative evaluations significantly differed by student gender. This work supports the hypothesis that how students are evaluated in clinical clerkships is associated with gender.
... 28 Higher clinical scores have been reported for female third-year human medicine students. 29 This last study, based on a large population of students and evaluators and on a fairly subjective scoring system (from exceptional to below expectations), shows an interaction between the evaluator's and student's gender (i.e., female evaluators assigned lower scores to male students). 29 In the present study, although the assessment was based on only one skill (transrectal palpation) and on an objective score system (correct or incorrect answer), the disproportion between the number of female and male students probably explains the gender differences. ...
... 29 This last study, based on a large population of students and evaluators and on a fairly subjective scoring system (from exceptional to below expectations), shows an interaction between the evaluator's and student's gender (i.e., female evaluators assigned lower scores to male students). 29 In the present study, although the assessment was based on only one skill (transrectal palpation) and on an objective score system (correct or incorrect answer), the disproportion between the number of female and male students probably explains the gender differences. Moreover, the mean number of tests performed by the male students doi: 10.3138/jvme-2021-0031 JVME advance access article © 2021 AAVMC This advance access version may differ slightly from the final published version. ...
Article
In a veterinary medicine curriculum, students’ hands-on practice is essential but is still considered one of the major deficiencies in veterinary schools in Europe. After theoretical and basic practical training, students, under the control of experienced veterinarians (supervisors), monitored the reproductive cycle of embryo recipients by transrectal palpation and ultrasound. To evaluate the skills of students, the question “Has she ovulated?” was posed when a dominant follicle ≥ 35 mm was recorded in the previous day’s examination and a score of 1 or 0 was assigned in the case of a correct or incorrect answer (test palpation), respectively. Study 1 involved the retrospective evaluation of 3,509 test palpation records of 43 students (31 females, 12 males) and showed a statistically significant positive correlation between the number of test palpations performed and the proportion of correct answers. There was a statistically significant effect of the number of test palpations performed by each student, their gender, and the season on the correct answers. When performing > 50 test palpations, a statistical difference between gender was observed ( p < .05). Study 2 involved the prospective evaluation of 687 records on 52 standardbred or thoroughbred recipient mares collected from nine right-handed female students. The different mares, breed, occurrence of ovulation on the left or right ovary, and the presence of one or more large follicle(s) per ovary had no effect on the correct answers ( p > .05). Individual students’ performances were statistically different ( p < .05), ranging from 60% to 92%.
... Finally, gender is taken into account because female students are more likely to get higher clinical grades. 30,31 Following this line of arguments, the present study tests two hypotheses: Each block starts with a few weeks of thematic education, after which the acquired knowledge is applied in that clerkship. The blocks occur in the following order: 10 weeks internal medicine; 10 weeks surgery; 10 weeks paediatrics; gynaecology and obstetrics; 10 weeks neurology and psychiatry; 9 weeks dermatology, ear, nose and throat surgery, and ophthalmology; and finally 9 weeks family and social medicine. ...
Article
Full-text available
CONTEXT Ethnic minority students find that their ethnicity negatively affects the evaluation of their capacities and their feelings in medical school. This study tests whether ethnic minority and majority students differ in their ‘self-regulatory focus’ in clinical training, i.e. their ways to approach goals, due to differences in social learning experiences. Self-regulatory focus consists of a promotion and prevention focus. People who are prone to stereotypes and unfair treatments, are more likely to have a prevention focus and conceal certain identity aspects. OBJECTIVES To test whether ethnic minority students, as compared to ethnic majority students, are equally likely to have a promotion focus, but more likely to have a prevention focus in clinical training due to more negative social learning experiences (H1), and whether the relationship between student ethnicity and clinical evaluations can be explained by students’ gender, social learning experiences, self-regulatory focus, and impression management (H2). METHODS Survey and clinical evaluation data of 312 (71.2% female) clerks were collected and grouped into 215 ethnic majority (69.4%) and 95 ethnic minority students (30.6%). Students’ social learning experiences were measured as: perceptions of unfair treatment, trust in supervisors, and social academic fit. Self-regulatory focus (general and work specific) and impression management were also measured. A parallel mediation model (H1) and hierarchical multiple regression analyses were used (H2). RESULTS Ethnic minority students had higher perceptions of unfair treatment and lower trust in their supervisors in clinical training. They were more prevention focused in clinical training, but this was not mediated by having more negative social learning experiences. Lower clinical evaluations for ethnic minority students were unexplained. Promotion focus in clinical training and trust in supervisors positively relate to clinical grades. CONCLUSION Student ethnicity predicts social learning experiences, self-regulatory focus and grades in clinical training. The hidden curriculum plausibly plays a role here.
... Yet another [4] evaluated the effect of student selected study materials during clerkships on NBME performance. Some studies have evaluated the effect of other factors on student performance during clinical education, such as preceptor quality [5], preceptor evaluations [6][7][8], clerkship order [9], and work hours [9]. To our knowledge, only two studies [10,11] have investigated impact of an online curriculum during clerkships on student performance on National Board of Osteopathic Medical Examiners (NBOME) examinations. ...
Article
Full-text available
Context Many medical schools have a distributed model for clinical clerkship education, challenging our ability to determine student gaps during clinical education. With the graduating class of 2017, A.T. Still University’s School of Osteopathic Medicine in Arizona (ATSU-SOMA) began requiring additional online curricula for all clerkship courses. Objectives To determine whether third year and fourth year students receiving ATSU-SOMA’s online curricula during core clerkships performed better overall on national standardized examinations than students from previous years who had not received the curricula, and whether scores from online coursework correlated with outcomes on standardized examinations as possible early predictors of success. Methods This retrospective cohort study analyzed existing data (demographics and assessments) from ATSU-SOMA classes of 2017–2020 (curriculum group) and 2014–2016 (precurriculum group). The effect of the curriculum on national standardized examinations (Comprehensive Osteopathic Medical Achievement Test [COMAT] and Comprehensive Osteopathic Medical Licensing Examination of the United States [COMLEX-USA]) was estimated using augmented inverse probability weighting (AIPW). Correlations between assignment scores and national standardized examinations were estimated using linear regression models. Results The curriculum group had 405 students with a mean (standard deviation [SD]) age of 25.7 (±3.1) years. Two hundred and fifteen (53.1%) students in the curriculum group were female and 190 (46.9%) were male. The precurriculum group had 308 students (mean ± SD age, 26.4 ± 4.2 years; 157 [51.0%] male; 151 [49.0%] female). The online curriculum group had higher COMAT clinical subject exam scores in obstetrics and gynecology, osteopathic principles and practice (OPP), psychiatry, and surgery (all p≤0.04), as well as higher COMLEX-USA Level 2-Cognitive Evaluation (CE) family medicine and OPP subscores (both p≤0.03). The curriculum group had a 9.4 point increase in mean total COMLEX-USA Level 2-CE score (p=0.08). No effect was found for the curriculum overall on COMAT mean or COMLEX-USA Level 2-Performance Evaluation scores (all p≥0.11). Total coursework scores in each core clerkship, excluding pediatrics, were correlated with COMAT mean score (all adjusted p≤0.03). Mean scores for five of the seven assignment types in core clerkships, excluding evidence based medicine types, were positively correlated with COMAT mean scores (all adjusted p≤0.049). All assignment types correlated with COMLEX-USA Level 2-CE total score (all adjusted p≤0.04), except interprofessional education (IPE). Conclusions Results from this study of 713 students from ATSU-SOMA suggested that our online curriculum supplemented clinic based learning during clerkship courses and improved student outcomes on national standardized examinations.
... Practicing mindfulness, this teacher, through self-reflection recognizes their erroneous assumption that all learners going into alternative fields are uninterested in their teaching, and instead of feeling defeated, renews their vigor to teach. Another teacher may enthusiastically teach a learner that looks/acts like them (Riese et al. 2017), but through practiced reflection recognizes that they had also been judging that learner differently. ...
Article
Full-text available
Medical educators’ stressors continue to increase, and they increasingly find themselves removed from their learners. This distance is thought to contribute to the disenchantment many educators feel. The challenge for educators is to reengage with their learners and restore their satisfaction in teaching. Mindful teaching can help educators meet this challenge. Mindful teaching is not an instructional technique; rather, it is a way of being that the teacher embodies. Mindful teachers practice awareness, acceptance and curiosity. They recognize the needs of their learners, engaging with learners who are ‘at the ready’; encouraging those who might not be engaged; and advocating for those who need support. These educators are less susceptible to burnout and help learners develop their own mindfulness. The Tips noted in this article can help educators make deeper connections with their learners, garner greater sense of personal accomplishment and become invigorated by their learners’ achievements.
... This is particularly true for the surgery clerkship, where grade inflation is common and there is no clear consensus about what constitutes an "honors"-level performance [12]. Furthermore, there have been a number of recent studies demonstrating that clinical evaluations, which often are an important component of clerkship grades, may be subject to implicit biases and may disproportionately disadvantage traditionally underrepresented groups in medicine [13][14][15]. Given concerns about the objectivity of clinical grades, some institutions have begun advocating for pass/fail clerkship evaluations [16]. ...
Article
Full-text available
Purpose of review: In light of the announcement that the United States Medical Licensing Examination Step 1 exam will transition to pass/fail reporting, we reviewed recent literature on evaluating residency applicants with a focus on identifying objective measurements of applicant potential. Recent findings: References from attending urologists, Step 1 scores, overall academic performance, and research publications are among the most important criteria used to assess applicants. There has been a substantial increase in the average number of applications submitted per applicant, with both applicants and residency directors indicating support for a cap on the number of applications that may be submitted. Additionally, there are increasing efforts to promote diversity with the goal of improving care and representation in urology. Despite progress in standardizing interview protocols, inappropriate questioning remains an issue. Opportunities to improve residency application include promoting diversity, enforcing prohibitions of illegal practices, limiting application numbers, and finding more transparent and equitable screening measures to replace Step 1.
Article
FULL TEXT: https://authors.elsevier.com/c/1efK~6tEErZmP6 OBJECTIVE The purpose of this study was to determine the strength of the association between medical school ranking and orthopedic surgery residency ranking using the current cohort of orthopedic surgery residents. DESIGN We obtained a list of accredited programs from Doximity for orthopedic surgery residency programs and U.S. News & World Report for medical schools. Each orthopedic surgery residency program webpage was evaluated for the presence of an orthopedic surgery residency roster. For each resident, the medical school attended, allopathic or osteopathic degree, and year of post-graduate training was recorded. Orthopedic surgery residency programs and medical schools were assigned to one of four tiers for each based on their respective ranking. Descriptive statistics, Chi squared tests and Pearson residuals were used to analyze the association of orthopedic surgery residency tier and medical school tier. Post-hoc pairwise comparisons were performed utilizing the Bonferroni correction to account for 16 tests, correcting the significance level to p = 0.003. SETTING 187 orthopedic surgery residency program webpages. PARTICIPANTS 4123 orthopedic surgery residents. RESULTS There was a significant association between medical school tier and orthopedic surgery residency tier (X² [9] = 1214.78, p < 0.001). The post-hoc residual values were statistically significant for 75% (12/16) of tests performed. The majority of Tier 1 orthopedic surgery residents 50.5% (800/1585) attended a Tier 1 medical school. The strongest positive association exists between Tier 1 medical students attending Tier 1 residencies (residual = 23.978, p < 0.001). The strongest negative association with Tier 4 residencies was with Tier 1 medical schools (residual= -15.656, p< 0.001). CONCLUSIONS Medical school ranking is an important consideration for prospective orthopedic surgery applicants and may become more important with less objective measures of academic performance such as United States Medical Licensing Examination Step 1. Level of Evidence Observational
Article
IntroductionSeveral factors are known to affect the way clinical performance evaluations (CPEs) of medical students are completed by supervising physicians. We sought to explore the effect of faculty perceived “level of interaction” (LOI) on these evaluations.Methods Our third-year CPE requires evaluators to identify perceived LOI with each student as low, moderate, or high. We examined CPEs completed during the academic year 2018–2019 for differences in (1) clinical and professionalism ratings, (2) quality of narrative comments, (3) quantity of narrative comments, and (4) percentage of evaluation questions left unrated.ResultsA total of 3682 CPEs were included in the analysis. ANOVA revealed statistically significant differences between LOI and clinical ratings (p ≤ .001), with mean ratings from faculty with a high LOI significantly higher than from faculty with a moderate or low LOI (p ≤ .001). Chi-squared analysis demonstrated differences based on faculty LOI and whether questions were left unrated (p ≤ .001), quantity of narrative comments (p ≤ .001), and specificity of narrative comments (p ≤ .001).Conclusions Faculty who perceive higher LOI were more likely to assign that student higher ratings, complete more of the clinical evaluation and were more likely to provide narrative feedback with more specific, higher-quality comments.
Article
Purpose: Efforts to address inequities in medical education are centered on a dialogue of deficits that highlight negative underrepresented in medicine (UIM) learner experiences and lower performance outcomes. An alternative narrative explores perspectives on achievement and equity in assessment. This study sought to understand UIM learner perceptions of successes and equitable assessment practices. Method: Using narrative research, investigators selected a purposeful sample of self-identified UIM fourth-year medical students and senior-level residents and conducted semi-structured interviews. Questions elicited personal stories of achievement during clinical training, clinical assessment practices that captured achievement, and equity in clinical assessment. Using re-storying and thematic analysis, investigators coded transcripts and synthesized data into themes and representative stories. Results: Twenty UIM learners (6 medical students and 14 residents) were interviewed. Learners often thought about equity during clinical training and provided personal definitions of equity in assessment. Learners shared stories that reflected their achievements in patient care, favorable assessment outcomes, and growth throughout clinical training. Sound assessments that captured achievements included frequent observations with real-time feedback on pre-defined expectations by supportive, longitudinal clinical supervisors. Finally, equitable assessment systems were characterized as sound assessment systems that also avoided comparison to peers, used narrative assessment, assessed patient care and growth, trained supervisors to avoid bias, and acknowledged learner identity. Conclusions: UIM learners characterized equitable and sound assessment systems that captured achievements during clinical training. These findings guide future efforts to create an inclusive, fair, and equitable clinical assessment experience.
Article
Full-text available
Objectives: To explore the sources of variability in evaluator ratings among third year medical students in the Internal Medicine clinical rotation. Also, to examine systematic effects and variability introduced by differences in the various student, evaluator, and evaluation settings. Methods: A multilevel model was used to estimate the amount of between-student, between-rater and rater-student interaction variability present in the students' clinical evaluations in a third year internal medicine clinical rotation. Within this model, linear regression analysis was used to estimate the effect of variables on the students' numerical evaluation scores and the reliability of those scores. Results: A total of 2,747 evaluation surveys were collected from 389 evaluators on 373 students over 4.5 years. All surveys used a nine-point grading scale, and therefore all results are reported on this scale. The calculated between-rater, between-student and rater-student interaction variance components were 0.50, 0.27 and 0.62, respectively. African American/Black students had lower scores than Caucasian students by 0.58 points (t=-3.28; P=0.001). No gender effects were noted. Conclusions: These between-rater and between-student variance components imply that the evaluator plays a larger role in the students' scores than the students themselves. The residual rater-student interaction variance was larger and did not change by accounting for the measured demographic variables. This implies there is significant variability in each rater-student interaction that remains unexplained. This could contribute to unreliability in the system, requiring that students receive between 8 and 17 clinical evaluations to achieve 80reliability.
Article
Full-text available
There is ample evidence today in the stereotype threat literature that women and girls are influenced by gender-stereotyped expectations on standardized. math tests. Despite its high relevance to education, this phenomenon has not received much attention in school settings. The present studies offer the 1 st evidence to date indicating that middle school girls exhibit a performance deficit in quasi-ordinary classroom circumstances when they are simply led to believe that the task at hand measures mathematical skills. This deficit occurred in girls working alone or in mixed-gender groups (i.e., presence of regular classmates) but not in same-gender groups (i.e., presence of only same-gender classmates). Compared with the mixed-gender groups, the same-gender groups were also associated for girls in the stereotype threat condition with greater accessibility of positive role models (i.e., female classmates who excel in math), at the expense of both stereotypic in-group and out-group members (i.e., low-math-achievement girls and high-math-achievement boys). Finally, the greater accessibility of positive role models mediated the impact of the activated stereotype on girls' performance, exactly as one would expect from C. M. Steele's (1997) stereotype threat theory. Taken together, these findings clearly show that reducing stereotype threat in the classroom is a crucial challenge for both scientists and teachers.
Article
Full-text available
We extend the analysis of early-emerging gender differences in academic achievement to include both (objective) test scores and (subjective) teacher assessments. Using data from the 1998-99 ECLS-K cohort, we show that the grades awarded by teachers are not aligned with test scores, with the disparities in grading exceeding those in testing outcomes and uniformly favoring girls, and that the misalignment of grades and test scores can be linked to gender differences in non-cognitive development. Girls in every racial category outperform boys on reading tests and the differences are statistically significant in every case except for black fifth-graders. Boys score at least as well on math and science tests as girls, with the strongest evidence of a gender gap appearing among whites. However, boys in all racial categories across all subject areas are not represented in grade distributions where their test scores would predict. Even those boys who perform equally as well as girls on reading, math and science tests are nevertheless graded less favorably by their teachers, but this less favorable treatment essentially vanishes when non-cognitive skills are taken into account. White boys who perform on par with white girls on these subject-area tests and exhibit the same non-cognitive skill level are graded similarly. For some specifications there is evidence of a grade "bonus" for white boys with test scores and behavior like their girl counterparts. While the evidence is a little weaker for blacks and Hispanics, the message is essentially the same.
Article
We estimate the effect of primary school teachers' gender biases on boys' and girls' academic achievements during middle and high school and on the choice of advanced level courses in math and sciences during high school in Tel-Aviv, Israel. We measure bias using class-gender differences in scores between school exams graded by teachers and national exams graded blindly by external examiners. For identification, we rely on the random assignment of teachers and students to classes in primary schools. Our results suggest that assignment to a teacher with a greater bias in favor of girls (boys) has positive effects on girls' (boys') achievements. Such gender biases have also positive impact on girls' (boys') enrollment in advanced level math courses in high school. These results suggest that teachers' biased behavior at early stages of schooling has long run implications for occupational choices and earnings at adulthood, because enrollment in advanced courses in math and science in high school is a prerequisite for post-secondary schooling in engineering, computer science and so on.
Article
A prominent class of explanations for the gender gaps in student outcomes focuses on the interactions between students and teachers. In this study, I examine whether assignment to a same-gender teacher influences student achievement, teacher perceptions of student performance, and student engagement. This study's identification strategy exploits a unique matched-pairs feature of a major longitudinal study, which provides contemporaneous data on student outcomes in two different subjects. Within-student comparisons indicate that assignment to a same-gender teacher significantly improves the achievement of both girls and boys as well as teacher perceptions of student performance and student engagement with the teacher's subject. © 2007 by the Board of Regents of the University of Wisconsin System.
Article
Girls presently outperform boys in overall academic success. Corresponding gender stereotypes portray male students as lazy and troublesome and female students as diligent and compliant. The present study investigated whether these stereotypes impact teachers’ perceptions of students and whether students’ visible enactment of their gender at school (behaving in a very masculine or feminine way) increases the impact of these stereotypes on teachers’ perceptions of students. We hypothesized that teachers would ascribe more behavior that impedes learning and less behavior that fosters learning to male students who enact masculinity as compared with male students who show gender-neutral behavior and female students. Three pilot studies (N = 104; N = 82; N = 86) yielded pretested material for a randomized vignette study of N = 104 teachers. The teachers read one randomly assigned vignette describing a male (or female) student enacting his (or her) gender (or not) and rated how likely this student would be to display behaviors that impede or foster learning in a 2 (between: target students’ gender) × 2 (between: gender enactment [yes/no]) × 2 (between: teachers’ gender) × 2 (within: ascribed behavior) factorial design. As expected, male students enacting masculinity were rated as showing the lowest amount of academic engagement. Results are discussed with regard to the current debate on the causes of boys’ lower academic success.
Article
Studies of the evaluation of medical students' clinical performance frequently do not differentiate between ratings by house officer and attending staff evaluators. This practice is not appropriate, since research investigations have shown that house officers rate medical students' clinical performance higher and have higher interrater agreement than do attending staff This investigation studies one aspect of the validity of medical students' clinical performance ratings and demonstrates that there are higher correlations between house officer ratings of student knowledge and student cognitive ability scores than there are between attending staff evaluations and student ability scores.
Article
Student's temperament plays a significant role in teacher's perception of the student's learning style, educational competence (EC), and teachability. Hence, temperament contributes to student's academic achievement and teacher's subjective ratings of school grades. However, little is known about the effect of gender and teacher's age on this association. We examined the effect of teacher's and student's gender and teacher's age on teacher-perceived temperament, EC, and teachability, and whether there is significant same gender or different gender association between teachers and students in this relationship. The participants were population-based sample of 3,212 Finnish adolescents (M= 15.1 years) and 221 subject teachers. Temperament was assessed with Temperament Assessment Battery for Children - Revised and Revised Dimensions of Temperament Survey batteries and EC with three subscales covering Cognitive ability, Motivation, and Maturity. Data were analyzed with multi-level modelling. Teachers perceived boys' temperament and EC more negatively than girls'. However, the differences between boys and girls were not as large when perceived by male teachers, as they were when perceived by female teachers. Males perceived boys more positively and more capable in EC and teachability than females. They were also stricter regarding their perceptions of girls' traits. With increasing age, males perceived boys' inhibition as higher and mood lower. Generally, the older the teacher, the more mature he/she perceived the student. Teachers' ratings varied systematically by their gender and age, and by students' gender. This bias may have an effect on school grades and needs be taken into consideration in teacher education.
Article
A group of 156 first year medical students completed measures of emotional intelligence (EI) and physician empathy, and a scale assessing their feelings about a communications skills course component. Females scored significantly higher than males on EI. Exam performance in the autumn term on a course component (Health and Society) covering general issues in medicine was positively and significantly related to EI score but there was no association between EI and exam performance later in the year. High EI students reported more positive feelings about the communication skills exercise. Females scored higher than males on the Health and Society component in autumn, spring and summer exams. Structural equation modelling showed direct effects of gender and EI on autumn term exam performance, but no direct effects other than previous exam performance on spring and summer term performance. EI also partially mediated the effect of gender on autumn term exam performance. These findings provide limited evidence for a link between EI and academic performance for this student group. More extensive work on associations between EI, academic success and adjustment throughout medical training would clearly be of interest.
Article
Multiple studies examining the relationship between physician gender and performance on examinations have found consistent significant gender differences, but relatively little information is available related to any gender effect on interviewing and written communication skills. The United States Medical Licensing Examination (USMLE(®)) Step 2 Clinical Skills(®) (CS(®)) examination is a multi-station examination where examinees (physicians in training) interact with, and are rated by, standardized patients (SPs) portraying cases in an ambulatory setting. Data from a recent complete year (2009) were analyzed via a series of hierarchical linear models to examine the impact of examinee gender on performance on the data gathering (DG) and patient note (PN) components of this examination. Results from both components show that not only do women have higher scores on average, but women continue to perform significantly better than men when other examinee and case variables are taken into account. Generally, the effect sizes are moderate, reflecting an approximately 2% score advantage by encounter. The advantage for female examinees increased for encounters that did not require a physical examination (for the DG component only) and for encounters that involved a Women's Health issue (for both components). The gender of the SP did not have an impact on the examinee gender effect for DG, indicating a desirable lack of interaction between examinee and SP gender. The implications of the findings, especially with respect to the validity of the use of the examination outcomes, are discussed.