ArticlePDF Available

Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching

Authors:

Abstract

This paper analyses the current research regarding student evaluations of courses and teaching. The article argues that student evaluations are influenced by racist, sexist and homophobic prejudices, and are biased against discipline and subject area. This paper’s findings are relevant to policymakers and academics as student evaluations are undertaken in over 16,000 higher education institutions at the end of each teaching period. The article’s purpose is to demonstrate to the higher education sector that the data informing student surveys is flawed and prejudiced against those being assessed. Evaluations have been shown to be heavily influenced by student demographics, the teaching academic’s culture and identity, and other aspects not associated with course quality or teaching effectiveness. Evaluations also include increasingly abusive comments which are mostly directed towards women and those from marginalised groups, and subsequently make student surveys a growing cause of stress and anxiety for these academics. Yet, student evaluations are used as a measure of performance and play a role in hiring, firing and promotional decisions. Student evaluations are openly prejudiced against the sector’s most underrepresented academics and they contribute to further marginalising the same groups universities declare to protect, value and are aiming to increase in their workforces.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=caeh20
Assessment & Evaluation in Higher Education
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/caeh20
Sexism, racism, prejudice, and bias: a literature
review and synthesis of research surrounding
student evaluations of courses and teaching
Troy Heffernan
To cite this article: Troy Heffernan (2021): Sexism, racism, prejudice, and bias: a literature review
and synthesis of research surrounding student evaluations of courses and teaching, Assessment &
Evaluation in Higher Education, DOI: 10.1080/02602938.2021.1888075
To link to this article: https://doi.org/10.1080/02602938.2021.1888075
Published online: 06 Mar 2021.
Submit your article to this journal
View related articles
View Crossmark data
ASSESSMENT & EVALUATION IN HIGHER EDUCATION
Sexism, racism, prejudice, and bias: a literature review
and synthesis of research surrounding student evaluations
of courses and teaching
Troy Heffernan
La Trobe University, Melbourne, Australia
ABSTRACT
This paper analyses the current research regarding student evaluations
of courses and teaching. The article argues that student evaluations are
influenced by racist, sexist and homophobic prejudices, and are biased
against discipline and subject area. This paper’s findings are relevant to
policymakers and academics as student evaluations are undertaken in
over 16,000 higher education institutions at the end of each teaching
period. The article’s purpose is to demonstrate to the higher education
sector that the data informing student surveys is flawed and prejudiced
against those being assessed. Evaluations have been shown to be heavily
influenced by student demographics, the teaching academic’s culture
and identity, and other aspects not associated with course quality or
teaching effectiveness. Evaluations also include increasingly abusive com-
ments which are mostly directed towards women and those from mar-
ginalised groups, and subsequently make student surveys a growing
cause of stress and anxiety for these academics. Yet, student evaluations
are used as a measure of performance and play a role in hiring, firing
and promotional decisions. Student evaluations are openly prejudiced
against the sector’s most underrepresented academics and they contrib-
ute to further marginalising the same groups universities declare to
protect, value and are aiming to increase in their workforces.
Introduction
This paper is part of a research project that investigates questions concerning student evalua-
tions of courses and teaching (SETs). The project examines the extent to which SET results
provide objective assessments of the course, the academic teaching the course, how biased
the information may be, and what groups are potentially disadvantaged by these processes. It
also investigates what adjustments can be made to create a more equitable system. Ample
evidence indicates that course evaluations (that is, evaluations of a course in terms of its content
and outcomes) and teaching evaluations (surveys dedicated to evaluating the teaching academic’s
performance), are both strongly correlated with the teaching academic’s demographics and
other issues unrelated to the course or the academic’s performance (Boring, Ottoboni, and Stark
2016; Uttl and Smibert 2017). For this reason, course and teaching evaluations are treated as
one within this paper (unless otherwise stated).
© 2021 Informa UK Limited, trading as Taylor & Francis Group
CONTACT Troy Heffernan t.heffernan@latrobe.edu.au
https://doi.org/10.1080/02602938.2021.1888075
KEYWORDS
Student evaluations;
academic equity;
wellbeing; prejudice;
abuse; higher education
2 T. HEFFERNAN
This paper’s collation and analysis of existing SET research makes several points clear: SETs
are significantly biased due to the demographics of students completing them, and prejudice
against the academic teaching the course, are dependent on subject areas, and are impacted
on by myriad other aspects not connected to the teacher or course. Yet despite these clear
prejudices and biases, SETs are used to gauge teaching quality and are a component in judging
who is hired, who is let go and who is promoted. In addition to these discriminatory practices,
the trend of abusive comments in SETs is increasing (Tucker 2014). These prejudices in SET
results, and how the results are used, are subsequently leading to growing academic mental
health and wellbeing issues that universities cannot ignore (Jordan 2011; Fan et al. 2019).
Thus, this paper provides much needed synthesis and analysis of the existing research for
the benefit of academics in every field and discipline who are subjected to these practices.
Cunningham-Nelson, Baktashmotlagh, and Boles (2019) have built upon the International
Association of Universities ‘World List of Universities’ (2006) and estimated that globally the
teaching staff of over 16,000 higher education institutions collect SETs at the end of each
teaching period. The paper makes clear that while SETs are being used as an aid in gauging
performance, women and marginalised groups are losing jobs, being promoted slower and/or
less often, and are being negatively impacted at career progression junctures within the acad-
emy (Uttl and Smibert 2017). This paper aims to inform higher education institutions, policy-
makers and administrators of the sexist, racist, homophobic and other biases that underpin SET
data as the evidence demonstrates the way institutions are complicit in prejudicial practices
associated with SET data.
Methods
This paper analyses the literature and evident themes around SETs and the prejudices and
biases that influence their results. The paper follows the methodology of a systematic analysis
to provide a transparent account of how data was gathered and informs the audience of the
protocols used to provide an account of the literature and themes from a breadth of sources
(Macpherson and Holt 2007; McCrae and Purssell 2020).
Data collection
This paper’s literature review method used the following initial criteria. The search began by
seeking out research published between 1990 and 2020. The period was selected to incorporate
the growth of SET surveys to include how they are used today, how technology has changed
how surveys are conducted, and how technology has contributed to new data analysis methods.
The title, abstract and keyword search terms included all versions of ‘student evaluations, ‘stu-
dent evaluations of teaching’, ‘SETs’, ‘SECTs’, ‘course evaluations’ and ‘teacher evaluations’ in English
language peer-reviewed articles and books from standard institutional databases including
EBSCO, ProQuest and Web of Science.
The advantage of these search terms is that research surrounding these topics is not limited
to publications explicitly dedicated to one field; in this case education researchers or educa-
tion-based journals and books (Pittaway et al. 2004). As student evaluations impact on every
university discipline, research has been conducted in essentially all disciplinary areas. These
findings sometimes appear in journals relating specifically to the subject area (such as medicine
or engineering) rather than journals focusing specifically on teaching that subject. These search
terms thus resulted in literature beyond the usual scope of education journals and higher
education researchers concerned with teaching practices.
The search resulted in 293 publications. The list was then refined to remove the papers
addressing student surveys not connected to SETs (86), duplicate results (55), book reviews (9),
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 3
media articles (4) and non-English language publications (3). This resulted in 136 publications
meeting the criteria. Analysis of these 136 articles then led to the identification of 47 articles
that were not returned in the initial search. In most cases, the papers did not appear in the
initial search because they were not publications dedicated to SETs; rather, the discussion around
student surveys was carried out for the purpose of informing audiences concerned with course
design or professional development. However, as they contributed to current scholarly thought
around SETs and their impact, they were included in the following analysis.
Data analysis
After an initial reading of the final collection of material, themes were generated using Braun
and Clarke’s (2006) method of thematic analysis. This analytic method provides a system by
which patterns of meaning can be generated from qualitative data; in this case it was themes
surrounding student evaluations of courses and teaching (Clarke and Braun 2017). This process
identified 58 individual themes, which after further analysis were categorised into 35 sub-themes,
before finally being refined into the five main themes discussed in this paper.
Findings
Evaluations technically work
A starting point of many publications examining SETs is an acknowledgement that evaluations
work in the sense that they provide the university with data relating to course design, delivery
of the course and teaching staff performance. The inherent problem with SETs, however, is that
this data disguises the prejudices and biases underpinning the data being gathered, and sub-
sequently in the results being produced (Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014).
That SETs provide data that appears sound is arguably why institutions believe that evalua-
tions are a measure of effectiveness (Osoian et al. 2010; Tucker 2014) and therefore play a role
in academic hiring and promotions (Arthur 2009; Shah and Nair 2012; Boring, Ottoboni, and
Stark 2016). At the same time, it has been noted that SET results have been used to aid in
firing unproductive staff, or guiding staffing decisions during times of restructure (Jones, Gaffney-
Rhys, and Jones 2014; Uttl and Smibert 2017).
Apparent data quality is also what leads institutions to use SETs as signifiers of teaching
standards. Many institutions expect staff to achieve a certain SET result (e.g. 3.75 out of five or
higher, or over 75% etc.) to be seen as fulfilling their duties. Stark and Freishtat (2014) also
found that some universities use SETs to intentionally incite continuous cycles of competition
amongst academics by making the acceptable result one that is above the cohort’s average.
That is to say, no matter what the cohort’s results, half of the teaching staff will be below the
average and susceptible to the negative repercussions of not meeting the target such as
decreased promotion chances or leadership opportunities.
The issue many researchers examining SETs find is that looking at the data presented by
compiling evaluation results appears to provide a somewhat objective picture of teacher and
course success. However, what is rarely considered, or perhaps not seen, by universities and
researchers praising SETs are the prejudices and conditions that shape the views that form the
data (Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014).
Impact of students’ cultural and demographic backgrounds
The review found that students’ backgrounds and demographics in terms of gender, age, dis-
ciplinary area and study type can all impact on an academic’s SET results. That student demo-
graphics alone impact on SET results demonstrates just how flawed the system of evaluations
4 T. HEFFERNAN
is, and to what extent results are determined by factors not related to course content or teaching
quality (Rosen 2018).
Tucker’s (2014) study of 43,000 course evaluations found that across every discipline and
course evaluated, and regardless of teaching staff, women students submitted SETs that were
overall more favourable by two per cent. However, the standard level of positive SET score
increased by as much as six per cent when completed by international students, students from
older age groups, external students and students with higher grade point averages (GPAs). That
student demographics alone can make a difference of between two and six per cent in SET
results is a significant finding. However, it must be reiterated that this is just one of the many
biases and prejudices that can accumulate to greatly disadvantage some groups more than others.
Variances in SET results being provided by students from different disciplinary areas have
also been noted. Beran and Violato’s (2005) study of 370,000 evaluations and Centra’s (2009)
study of 238,000 evaluations found that, across multiple universities and countries, academics
teaching science-based subjects receive lower results than those teaching social science/human-
ities-based subjects. Uttl and Smibert (2017) study of 325,000 SETs divided subject areas into
quantitative and qualitative subjects; those being subjects with assessments based on right/
wrong assessment (such as correct calculations and formulas) and those with assessor judge-
ments such as essays. They found that those teaching quantitative subjects were 3.3 times more
likely to score in a lower evaluation bracket than those being evaluated by students in quali-
tative subjects. Those teaching quantitative subjects were also 1.88 times more likely to fail
their evaluations. With quantitative subjects continually being found to be evaluated lower by
students, Uttl and Smibert (2017, p. 8) concluded that:
Professors who teach quantitative vs. non-quantitative classes are not only likely to receive
lower [SETs], but they are also at a substantially higher risk of being labelled unsatisfactory in
teaching, and thus, more likely to be fired, not re-appointed, not promoted, not tenured, and
denied merit pay.
Students are also influenced by factors not related directly to the course or teacher. Several
studies have found high correlations between students’ grade expectations and the SET scores
they deliver. These studies have examined expectations based on GPAs and mid-term results
from institutions across several countries. Repeatedly, the findings are that students who are
graded higher, or expect to gain a high grade if the SET is completed before results are released,
provide higher scoring evaluations (Worthington 2002; Short et al. 2008; Stark and Freishtat
2014; Boring, Ottoboni, and Stark 2016). These findings also raise pedagogical concerns as these
results provide motivation for academics to set easier assessment, or perhaps grade easier, to
facilitate better SET results. Considering SET results are used to aid in hiring, firing and promo-
tional decisions, it cannot be ignored that academics may be motivated to alter their assessment
processes given that their livelihoods are at risk (Carter and Lara 2016; Bachan 2017).
Studies have also found that SET results are being driven by student biases irrelevant to
course content and effectiveness. Benton, Cashin, and Manhattan (2012) found class size to be
a major factor. Issues including classroom design, cleanliness of the university, quality of course
websites, library services, food options available on campus, and difficulty in the admissions
process (for first year students) have all been found to play a larger role in influencing SET
results than teaching quality or course design (Osoian et al. 2010).
Academic gender, ethnicity, sexual identity, and other demographics
Arguments have also been made that an academic’s gender, ethnicity, language, perceived sexual
identity, age or visible disabilities impacts on student evaluations (Valencia 2020). Such is the
bias against gender and perceptions of ethnicity, sexuality, age and disability that, in 1993,
Ambady and Rosenthal (1993) found that student reactions to a 30 second silent video of their
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 5
teacher played at the start of the semester correlated to the SET results the academic received.
More recently, Boring, Ottoboni, and Stark (2016) study of 23,000 SETs, and Fan et al.’s (2019)
study of 22,000 SETs, found that male students express a significant bias in favour of male
academics.
Numerous studies have also found statistically significant differences between how gender
influences academic evaluations. MacNell, Driscoll, and Hunt’s (2015) work determined that
women academics consistently receive lower scores relating to course design, clarity of assess-
ment, class engagement, turnaround time of essays, and question response times regardless of
their performance. Boring, Ottoboni, and Stark (2016) also found statistically significant examples
of student expectations being amplified due to the academic’s gender. Thus, they found that
not only did SET results highly correlate with grade expectations, when grade expectations
were met, male academics were rewarded with higher scores. When grade expectations were
not met, the impact on evaluation scores was lower for male academics. To explore the extremes
of gender prejudice in SETs, MacNell, Driscoll, and Hunt (2015) conducted a study that found
that online classes led by male avatars (regardless of the academic’s actual gender) received
higher SET scores than those led by women avatars. Thus, students even perceiving the teaching
academic to be a woman delivered lower scores.
A consistent theme within these studies is that gender, and even perceived gender, makes
a difference to SET scores and is highly prejudiced against women. In Boring, Ottoboni, and
Stark’s (2016) summary of their study’s findings and those of the existing studies they analysed,
they declared that SETs are ‘biased against female instructors by an amount that is large and
statistically significant’ (p. 1).
Many of these studies find that these biases and prejudices result in large and significant vari-
ations in SET results, but this raises the question of what these differences can look like in practice.
Fan et al.’s (2019) study of 22,000 SET results concluded that at the extreme, women academics are
receiving SET scores 37 percentage points lower than male academics. This figure represents science
subjects with high numbers of male students and being led by younger female academics (approx-
imately under 35 years old). Boring, Ottoboni, and Stark’s (2016) study found similar results and
concluded that the biases at play were so great that more effective female academics are being
placed in lower SET grading brackets than their less effective male counterparts.
In all studies relating to gender, the analyses indicate that the highest scores are awarded
in subjects filled with young, white, male students being taught by white English first language
speaking, able-bodied, male academics who are neither too young nor too old (approx.
35–50 years of age), and who the students believe are heterosexual. Most deviations from this
scenario in terms of student and academic demographics equates to lower SET scores. These
studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only
the least affected, they benefit from the practice. When every demographic group who does
not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance
the position of the already privileged.
Prejudice in SETs stemming from the academic’s ethnicity is also a common finding in studies
concerning evaluations and biases, as are issues around age, disability, sexual identity and
appearance (Andersen and Miller 1997; Cramer and Alexitch 2000; Worthington 2002). A primary
issue with these studies is that for large-scale quantitative research, the current state of higher
education inclusion and diversity means results concerning marginalised groups are deemed
too small to be considered a valid sample size (Hendrix 1998; Rubin 1998; Fan et al. 2019).
However, smaller scale surveys and studies relying on qualitative methods consistently find
prejudices against academics of colour (DiPietro and Faye 2005; Hamermesh and Parker 2005).
Significant bias in SETs have been found to impact negatively on academics of colour (Hendrix
1998; Rubin 1998), and academics whose native language is not that of the university. In most
studies this means the English language, though some studies have explored issues of SET
prejudice in European non-English speaking countries and found similar results (DiPietro and
6 T. HEFFERNAN
Faye 2005; Hamermesh and Parker 2005). Fan et al.’s (2019) study also found that academics
from diverse backgrounds or non-English first language backgrounds faced different levels of
prejudice according to subject area; more liberal subjects were less prejudiced but were prej-
udiced nonetheless. Domestic students are also more likely to provide lower SET scores to
academics from these groups than international students (Tucker 2014).
Providing further evidence of the significant prejudice women academics face in the SET
process, even though all academics from ethnically diverse and marginalised backgrounds
received lower SET scores than their white, English first language speaking colleagues, women
from ethnically diverse backgrounds are graded more harshly than men from ethnically diverse
backgrounds. At the extreme, Fan et al. (2019) found that in science faculties, a woman from a
non-English speaking background was half as likely to receive the same SET result as a white
English-speaking male.
SET comments
A key component of SETs in the literature that promotes their use is their anonymity.
Universities argue that anonymity allows students to provide honest feedback without fear
of retribution for speaking negatively against the teaching academic; the implication being
that students may feel unable to make negative comments if they can be identified (Tucker
2014; Uttl and Smibert 2017). Studies have nonetheless determined that the issue with anon-
ymous comments in SETs is that a portion of the comments are abusive, that the abuse is
growing, and that the abuse is mostly directed towards race, gender, sexual identity, ethnicity,
age and other marginalising characteristics.
The literature regarding comments in SETs is somewhat limited. Tucker’s (2014) study of
43,000 SETs suggested that only around one per cent of comments were abusive (though she
noted sharp increases across studies, and hypothesised that the trend was growing rapidly).
However, even if the figure of one per cent is accurate, it is imperative to point out that the
one per cent is not distributed evenly across academic demographics. Additionally, to the person
receiving the abusive comments, the emotional damage and stress is real, and the overall rate
of abusive comments is irrelevant to their stress, anxiety or mental wellbeing (Tucker 2014).
As might be expected considering the clear trends within the findings of this review, a white
male who is perceived to be heterosexual and is in the 35–50-year-old age group will, statis-
tically speaking, receive few, if any, abusive comments. Abusive comments are mostly directed
towards women and marginalised groups, focus on marginalising characteristics, and they are
cumulative. For example, women receive abusive comments, and academics of colour receive
abusive comments, thus, a woman of colour is more likely to receive abuse because of her
gender and her skin colour (Oliver et al. 2008; Jordan 2011). One way to consider this finding
is that some groups are so underrepresented in the sector that they do not constitute a valid
sample size in large-scale studies. It is women and academics from these underrepresented
groups that receive a majority of the abusive comments which makes the notion of ‘only one
per cent of comments being abusive’ a highly distorted figure.
The result of SET comments is that they are a source of anxiety for academics. The process
of SETs alone is cause for concern being that they are used for firing, hiring and promotion
purposes (Jones, Gaffney-Rhys, and Jones 2014; Uttl and Smibert 2017). However, that comments
for the sector’s women and marginalised groups are then likely to be at best unconstructive
and unjustified, and at worst racist, sexist, homophobic or ageist (among other prejudices) is
only a further cause for concern and mental distress for the academics receiving these comments
(Jordan 2011; Tucker 2014).
Jones, Gaffney-Rhys, and Jones (2014) considered the legal implications of universities con-
tinuing to allow SETs to be collected when they are known to be a cause of distress. They
discuss defamation and the university’s potential breaches of duty of care, and their suggestions
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 7
add further reason for SETs to be removed. Where comments are concerned, however, it is likely
that if a student who could be identified provided racist, sexist, homophobic or other abusive
feedback they would be removed from classes or face other consequences.
What is known from previous SET research?
For all the research and studies conducted around SETs, their conclusions largely all point to
similar findings. A frequent conclusion across large-scale quantitative and smaller qualitative
studies is that SETs rarely measure course or teacher quality or effectiveness, or if they do,
these elements are significantly outweighed in the final results by other factors unrelated to
course or teacher quality (Stark and Freishtat 2014; Tucker 2014; Boring, Ottoboni, and Stark
2016; Fan et al. 2019).
As has been discussed, SETs are heavily influenced by student demographics and subject
area, but most studies argue that the two greatest influences are the academic’s gender and
culture (Stark and Freishtat 2014; Boring, Ottoboni, and Stark 2016; Boring 2017; Fan et al. 2019).
It is also critical to note that these researchers, and those of many more studies within the
literature review, have suggested that it is impossible to account for these biases because from
one class, subject, let alone one university, to another, the variances cannot be predicted or
accounted for in the data.
Many researchers have not only questioned the use of SETs, but also the value of the content
they provide. Cunningham-Nelson, Baktashmotlagh, and Boles (2019) make the point that even
if SETs were not guided by biases and prejudice, the students completing the evaluation will
not benefit from the process. The surveyed group will have moved on before changes can be
made and the next cohort may value different attributes in the course or teacher. Researchers
additionally argue that SETs are limited in what they provide because the qualities students
place value on are already known (Sunindijo 2016). The existing literature declares that students
want engaging lectures, assessment to be explained clearly and graded fairly, assessment returned
in a timely manner and their questions answered promptly (Sunindijo 2016; Park and Dooris
2020). Vivanti, Haron, and Barnes (2014) came to similar conclusions, and add that students want
guidance, and teaching staff who were interested in them and the subject they are teaching.
Similar arguments also appear in the literature alongside the conclusion that SETs are mea-
suring customer satisfaction (Enache 2011; Osoian et al. 2010). These researchers acknowledge
that the marketisation of the higher education system means customer satisfaction plays a
valuable role in the modern university because enrolment numbers matter (Marginson 2013).
However, these researchers are all highly critical that SETs are customer satisfaction surveys
alleging to measure course outcomes and teacher effectiveness when many studies argue that
this is untrue. These studies point out that customer satisfaction can be measured via methods
that are not SETs, and suggest opportunities within the growing programs of universities
embracing student voice and students as partners initiatives.
Discussion
This paper’s analysis of the existing literature makes it clear that SET results are strongly influ-
enced by external factors unrelated to course content or teacher performance. In addition, these
factors are frequently based on student demographics, and students’ biases and prejudices
based on the teaching academic’s gender, sexuality, ethnicity, age or disability as well as other
marginalising factors. Ultimately, this analysis raises the question of how can any university
justify the continued use of SETs?
Official university responses to issues regarding SETs are rare. Tucker (2014) sought clarifica-
tion of why a university continued using SETs when they were known to attract abusive
8 T. HEFFERNAN
comments. The university’s response was that the rate of abusive comments was too low to
alter the existing procedures considering the value (they perceived) in the attained data. This
response provides some insights into institutional administrative thinking. The first is that uni-
versities consider there to be an acceptable level of abuse that staff must endure. A factor here,
however, is that Tucker collected and evaluated this information almost a decade ago. Considering
the current focus on staff wellbeing and mental health (Henning 2018), it is possible that insti-
tutions today might have different views on academics receiving abuse. Universities might also
be influenced by the knowledge that it is now known that abuse is directed heavily towards
women and academics from marginalised groups. Thus, institutions continuing to conduct SETs
are allowing the sector’s most underrepresented and marginalised groups to be subjected to
possible hate speech (Jones, Gaffney-Rhys, and Jones 2014; Uttl and Smibert 2017).
The second issue Tucker’s (2014) research raised (as have other researchers in this literature
review) is that universities believe the data they are collecting via SETs provides accurate assess-
ments of course content and outcomes, and teacher quality and effectiveness. This paper’s findings
make it clear why universities might believe this to be the case. Methodologically, the process
appears sound. Students provide anonymous feedback so as to not be concerned with potential
repercussions from academics, and this data informs the faculty and administrators of students
perspectives concerning the course and teacher (Arthur 2009; Shah and Nair 2012; Jones, Gaffney-
Rhys, and Jones 2014; Boring, Ottoboni, and Stark 2016). However, this review’s findings also
demonstrate that the methodology surrounding SETs is inherently flawed because the data being
input into the system is influenced by biases and prejudices that are invisible in the data’s results
(Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014). These elements are also invisible because
unless researchers specifically focus on biases in SETs, there is little reason to believe that the
questions being asked will not be answered somewhat objectively across a class of respondents
and provide an overall, somewhat accurate measure. However, this is not the case; this paper has
shown that statistically significant biases in these data collection methods exist.
The groups most impacted by these prejudices are clear. This study demonstrates that at
best SETs disadvantage women, and at worst, see women academics placed in untenable posi-
tions (MacNell, Driscoll, and Hunt 2015; Boring, Ottoboni, and Stark 2016). These results have
also been theorised as a reason why women are underrepresented in both the professoriate
and upper levels of university leadership (Fan et al. 2019).
The results are worse still when ethnicity, language, perceptions of sexuality or disability are
considered. People from ethnic backgrounds, and/or who do not speak English as their first
language, are receiving much lower SET results (Fan et al. 2019). Other researchers have exam-
ined disability, sexual identity, and cultural and linguistic diversity and found similar results in
prejudice against these groups (Hendrix 1998; Rubin 1998; DiPietro and Faye 2005; Hamermesh
and Parker 2005), though it cannot be ignored that researching prejudice against some of these
groups is difficult because they are so underrepresented within the sector.
Researchers have also argued for decades that subject areas impact on SET results with the
sciences being evaluated more harshly (Beran and Violato 2005; Centra 2009; Uttl and Smibert
2017). Similarly, that studies have routinely found high correlations between student grades
and SET results makes it clear that student evaluations are being influenced by factors far
exceeding the intended purpose of SET questions (Worthington 2002; Short et al. 2008; Stark
and Freishtat 2014; Boring, Ottoboni, and Stark 2016). Studies have also determined that issues
including classroom design, cleanliness of the university, quality of course websites, library
services and food options available on campus all have a larger influence on SET results than
practices concerning courses and teaching (Felton, Mitchell, and Stinson 2004; Felton et al. 2008;
Osoian et al. 2010; Benton, Cashin, and Manhattan 2012).
In addition to the above findings, many researchers also argue that SETs are not needed
because universities already know what students want from a course and teacher (Vivanti, Haron,
and Barnes 2014; Sunindijo 2016; Park and Dooris 2020). These researchers claim that focus
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 9
groups or student interviews would be a more equitable way of gaining information and judging
potential biases and prejudice. What these views make clear is that the growth in student voice,
and students as partners initiatives, in recent years are well placed to provide discussions to
improve learning opportunities rather than relying on the current prejudice practices.
Conclusion
This paper has shown that no university, and indeed the higher education sector as a whole,
can declare to be a gender equal employer or have an interest in growing a safe, inclusive and
diverse workforce if they continue using SETs to evaluate course and teacher quality.
This paper provides an evidence base which can be used as part of the growing material
and argument against the practice of collecting SET data. When SET data is known to be highly
prejudiced against many groups, methods must be changed, and using SET data as a compo-
nent of hiring, promotion and/or firing decisions must be seen as the blatantly discriminatory
practice that it is.
The need for immediate policy changes is clear. Women and marginalised groups are losing
jobs, failing to achieve promotion, and are being negatively impacted at every step where SETs
are concerned, and will continue to be discriminated against every time SET data is collected
until the practice is stopped. These practices not only harm the sector’s women and most
underrepresented and vulnerable, it cannot be denied that SETs also actively contribute to
further marginalising the groups universities declare to protect and value in their workforces.
ORCID
Troy Heffernan http://orcid.org/0000-0002-8156-622X
References
Ambady, N., and R. Rosenthal. 1993. “Half a Minute: Predicting Teacher Evaluations from Thin Slices
of Nonverbal Behavior and Physical Attractiveness.Journal of Personality and Social Psychology 64
(3): 431–441. doi:10.1037/0022-3514.64.3.431.
Andersen, Kristi, and Elizabeth D. Miller. 1997. “Gender and Student Evaluations of Teaching.PS:
Political Science and Politics 30 (02): 216–219. doi:10.1017/S1049096500043407.
Arthur, L. 2009. “From Performativity to Professionalism: Lecturers’ Responses to Student Feedback.
Teaching in Higher Education 14 (4): 441–454. doi:10.1080/13562510903050228.
Bachan, R. 2017. “Grade Inflation in UK Higher Education.Studies in Higher Education 42 (8): 1580–1600.
doi:10.1080/03075079.2015.1019450.
Benton, S., W. Cashin, and K. Manhattan. 2012. Student ratings of teaching: A summary of research
and literature. IDEA Center. http://www.ideaedu.org
Beran, T., and C. Violato. 2005. “Ratings of University Teacher Instruction: How Much Do Student and
Course Characteristics Really Matter?” Assessment and Evaluation in Higher Education 30 (6): 593–601.
doi:10.1080/02602930500260688.
Boring, A. 2017. “Gender Biases in Student Evaluations of Teaching.Journal of Public Economics 145:
27–41. doi:10.1016/j.jpubeco.2016.11.006.
Boring, A., K. Ottoboni, and P. Stark. 2016. “Student Evaluations of Teaching (Mostly) Do Not Measure
Teaching Effectiveness.Science Open Research. doi:10.14293/s2199-1006.1.sor-edu.aetbzc.v1.
Braun, V., and V. Clarke. 2006. “Using Thematic Analysis in Psychology.Qualitative Research in Psychology
3 (2): 77–101. doi:10.1191/1478088706qp063oa.
Carter, M., and P. Lara. 2016. “Grade Inflation in Higher Education: Is the End in Sight?” Academic
Questions 29 (3): 346–353. doi:10.1007/s12129-016-9569-5.
Centra, J. 2009. Differences in Responses to the Student Instructional Report: Is It Bias? Princeton:
Educational Testing Service.
10 T. HEFFERNAN
Clarke, V., and V. Braun. 2017. “Thematic Analysis.The Journal of Positive Psychology 12 (3): 297–298.
doi:10.1080/17439760.2016.1262613.
Cramer, K., and L. Alexitch. 2000. “Student Evaluations of College Professors: Identifying Sources of
Bias.Canadian Journal of Higher Education 30 (2): 143–164. https://journals.sfu.ca/cjhe/index.php/
cjhe/article/view/183360.
Cunningham-Nelson, S., M. Baktashmotlagh, and W. Boles. 2019. “Visualizing Student Opinion through
Text Analysis.” IEEE Transactions on Education 62 (4): 305–311. doi:10.1109/TE.2019.2924385.
DiPietro, M., and A. Faye. 2005. “Online Student-Ratings-of-Instruction (SRI) Mechanisms for Maximal
Feedback to Instructors.30th Annual Meeting of the Professional and Organizational Development
Network. Milwaukee, WI.
Enache, I. 2011. “Customer Behaviour and Student Satisfaction.Economic Sciences 4 (53): 41–46.
Fan, Y., L. Shepherd, D. Slavich, D. Waters, M. Stone, R. Abel, and E. Johnston.2019. “Gender and
Cultural Bias in Student Evaluations: Why Representation Matters.Plos ONE 14 (2): e0209749.
doi:10.1371/journal.pone.0209749.
Felton, J., P. Koper, J. Mitchell, and M. Stinson. 2008. “Attractiveness, Easiness, and Other Issues: Student
Evaluations of Professors on Rate My Professors.Assessment and Evaluation in Higher Education 33
(1): 45–61. doi:10.2139/ssrn.918283.
Felton, J., J. Mitchell, and M. Stinson. 2004. “Web-Based Student Evaluations of Professors: The Relations
between Perceived Quality, Easiness and Sexiness.Assessment and Evaluation in Higher Education
29 (1): 91–108. doi:10.1080/0260293032000158180.
Hamermesh, D., and A. Parker. 2005. “Beauty in the Classroom: Instructors’ Pulchritude and Putative
Pedagogical Productivity.Economics of Education Review 24 (4): 369–376. doi:10.1016/j.econ-
edurev.2004.07.013.
Hendrix, K. 1998. “Student Perceptions of the Influence of Race on Professor Credibility.Journal of
Black Studies 28 (6): 738–764. doi:10.1177/002193479802800604.
Henning, M. 2018. Wellbeing in Higher Education: Cultivating a Healthy Lifestyle Among Faculty and
Students. Abingdon: Routledge.
International Association of Universities. 2006. World List of Universities, 25th Edition: And Other
Institutions of Higher Education. 25th ed. London: Palgrave Macmillan.
Jones, J., R. Gaffney-Rhys, and E. Jones. 2014. “Handle with Care! An Exploration of the Potential Risks
Associated with the Publication and Summative Usage of Student Evaluation of Teaching (SET)
Results.Journal of Further and Higher Education 38 (1): 37–56. doi:10.1080/0309877X.2012.699514.
Jordan, D. W. 2011. “Re-Thinking Student Written Comments in Course Evaluations: Text Mining
Unstructured Data for Program and Institutional Assessment.” PhD diss., California State University,
Stanislaus.
MacNell, L., A. Driscoll, and A. Hunt. 2015. “What’s in a Name: Exposing Gender Bias in Student Ratings
of Teaching.” Innovative Higher Education 40 (4): 291–303. doi:10.1007/s10755-014-9313-4.
Macpherson, A., and R. Holt. 2007. “Knowledge, Learning and Small Firm Growth: A Systematic Review
of the Evidence.Research Policy 36 (2): 172–192. doi:10.1016/j.respol.2006.10.001.
Marginson, S. 2013. “The Impossibility of Capitalist Markets in Higher Education.” Journal of Education
Policy 28 (3): 353–370. doi:10.1080/02680939.2012.747109.
Marsh, H. 2007. “Students’ Evaluations of University Teaching: dimensionality, Reliability, Validity,
Potential Biases and Usefulness.” In The Scholarship of Teaching and Learning in Higher Education:
An Evidence-Based Perspective, edited by R. Perry and J. Smart, 319–383. Dordrecht: Springer.
McCrae, N., and E. Purssell. 2020. How to Perform a Systematic Literature Review: A Guide for Healthcare
Researchers, Practitioners and Students. Berlin: Springer.
Oliver, B., B. Tucker, R. Gupta, and S. Yeo. 2008. “eVALUate: An Evaluation Instrument for Measuring
Students’ Perceptions of Their Engagement and Learning Outcomes.Assessment and Evaluation in
Higher Education 33 (6): 619–630. doi:10.1080/02602930701773034.
Osoian, C., R. Nistor, M. Zaharie, and H. Flueras. 2010. “Improving Higher Education through Student
Satisfaction Surveys.Proceedings of the 2nd International Conference on Education Technology and
Computer. doi:10.1109/icetc.2010.5529347.
Park, E., and J. Dooris. 2020. “Predicting Student Evaluations of Teaching Using Decision Tree Analysis.
Assessment and Evaluation in Higher Education 45 (5): 776–793. doi:10.1080/02602938.2019.1697798.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 11
Pittaway, L., M. Robertson, K. Munir, D. Denyer, and A. Neely. 2004. “Networking and Innovation: A
Systematic Review of the Evidence.International Journal of Management Reviews 5 (3–4): 137–168.
doi:10.1111/j.1460-8545.2004.00101.x.
Rosen, A. 2018. “Correlations, Trends and Potential Biases among Publicly Accessible Web-Based
Student Evaluations of Teaching: A Large-Scale Study of RateMyProfessors.com Data.Assessment
and Evaluation in Higher Education 43 (1): 31–44. doi:10.1080/02602938.2016.1276155.
Rubin, D. 1998. Help! My Professor (or doctor or boss) doesn’t talk English. In Readings in cultural
contexts, edited by J. Martin, T. Nakayama, & L. Flores, 149-159. Mountain View, CA: Mayfield.
Shah, M., and C. Nair. 2012. The Changing Nature of Teaching and Unit Evaluations in Australian
Universities.Quality Assurance in Education 20 (3): 274–288. doi:10.1108/09684881211240321.
Short, H., R. Boyle, R. Braithwaite, M. Brookes, J. Mustard, and D. Saundage. 2008. “A Comparison of
Student Evaluation of Teaching with Student Performance.” In Proceedings of the 6th Australian
Conference on Teaching Statistics, edited by H. MacGillivray, 1–10. Melbourne, Victoria, Australia.
Stark, P., and R. Freishtat. 2014. “An Evaluation of Course Evaluations.Science Open Research
doi:10.14293/s2199-1006.1.sor-edu.aofrqa.v1.
Sunindijo, R. 2016. “Teaching First-Year Construction Management Students: Lessons Learned from
Student Satisfaction Surveys.International Journal of Construction Education and Research 12 (4):
243–254. doi:10.1080/15578771.2015.1121937.
Tucker, B. 2014. “Student Evaluation Surveys: Anonymous Comments That Offend or are Unprofessional.
Higher Education 68 (3): 347–358. doi:10.1007/s10734-014-9716-2.
Uttl, B., and D. Smibert. 2017. “Student Evaluations of Teaching: Teaching Quantitative Courses Can
Be Hazardous to One’s Career.Peer Journal 5: e3299. doi:10.7717/peerj.3299.
Valencia, E. 2020. “Acquiescence, Instructor’s Gender Bias and Validity of Student Evaluation of
Teaching.” Assessment and Evaluation in Higher Education 45 (4): 483–495. doi:10.1080/02602938.20
19.1666085.
Vivanti, A., N. Haron, and R. Barnes. 2014. Validation of a Student Satisfaction Survey for Clinical
Education Placements in Dietetics.Journal of Allied Health 43 (2): 65–71.
Worthington, A. 2002. “The Impact of Student Perceptions and Characteristics on Teaching Evaluations:
A Case Study in Finance Education.Assessment and Evaluation in Higher Education 27 (1): 49–64.
doi:10.1080/02602930120105054.
... As such, we restricted our search to peer-reviewed literature published between January 1, 2012, and March 10, 2021. We identified a few reviews that included research published in the last 10 years focussing on specific disciplines (Nicolaou & Atkinson, 2019;Schiekirka & Raupach, 2015), but only one touched on bias across disciplines (Heffernan, 2021). A review by Heffernan (2021) focused on broad themes derived from thematic analysis rather than examining and addressing all the biases present in the research literature and did not report the characteristics and results of individual studies, as is the goal of the present review. ...
... We identified a few reviews that included research published in the last 10 years focussing on specific disciplines (Nicolaou & Atkinson, 2019;Schiekirka & Raupach, 2015), but only one touched on bias across disciplines (Heffernan, 2021). A review by Heffernan (2021) focused on broad themes derived from thematic analysis rather than examining and addressing all the biases present in the research literature and did not report the characteristics and results of individual studies, as is the goal of the present review. Thus, the present article provides a more comprehensive review of biases identified in the literature over this period and takes a systematic approach that also considers study reporting quality. ...
... The results of this systematic review indicate that bias can be introduced into SRIs by factors that are unrelated to the course, or the quality of teaching and our overall findings provide additional support for themes identified in a recent literature review (see Heffernan, 2021). The existence of gender bias was the most consistent and prominent finding across the studies we reviewed. ...
Article
Student ratings of instruction (SRI) are commonly used to evaluate courses and teaching in higher education. Much debate about their validity in evaluating teaching exists, which is due to concerns of bias by factors unrelated to teaching quality (Spooren et al., 2013). Our objective was to identify peer-reviewed original research published in English from January 1, 2012, to March 10, 2021, on potential sources of bias in SRIs. Our systematic review of 63 articles demonstrated strong support for the continued existence of gender bias, favoring male instructors and bias against faculty with minority ethnic and cultural backgrounds. These and other biases must be considered when implementing SRIs and reviewing results. Critical practices for reducing bias when using SRIs include implementing bias awareness training and avoiding use of SRIs as a singular measure of teaching quality when making decisions for teaching development or hiring and promotion.
... In university contexts, assessment also cuts both ways, with standard practice including student evaluation of instruction that is often incorporated into hiring and promotion processes. Researchers have underscored myriad problems with student evaluation of university faculty, including discrimination according to race, gender, accented speech, and other factors linked more to a particular instructor than a style or outcome of instructional practice (Heffernan 2021). Such bigotry can target faculty at all ranks, but the cumulative effects of these negative comments can hit precarious instructors hardest of all. ...
Article
Full-text available
This introduction to the dossier on Teaching Women’s Filmmaking contextualizes the contributions among recent work that examines classroom praxis and relevant innovations in film and media studies scholarship, such as videographic criticism and online publication. It considers three key components of building a course—what to study, how to run a classroom, and how to implement assessments—and discusses each area according to critical feminist pedagogies.
... All part of an interconnected ecology, music departments, ensembles, and the studio music class each represent different modes of teacher-student engagement, and possibility for favoritism and prejudice (Peterson et al., 2016;Heffernan, 2022). Everything the director does and says is transmitted and absorbed by students, and the wider community network of parents. ...
Article
Full-text available
Arts and culture are increasingly acknowledged as pillars of society in which all of humanity including people who identify as’ LGBTQIA+ can contribute in 21 st century society. United Nations and individual country initiatives continue to promote the notion of inclusive, egalitarian values that promote equal access and opportunity to chosen careers and passions. Jazz as an artform has evolved as a form of cultural expression, entertainment, and political metaphor, subject to societal and populist pressures that have created both a canon and popularized history. Jazz education has moved from largely informal to almost wholly formal and institutionally designed methods of learning and teaching. The jazz ensemble or stage band remains an enduring secondary education experience for most students learning jazz today. This qualitative study of music directors investigates their approaches, perspectives and concerns regarding attitudes and practices in the teaching profession, the promoting of inclusive practices, access, and equity, amidst a pervasive masculinized performance and social structure that marginalizes non-male participation. The study provides implications for how jazz education may continue to evolve in both attitude and enlightened access in the education of jazz learners.
... [16] The use of course evaluations as data might be criticised, as much research questions their validity, reliability and potential bias and whether they measure course outcomes or instead are glorified customer satisfaction surveys. [89] However, other research has reported a positive relationship between students' experiences of learning and learning outcomes, as reported in course evaluations. [90] It has been recommended that scholars of teaching and learning use student evaluations as adjunct data sources, ensuring critical and reflexive approaches to research are maintained. ...
Article
Full-text available
Objectives: Gamification involves applying game attributes to non-game contexts and its educational use is increasing. It is essential to review the outcomes and the efficacy of gamification to identify evidence to support its use in pharmacy education. This article: systematically and quantitatively reviews and evaluates the alignment of learning outcomes and the quality of peer-reviewed literature reporting gamification in pharmacy education. Key findings: A literature search was undertaken in February 2022 using CINAHL Complete, MEDLINE, Science Direct, Scopus and ERIC databases, via keywords (game* OR gaming OR gamif*) AND pharmac* AND education. Google Scholar was searched using 'gamification of pharmacy education' and 'serious games in pharmacy education'. Data extracted included type of gamified intervention, mode of delivery, game fidelity, intended learning outcomes and outcomes reported. Quality assessments aligned with key aspects of the SQUIRE-EDU Reporting Guidelines. Of 759 abstracts and 95 full-text papers assessed, 66 articles met the inclusion criteria. They described gamification from 12 countries in the education of 8272 pharmacy and health professional students. Gamified interventions ranged from board games to immersive simulations, with escape rooms most frequently reported. Reporting quality was inconsistent, with observed misalignment between intended learning outcomes and outcomes reported, an apparent overreliance on student perceptions as primary data and a lack of reference to reporting guidelines. Summary: Gamification is included in the curricula of many pharmacy degrees, across multiple subject areas. This review identified evidence gaps and reinforces the need for improved quality of gamification research, critical alignment of learning outcomes with evaluation, and use of reporting guidelines.
... Instructors in this study were cisgendered, heterosexual women in tenure-track positions; two instructors identify as White, and one identifies as Puerto Rican/White. Instructors who are members of marginalized racial groups face stigmatization that can compromise their perceived teaching effectiveness (Crittle & Maddox, 2017;Heffernan, 2021). These instructors also spend more time addressing race and racism in their classrooms than their White peers (Prieto, 2018). ...
Article
Background: Teaching students about race and racism is critical to and relevant in psychology classrooms. Objective: We explored whether direct instruction dismantling ideas that race is genetic affects students' race essentialist and other related beliefs. Method: Undergraduate students enrolled in four social psychology courses completed measures of race essentialism and other related beliefs before and after engaging in course-directed activities designed to reduce endorsement of biological essentialist beliefs about race. Results: After class activities, students reported lower levels of general racial essentialist beliefs and estimated that more progress is needed to reduce racial inequality. However, attitudes towards racially minoritized groups or perceived need for anti-racist actions did not shift, and colorblind ideology may have increased. Conclusion: These data provide evidence that essentialism shifts can be accomplished in the psychology classroom, but shifting related beliefs may require additional instruction. Teaching Implications: The class activities described in this research provide a way for instructors to introduce students to a new concept (race essentialism) and change students’ beliefs in the genetic underpinning of race.
... University teachers therefore juggle multiple roles, constructing courses of study including unit design and development and learning outcome mapping, as part of their workload (Lodewijks, 2011). Added to this, in-house teaching evaluations and student evaluations bring additional pressure into the equation, as university teachers become responsible for motivating and engaging adult learners, in an ever-changing student landscape (Lodewijks, 2011;Naidoo-Chetty and du Plessis, 2021;Heffernan, 2022). The progressive nature of these demands is significant (Lodewijks, 2011;de la Fuente et al., 2021). ...
Article
Full-text available
Prompted by the wide-spread impact of the global pandemic on the higher education sector in Australia, this study explores the wellbeing and mental health of university academics who were caught in this altering landscape. This mixed-methods study has three objectives. Firstly, the study involved the design and development of an instrument to measure the wellbeing of university teachers. Secondly, the new instrument was administered to a randomly drawn sample of university academics, in order to validate its use. Thirdly, the study sought to identify possible strategies utilized by participants during times of high pressure, conflict and stress. As an initial validation study, the project involved scale design, generating a tool which measures the wellbeing of university academics, especially during times of crisis. The measurement tool was constructed in four parts drawing on the established formula of academic workload: Teaching, Research, Service/Engagement, with Part 4 seeking out demographic variables for analysis. Findings suggested that most academics were concerned about the maintenance of their research output and teaching workloads. Maintaining responsibilities as care-givers and parents of school-going children proved challenging. Many conceded that maintaining equilibrium was complex. It is anticipated that the scale will be an effective means of quantifying academic wellbeing especially during a crisis, thereby offering a valid instrument to university leaders, when considering staff security and comfort, in the contemporary context.
Article
Student evaluations of curricular experiences and instructors are employed by institutions to obtain feedback and guide improvement. However, to be effective, evaluations must prompt faculty action. Unfortunately, evaluative comments that engender strong reactions may undermine the process by hindering innovation and improvement steps. The literature suggests that faculty interpret evaluation feedback as a judgment not just on their teaching ability but on their personal and professional identity. In this context, critical evaluations, even when constructively worded, can result in disappointment, hurt, and shame. The COVID pandemic has challenged institutions and faculty to repeatedly adapt curricula and educational practices, heightening concerns for faculty burnout. In this context, the risk of ‘words that hurt’ is higher than ever. This article offers guidance for faculty and institutions to support effective responses to critical feedback and ameliorate counterproductive effects of learner evaluations.
Article
Equity in education has been conceptualized in various ways, which provides different affordances and constraints for policy, research, and teaching decisions. Regarding the latter, there is a dearth of research on how postsecondary science faculty conceptualize equity and whether and how their understandings of equity may be informing their teaching and related practices. This study examined equity conceptions and reported teaching practices among forty‐five faculty members in a College of Sciences at a research intensive, historically White, public university in the US. Thematic analysis showed that faculty conceptualized equity in three ways–– “equality”, “inclusion”, or “justice”–– and these conceptions were associated with instructor‐centered or student‐centered practices. Professors with “equality” conceptions of equity tended to report teaching mostly via lecture, while those with “inclusion” conceptions reported using active learning and/or inclusive teaching practices. Most professors with a “justice” conception went beyond active learning and inclusive practices to also include an emerging critical pedagogy. Conceptions of equity appear to inform how university science faculty see their roles in advancing equity in their classrooms. These findings provide a foundation for future research that seeks to support college science faculty's understandings of equity issues in higher education and the pedagogical practices necessary to ameliorate them. This article is protected by copyright. All rights reserved.
Article
This research explores if student and faculty ethnic similarity produces more favorable teaching evaluations, and if the effect is enhanced when ethnic group representation on campus is low. When student and faculty ethnicity was similar, (a) students from low-representation groups provided the highest evaluations, and (b) students from high-representation groups showed both “more favorable” and “less favorable” evaluations. Evidence suggests that the pattern of findings was strong for qualitatively oriented courses, with the results for quantitative classes less conclusive. Discussion focuses on potential influences to ethnic similarity effects, applications to real-world settings, and future research.
Article
Full-text available
Gendered and racial inequalities persist in even the most progressive of workplaces. There is increasing evidence to suggest that all aspects of employment, from hiring to performance evaluation to promotion, are affected by gender and cultural background. In higher education, bias in performance evaluation has been posited as one of the reasons why few women make it to the upper echelons of the academic hierarchy. With unprecedented access to institution-wide student survey data from a large public university in Australia, we investigated the role of conscious or unconscious bias in terms of gender and cultural background. We found potential bias against women and teachers with non-English speaking backgrounds. Our findings suggest that bias may decrease with better representation of minority groups in the university workforce. Our findings have implications for society beyond the academy, as over 40% of the Australian population now go to university, and graduates may carry these biases with them into the workforce.
Article
Full-text available
Anonymous student evaluations of teaching (SETs) are used by colleges and universities to measure teaching effectiveness and to make decisions about faculty hiring, firing, re-appointment, promotion, tenure, and merit pay. Although numerous studies have found that SETs correlate with various teaching effectiveness irrelevant factors (TEIFs) such as subject, class size, and grading standards, it has been argued that such correlations are small and do not undermine the validity of SETs as measures of professors' teaching effectiveness. However, previous research has generally used inappropriate parametric statistics and effect sizes to examine and to evaluate the significance of TEIFs on personnel decisions. Accordingly, we examined the influence of quantitative vs. non-quantitative courses on SET ratings and SET based personnel decisions using 14,872 publicly posted class evaluations where each evaluation represents a summary of SET ratings provided by individual students responding in each class. In total, 325,538 individual student evaluations from a US mid-size university contributed to theses class evaluations. The results demonstrate that class subject (math vs. English) is strongly associated with SET ratings, has a substantial impact on professors being labeled satisfactory vs. unsatisfactory and excellent vs. non-excellent, and the impact varies substantially depending on the criteria used to classify professors as satisfactory vs. unsatisfactory. Professors teaching quantitative courses are far more likely not to receive tenure, promotion, and/or merit pay when their performance is evaluated against common standards.
Article
Full-text available
Thematic analysis (TA) is a method for identifying, analyzing, and interpreting patterns of meaning (‘themes’) within qualitative data. TA is unusual in the canon of qualitative analytic approaches, because it offers a method – a tool or technique, unbounded by theoretical commitments – rather than a methodology (a theoretically informed, and confined, framework for research). This does not mean that TA is atheoretical, or, as is often assumed, realist, or essentialist. Rather, TA can be applied across a range of theoretical frameworks and indeed research paradigms. There are versions of TA developed for use within (post)positivist frameworks that foreground the importance of coding reliability (e.g. Boyatzis, 1998; Guest, MacQueen, & Namey, 2012), and given the emphasis on positivism in positive psychology (Friedman, 2008), it is unsurprising that such approaches are often favored by qualitative researchers in this area (e.g. Selvam & Collicutt, 2013). However, there are also versions of TA – like ours – developed (primarily) for use within a qualitative paradigm (Braun & Clarke, 2006, 2013). These versions emphasize an organic approach to coding and theme development and the active role of the researcher in these processes, and some positive psychologists are embracing the greater flexibility that they offer to the qualitative researcher (e.g. Holmqvist & Frisén, 2012). Since we published our original paper on TA (Braun & Clarke, 2006), our approach has become the most widely cited of the many (many!) different version of TA available to the qualitative researcher, and it is this version that we focus on in the rest of this brief commentary.
Article
p>Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show: SET are biased against female instructors by an amount that is large and statistically significant the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded the bias varies by discipline and by student gender, among other things it is not possible to adjust for the bias, because it depends on so many factors SET are more sensitive to students' gender bias and grade expectations than they are to teaching effectiveness gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors. These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.</p
Book
The systematic review is a rigorous method of collating and synthesizing evidence from multiple studies, producing a whole greater than the sum of parts. This textbook is an authoritative and accessible guide to an activity that is often found overwhelming. The authors steer readers on a logical, sequential path through the process, taking account of the different needs of researchers, students and practitioners. Practical guidance is provided on the fundamentals of systematic reviewing and also on advanced techniques such as meta-analysis. Examples are given in each chapter, with a succinct glossary to support the text. This up-to-date, accessible textbook will satisfy the needs of students, practitioners and educators in the sphere of healthcare, and contribute to improving the quality of evidence-based practice. The authors will advise some freely available or inexpensive open source/access resources (such as PubMed, R and Zotero) to help students how to perform a systemic review, in particular those with limited resources.
Article
This study uses decision tree analysis to determine the most important variables that predict high overall teaching and course scores on a student evaluation of teaching (SET) instrument at a large public research university in the United States. Decision tree analysis is a more robust and intuitive approach for analysing and interpreting SET scores compared to more common parametric statistical approaches. Variables in this analysis included individual items on the SET instrument, self-reported student characteristics, course characteristics and instructor characteristics. The results show that items on the SET instrument that most directly address fundamental issues of teaching and learning, such as helping the student to better understand the course material, are most predictive of high overall teaching and course scores. SET items less directly related to student learning, such as those related to course grading policies, have little importance in predicting high overall teaching and course scores. Variables irrelevant to the construct, such as an instructor’s gender and race/ethnicity, were not predictive of high overall teaching and course scores. These findings provide evidence of criterion and discriminant validity, and show that high SET scores do not reflect student biases against an instructor’s gender or race/ethnicity.
Article
The validity of student evaluation of teaching (SET) scores depends on minimum effect of extraneous response processes or biases. A bias may increase or decrease scores and change the relationship with other variables. In contrast, SET literature defines bias as an irrelevant variable correlated with SET scores, and among many, a relevant biasing factor in literature is the instructor’s gender. The study examines the extent to which acquiescence, the tendency to endorse the highest response option across items and bias in the first sense affects students’ responses to a SET rating scale. The study also explores how acquiescence affects the difference in teaching quality (TQ) by instructor’s gender, a bias in the latter sense. SET data collected at a faculty of education in Ontario, Canada were analysed using the Rasch rating scale model. Findings provide empirical support for acquiescence affecting students’ responses. Latent regression analyses show how acquiescence reduces the difference in TQ by instructor’s gender. Findings encourage greater attention to the response process quality as a way to better defend the utility of SET and prevent potentially misleading conclusions from the analysis of SET data.
Article
Contribution: An automated methodology that provides visualizations of students' free text comments from course satisfaction surveys. Focusing on sentiment, these visualizations reveal learning and teaching aspects of the course that either may require improvement or are performing well. They provide educators with a simple, systematic way to monitor their courses and make pedagogically sound decisions on teaching strategies. Background: Student course satisfaction surveys often solicit free text comments. This feedback can provide invaluable insights for educators, but because these comments often contain a large amount of data, they cannot easily be acted upon. Existing visualization methods are not suitable for this application, and needed additional capabilities. Research Questions: How can large quantities of student satisfaction data be summarized and visualized? How can these visualizations be used to learn meaningful information about courses? What are the recurring themes across semesters? Methodology: Several methods based on machine learning and text analysis techniques were used to visualize student satisfaction comments. The latent Dirichlet allocation (LDA) statistical method was used to identify aspects of student opinion of a course. The sentiment of the student comments were also identified. This information was then presented visually for educators in a case study that gives examples of these visualizations. Findings: The visualization methods explored provide educators with an overview of aspects and their associated sentiment. The summary visualizations allow easy comparison to be made between courses, or between teaching periods in the same course.
Book
This book provides an examination of the key areas that are important to the sustenance of wellbeing within higher education settings, with a view to promoting healthy learning environments. The synthesis of the issues covered in the book is crucial to the understanding of higher education as not only an environment for gaining knowledge and skills relevant for success in academic and career domains, but also as an environment for developing socially adept and authentic communication skills.
Article
Student evaluations of teaching are widely adopted across academic institutions, but there are many underlying trends and biases that can influence their interpretation. Publicly accessible web-based student evaluations of teaching are of particular relevance, due to their widespread use by students in the course selection process and the quantity of data available for analysis. In this study, data from the most popular of these websites, RateMyProfessors.com, is analysed for correlations between measures of instruction quality, easiness, physical attractiveness, discipline and gender. This study of 7,882,980 RateMyProfessors ratings (from 190,006 US professors with at least 20 student ratings) provides further insight into student perceptions of academic instruction and possible variables in student evaluations. Positive correlations were observed between ratings of instruction quality and easiness, as well as between instruction quality and attractiveness. On average, professors in science and engineering disciplines have lower ratings than in the humanities and arts. When looking at RateMyProfessors as a whole, the effect of a professor’s gender on rating criteria is small but statistically significant. When analysing the data as a function of discipline, however, the effects of gender are significantly more pronounced, albeit more complex. The potential implications are discussed.