Content uploaded by Troy Heffernan
Author content
All content in this area was uploaded by Troy Heffernan on Mar 09, 2021
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=caeh20
Assessment & Evaluation in Higher Education
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/caeh20
Sexism, racism, prejudice, and bias: a literature
review and synthesis of research surrounding
student evaluations of courses and teaching
Troy Heffernan
To cite this article: Troy Heffernan (2021): Sexism, racism, prejudice, and bias: a literature review
and synthesis of research surrounding student evaluations of courses and teaching, Assessment &
Evaluation in Higher Education, DOI: 10.1080/02602938.2021.1888075
To link to this article: https://doi.org/10.1080/02602938.2021.1888075
Published online: 06 Mar 2021.
Submit your article to this journal
View related articles
View Crossmark data
ASSESSMENT & EVALUATION IN HIGHER EDUCATION
Sexism, racism, prejudice, and bias: a literature review
and synthesis of research surrounding student evaluations
of courses and teaching
Troy Heffernan
La Trobe University, Melbourne, Australia
ABSTRACT
This paper analyses the current research regarding student evaluations
of courses and teaching. The article argues that student evaluations are
influenced by racist, sexist and homophobic prejudices, and are biased
against discipline and subject area. This paper’s findings are relevant to
policymakers and academics as student evaluations are undertaken in
over 16,000 higher education institutions at the end of each teaching
period. The article’s purpose is to demonstrate to the higher education
sector that the data informing student surveys is flawed and prejudiced
against those being assessed. Evaluations have been shown to be heavily
influenced by student demographics, the teaching academic’s culture
and identity, and other aspects not associated with course quality or
teaching effectiveness. Evaluations also include increasingly abusive com-
ments which are mostly directed towards women and those from mar-
ginalised groups, and subsequently make student surveys a growing
cause of stress and anxiety for these academics. Yet, student evaluations
are used as a measure of performance and play a role in hiring, firing
and promotional decisions. Student evaluations are openly prejudiced
against the sector’s most underrepresented academics and they contrib-
ute to further marginalising the same groups universities declare to
protect, value and are aiming to increase in their workforces.
Introduction
This paper is part of a research project that investigates questions concerning student evalua-
tions of courses and teaching (SETs). The project examines the extent to which SET results
provide objective assessments of the course, the academic teaching the course, how biased
the information may be, and what groups are potentially disadvantaged by these processes. It
also investigates what adjustments can be made to create a more equitable system. Ample
evidence indicates that course evaluations (that is, evaluations of a course in terms of its content
and outcomes) and teaching evaluations (surveys dedicated to evaluating the teaching academic’s
performance), are both strongly correlated with the teaching academic’s demographics and
other issues unrelated to the course or the academic’s performance (Boring, Ottoboni, and Stark
2016; Uttl and Smibert 2017). For this reason, course and teaching evaluations are treated as
one within this paper (unless otherwise stated).
© 2021 Informa UK Limited, trading as Taylor & Francis Group
CONTACT Troy Heffernan t.heffernan@latrobe.edu.au
https://doi.org/10.1080/02602938.2021.1888075
KEYWORDS
Student evaluations;
academic equity;
wellbeing; prejudice;
abuse; higher education
2 T. HEFFERNAN
This paper’s collation and analysis of existing SET research makes several points clear: SETs
are significantly biased due to the demographics of students completing them, and prejudice
against the academic teaching the course, are dependent on subject areas, and are impacted
on by myriad other aspects not connected to the teacher or course. Yet despite these clear
prejudices and biases, SETs are used to gauge teaching quality and are a component in judging
who is hired, who is let go and who is promoted. In addition to these discriminatory practices,
the trend of abusive comments in SETs is increasing (Tucker 2014). These prejudices in SET
results, and how the results are used, are subsequently leading to growing academic mental
health and wellbeing issues that universities cannot ignore (Jordan 2011; Fan et al. 2019).
Thus, this paper provides much needed synthesis and analysis of the existing research for
the benefit of academics in every field and discipline who are subjected to these practices.
Cunningham-Nelson, Baktashmotlagh, and Boles (2019) have built upon the International
Association of Universities ‘World List of Universities’ (2006) and estimated that globally the
teaching staff of over 16,000 higher education institutions collect SETs at the end of each
teaching period. The paper makes clear that while SETs are being used as an aid in gauging
performance, women and marginalised groups are losing jobs, being promoted slower and/or
less often, and are being negatively impacted at career progression junctures within the acad-
emy (Uttl and Smibert 2017). This paper aims to inform higher education institutions, policy-
makers and administrators of the sexist, racist, homophobic and other biases that underpin SET
data as the evidence demonstrates the way institutions are complicit in prejudicial practices
associated with SET data.
Methods
This paper analyses the literature and evident themes around SETs and the prejudices and
biases that influence their results. The paper follows the methodology of a systematic analysis
to provide a transparent account of how data was gathered and informs the audience of the
protocols used to provide an account of the literature and themes from a breadth of sources
(Macpherson and Holt 2007; McCrae and Purssell 2020).
Data collection
This paper’s literature review method used the following initial criteria. The search began by
seeking out research published between 1990 and 2020. The period was selected to incorporate
the growth of SET surveys to include how they are used today, how technology has changed
how surveys are conducted, and how technology has contributed to new data analysis methods.
The title, abstract and keyword search terms included all versions of ‘student evaluations’, ‘stu-
dent evaluations of teaching’, ‘SETs’, ‘SECTs’, ‘course evaluations’ and ‘teacher evaluations’ in English
language peer-reviewed articles and books from standard institutional databases including
EBSCO, ProQuest and Web of Science.
The advantage of these search terms is that research surrounding these topics is not limited
to publications explicitly dedicated to one field; in this case education researchers or educa-
tion-based journals and books (Pittaway et al. 2004). As student evaluations impact on every
university discipline, research has been conducted in essentially all disciplinary areas. These
findings sometimes appear in journals relating specifically to the subject area (such as medicine
or engineering) rather than journals focusing specifically on teaching that subject. These search
terms thus resulted in literature beyond the usual scope of education journals and higher
education researchers concerned with teaching practices.
The search resulted in 293 publications. The list was then refined to remove the papers
addressing student surveys not connected to SETs (86), duplicate results (55), book reviews (9),
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 3
media articles (4) and non-English language publications (3). This resulted in 136 publications
meeting the criteria. Analysis of these 136 articles then led to the identification of 47 articles
that were not returned in the initial search. In most cases, the papers did not appear in the
initial search because they were not publications dedicated to SETs; rather, the discussion around
student surveys was carried out for the purpose of informing audiences concerned with course
design or professional development. However, as they contributed to current scholarly thought
around SETs and their impact, they were included in the following analysis.
Data analysis
After an initial reading of the final collection of material, themes were generated using Braun
and Clarke’s (2006) method of thematic analysis. This analytic method provides a system by
which patterns of meaning can be generated from qualitative data; in this case it was themes
surrounding student evaluations of courses and teaching (Clarke and Braun 2017). This process
identified 58 individual themes, which after further analysis were categorised into 35 sub-themes,
before finally being refined into the five main themes discussed in this paper.
Findings
Evaluations technically work
A starting point of many publications examining SETs is an acknowledgement that evaluations
work in the sense that they provide the university with data relating to course design, delivery
of the course and teaching staff performance. The inherent problem with SETs, however, is that
this data disguises the prejudices and biases underpinning the data being gathered, and sub-
sequently in the results being produced (Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014).
That SETs provide data that appears sound is arguably why institutions believe that evalua-
tions are a measure of effectiveness (Osoian et al. 2010; Tucker 2014) and therefore play a role
in academic hiring and promotions (Arthur 2009; Shah and Nair 2012; Boring, Ottoboni, and
Stark 2016). At the same time, it has been noted that SET results have been used to aid in
firing unproductive staff, or guiding staffing decisions during times of restructure (Jones, Gaffney-
Rhys, and Jones 2014; Uttl and Smibert 2017).
Apparent data quality is also what leads institutions to use SETs as signifiers of teaching
standards. Many institutions expect staff to achieve a certain SET result (e.g. 3.75 out of five or
higher, or over 75% etc.) to be seen as fulfilling their duties. Stark and Freishtat (2014) also
found that some universities use SETs to intentionally incite continuous cycles of competition
amongst academics by making the acceptable result one that is above the cohort’s average.
That is to say, no matter what the cohort’s results, half of the teaching staff will be below the
average and susceptible to the negative repercussions of not meeting the target such as
decreased promotion chances or leadership opportunities.
The issue many researchers examining SETs find is that looking at the data presented by
compiling evaluation results appears to provide a somewhat objective picture of teacher and
course success. However, what is rarely considered, or perhaps not seen, by universities and
researchers praising SETs are the prejudices and conditions that shape the views that form the
data (Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014).
Impact of students’ cultural and demographic backgrounds
The review found that students’ backgrounds and demographics in terms of gender, age, dis-
ciplinary area and study type can all impact on an academic’s SET results. That student demo-
graphics alone impact on SET results demonstrates just how flawed the system of evaluations
4 T. HEFFERNAN
is, and to what extent results are determined by factors not related to course content or teaching
quality (Rosen 2018).
Tucker’s (2014) study of 43,000 course evaluations found that across every discipline and
course evaluated, and regardless of teaching staff, women students submitted SETs that were
overall more favourable by two per cent. However, the standard level of positive SET score
increased by as much as six per cent when completed by international students, students from
older age groups, external students and students with higher grade point averages (GPAs). That
student demographics alone can make a difference of between two and six per cent in SET
results is a significant finding. However, it must be reiterated that this is just one of the many
biases and prejudices that can accumulate to greatly disadvantage some groups more than others.
Variances in SET results being provided by students from different disciplinary areas have
also been noted. Beran and Violato’s (2005) study of 370,000 evaluations and Centra’s (2009)
study of 238,000 evaluations found that, across multiple universities and countries, academics
teaching science-based subjects receive lower results than those teaching social science/human-
ities-based subjects. Uttl and Smibert (2017) study of 325,000 SETs divided subject areas into
quantitative and qualitative subjects; those being subjects with assessments based on right/
wrong assessment (such as correct calculations and formulas) and those with assessor judge-
ments such as essays. They found that those teaching quantitative subjects were 3.3 times more
likely to score in a lower evaluation bracket than those being evaluated by students in quali-
tative subjects. Those teaching quantitative subjects were also 1.88 times more likely to fail
their evaluations. With quantitative subjects continually being found to be evaluated lower by
students, Uttl and Smibert (2017, p. 8) concluded that:
Professors who teach quantitative vs. non-quantitative classes are not only likely to receive
lower [SETs], but they are also at a substantially higher risk of being labelled unsatisfactory in
teaching, and thus, more likely to be fired, not re-appointed, not promoted, not tenured, and
denied merit pay.
Students are also influenced by factors not related directly to the course or teacher. Several
studies have found high correlations between students’ grade expectations and the SET scores
they deliver. These studies have examined expectations based on GPAs and mid-term results
from institutions across several countries. Repeatedly, the findings are that students who are
graded higher, or expect to gain a high grade if the SET is completed before results are released,
provide higher scoring evaluations (Worthington 2002; Short et al. 2008; Stark and Freishtat
2014; Boring, Ottoboni, and Stark 2016). These findings also raise pedagogical concerns as these
results provide motivation for academics to set easier assessment, or perhaps grade easier, to
facilitate better SET results. Considering SET results are used to aid in hiring, firing and promo-
tional decisions, it cannot be ignored that academics may be motivated to alter their assessment
processes given that their livelihoods are at risk (Carter and Lara 2016; Bachan 2017).
Studies have also found that SET results are being driven by student biases irrelevant to
course content and effectiveness. Benton, Cashin, and Manhattan (2012) found class size to be
a major factor. Issues including classroom design, cleanliness of the university, quality of course
websites, library services, food options available on campus, and difficulty in the admissions
process (for first year students) have all been found to play a larger role in influencing SET
results than teaching quality or course design (Osoian et al. 2010).
Academic gender, ethnicity, sexual identity, and other demographics
Arguments have also been made that an academic’s gender, ethnicity, language, perceived sexual
identity, age or visible disabilities impacts on student evaluations (Valencia 2020). Such is the
bias against gender and perceptions of ethnicity, sexuality, age and disability that, in 1993,
Ambady and Rosenthal (1993) found that student reactions to a 30 second silent video of their
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 5
teacher played at the start of the semester correlated to the SET results the academic received.
More recently, Boring, Ottoboni, and Stark (2016) study of 23,000 SETs, and Fan et al.’s (2019)
study of 22,000 SETs, found that male students express a significant bias in favour of male
academics.
Numerous studies have also found statistically significant differences between how gender
influences academic evaluations. MacNell, Driscoll, and Hunt’s (2015) work determined that
women academics consistently receive lower scores relating to course design, clarity of assess-
ment, class engagement, turnaround time of essays, and question response times regardless of
their performance. Boring, Ottoboni, and Stark (2016) also found statistically significant examples
of student expectations being amplified due to the academic’s gender. Thus, they found that
not only did SET results highly correlate with grade expectations, when grade expectations
were met, male academics were rewarded with higher scores. When grade expectations were
not met, the impact on evaluation scores was lower for male academics. To explore the extremes
of gender prejudice in SETs, MacNell, Driscoll, and Hunt (2015) conducted a study that found
that online classes led by male avatars (regardless of the academic’s actual gender) received
higher SET scores than those led by women avatars. Thus, students even perceiving the teaching
academic to be a woman delivered lower scores.
A consistent theme within these studies is that gender, and even perceived gender, makes
a difference to SET scores and is highly prejudiced against women. In Boring, Ottoboni, and
Stark’s (2016) summary of their study’s findings and those of the existing studies they analysed,
they declared that SETs are ‘biased against female instructors by an amount that is large and
statistically significant’ (p. 1).
Many of these studies find that these biases and prejudices result in large and significant vari-
ations in SET results, but this raises the question of what these differences can look like in practice.
Fan et al.’s (2019) study of 22,000 SET results concluded that at the extreme, women academics are
receiving SET scores 37 percentage points lower than male academics. This figure represents science
subjects with high numbers of male students and being led by younger female academics (approx-
imately under 35 years old). Boring, Ottoboni, and Stark’s (2016) study found similar results and
concluded that the biases at play were so great that more effective female academics are being
placed in lower SET grading brackets than their less effective male counterparts.
In all studies relating to gender, the analyses indicate that the highest scores are awarded
in subjects filled with young, white, male students being taught by white English first language
speaking, able-bodied, male academics who are neither too young nor too old (approx.
35–50 years of age), and who the students believe are heterosexual. Most deviations from this
scenario in terms of student and academic demographics equates to lower SET scores. These
studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only
the least affected, they benefit from the practice. When every demographic group who does
not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance
the position of the already privileged.
Prejudice in SETs stemming from the academic’s ethnicity is also a common finding in studies
concerning evaluations and biases, as are issues around age, disability, sexual identity and
appearance (Andersen and Miller 1997; Cramer and Alexitch 2000; Worthington 2002). A primary
issue with these studies is that for large-scale quantitative research, the current state of higher
education inclusion and diversity means results concerning marginalised groups are deemed
too small to be considered a valid sample size (Hendrix 1998; Rubin 1998; Fan et al. 2019).
However, smaller scale surveys and studies relying on qualitative methods consistently find
prejudices against academics of colour (DiPietro and Faye 2005; Hamermesh and Parker 2005).
Significant bias in SETs have been found to impact negatively on academics of colour (Hendrix
1998; Rubin 1998), and academics whose native language is not that of the university. In most
studies this means the English language, though some studies have explored issues of SET
prejudice in European non-English speaking countries and found similar results (DiPietro and
6 T. HEFFERNAN
Faye 2005; Hamermesh and Parker 2005). Fan et al.’s (2019) study also found that academics
from diverse backgrounds or non-English first language backgrounds faced different levels of
prejudice according to subject area; more liberal subjects were less prejudiced but were prej-
udiced nonetheless. Domestic students are also more likely to provide lower SET scores to
academics from these groups than international students (Tucker 2014).
Providing further evidence of the significant prejudice women academics face in the SET
process, even though all academics from ethnically diverse and marginalised backgrounds
received lower SET scores than their white, English first language speaking colleagues, women
from ethnically diverse backgrounds are graded more harshly than men from ethnically diverse
backgrounds. At the extreme, Fan et al. (2019) found that in science faculties, a woman from a
non-English speaking background was half as likely to receive the same SET result as a white
English-speaking male.
SET comments
A key component of SETs in the literature that promotes their use is their anonymity.
Universities argue that anonymity allows students to provide honest feedback without fear
of retribution for speaking negatively against the teaching academic; the implication being
that students may feel unable to make negative comments if they can be identified (Tucker
2014; Uttl and Smibert 2017). Studies have nonetheless determined that the issue with anon-
ymous comments in SETs is that a portion of the comments are abusive, that the abuse is
growing, and that the abuse is mostly directed towards race, gender, sexual identity, ethnicity,
age and other marginalising characteristics.
The literature regarding comments in SETs is somewhat limited. Tucker’s (2014) study of
43,000 SETs suggested that only around one per cent of comments were abusive (though she
noted sharp increases across studies, and hypothesised that the trend was growing rapidly).
However, even if the figure of one per cent is accurate, it is imperative to point out that the
one per cent is not distributed evenly across academic demographics. Additionally, to the person
receiving the abusive comments, the emotional damage and stress is real, and the overall rate
of abusive comments is irrelevant to their stress, anxiety or mental wellbeing (Tucker 2014).
As might be expected considering the clear trends within the findings of this review, a white
male who is perceived to be heterosexual and is in the 35–50-year-old age group will, statis-
tically speaking, receive few, if any, abusive comments. Abusive comments are mostly directed
towards women and marginalised groups, focus on marginalising characteristics, and they are
cumulative. For example, women receive abusive comments, and academics of colour receive
abusive comments, thus, a woman of colour is more likely to receive abuse because of her
gender and her skin colour (Oliver et al. 2008; Jordan 2011). One way to consider this finding
is that some groups are so underrepresented in the sector that they do not constitute a valid
sample size in large-scale studies. It is women and academics from these underrepresented
groups that receive a majority of the abusive comments which makes the notion of ‘only one
per cent of comments being abusive’ a highly distorted figure.
The result of SET comments is that they are a source of anxiety for academics. The process
of SETs alone is cause for concern being that they are used for firing, hiring and promotion
purposes (Jones, Gaffney-Rhys, and Jones 2014; Uttl and Smibert 2017). However, that comments
for the sector’s women and marginalised groups are then likely to be at best unconstructive
and unjustified, and at worst racist, sexist, homophobic or ageist (among other prejudices) is
only a further cause for concern and mental distress for the academics receiving these comments
(Jordan 2011; Tucker 2014).
Jones, Gaffney-Rhys, and Jones (2014) considered the legal implications of universities con-
tinuing to allow SETs to be collected when they are known to be a cause of distress. They
discuss defamation and the university’s potential breaches of duty of care, and their suggestions
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 7
add further reason for SETs to be removed. Where comments are concerned, however, it is likely
that if a student who could be identified provided racist, sexist, homophobic or other abusive
feedback they would be removed from classes or face other consequences.
What is known from previous SET research?
For all the research and studies conducted around SETs, their conclusions largely all point to
similar findings. A frequent conclusion across large-scale quantitative and smaller qualitative
studies is that SETs rarely measure course or teacher quality or effectiveness, or if they do,
these elements are significantly outweighed in the final results by other factors unrelated to
course or teacher quality (Stark and Freishtat 2014; Tucker 2014; Boring, Ottoboni, and Stark
2016; Fan et al. 2019).
As has been discussed, SETs are heavily influenced by student demographics and subject
area, but most studies argue that the two greatest influences are the academic’s gender and
culture (Stark and Freishtat 2014; Boring, Ottoboni, and Stark 2016; Boring 2017; Fan et al. 2019).
It is also critical to note that these researchers, and those of many more studies within the
literature review, have suggested that it is impossible to account for these biases because from
one class, subject, let alone one university, to another, the variances cannot be predicted or
accounted for in the data.
Many researchers have not only questioned the use of SETs, but also the value of the content
they provide. Cunningham-Nelson, Baktashmotlagh, and Boles (2019) make the point that even
if SETs were not guided by biases and prejudice, the students completing the evaluation will
not benefit from the process. The surveyed group will have moved on before changes can be
made and the next cohort may value different attributes in the course or teacher. Researchers
additionally argue that SETs are limited in what they provide because the qualities students
place value on are already known (Sunindijo 2016). The existing literature declares that students
want engaging lectures, assessment to be explained clearly and graded fairly, assessment returned
in a timely manner and their questions answered promptly (Sunindijo 2016; Park and Dooris
2020). Vivanti, Haron, and Barnes (2014) came to similar conclusions, and add that students want
guidance, and teaching staff who were interested in them and the subject they are teaching.
Similar arguments also appear in the literature alongside the conclusion that SETs are mea-
suring customer satisfaction (Enache 2011; Osoian et al. 2010). These researchers acknowledge
that the marketisation of the higher education system means customer satisfaction plays a
valuable role in the modern university because enrolment numbers matter (Marginson 2013).
However, these researchers are all highly critical that SETs are customer satisfaction surveys
alleging to measure course outcomes and teacher effectiveness when many studies argue that
this is untrue. These studies point out that customer satisfaction can be measured via methods
that are not SETs, and suggest opportunities within the growing programs of universities
embracing student voice and students as partners initiatives.
Discussion
This paper’s analysis of the existing literature makes it clear that SET results are strongly influ-
enced by external factors unrelated to course content or teacher performance. In addition, these
factors are frequently based on student demographics, and students’ biases and prejudices
based on the teaching academic’s gender, sexuality, ethnicity, age or disability as well as other
marginalising factors. Ultimately, this analysis raises the question of how can any university
justify the continued use of SETs?
Official university responses to issues regarding SETs are rare. Tucker (2014) sought clarifica-
tion of why a university continued using SETs when they were known to attract abusive
8 T. HEFFERNAN
comments. The university’s response was that the rate of abusive comments was too low to
alter the existing procedures considering the value (they perceived) in the attained data. This
response provides some insights into institutional administrative thinking. The first is that uni-
versities consider there to be an acceptable level of abuse that staff must endure. A factor here,
however, is that Tucker collected and evaluated this information almost a decade ago. Considering
the current focus on staff wellbeing and mental health (Henning 2018), it is possible that insti-
tutions today might have different views on academics receiving abuse. Universities might also
be influenced by the knowledge that it is now known that abuse is directed heavily towards
women and academics from marginalised groups. Thus, institutions continuing to conduct SETs
are allowing the sector’s most underrepresented and marginalised groups to be subjected to
possible hate speech (Jones, Gaffney-Rhys, and Jones 2014; Uttl and Smibert 2017).
The second issue Tucker’s (2014) research raised (as have other researchers in this literature
review) is that universities believe the data they are collecting via SETs provides accurate assess-
ments of course content and outcomes, and teacher quality and effectiveness. This paper’s findings
make it clear why universities might believe this to be the case. Methodologically, the process
appears sound. Students provide anonymous feedback so as to not be concerned with potential
repercussions from academics, and this data informs the faculty and administrators of students’
perspectives concerning the course and teacher (Arthur 2009; Shah and Nair 2012; Jones, Gaffney-
Rhys, and Jones 2014; Boring, Ottoboni, and Stark 2016). However, this review’s findings also
demonstrate that the methodology surrounding SETs is inherently flawed because the data being
input into the system is influenced by biases and prejudices that are invisible in the data’s results
(Marsh 2007; Osoian et al. 2010; Stark and Freishtat 2014). These elements are also invisible because
unless researchers specifically focus on biases in SETs, there is little reason to believe that the
questions being asked will not be answered somewhat objectively across a class of respondents
and provide an overall, somewhat accurate measure. However, this is not the case; this paper has
shown that statistically significant biases in these data collection methods exist.
The groups most impacted by these prejudices are clear. This study demonstrates that at
best SETs disadvantage women, and at worst, see women academics placed in untenable posi-
tions (MacNell, Driscoll, and Hunt 2015; Boring, Ottoboni, and Stark 2016). These results have
also been theorised as a reason why women are underrepresented in both the professoriate
and upper levels of university leadership (Fan et al. 2019).
The results are worse still when ethnicity, language, perceptions of sexuality or disability are
considered. People from ethnic backgrounds, and/or who do not speak English as their first
language, are receiving much lower SET results (Fan et al. 2019). Other researchers have exam-
ined disability, sexual identity, and cultural and linguistic diversity and found similar results in
prejudice against these groups (Hendrix 1998; Rubin 1998; DiPietro and Faye 2005; Hamermesh
and Parker 2005), though it cannot be ignored that researching prejudice against some of these
groups is difficult because they are so underrepresented within the sector.
Researchers have also argued for decades that subject areas impact on SET results with the
sciences being evaluated more harshly (Beran and Violato 2005; Centra 2009; Uttl and Smibert
2017). Similarly, that studies have routinely found high correlations between student grades
and SET results makes it clear that student evaluations are being influenced by factors far
exceeding the intended purpose of SET questions (Worthington 2002; Short et al. 2008; Stark
and Freishtat 2014; Boring, Ottoboni, and Stark 2016). Studies have also determined that issues
including classroom design, cleanliness of the university, quality of course websites, library
services and food options available on campus all have a larger influence on SET results than
practices concerning courses and teaching (Felton, Mitchell, and Stinson 2004; Felton et al. 2008;
Osoian et al. 2010; Benton, Cashin, and Manhattan 2012).
In addition to the above findings, many researchers also argue that SETs are not needed
because universities already know what students want from a course and teacher (Vivanti, Haron,
and Barnes 2014; Sunindijo 2016; Park and Dooris 2020). These researchers claim that focus
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 9
groups or student interviews would be a more equitable way of gaining information and judging
potential biases and prejudice. What these views make clear is that the growth in student voice,
and students as partners initiatives, in recent years are well placed to provide discussions to
improve learning opportunities rather than relying on the current prejudice practices.
Conclusion
This paper has shown that no university, and indeed the higher education sector as a whole,
can declare to be a gender equal employer or have an interest in growing a safe, inclusive and
diverse workforce if they continue using SETs to evaluate course and teacher quality.
This paper provides an evidence base which can be used as part of the growing material
and argument against the practice of collecting SET data. When SET data is known to be highly
prejudiced against many groups, methods must be changed, and using SET data as a compo-
nent of hiring, promotion and/or firing decisions must be seen as the blatantly discriminatory
practice that it is.
The need for immediate policy changes is clear. Women and marginalised groups are losing
jobs, failing to achieve promotion, and are being negatively impacted at every step where SETs
are concerned, and will continue to be discriminated against every time SET data is collected
until the practice is stopped. These practices not only harm the sector’s women and most
underrepresented and vulnerable, it cannot be denied that SETs also actively contribute to
further marginalising the groups universities declare to protect and value in their workforces.
ORCID
Troy Heffernan http://orcid.org/0000-0002-8156-622X
References
Ambady, N., and R. Rosenthal. 1993. “Half a Minute: Predicting Teacher Evaluations from Thin Slices
of Nonverbal Behavior and Physical Attractiveness.” Journal of Personality and Social Psychology 64
(3): 431–441. doi:10.1037/0022-3514.64.3.431.
Andersen, Kristi, and Elizabeth D. Miller. 1997. “Gender and Student Evaluations of Teaching.” PS:
Political Science and Politics 30 (02): 216–219. doi:10.1017/S1049096500043407.
Arthur, L. 2009. “From Performativity to Professionalism: Lecturers’ Responses to Student Feedback.”
Teaching in Higher Education 14 (4): 441–454. doi:10.1080/13562510903050228.
Bachan, R. 2017. “Grade Inflation in UK Higher Education.” Studies in Higher Education 42 (8): 1580–1600.
doi:10.1080/03075079.2015.1019450.
Benton, S., W. Cashin, and K. Manhattan. 2012. Student ratings of teaching: A summary of research
and literature. IDEA Center. http://www.ideaedu.org
Beran, T., and C. Violato. 2005. “Ratings of University Teacher Instruction: How Much Do Student and
Course Characteristics Really Matter?” Assessment and Evaluation in Higher Education 30 (6): 593–601.
doi:10.1080/02602930500260688.
Boring, A. 2017. “Gender Biases in Student Evaluations of Teaching.” Journal of Public Economics 145:
27–41. doi:10.1016/j.jpubeco.2016.11.006.
Boring, A., K. Ottoboni, and P. Stark. 2016. “Student Evaluations of Teaching (Mostly) Do Not Measure
Teaching Effectiveness.” Science Open Research. doi:10.14293/s2199-1006.1.sor-edu.aetbzc.v1.
Braun, V., and V. Clarke. 2006. “Using Thematic Analysis in Psychology.” Qualitative Research in Psychology
3 (2): 77–101. doi:10.1191/1478088706qp063oa.
Carter, M., and P. Lara. 2016. “Grade Inflation in Higher Education: Is the End in Sight?” Academic
Questions 29 (3): 346–353. doi:10.1007/s12129-016-9569-5.
Centra, J. 2009. Differences in Responses to the Student Instructional Report: Is It Bias? Princeton:
Educational Testing Service.
10 T. HEFFERNAN
Clarke, V., and V. Braun. 2017. “Thematic Analysis.” The Journal of Positive Psychology 12 (3): 297–298.
doi:10.1080/17439760.2016.1262613.
Cramer, K., and L. Alexitch. 2000. “Student Evaluations of College Professors: Identifying Sources of
Bias.” Canadian Journal of Higher Education 30 (2): 143–164. https://journals.sfu.ca/cjhe/index.php/
cjhe/article/view/183360.
Cunningham-Nelson, S., M. Baktashmotlagh, and W. Boles. 2019. “Visualizing Student Opinion through
Text Analysis.” IEEE Transactions on Education 62 (4): 305–311. doi:10.1109/TE.2019.2924385.
DiPietro, M., and A. Faye. 2005. “Online Student-Ratings-of-Instruction (SRI) Mechanisms for Maximal
Feedback to Instructors.” 30th Annual Meeting of the Professional and Organizational Development
Network. Milwaukee, WI.
Enache, I. 2011. “Customer Behaviour and Student Satisfaction.” Economic Sciences 4 (53): 41–46.
Fan, Y., L. Shepherd, D. Slavich, D. Waters, M. Stone, R. Abel, and E. Johnston.2019. “Gender and
Cultural Bias in Student Evaluations: Why Representation Matters.” Plos ONE 14 (2): e0209749.
doi:10.1371/journal.pone.0209749.
Felton, J., P. Koper, J. Mitchell, and M. Stinson. 2008. “Attractiveness, Easiness, and Other Issues: Student
Evaluations of Professors on Rate My Professors.” Assessment and Evaluation in Higher Education 33
(1): 45–61. doi:10.2139/ssrn.918283.
Felton, J., J. Mitchell, and M. Stinson. 2004. “Web-Based Student Evaluations of Professors: The Relations
between Perceived Quality, Easiness and Sexiness.” Assessment and Evaluation in Higher Education
29 (1): 91–108. doi:10.1080/0260293032000158180.
Hamermesh, D., and A. Parker. 2005. “Beauty in the Classroom: Instructors’ Pulchritude and Putative
Pedagogical Productivity.” Economics of Education Review 24 (4): 369–376. doi:10.1016/j.econ-
edurev.2004.07.013.
Hendrix, K. 1998. “Student Perceptions of the Influence of Race on Professor Credibility.” Journal of
Black Studies 28 (6): 738–764. doi:10.1177/002193479802800604.
Henning, M. 2018. Wellbeing in Higher Education: Cultivating a Healthy Lifestyle Among Faculty and
Students. Abingdon: Routledge.
International Association of Universities. 2006. World List of Universities, 25th Edition: And Other
Institutions of Higher Education. 25th ed. London: Palgrave Macmillan.
Jones, J., R. Gaffney-Rhys, and E. Jones. 2014. “Handle with Care! An Exploration of the Potential Risks
Associated with the Publication and Summative Usage of Student Evaluation of Teaching (SET)
Results.” Journal of Further and Higher Education 38 (1): 37–56. doi:10.1080/0309877X.2012.699514.
Jordan, D. W. 2011. “Re-Thinking Student Written Comments in Course Evaluations: Text Mining
Unstructured Data for Program and Institutional Assessment.” PhD diss., California State University,
Stanislaus.
MacNell, L., A. Driscoll, and A. Hunt. 2015. “What’s in a Name: Exposing Gender Bias in Student Ratings
of Teaching.” Innovative Higher Education 40 (4): 291–303. doi:10.1007/s10755-014-9313-4.
Macpherson, A., and R. Holt. 2007. “Knowledge, Learning and Small Firm Growth: A Systematic Review
of the Evidence.” Research Policy 36 (2): 172–192. doi:10.1016/j.respol.2006.10.001.
Marginson, S. 2013. “The Impossibility of Capitalist Markets in Higher Education.” Journal of Education
Policy 28 (3): 353–370. doi:10.1080/02680939.2012.747109.
Marsh, H. 2007. “Students’ Evaluations of University Teaching: dimensionality, Reliability, Validity,
Potential Biases and Usefulness.” In The Scholarship of Teaching and Learning in Higher Education:
An Evidence-Based Perspective, edited by R. Perry and J. Smart, 319–383. Dordrecht: Springer.
McCrae, N., and E. Purssell. 2020. How to Perform a Systematic Literature Review: A Guide for Healthcare
Researchers, Practitioners and Students. Berlin: Springer.
Oliver, B., B. Tucker, R. Gupta, and S. Yeo. 2008. “eVALUate: An Evaluation Instrument for Measuring
Students’ Perceptions of Their Engagement and Learning Outcomes.” Assessment and Evaluation in
Higher Education 33 (6): 619–630. doi:10.1080/02602930701773034.
Osoian, C., R. Nistor, M. Zaharie, and H. Flueras. 2010. “Improving Higher Education through Student
Satisfaction Surveys.” Proceedings of the 2nd International Conference on Education Technology and
Computer. doi:10.1109/icetc.2010.5529347.
Park, E., and J. Dooris. 2020. “Predicting Student Evaluations of Teaching Using Decision Tree Analysis.”
Assessment and Evaluation in Higher Education 45 (5): 776–793. doi:10.1080/02602938.2019.1697798.
ASSESSMENT & EVALUATION IN HIGHER EDUCATION 11
Pittaway, L., M. Robertson, K. Munir, D. Denyer, and A. Neely. 2004. “Networking and Innovation: A
Systematic Review of the Evidence.” International Journal of Management Reviews 5 (3–4): 137–168.
doi:10.1111/j.1460-8545.2004.00101.x.
Rosen, A. 2018. “Correlations, Trends and Potential Biases among Publicly Accessible Web-Based
Student Evaluations of Teaching: A Large-Scale Study of RateMyProfessors.com Data.” Assessment
and Evaluation in Higher Education 43 (1): 31–44. doi:10.1080/02602938.2016.1276155.
Rubin, D. 1998. Help! My Professor (or doctor or boss) doesn’t talk English. In Readings in cultural
contexts, edited by J. Martin, T. Nakayama, & L. Flores, 149-159. Mountain View, CA: Mayfield.
Shah, M., and C. Nair. 2012. “The Changing Nature of Teaching and Unit Evaluations in Australian
Universities.” Quality Assurance in Education 20 (3): 274–288. doi:10.1108/09684881211240321.
Short, H., R. Boyle, R. Braithwaite, M. Brookes, J. Mustard, and D. Saundage. 2008. “A Comparison of
Student Evaluation of Teaching with Student Performance.” In Proceedings of the 6th Australian
Conference on Teaching Statistics, edited by H. MacGillivray, 1–10. Melbourne, Victoria, Australia.
Stark, P., and R. Freishtat. 2014. “An Evaluation of Course Evaluations.” Science Open Research
doi:10.14293/s2199-1006.1.sor-edu.aofrqa.v1.
Sunindijo, R. 2016. “Teaching First-Year Construction Management Students: Lessons Learned from
Student Satisfaction Surveys.” International Journal of Construction Education and Research 12 (4):
243–254. doi:10.1080/15578771.2015.1121937.
Tucker, B. 2014. “Student Evaluation Surveys: Anonymous Comments That Offend or are Unprofessional.”
Higher Education 68 (3): 347–358. doi:10.1007/s10734-014-9716-2.
Uttl, B., and D. Smibert. 2017. “Student Evaluations of Teaching: Teaching Quantitative Courses Can
Be Hazardous to One’s Career.” Peer Journal 5: e3299. doi:10.7717/peerj.3299.
Valencia, E. 2020. “Acquiescence, Instructor’s Gender Bias and Validity of Student Evaluation of
Teaching.” Assessment and Evaluation in Higher Education 45 (4): 483–495. doi:10.1080/02602938.20
19.1666085.
Vivanti, A., N. Haron, and R. Barnes. 2014. “Validation of a Student Satisfaction Survey for Clinical
Education Placements in Dietetics.” Journal of Allied Health 43 (2): 65–71.
Worthington, A. 2002. “The Impact of Student Perceptions and Characteristics on Teaching Evaluations:
A Case Study in Finance Education.” Assessment and Evaluation in Higher Education 27 (1): 49–64.
doi:10.1080/02602930120105054.