Conference PaperPDF Available

Unequal Representation and Gender Stereotypes in Image Search Results for Occupations

Authors:

Abstract and Figures

Information environments have the power to affect people’s perceptions and behaviors. In this paper, we present the results of studies in which we characterize the gender bias present in image search results for a variety of occupations. We experimentally evaluate the effects of bias in image search results on the images people choose to represent those careers and on people’s perceptions of the prevalence of men and women in each occupation. We find evidence for both stereotype exaggeration and systematic underrepresentation of women in search results. We also find that people rate search results higher when they are consistent with stereotypes for a career, and shifting the representation of gender in image search results can shift people’s perceptions about real-world distributions. We also discuss tensions between desires for high-quality results and broader societal goals for equality of representation in this space.
Content may be subject to copyright.
Unequal Representation and Gender Stereotypes
in Image Search Results for Occupations
Matthew Kay Cynthia Matuszek Sean A. Munson
Computer Science Computer Science & Electrical Human-Centered Design
& Engineering | dub, Engineering, University of & Engineering | dub,
University of Washington Maryland Baltimore County University of Washington
mjskay@uw.edu cmat@umbc.edu smunson@uw.edu
ABSTRACT
Information environments have the power to affect people’s
perceptions and behaviors. In this paper, we present the
results of studies in which we characterize the gender bias
present in image search results for a variety of occupations.
We experimentally evaluate the effects of bias in image
search results on the images people choose to represent
those
careers
and on people’s perceptions of the prevalence
of men and women in each
occupation.
We find evidence
for both stereotype exaggeration and systematic underrepre-
sentation of women in search results. We also find that peo-
ple rate search results higher
when they are consistent with
stereotypes for a career, and shifting
the representation of
gender in image search results can shift people’s percep-
tions about real-world
distributions. We also discuss ten-
sions between desires for high-quality results and broader
societal goals for equality of representation in this space.
Author Keywords
Representation; bias; stereotypes; gender; inequality; image
search
INTRODUCTION
Every day, billions of people interact with interfaces that
help them access information and make decisions. As in-
creasing amounts of information become available, systems
designers turn
to algorithms to select which information to
show to whom. These algorithms and the interfaces built on
them can influence people’s behaviors and perceptions
about the world. Both algorithms and interfaces, however,
can be biased in how they represent the world [9,34]. These
biases can be particularly
insidious when they are not trans-
parent to the user or even to the designer [28].
The infor-
mation people access affects their understanding of the
world around them and the
decisions they make: biased
information can affect
both how people treat others and
how they evaluate their own choices or opportunities.
One of the most prevalent and
persistent
biases in the Unit-
ed States is a bias against
women with respect to occupa-
Permission to make digital or hard copies of all or part of this work for personal o
r
classroom use is granted without fee provided that copies are not made or distributed fo
r
p
rofit or commercial advantage and that copies bear this notice and the full citation o
n
the first page. Copyrights for components of this work owned by others than ACM mus
t
be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to
p
ost on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from permissions@acm.org.
CHI 2015, April 18–23, 2015, Seoul, Republic of Korea.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3145-6/15/04…$15.00.
http://dx.doi.org/10.1145/2702123.2702520
tional choices, opportunities, and compensation [20,26].
Stereotypes of many careers as gender-segregated serve to
reinforce gender sorting into
different careers and unequal
compensation
for men and
women in the same career. Cul-
tivation theory, traditionally studied in the context of televi-
sion, contends that both the
prevalence and characteristics
of media portrayals can develop, reinforce, or challenge
viewers’ stereotypes [29].
Inequality in the representation of women and minorities,
and the role of online information sources in p
ortraying and
perpetuating it, have
not gone unnoticed in the technology
community. This past spring, Getty Images and LeanIn.org
announced an initiative to increase the diversity of working
women portrayed in the stock images and to improve how
they are
depicted [27].
A recent study identified discrimina-
tion in online advertising delivery: when searching for
names, search
results for black-identifying
first names were
accompanied by more ads for public records searches than
those for white-identifying
first names, and those results
were more likely to suggest searches for arrest records [34].
These findings raise
questions about the possible impacts of
this discrimination and how to
design technology in consid-
eration of issues such
as structural racism.
Despite efforts to address some of these issues, there has
been limited
public effort to measure how online infor-
mation sources represent men and women. Further,
we do
not know how people perceive these biases when
they view
information sources, or the extent to
which it affects their
choices or perceptions about the world. For example, are
gender distributions
in search results representative of
those
in the real world – an
d if not, how does that affect people’s
perceptions of the world?
In this
paper,
we begin to address these gaps through four
studies characterizing how genders are represented in image
search results for occupations. We evaluate whether and
how these biases affect people’s
perceptions of search result
quality, their beliefs about the occupations
represented, and
the choices they make. In a series of studies on existing
image search results, manipulated search results, and peo-
ple’s perceptions of these results, we investigate the follow-
ing phenomena:
Stereotype exaggeration: While gender proportions
in
image search results are close to those in actual occupa-
tions, results for many occupations exhibit a slight exag-
geration of gender ratios according to stereotype: e.g.,
male-dominated professions tend to
have even more men
in their results than
would be expected if the proportions
reflected real-world distributions. This effect is also seen
when
people rate the quality of search
results or select the
best image from a result: they prefer images with genders
that match the stereotype
of an occupation, even
when
controlling for qualitative differences in
images.
Systematic over-/under- representation: Search results
also exhibit a slight under-representation of women in
images, such that an
occupation with
50%
women would
be expected to have about 45% women in
the results
on
average. However,
when evaluating image result quality,
people do not systematically prefer either gender: instead,
stereotyping dominates, and they prefer images that
match a given
occupation’s gender stereotype.
Qualitative differential representation: Image
search re-
sults also exhibit biases in
how genders
are depicted:
those matching the
gender stereotype
of a profession tend
to be portrayed as more professional-looking and less in-
appropriate-looking.
Perceptions
of
occupations in search results: We
find
that
people’s existing perceptions of gender ratios in
occupa-
tions are quite accurate (R2 of 0.72),
but that manipulated
search results can have a small but significant effect on
perceptions, shifting
estimations on
average ~7%.
This last point contributes to the broader motivation of this
work: not
only t
o contribute
to a
n understanding of how
everyday information systems – here, image search results
– both reflect and influence perceptions about gender in
occupations, but also to characterize a possible design space
for correcting or adjusting for differences in representation.
We do no t take a stance on whether or how designers and
system builders should address gender inequality and its
effects in their systems, but we believe that designers
should
be aware of inequalities in their systems and
how
those inequalities can affect perceptions. We
particularly
note
two overriding design tensions in this space: the desire
to improve perceived search result quality, and societal mo-
tivations
for improving equality of representation.
In the remainder of this paper, we review motivating work
and our specific research
questions. We then describe four
studies and their answers to these research
questions before
discussing the
implications for designers and society.
BACKGROUND AND MOTIVATION
The Internet and large data sets create many new opportuni-
ties for engaging with
data and using it in
communication
and to support
decision making. They also come with chal-
lenges and pitfalls. A recent
White
House report
noted that
biases in
data collection and presentation can lead to
flawed
understandings of the
need for and use of
public services,
and that this can lead to discrimination in
who receives
those services [8].
In the studies presented in
this paper,
we investigate the
prevalence and risks of gender-based stereotyping and bias
in image search results for occupations. Our research ques-
tions were guided by prior work in stereotyping and biases,
the role of media in forming, perpetuating, or challenging
these, and contemporary discussions of the effects of stereo-
types and biases information environments.
Stereotypes and bias
A stereotype refers to a belief that individuals in a group –
e.g., gender, occupation, ra
ce, ethnicity, or particular back-
ground
– generally have one or more traits or behaviors.
People make use of
stereotypes to explain their
own or oth-
ers’
behaviors
[14,35], to
justify actions or decide how to
act [4,35], and to define group boundaries [35]. While accu-
rate stereotypes may be useful for making decisions in the
absence of more specific information, inaccurate stereo-
types can be harmful. Belief that
one’s
group performs
poorly at a task can lead to lower performance (stereotype
threat
[32]). Stereotyped expectations about someone’s be-
havior can also lead them to behave in that way, a self-
fulfilling prophecy [32,38], and expectations about
one’s
own abilities can influence aspirations and choices, such as
beliefs about what career path one should follow [6,7].
Bias arises when an individual, group or process unfairly
and systematically treats an individual or
group favorably
or
unfavorably. Stereotypes about abilities or
character are a
common source of
bias [17], often to the disadvantage of a
particular race, sexual
orientation, or
gender. For example,
stereotypes about gender and parental
roles can systemati-
cally limit women’s career advancement [13,15,16].
Effects of stereotypes and bias in the media
The portrayal of
women
and racial/ethnic minorities in
tele-
vision and other media has received considerable attention
as both a possible source of stereotypes and opportunity to
challenge them
[11]. E
xclusion of these groups can imply
that they are “unimportant, inconsequential, and powerless”
[11]. Their inclusion offers specific examples whose impli-
cations depend on how
they are portrayed, and these por-
trayals can reinforce
or challenge stereotypes. Unfortunate-
ly, portrayals often
reinforce negative stereotypes, for ex-
ample by showing racial/ethnic minorities as criminals,
victims of criminals, and in low-status service jobs [11].
Cultivation theory predicts that television’s portrayal of the
world affects people’s beliefs about reality [10,31]. Portray-
als, or the lack of portrayals, can affect whether people be-
lieve that people like them commonly participate in an oc-
cupation, or their perceived self-efficacy for that role
[11,31]. Researchers studying
television commercials find
that women are less likely to be portrayed as workers and
that they exaggerate gender-occupation stereotypes [5].
They express concern that such portrayals may perpetuate
stereotypes. Cultivation theory has also been
found to
pre-
dict how people perceive risks after experiencing them in a
video game [37], and playing a sexualized female character
reduces female
players’ feelings of self-efficacy [3].
Stereotypes and bias in information systems
Like media and other built systems or environments, com-
puter systems have bias. Friedman and Nissenbaum describe
biased systems as those that “systematically and unfairly
discriminate against certain individuals or groups of indi-
viduals in favor of others” [9]. They describe three catego-
ries: pre-existing bias (arising from biases present in indi-
viduals or society), technical bias (arising from
technical
constraints), and emergent
bias (arising in real use, in which
a system is mismatched for the capabilities or
values of its
users). They argue: “freedom from bias should be counted
among the select set of criteria according to
which the qual-
ity of systems in use in society should be judged.”
Search engines have been studied and received popular at-
tention
for bias in their results, both for what they index and
present overall [19,36] and what they present to particular
users [28]. People tend to rely on search engines’
selection
and ordering
of results as signs of quality and
relevance
[21,22], and so
biased search results may affect
people’s
choices and beliefs. Scholars have previously
noted bias in
which geographic locations are indexed
and listed [36].
Others express concern that search autocomplete features
could perpetuate
preexisting biases, noting that
suggestions
varied between different religious
groups, and sexual and
racial minorities received more negatively framed questions
as suggestions [2]. As illustrated
by these examples, a
search
engine which has neither algorithms that systemati-
cally favor one group nor designers with
a particular bias
can still perpetuate preexisting societal biases: a representa-
tive indexing
of biased source material will produce results
that contain the same biases.
More recently, Getty Images and Sheryl Sandberg’s Lean In
Foundation announced an effort to improve the depiction of
working women in stock photos.
They argue that existing
images support stereotypes of working women as sidelined,
sexualized, or in supporting roles, and that these depictions
hurt women’s career aspirations and prospects [12,24,27].
RESEARCH QUESTIONS
Motivated by these concerns and questions about them, we
conducted a series of studies to evaluate bias in image
search results. Pre-existing biases that affect the images
available for image search
systems, and algorithms de-
signed to represent available content, may lead to biased
result sets, which in turn affect people’s perceptions and
choices among the search results. We specifically focus on
gender representation in image search results for occupa-
tions. We choose the portrayal of occupations because it is a
topic of societal importance that has recently received atten-
tion and efforts to ameliorate biases. While efforts such as
the partnership between Getty Images and Lean In may
make more diverse or positive images available, and partic-
ularly to those who access the Lean In collection, many
people turn to
major search engines when looking to illus-
trate a topic, and so
we focus our attention on the image
search results for a major search engine.
To the discussion of the bias in computer systems, we con-
tribute an assessment of the current extent and form of sev-
eral forms of stereotyping and differences of representation
present in image search
results: stereotype exaggeration,
systematic
over-/under-representation, and qualitative dif-
ferential representation. We also explore the effects of these
biases on perceptions of
the
occupations
in question.
We
designed four studies to answer these research questions:
Study
1: How does the prevalence of men and women in
image search
results for professions correspond to their
prevalence
in actual professions?
Are genders systemati-
cally over- or under-represented across careers, and is
there stereotype exaggeration
in gender proportions?
Study 2:
Are there qualitative differences in
how men and
women are portrayed in the image search
results?
Study 3: Do biased image search
results lead people to
perpetuate a bias in image search
results when they
choose images to represent a profession (i.e. through ste-
reotype exaggeration)?
Are there systemic over- or un-
der-representations of women in preferred results? How
do differences in representation affect people’s
percep-
tions
of the search result quality?
Study 4: Do differences in representation in image search
results affect viewers’ perceptions of the prevalence of
men and women in that occupation? Can we shift those
opinions by manipulating results?
For all studies, we recruited turkers/participants1 from
Am-
azon’s Mechanical Turk microtask market. We
required that
they be from
the United
States (as our occupation preva-
lence data is specific to that
population) and, for studies 2-
4, required them
to have the Masters qualification.
1
We use “turkers” for studies 1 and 2, where they were asked only
to label data, and “participants” for studies 3 and 4, where their
opinions and perceptions were solicited.
STUDY 1: GENDER PROPORTIONS IN RESULTS
COMPARED TO ACTUAL PROPORTIONS
In this study, we sought to characterize the extent to which
the prevalence of men and women in image search results
for professions correspond to their actual prevalence in
those occupations. As a gold standard for actual prevalence
of men and women by occupation, we used estimates from
the US Bureau of Labor and Statistics (BLS) [5]. We did
not use all occupations, but removed occupations that:
Presented difficult polysemy problems: for example, oc-
cupations that are listed as conjunctions of multiple occu-
pations in the BLS, such as “Musicians, singers, and re-
lated workers”, are difficult to reduce to a single search.
Had non-obvious search terms: for example, “Miscella-
neous media and communication workers”.
Are typically referred to using
gender-specific terms: for
example, “tailor” and “seamstress”
Most of the remaining terms had straightforward transla-
tions from BLS categories into search terms for a worker in
that occupation (for example, we mapped “Nurse practi-
tioners” in the BLS database to the search term “nurse prac-
titioner”). Some categories required limited interpretation
(e.g., we translated “Police and sheriff’s patrol officers”
into “police officer”); for these terms, all three authors had
to agree on a corresponding search term for the category to
be included. This left us with 96 occupations having an
entry in BLS and a corresponding search term.
We then do wnloaded the top 100 Google Image search re-
sults for each search term (from July 26–29, 2013). For
each image, turkers were asked to indicate whether there
were no people, one person, or
more than one person
in
the
image. They were also asked whether the people were
women, men, children, or of unknown gender (and to check
all that apply).2
We had three turkers label each image.
100
80
60
40
sn
n
oi
e
ta
m
puc
wox
cf o
20 Full BL S d ataset
% o
hwit
Filtered dataset
0
0 2040608090 100
% women in occupation
Figure 1. Comparison of empirical cumulative distribution
functions of gender distributions in the full set of BLS occu-
pations and our filtered dataset, showing similar distribution.
Results
Representativeness of filtered dataset
A requirement of this study was to obtain a representative
dataset of images of individuals in different occupations
with properly labelled gender. This required some filtering
to ensure that images had correctly labelled genders, de-
picted only people of that gender, and were generally imag-
es of people in
the first place. To
label gender, we took
the
majority label for each image, and dropped those images
from the results which
did not have majority agreement. We
then
dropped entire search terms which:
Had less than
80% of the images labelled
with majority
agreement (two terms failed this criterion: firefighter and
baker; notably, firefighter had only 64% agreement,
largely because most of the images were dark silhouettes
of uniformed firefighters with ambiguous gender, fre-
quently labeled as “male” by some turkers).
Had few images containing
only one gender or that most-
ly
depicted workers with
clients/patients. For
example,
hairdresser was dropped since too many of its images
contained both hairdresser and client, making it
difficult
to determine which gender label corresponds
to which.
We considered asking turkers whether the person in
ques-
tion has the given occupation; however, this implicitly
asks them to decide if a person of that gender could be a
hairdresser (thus potentially subject to
gender stereotypes
related to that
profession, which would
bias our labelled
data set), so we opted to filter occupations
with multiple
genders in the majority of images.
Had too few people in the image results; e.g., dishwasher
largely returned images of dishwashing machines.
Corresponded with a common name (e.g., baker returned
many results of people with the surname Baker).
2 As the BLS uses only binary gender classifications, we also re-
stricted labels to binary gender classification here.
This second filtering process left us with 45 occupations. To
ensure that all levels of our filtering (from the initial selec-
tion of search terms down to the filtering of labelled imag-
es) had not biased the representativeness of our final selec-
tion of occupations in terms of gender or ethnicity, we con-
ducted a series of Kolmogorov-Smirnov tests comparing
the gender and ethnicity distributions of the filtered 45 oc-
cupations to the entire set of 535 occupations in the BLS
(using bootstrap
p
values; unlike the traditional KS test this
allows for non-continuous distributions and ties). We
did
not find evidence that
our filtered dataset significantly dif-
fered from the set
of
occupations in the BLS in terms of
gender distribution (D45,535 = 0.0997, p = 0.765),
distribu-
tion of
Asian people, (D45,535 = 0.0901, p
= 0.814), distribu-
tion of Black or African American people
(D45,535 = 0.1021,
p
= 0.729), or distribution of Hispanic or
Latino people
(D45,535 = 0.1423, p = 0.315). Note the close correspondence
of empirical cumulative distribution
functions
for the
gen-
der distribution in the filtered dataset versus the full BLS
dataset in Figure 1 (plots for ethnicity showed similar cor-
respondence).
Misrepresentation of genders in search results
We
ran several models to assess the possibility of systemat-
ic differences of representation in depictions of occupations
in image search
results compared to the known proportions
in the BLS. The purpose of
these models was to assess the
presence of two potential forms of quantitative differences
in representation.
The first is that
gender proportions are exaggerated accord-
ing to stereotypes for each career (stereotype exaggeration).
For example, if stereotyping occurs in image results, we
would expect a profession with 75% males in the BLS to
have more than
75% males in the image results. The second
possibility is that there is a systematic overrepresentation of
one gender, across all careers, in
the search results.
Stereotype exaggeration by career
To assess whether men or women were over- or under-
represented according to stereotypes for careers, we ran two
logistic regression models: a stereotyped and a non-
stereotyped model.
The
stereotyped model regressed the
logit of the proportion of women in the search results on the
proportion of women in BLS (exhibiting an s-curve charac-
teristic of the logit of a proportion and indicative of stereo-
typing: extreme gender proportions in the BLS are pulled
even more to the extremes in the search results). The non-
stereotyped model regressed the logit of the proportion of
women in the search results on the logit of the proportion of
women in BLS, thus not exhibiting an s-curve. While both
models can account for a systematic over-representation of
one gender across careers, only the stereotyped model can
account for the pulling at the extremes characteristic of ste-
reotyping by the typical gender of a career.
We found some evidence for stereotype exaggeration: the
stereotyped model had
qualitatively better residual fit
(Figure 2). Vuong’s closeness test for model fit also sug-
gested that the stereotyped model had better fit (z = 1.55,
p = 0.06). This stereotyping
effect can be seen as the overall
s-shape of the data compared
to a line with slope = 1
(we
would expect a line with slope = 1 if the data
did not exhibit
stereotype exaggeration).
Systematic over-/under- representation across careers
We can estimate overrepresentation of a gender across ca-
reers from our logistic regression model by testing to see if
the coefficient of the intercept is significantly different from
0 when the x-intercept is set to 50% women in the BLS.
Indeed, we find that the intercept does have a significant
effect in this model (estimated effect: 0.26, 95% confi-
dence interval: [0.45, 0.07], t = 2.68, p < 0.05);3
43 this
effect can be seen in Figu
re
2 as a dip in the predicted pr
o-
portion
of women in the search results at (50%, 50%). This
effect corresponds to an odds ratio of approximately 0.77
(95% CI: [0.64, 0.93]); this means that in a profession
with
50% women, we would expect about 45% of the images to
be women on average (95% CI: [38.9%, 48.2%]).
The particular combination of stereotype exaggeration and
underrepresentation of women that we see – slight pulling
at the extremes and slight bias towards male images – com-
bine to affect male- and female-dominated professions dif-
ferently. In male-dominated professions (lower-left quad-
rant
of Figure 2A)
both effects amplify each other, so
a
higher proportion of males appear in search
results than are
in the BLS. By contrast, in female-dominated professions
(upper-right quadrant of Figure 2A) these two effects essen-
tially cancel each
other out, leading to a similar proportion
of women in the search
results as are in the BLS.
A. Stereotyped model:
1
4
0.8
e
2
n in imag
0.6
0
s
t
sul
−2
0.4
e
me
sl
r
a
−4
h
u
o
c
0.2
d
% w
r
i
s
−6
sea
Re
0
0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2
% women in occupation (BLS) Predicted values (log odds)
B. Non-stereotyped model:
1
4
0.8
e
n in imag
0.6
s
tsul
0.4
me
e
sl
r
a
h
u
o
0.2
% w
c
r
d
sea
i
s
Re
0
2
0
−2
−4
−6
0 0.2 0.4 0.6 0.8 1 −4 −3 −2 −1 0 1 2
% women in occupation (BLS) Predicted values (log odds)
Figure 2. Stereotyped (A) and non-stereotyped (B) models of
image search gender distributions compared to actual distri-
butions. Note the improved fit of the stereotyped model, sug-
gesting stereotype exaggeration in search results.
STUDY 2: DIFFERENCES IN QUALITATIVE
REPRESENTATION
3 This test was carried out using the stereotyped model, but we
note that a similar test carried out on the non-stereotyped model
yielded similar results and confidence intervals.
Search results can be biased even when their gender propor-
tions are representative. For example, in reviewing the im-
ages
collected for Study 1, we identified many examples of
sexualized depictions of women who were almost certainly
not engaged in the profession they portrayed; we dub this
the sexy construction worker problem, as images of fe-
male construction
workers in our results tended to
be
sexu-
alized caricatures of construction workers.
We wished
to
assess whether images that better match the stereotypical
gender
of a profession were systematically portrayed as
more or less professional, attractive, or appropriate. Note
that while there are many interesting differences to unpack
here, our primary focus is on
assessing these characteristics
so that we can
control
for them
in subsequent analysis.
Methods
We used the top 8 male and female images from each pro-
fession, as these images will be used again in study 3, be-
low. Initially, we piloted a study in which people were
asked to give 5 adjectives describing the person in each
image, but found that this task was too difficult. We there-
fore opted to select 8 adjectives derived from our pilot re-
sults and our research questions: attractive, provocative,
sexy, professional, competent, inappropriate, trustworthy,
and weird. We then had turkers indicate on a 5-point scale
(strongly disagree to strongly agree) whether they felt each
adjective described the person in the picture. Each turker
could rate each image at most once, though no turker could
rate more than 600 images. Each image was rated by at
least 3 turkers.
Results
At a high level, we found that images showing a person
matching the majority gender for a profession tend to be
ranked as slightly more professional-looking and slightly
less inappropriate than those going against stereotype.
Adjective ratings
One would expect that men and women rate images differ-
ently; however, this is not our focus here, so we have at-
tempted to factor out these differences. We conducted a
series of mixed-effects ordinal logistic regressions to model
how turkers rated images for each adjective. We included
the turker’s gender, the image gender, and their interaction
as fixed effects; we included the turker and the image as
random effects.4
This allows our models to account for (for
example) situations where women systematically rate men
as more attractive than men do. We used the coefficients of
the image effect in each model as a
normalized rating for
that adjective. These ratings have the effects of turker,
turker gender,
image gender, and their interaction factored
out and are all
approximately standard normally distributed.
Stereotyping bias in qualitative ratings
We hypothesized that images matching the typical gender
of a given profession might be portrayed differently from
images that do not match the typical gender of that profes-
sion (as in the
sexy construction worker problem). To
assess
the whether this was the case, we ran linear mixed-effects
regressions, each with one of the adjective ratings derived
above as the independent variable. Each model included
image gender, the image gender proportion
in
BLS (the %
of people in the BLS matching the gender of the image; e.g.
for a male image of a construction worker, this would be the
% of construction workers who are male according to the
BLS), and the interaction of these two terms as fixed ef-
fects. The models also included the occupation as a random
effect. As noted above, we are primarily interested in these
factors as controls in Study 3 (below), so we only summa-
rize two
high-level trends in the results here.
First, adjectives like professional (F1,623.6 = 36.6,
p < 0.0001), competent (F1,630 = 28.4, p < 0.0001), and
trustworthy (F1,627.8 = 33.8, p < 0.0001) had significantly
higher ratings when the proportion of people in the BLS
matching the gender of the image was higher. Second, ad-
jectives like inappropriate (F1,635.2 = 20.4, p < 0.0001)
or
provocative (F1,635.12 = 4.38, p < 0.05) had significantly
lower ratings when the proportion
of
people in the BLS
matching the
gender
of the image was
higher. In other
words, we again we see an effect of stereotyping exaggera-
tion: images matching the gender stereotype of a profession
tend to
be slightly
more professional-looking and slightly
less inappropriate than those
going against
stereotype.
The
reason for this effect is unclear: it may be that these images
are rated less professional/appropriate because of
raters’
biases against images going against their stereotypes for
those
professions.
However, it may also be that these depic-
tions are of lower quality – examples of the sexy construc-
tion worker problem, where depictions against
stereotype
are not true depictions of the profession at all.
STUDY 3: PERCEPTIONS OF SEARCH RESULTS
Having described the bias
present in image search
results
for careers
– stereotype exaggeration and differences in
4 While we have used worker and image as random effects here
and elsewhere in the paper, where estimable we have also com-
pared results with fixed effects models and found similar effects.
We believe random effects to be more appropriate here as some
workers have completed only a s mall number of tasks.
representation – we next turn our attention to whether these
differences affect people’s appraisals of the quality of
search results and, in a hypothetical task, what image they
choose to represent an occupation. This is not a purely ab-
stract problem: a textbook publisher recently recalled a
math textbook after discovering they had selected an image
from a pornographic film for the cover [18].
We generated synthetic sets of image search results for each
occupation, in which the gender balance was manipulated
by re-ranking images from the original search results. Each
synthetic result had a different gender distribution, with 8
images in each result. For each search term we generated up
to
7 synthetic results: all men, all women, equal propor-
tions, proportions from Google search, proportions from the
BLS, the reverse of
the proportions from
Google
search,
and the reverse of the proportions in the BLS.
To ensure that the proportion of
women in the BLS for a
given search term does not influence the proportion of im-
ages in the synthetic results for that search term, synthetic
subsets (other than equal) were only included if their corre-
sponding reversed subset could also be included (for exam-
ple, if
we had enough
women to make a synthetic search
result with 6/8 images of
women,
but not enough men to
make a synthetic search result with 6/8 images of men, nei-
ther synthetic result was included). This ensures that if gen-
der has no effect on the probability of an
image being se-
lected, the baseline probability of two images of different
gender being selected for any occupation will be the same:
(regardless of the gender ratio in that occupation).
To generate a subset with k
women, we selected the top k
female images from our labelled dataset (in the order they
appeared in the original Google image search results) and
the top 8-k male images. The
images were displayed to
par-
ticipants in the order they appeared in the original search
results. Participants could view one
result set per occupa-
tion.
This was to prevent participants from
realizing
that we
manipulated the gender
proportions
of the search results, as
they might if they saw multiple synthetic results for the
same occupation with very
different
gender distributions.
On viewing a result set, we asked participants to select one
image to illustrate the occupation
for a hypothetical busi-
ness presentation.
We then asked them to describe how well
the image results matched the search term (the
occupation),
in a drop down
from 1
(very poor) to
5
(very good), and to
describe why they rated as they did.
Image Selection Results
We used logistic regression to model the probability that a
given image is selected as the best result by a participant.
Our model included image gender, the image gender pro-
portion in BLS
(see explanation under Study 2), participant
gender, and their interactions. We also included all of the
image adjective ratings to control for differences in qualita-
tive representation. Results are shown in
Table 1.
Est. SE z p
(Intercept)
% image gender in BLS
female image
male participant
professional
attractive
inappropriate
provocative
sexy
competent
trustworthy
weird
% image gender in BLS
  × female image
% image gender in BLS
  × male participant
female image
  × male participant
% image gender in BLS
  × female image
  × male participant
2.661
0.928
0.239
0.041
0.089
0.028
0.018
0.312
0.021
0.448
0.046
0.485
0.222
0.003
0.011
0.175
0.273
0.414
0.360
0.393
0.082
0.080
0.109
0.166
0.087
0.104
0.090
0.111
0.585
0.594
0.512
0.832
9.737
2.244
0.663
0.104
1.082
0.35
0.168
1.877
0.244
4.298
0.513
4.355
0.38
0.005
0.021
0.21
<0.0001
   0.0248
0.5072
0.9173
0.2794
0.7262
0.8662
   0.0606
0.8076
<0.0001
0.6078
<0.0001
0.7037
0.996
0.9836
0.8333
***
*
.
***
***
Table 1. Factors affecting image selection in Study 3. Coeffi-
cients are on a logit scale. Note the stereotype effect: greater %
image gender in BLS is associated with higher probability that
an image is selected.
Over-/under- representation and participant effects
We found no evidence of systematic over-/under- represen-
tation of either gender (there were no significant effects of
image gender). Neither were there significant effects of
participant gender (suggesting men and women generally
judge the best search result in the same way), nor any sig-
nificant interactions with either of these factors.
Stereotype exaggeration
As with the gender distributions in the search results them-
selves, we found evidence of st ereotyping when people
choose image results: image gender proportion in BLS had
a significant effect on the probability of an image being
selected; i.e., an image matching the majority gender pro-
portion
of its occupation
was more likely to be selected. We
believe this is consequence of stereotype matching: an im-
age matching a person’s stereotype for that gender is more
likely to be selected as an exemplar result.
Search Result Quality Rating Results
We saw very similar effects influencing quality rating. We
ran a mixed effects ordinal logistic regression to model
quality rating
based on
proportion of women in BLS, pro-
portion
of women in the synthetic search result, participant
gender, and their interactions. We included the adjective
rating of the selected image (as possibly the most salient in
judging search quality) in the synthetic search
result to con-
trol for differences in qualitative representation. We also
included
participant and search term (occupation) as ran-
dom effects. Results are in Table 2.
Est. SE z p
% women in BLS
% women in search result
male participant
professional
attractive
inappropriate
provocative
sexy
competent
trustworthy
weird
% women in search result
  × % women in BLS
% women in search result
  × male participant
% women in BLS
  × male participant
% women in search result
  × % women in BLS
  × male participant
2.978
2.255
0.156
0.039
0.160
0.411
0.491
0.053
0.506
0.392
0.509
5.321
0.595
0.525
2.036
1.344
0.889
0.871
0.186
0.180
0.243
0.329
0.209
0.227
0.219
0.218
1.699
1.221
1.359
2.325
2.22
2.54
0.18
0.21
0.89
1.69
1.49
0.25
2.23
1.79
2.33
3.13
0.49
0.39
0.88
0.0268
0.0112
0.8581
0.8324
0.3748
0.0911
0.1357
0.8016
0.0257
0.0732
0.0199
0.0017
0.6261
0.6992
0.3812
*
*
.
*
.
*
**
Table 2. Factors affecting search result quality ratings in Study
3. Coefficients are on a logit scale.
Over-/under- representation and participant effects
As above, we found no significant over-/under-
representation effect: in an occupation with 50% women,
we would not expect an all-male search result to be rated
differently from an all-female
search result (estimated dif-
ference = 0.41, SE = 0.47, z = 0.87, p = 0.38). As above,
there were no significant effects of participant gender.
Stereotype exaggeration
We again saw a stereotype exaggeration effect, manifested
here as a significant interaction between proportion of
women in BLS and proportion of women in the search re-
sult: in male-dominated
occupations, search
results with
more males are preferred; in female-dominated occupa-
tions, search
results with more females are preferred.
Viewed from the perspective of this task, these results make
sense: we asked people to select the best search result (or to
rate the quality of all results), and they tended to prefer im-
ages matching their mental image of each profession, both
in qualitative characteristics and in expected gender.
This
reflects the strong sense that people have of
expected gen-
der proportions in a broad spectrum of occupations, which
we explore in more detail next. This also emphasizes an
important
tension between possible broader societal
goals
in manipulating gender as a design
dimension in search
results versus
end-users’
quality expectations, an issue we
discuss in detail at the end of this
paper.
STUDY 4: PERCEPTIONS OF GENDER
PROPORTIONS IN OCCUPATIONS
Finally, we sought to understand whether and how gender
proportions in image search results can affect people’s per-
ceptions of
the actual prevalence of men and women in
different occupations, both to
understand how existing (ste-
reotype-exaggerating, misrepresented) results might be af-
fecting people’s
perceptions of gender proportions
and how
feasible manipulating gender distributions in image search
might be as a method for affecting those perceptions. This
gets at a primary motivation of our paper: opening up gen-
der proportions as a design
dimension in image search.
Given the many possible day-to-day influences on percep-
tions of the prevalence of genders in different fields, we
chose to collect people’s baseline perceptions, wait two
weeks, show them a synthetic image search result set for
the same career, and then immediately ask them their per-
ceptions of prevalence.
We asked each participant the demographics information
we used in studies 2 and 3. Then for each career we asked
what percent of people working in that career in the US are
women, alongside three distraction questions: what educa-
tion they believe is typical for someone in that career,
whether they believe the career was growing, and how pres-
tigious they think it is. Participants could answer for as
many careers as they wished.
After two weeks, each participant received an email thank-
ing them for their prior participation and inviting them to
participate in a new round of
tasks; we limited access both
in the script that
managed our tasks and using an assigned
qualification on Mechanical Turk. For each profession
to
which
they
had previously responded, we returning partici-
pants to view a synthetic search result and complete the
image search
task from
study
3; on
the next page we re-
asked the four
questions from the first page: typical educa-
tion for the career, percent women, whether the career was
growing, and its prestige.
Results
Perceptions absent influence
People’s initial perceptions of gender proportions in occu-
pations are quite good. We assessed the correlation of their
existing perceptions to real-world proportions using a
mixed-effects linear regression with gender proportions in
BLS as the fixed effect and participant as a random effect.
The marginal pseudo-R2 of this model was 0.717
(F1,297.71 = 870.21, p < 0.0001).
Perceptions after influence
After exposure to search results with manipulated gender
proportions, estimates shifted slightly in the direction of the
manipulated proportions. We
ran a linear mixed-effects re-
gression to
predict perceived gender proportions, with
ini-
tial perceived g
ender proportion and
manipulated search
gender proportion as fixed effects and participant as a ran-
dom effect. Both fixed effects were significant: while a per-
son’s
original perceptions of an occupation dominated their
opinion two weeks later; approximately 7% of a person’s
subsequent opinion on average was determined by the result
set they were exposed to (p <
0.01, see Table 3).
While this
only shows short-term
movement due to manipu-
lated search results, cultivation theory suggests that long-
term, ongoing
exposure to such results might shift percep-
tions
over time. This suggests that there may be value in
considering gender distribution as a more deliberate design
dimension in image search
results, as we discuss next.
DISCUSSION
Our results provide
guidance on the short-term effects of
possible changes to
search engine algorithms and highlight
tensions in possible designs of search algorithms.
As a design space, what other kinds of search results
could we design and what might be the consequences?
There are two sets of adjustments that can be made: adjust-
ing the gender distribution, and adjusting the distribution of
qualitative image characteristics within genders (e.g. in-
creasing the proportion of female construction worker
re-
sults that are rated as professional or competent to correct
for the sexy construction worker problem). Taking the for-
mer as a starting point, we
outline three possible (amongst
many) ways of adjusting search results: 1) exaggerating, or
continuing to accept, exaggerated gender stereotypes; 2)
aligning the gender balance of results to match reality; or 3)
balancing the genders represented in results.
These models also surface several design
tensions in this
space. In particular, we might ask if our goal is to improve
perceptions
of search result quality, or to advance a broader
social agenda to shift perceptions of gender equality in
var-
ious
professions. While potentially at odds in the short term
(e.g., highly stereotyped results
might be highly rated but
not have desirable societal effects), cultivation theory also
suggests these goals may not be as contrary over the long
term if perceptions can be shifted to match a broader goal
of equal representation (as, at
least in the short term, Study
4 suggests is
possible). We
discuss how these motivations
interact in more detail for each
proposed model. We
do not
wish to come
down on any side of these issues, but
wish to
advance an
understanding of how the choices people make
in designing algorithms can (and
already do) define a de-
sign space that explicitly or implicitly affects these issues.
1. Stereotype model. Exaggerate gender stereotypes.
This
might improve subjective assessment of
quality over base-
line results if the dominant gender is already represented as
professional and appropriate, so
would likely not require
correcting for qualitative differences (simplifying
the tech-
nical problem). This would also give more exemplars for
the selection task. At the same time, continuing to
portray
careers as more gender-segregated than they are, or even
further exaggerating gender imbalances, has the disad-
vantage of potentially
influencing people’s
perceptions of
occupation gender distributions over time to be less accu-
rate and reinforcing stereotypes that can shape and limit
career aspirations and treatment.
2. Reality model. Correct the slight stereotype exaggera-
tion and underrepresentation of women seen in the data so
Est. SE df t p
(Intercept) 0.074 0.021 103.6 3.51 <0.001 ***
% women in
manipulated 0.070 0.024 206.3 2.90 <0.01   **
search result
perceived % women
in occupation before 0.803 0.033 211.8 24.57 <0.001 ***
manipulation
Table 3. Effects of the manipulated search result and a per-
son’s pre-existing opinion of % women in an occupation on
their opinion after seeing the manipulated result (Study 4).
that gender
distributions
better resemble the BLS. So long
as we can select images of the non-dominant gender that
have
high
professionalism and low inappropriateness, this
would at least better represent the reality of the profession
while having little effect on the perceived
search
result
quality or the selection task. Over the long term, exposure
to such
results might improve people’s estimations of real-
world gender proportions in occupations.
This also repre-
sents only a small perturbation to existing
results, and may
not even require adjustments to distributions to account
for
qualitative differences in
representation due to how close
the existing search
proportions are to actual BLS propor-
tions.
3. Balanced model.
Adjust the gender proportions in occu-
pations to
be equal or closer to equal. This may impair the
selection task
by giving
fewer gender exemplars. However,
if this is paired
with corrections for qualitative differences
in r
epresentation so that portrayals of the non-dominant
gender are similar to the dominant one (particularly for pro-
fessionalism), we do
not believe it would significantly de-
grade people’s overall perceptions of the quality of the re-
sults. This model exposes a tension between a desire for
results perceived as high-quality and possible societal goals
for advancing equal representation.
While the short-term
effects on
perceived search
result quality would likely be
negative, both cultivation theory [31] and the results of
study 4 predict that this could, in the long
term, shift peo-
ple’s perceptions towards a less gender-stereotyped view
of
these professions. Along
with that long-term shift, a possi-
ble result may be that perceptions of
quality shift back as
people
begin to
perceive gender proportions as more equal.
Feasibility of Manipulating Representation
Automatic gender classification of images of people is an
outstanding problem. While state-of-the art classifiers per-
form well under certain circumstances, they have historical-
ly
focused on straight-on
images of a single face, typically
without visual occlusions, uneven lighting, or complicated
backgrounds [25]. However,
recent studies on gender clas-
sification on images collected in the wild [1,30] or
of
only
partial features [23] strongly suggest that automated solu-
tions will soon reach or surpass the accuracy of human an-
notators. Meanwhile, automated human labelling of these
images would not be much more costly than the data collec-
tion processes used
in
this paper, and would provide ground
truth data for future automated approaches.
Limitations and Future Work
We have focused here on search results absent personaliza-
tion or other additional context (such as time of year or lo-
cation) that may affect results shown to users in real-world
tasks. While the businessperson making a slide presentation
might be seeking an accurate depiction of
a given profes-
sion, other users with different
goals might
prefer
carica-
tured or inaccurate portrayals, such as the sexy construction
worker. We
also do not
address the cause of the biases and
misrepresentation found (e.g.,
are these due to actual preva-
lence in
webpages,
or due to the ranking algorithms used by
Google?). Future work might
try to tease these effects apart,
and to investigate these phenomena in other types of image
search, such as on
photo sharing sites and in social media.
To aid in replication,
our data
and code are available
online: https://github.com/mjskay/gender-in-image-search.
CONCLUSION
Academics and the technology community have raised con-
cerns about potential biases in search engines and in stock
photos. We contribute an assessment of gender representa-
tion in image search results and its effects on perceived
search
result quality, images selected, and
perceptions about
reality. We find that image search results for occupations
slightly exaggerate gender stereotypes and portray the mi-
nority gender for an occupational less professionally. There
is also a slight underrepresentation of
women. This stereo-
type exaggeration is consistent with perceptions of result
quality – people believe results are better when they agree
with the stereotype – but
risks reinforcing or even increas-
ing perceptions of actual
gender segregation in careers.
Addressing concerns such as
these in search engines and
other information sources, however, requires balancing de-
sign tensions. For example, maintaining perceived search
quality and accurately representing available materials may
be at odds with supporting socially desirable outcomes and
representing either real-world distributions of careers or
idealized
distributions
of careers. We hope to advance a
constructive
discussion on gender representation as
a design
dimension (explicit and implicit) in information
systems.
ACKNOWLEDGEMENTS
We thank
Gilbert Bernstein and Benjamin Mako
Hill for
their valuable feedback
on this work.
REFERENCES
1.
Arigbabu
OA; Ahmad SMS, Adnan WAN,
Yussof
S,
Iranmanesh V, Malallah, FL,
Gender recognition on
real
world faces based on shape representation and neural
network.
ICCOINS 2014.
2.
Baker P, Potts A. “Why do
white people have thin lips?”
Google and the perpetuation of stereotypes via auto-
complete search forms. Crit Disc St 10(2): 187-204.
3. Behm-Morawitz E, Mastro D. The Effects of the Sexual-
ization of Female Video Game Characters on Gender
Stereotyping and Female Self-Concept. Sex Roles 2009;
61(11-12): 808-823.
4.
Bodenhausen GV, Wyer RS. Effects of stereotypes in
decision making and information-processing
strategies.
J Pers Soc Psychol 1985; 48(2): 267.
5.
Bureau
of Labor Statistics. Labor Force Statistics from
the Current Population Survey, Section 11.
5 February
2013. http://www.bls.gov/cps/aa2012/cpsaat11.htm.Coltrane
S, Adams M. Work–family imagery and gender stereo-
types:
Television and the reproduction of difference. J
Vo c a t B e h a v 1997; 50(2): 323-347.
6.
Correll SJ. Gender and the career choice process: The
role of biased self-assessments. Am J Sociol 2001;
106(6): 1691–1730.
7.
Correll SJ. Constraints into preferences: Gender, status,
and emerging career aspirations. Am Sociol Rev 2004;
69(1): 93-113.
8.
Executive Office of the President. Big Data: Seizing
Opportunities, Preserving
Values. May 2014.
9.
Friedman B,
Nissenbaum H. Bias in computer systems.
ACM T Inform Syst 1996; 14(3): 330-347.
10.
Gerbner G, Gross L, Morgan
M, Signorielli N. Living
with television: The dynamics of the cultivation
process.
Perspectives on media effects 1986: 17-40.
11.
Graves SB. Television and Prejudice Reduction:
When
Does
Television as a Vicarious Experience Make a Dif-
ference?
J Soc
Issues 1999; 55(4): 707-727.
12.
Grossman P.
New Partnership with LeanIn.org. InFocus
by Getty Images. http://infocus.gettyimages.com/post/
new-partnership-with-leaninorg.
13.
Halpert JA, Wilson ML, Hickman JL. Pregnancy as a
source of bias in performance appraisals. J Organ Behav
1993.
14.
Haslam SA, Turner JC, Oakes PJ, Reynolds KJ, D oosje,
B From personal pictures in
the head to collective tools
in the word: how shared stereotypes allow groups to rep-
resent and change social reality. In C McGarty,
VY Yz-
erbyt, R Spears (eds.). Stereotypes as explanations: The
formation of meaningful beliefs about social
groups
2002. Cambridge University Press, 157-185.
15.
Heilman ME. Description
and prescription: How gender
stereotypes prevent women's ascent up the organization-
al ladder.
J Soc
Issues 2001; 57(4): 657-674.
16.
Heilman
ME. Okimoto,
TG. 2008. Motherhood: A po-
tential source of bias in employment decisions. J Appl
Psychol 2008; 93(1): 189-198.
17. Hilton JL, & Von Hippel W. S tereotypes. Annu Rev Psy-
chol 1996; 47(1): 237-271.
18. Hooper B. Porn star appears on cover of Thai math text-
book. United Press International.
http://upi.com/5031410787947
19.
Introna L, Nissenbaum
H. Defining
the web:
The poli-
tics of search engines.
Computer 2000; 33(1): 54-62.
20.
Jacobs J. Gender Inequality at Work. Thousand
Oaks,
CA: SAGE Publications, 1995.
21.
Kammerer,Y,
Gerjets P. How search engine users evalu-
ate and select Web search results: The impact of the
search engine interface on
credibility assessments. Libr
Inform Sci 2012; 4: 251–279.
22.
Keane MT, O'Brien M, Smyth B.
Are people biased in
their use of search engines?
Commun ACM 2008; 51(2):
49-52.
23.
Khorsandi R,
Abdel-Mottaleb M.
Gender classification
using 2-D ear images and sparse representation. 2013
IEEE Workshop
on
Applications of Computer Vision
(WACV), 461-466.
24.
Lean
In Foundation.
Getty Image Collection.
http://leanin.org/getty.
25.
Makinen E,
Raisamo R. Evaluation of Gender Classifi-
cation
Methods
with
Automatically Detected
and Aligned
Faces, IEEE T
Pattern Anal 2008; 30(3): 541-547.
26.
Massey D. Categorically Unequal: The American Strati-
fication System. NY: Russell Sage Foundation,
2007.
27.
Miller CC. 10
February 2014. LeanIn.org and
Getty Aim
to Change Women’s Portrayal in Stock Photos. New
York Times, B3.
http://nyti.ms/1eLY7ij
28.
Pariser E. The Filter Bubble: What the Internet Is Hid-
ing from
You. 2011. Penguin Press.
29.
Potter WJ. Cultivation theory and research, Hum Com-
mun Res 1993; 19(4): 564-601.
30.
Shan C. Learning local
binary patterns for gender classi-
fication on real-world face images, Pattern Recogn
Lett
2012; 33(4), 431-437.
31.
Shrum LJ. Assessing the Social Influence of
Television
A Social Cognition Perspective on Cultivation Effects.
Commun Res 1995; 22(4): 402-429.
32.
Spencer SJ, Steele CM, Quinn DM. Stereotype threat
and women's math performance. J
Exp Soc Psychol
1999; 35(1): 4-28.
33.
Snyder M, Tanke ED, Berscheid E. Social perception
and interpersonal behavior: On the self-fulfilling
nature
of social stereotypes. J Pers Soc Psychol 1977; 35(9):
656–666.
34.
Sweeney L. Discrimination in online ad
delivery.
Com-
mun ACM
2013; 56(5): 44-54.
35.
Tajfel H. Social
stereotypes
and social groups. In Turner
JC, Giles H. Intergroup Behaviour
1981. Oxford:
Blackwell. 144–167.
36.
Vaughan L,
Thelwall M. Search engine coverage bias:
evidence and possible causes. Inform Process Manag
2004; 40(4): 693-707.
37.
Williams D. Virtual Cultivation: Online Worlds, Offline
Perceptions. J Commun 1996; 56(1):
69–87.
38.
Word CO, Zanna MP, Cooper J. The nonverbal media-
tion
of self-fulfilling prophecies in interracial interac-
tion. J Exp Soc Psychol 1974; 10(2): 109–120.
39.
X Tang, K Liu, J Cui, F Wen,
X Wang. IntentSearch:
Capturing User Intention for One-Click Internet Image
Search, IEEE T
Pattern Anal 34(7):
1342-1353.
40.
Zha ZJ, Yang
L, Mei T,
Wang M, Wang Z, Chua
TS,
Hua XS. 2010. Visual query suggestion:
Towards cap-
turing
user intent in internet image search. ACM T Mul-
tim Comput 2010; 6(3):
a13.
... Even simple AI applications (e.g. two-class classifiers) can make inference errors, leading to UX breakdowns and harm [49]. A significant amount of research investigated the challenges of designing human-AI interaction, such as intelligibility [1], explainability [60,98], user control [62,87], feedback [89], error recovery [41], and setting user expectations [47,53]. ...
Preprint
Full-text available
Artificial intelligence (AI) presents new challenges for the user experience (UX) of products and services. Recently, practitioner-facing resources and design guidelines have become available to ease some of these challenges. However, little research has investigated if and how these guidelines are used, and how they impact practice. In this paper, we investigated how industry practitioners use the People + AI Guidebook. We conducted interviews with 31 practitioners (i.e., designers, product managers) to understand how they use human-AI guidelines when designing AI-enabled products. Our findings revealed that practitioners use the guidebook not only for addressing AI's design challenges, but also for education, cross-functional communication, and for developing internal resources. We uncovered that practitioners desire more support for early phase ideation and problem formulation to avoid AI product failures. We discuss the implications for future resources aiming to help practitioners in designing AI products.
... Even simple AI applications (e.g. two-class classifiers) can make inference errors, leading to UX breakdowns and harm [49]. A significant amount of research investigated the challenges of designing human-AI interaction, such as intelligibility [1], explainability [60,98], user control [62,87], feedback [89], error recovery [41], and setting user expectations [47,53]. ...
Conference Paper
Full-text available
Artificial intelligence (AI) presents new challenges for the user experience (UX) of products and services. Recently, practitioner-facing resources and design guidelines have become available to ease some of these challenges. However, little research has investigated if and how these guidelines are used, and how they impact practice. In this paper, we investigated how industry practitioners use the People + AI Guidebook. We conducted interviews with 31 practitioners (i.e., designers, product managers) to understand how they use human-AI guidelines when designing AI-enabled products. Our findings revealed that practitioners use the guidebook not only for addressing AI's design challenges, but also for education , cross-functional communication, and for developing internal resources. We uncovered that practitioners desire more support for early phase ideation and problem formulation to avoid AI product failures. We discuss the implications for future resources aiming to help practitioners in designing AI products.
... Chouldechova and Roth have shown that decision support systems can inadvertently encode existing human biases and introduce new ones [20]. Another study found that women are stereotyped and systematically underrepresented when the results of a visual search for doctors or nurses are compared to the actual percentage estimated by the US Bureau of Labor and Statistics [21]. The first of the two interesting results from the study is that people are more likely to prefer and rate search results that are consistent with stereotypes. ...
Article
Full-text available
AI fairness is an essential topic as regards its topical and social-societal implications. However, there are many challenges posed by automating AI fairness. Based on the challenges around automating fairness in texts, our study aims to create a new fairness testing paradigm that can gather disparate proposals on fairness on a single platform, test them, and develop the most effective method, thereby contributing to the general orientation on fairness. To ensure and sustain mass participation in solving the fairness problem, gamification elements are used to mobilize individuals’ motivation. In this framework, gamification in the design allows participants to see their progress and compare it with other players. It uses extrinsic motivation elements, i.e., rewarding participants by publicizing their achievements to the masses. The validity of the design is demonstrated through the example scenario. Our design represents a platform for the development of practices on fairness and can be instrumental in making contributions to this issue sustainable. We plan to further realize a plot application of this structure designed with the gamification method in future studies.
... The algorithmic and AI systems give rise to various forms of discrimination: biases regarding gender, age, and race have been the most emphasized, both in "popular" and academic literature (e.g., Kay et al., 2015;Noble, 2018;Stypińska, 2021). The algorithms responsible for selecting the content shown in social media and search engines (Gillespie, 2014) personalize rankings by processing available data, both individual and collective, on user behavior and preferences. ...
Article
Full-text available
This article explores the growing role of traces and footprints in social science, giving an outline of the main concepts and empirical questions discussed in this special issue. Remarkably, the study of traces (unobtrusive, necessarily interpretative, theory-driven and data-fueled at the same time) considers them as unwillingly insightful and strategic materials. However, the difference between information “intentionally transmitted” and information “accidentally leaked” gets fuzzy especially in online communication and social network. In this sense, by interpreting the “digital data deluge” as a social trace deluge, issues of algorithmic environments, users’ participation, datafied biases, and transparency are addressed. Finally, it is argued that traces, data exhaust, and residual information help overcoming standard dichotomies in social science (e.g., human vs. non-human realms, quantitative vs. qualitative, online vs. offline activities), both methodologically and theoretically.
... So far, however, there are no studies looking in detail at how search engines deal with ontologically contested memories. Research dealing with other contested subjects, such as the representation of race in the context of technological innovation (Makhortykh et al., 2021a) or of gender in the context of professional vocation (Kay et al., 2015) or face-ism (Ulloa et al., 2022a), suggests that the possibility of distorted treatment (e.g. by reinforcing stereotypes) of such subjects by search engines is high. In the case of memory wars, such treatment can result in the reiteration of existing misconceptions (e.g. ...
Article
Full-text available
Search engines, such as Google or Yandex, shape social reality by informing their users about current and historical phenomena. However, there is little research on how search engines deal with contested memories, which are subjected to ontological conflicts known as memory wars. In this article, we investigate how search engines circulate information about memory wars related to the Holodomor, a mass famine caused by Soviet repressive politics in Ukraine in 1932–1933. For this aim, we conduct an agent-based audit of four search engines—Bing, DuckDuckGo, Google, and Yandex—and examine how their top search results represent the Holodomor and related memory wars. Our findings demonstrate that search engines prioritize interpretations of the Holodomor aligning with specific sides in the memory wars, thus becoming memory warriors themselves.
Chapter
Full-text available
En la presente investigación vamos a elaborar un listado de los sesgos sexistas habituales que se producen en el uso de algoritmos y en la inteligencia artificial. La finalidad es sistematizar sesgos comunes (para ello aportaremos ejemplos de cada uno de ellos). Los sesgos identificados son: 1.Reproducir la discriminación existente en los datos. 2.Considerar que el hombre es el sujeto universal. 3.Ignorar las variables correlativas. 4.Alimentar el sexismo de las personas usuarias. 5.Cosificación sexual en los algoritmos y la IA. 6.Automatismo. 7.Sexismo en la selección del target. 8.Falta de discernimiento entre usos legítimos y usos ilegítimos de las palabras o imágenes. En las conclusiones reflexionaremos brevemente sobre posibles soluciones jurídicas que permitan evitar la perpetuación de los sesgos sexistas.
Article
An increasing number of decisions regarding the daily lives of human beings are being controlled by artificial intelligence and machine learning (ML) algorithms in spheres ranging from healthcare, transportation, and education to college admissions, recruitment, provision of loans, and many more realms. Since they now touch on many aspects of our lives, it is crucial to develop ML algorithms that are not only accurate but also objective and fair. Recent studies have shown that algorithmic decision making may be inherently prone to unfairness, even when there is no intention for it. This article presents an overview of the main concepts of identifying, measuring, and improving algorithmic fairness when using ML algorithms, focusing primarily on classification tasks. The article begins by discussing the causes of algorithmic bias and unfairness and the common definitions and measures for fairness. Fairness-enhancing mechanisms are then reviewed and divided into pre-process, in-process, and post-process mechanisms. A comprehensive comparison of the mechanisms is then conducted, toward a better understanding of which mechanisms should be used in different scenarios. The article ends by reviewing several emerging research sub-fields of algorithmic fairness, beyond classification.
Article
Este trabajo utiliza, por vez primera, información en formato de texto proveniente de 9,802 entrevistas realizadas entre enero de 2016 y enero de 2021 del Programa Trimestral de Entrevistas a Directivos, empleado en la elaboración del Reporte sobre las Economías Regionales del Banco de México, para estimar índices de sentimiento regionales y nacionales. Estos índices son posteriormente asociados con diferentes indicadores, "suaves" y "duros", de actividad económica. Los resultados muestran corre
Article
Mitigating bias in algorithmic systems is a critical issue drawing attention across communities within the information and computer sciences. Given the complexity of the problem and the involvement of multiple stakeholders—including developers, end users, and third-parties—there is a need to understand the landscape of the sources of bias, and the solutions being proposed to address them, from a broad, cross-domain perspective. This survey provides a “fish-eye view,” examining approaches across four areas of research. The literature describes three steps toward a comprehensive treatment—bias detection, fairness management, and explainability management—and underscores the need to work from within the system as well as from the perspective of stakeholders in the broader context.
Conference Paper
Full-text available
Gender as a soft biometric attribute has been extensively investigated in the domain of computer vision because of its numerous potential application areas. However, studies have shown that gender recognition performance can be hindered by improper alignment of facial images. As a result, previous experiments have adopted face alignment as an important stage in the recognition process, before performing feature extraction. In this paper, the problem of recognizing human gender from unaligned real world faces using single image per individual is investigated. The use of feature descriptor to form shape representation of face images with any arbitrary orientation from the cropped version of Labeled Faces in the Wild (LFW) dataset is proposed. By combining the feature extraction technique with artificial neural network for classification, a recognition rate of 89.3% is attained.
Conference Paper
Full-text available
Gender as a soft biometric attribute has been extensively investigated in the domain of computer vision because of its numerous potential application areas. However, studies have shown that gender recognition performance can be hindered by improper alignment of facial images. As a result, previous experiments have adopted face alignment as an important stage in the recognition process, before performing feature extraction. In this paper, the problem of recognizing human gender from unaligned real world faces using single image per individual is investigated. The use of feature descriptor to form shape representation of face images with any arbitrary orientation from the cropped version of Labeled Faces in the Wild (LFW) dataset is proposed. By combining the feature extraction technique with artificial neural network for classification, a recognition rate of 89.3% is attained.
Chapter
A series of studies by Taylor and Simard (1975) demonstrated that cross-cultural communication can be, in objective terms, as effective as within-group communication. We should ask then, why this is not always the case, and subjectively too. A major part of the answer, we believe, lies in the role played by stereotypes. We therefore consider the nature of stereotypes, their cognitive foundations and consequences, social functions, resistance to change, and relationship to behaviour.
Book
The United States holds the dubious distinction of having the most unequal income distribution of any advanced industrialized nation. While other developed countries face similar challenges from globalization and technological change, none rivals America’s singularly poor record for equitably distributing the benefits and burdens of recent economic shifts. In Categorically Unequal, Douglas Massey weaves together history, political economy, and even neuropsychology to provide a comprehensive explanation of how America’s culture and political system perpetuates inequalities between different segments of the population. Categorically Unequal is striking both for its theoretical originality and for the breadth of topics it covers. Massey argues that social inequalities arise from the universal human tendency to place others into social categories. In America, ethnic minorities, women, and the poor have consistently been the targets of stereotyping, and as a result, they have been exploited and discriminated against throughout the nation’s history. African-Americans continue to face discrimination in markets for jobs, housing, and credit. Meanwhile, the militarization of the U.S.-Mexican border has discouraged Mexican migrants from leaving the United States, creating a pool of exploitable workers who lack the legal rights of citizens. Massey also shows that women’s advances in the labor market have been concentrated among the affluent and well-educated, while low-skilled female workers have been relegated to occupations that offer few chances for earnings mobility. At the same time, as the wages of low-income men have fallen, more working-class women are remaining unmarried and raising children on their own. Even as minorities and women continue to face these obstacles, the progressive legacy of the New Deal has come under frontal assault. The government has passed anti-union legislation, made taxes more regressive, allowed the real value of the federal minimum wage to decline, and drastically cut social welfare spending. As a result, the income gap between the richest and poorest has dramatically widened since 1980. Massey attributes these anti-poor policies in part to the increasing segregation of neighborhoods by income, which has insulated the affluent from the social consequences of poverty, and to the disenfranchisement of the poor, as the population of immigrants, prisoners, and ex-felons swells. America’s unrivaled disparities are not simply the inevitable result of globalization and technological change. As Massey shows, privileged groups have systematically exploited and excluded many of their fellow Americans. By delving into the root causes of inequality in America, Categorically Unequal provides a compelling argument for the creation of a more equitable society.
Article
This review article posits that the scarcity of women at the upper levels of organizations is a consequence of gender bias in evaluations. It is proposed that gender stereotypes and the expectations they produce about both what women are like (descriptive) and how they should behave (prescriptive) can result in devaluation of their performance, denial of credit to them for their successes, or their penalization for being competent. The processes giving rise to these outcomes are explored, and the procedures that are likely to encourage them are identified. Because of gender bias and the way in which it influences evaluations in work settings, it is argued that being competent does not ensure that a woman will advance to the same organizational level as an equivalently performing man.
Article
Source: Democracy Now! JUAN GONZALEZ: When you follow your friends on Facebook or run a search on Google, what information comes up, and what gets left out? That's the subject of a new book by Eli Pariser called The Filter Bubble: What the Internet Is Hiding from You. According to Pariser, the internet is increasingly becoming an echo chamber in which websites tailor information according to the preferences they detect in each viewer. Yahoo! News tracks which articles we read. Zappos registers the type of shoes we wear, we prefer. And Netflix stores data on each movie we select. AMY GOODMAN: The top 50 websites collect an average of 64 bits of personal information each time we visit and then custom-designs their sites to conform to our perceived preferences. While these websites profit from tailoring their advertisements to specific visitors, users pay a big price for living in an information bubble outside of their control. Instead of gaining wide exposure to diverse information, we're subjected to narrow online filters. Eli Pariser is the author of The Filter Bubble: What the Internet Is Hiding from You. He is also the board president and former executive director of the group MoveOn.org. Eli joins us in the New York studio right now after a whirlwind tour through the United States.
Article
This study highlights how the auto-complete search algorithm offered by the search tool Google can produce suggested terms which could be viewed as racist, sexist or homophobic. Google was interrogated by entering different combinations of question words and identity terms such as ‘why are blacks…’ in order to elicit auto-completed questions. Two thousand, six hundred and ninety questions were elicited and then categorised according to the qualities they referenced. Certain identity groups were found to attract particular stereotypes or qualities. For example, Muslims and Jewish people were linked to questions about aspects of their appearance or behaviour, while white people were linked to questions about their sexual attitudes. Gay and black identities appeared to attract higher numbers of questions that were negatively stereotyping. The article concludes by questioning the extent to which such algorithms inadvertently help to perpetuate negative stereotypes.