Content uploaded by Jasper Feine
Author content
All content in this area was uploaded by Jasper Feine on Jan 20, 2020
Content may be subject to copyright.
Content uploaded by Jasper Feine
Author content
All content in this area was uploaded by Jasper Feine on Nov 20, 2019
Content may be subject to copyright.
Gender Bias in Chatbot Design
Jasper Feine
(&)
, Ulrich Gnewuch, Stefan Morana,
and Alexander Maedche
Institute of Information Systems and Marketing (IISM),
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
{jasper.feine,ulrich.gnewuch,stefan.morana,
alexander.maedche}@kit.edu
Abstract. A recent UNESCO report reveals that most popular voice-based
conversational agents are designed to be female. In addition, it outlines the
potentially harmful effects this can have on society. However, the report focuses
primarily on voice-based conversational agents and the analysis did not include
chatbots (i.e., text-based conversational agents). Since chatbots can also be
gendered in their design, we used an automated gender analysis approach to
investigate three gender-specific cues in the design of 1,375 chatbots listed on
the platform chatbots.org. We leveraged two gender APIs to identify the gender
of the name, a face recognition API to identify the gender of the avatar, and a
text mining approach to analyze gender-specific pronouns in the chatbot’s
description. Our results suggest that gender-specific cues are commonly used in
the design of chatbots and that most chatbots are –explicitly or implicitly –
designed to convey a specific gender. More specifically, most of the chatbots
have female names, female-looking avatars, and are described as female chat-
bots. This is particularly evident in three application domains (i.e., branded
conversations, customer service, and sales). Therefore, we find evidence that
there is a tendency to prefer one gender (i.e., female) over another (i.e., male).
Thus, we argue that there is a gender bias in the design of chatbots in the wild.
Based on these findings, we formulate propositions as a starting point for future
discussions and research to mitigate the gender bias in the design of chatbots.
Keywords: Chatbot Gender-specific cue Gender bias Conversational
agent
1 Introduction
Text- and voice-based conversational agents (CAs) have become increasingly popular
in recent years [19]. Many organizations use chatbots (i.e., text-based CAs) in short-
term interactions, such as customer service and content curation [22], as well as in
long-term interactions, such as personal assistants or coaches [21]. Research has found
that chatbots can increase user satisfaction [41], positively influence perceived social
presence [3], and establish long-term relationships with users [7]. Additionally, large
technology companies have successfully deployed voice-based CAs (e.g., Microsoft’s
Cortana, Amazon’s Alexa, and Apple’s Siri) on many devices such as mobile phones,
smart speakers, and computers.
©Springer Nature Switzerland AG 2020
A. Følstad et al. (Eds.): CONVERSATIONS 2019, LNCS 11970, pp. 79–93, 2020.
https://doi.org/10.1007/978-3-030-39540-7_6
Despite the many beneficial aspects of this technology, a recent UNESCO report
[43] from 2019 sheds light on the negative implications of the gendered design of most
commercial voice-based CAs. The report reveals that most voice-based CAs are
designed to be “female exclusively or female by default”[43]. For example, their name
(e.g., Alexa, Cortana, Siri), their voice (i.e., voices of Alexa and Cortana are exclu-
sively female), and how they are advertised (e.g., “Alexa lost her voice”) often cause
female gender associations. This can lead to the manifestation of gender stereotypes.
For example, since users mostly interact with voice-based CAs using short command-
like phrases (e.g., “tell me the weather”), people might deem this form of interaction
style as appropriate when conversing with (female) CAs and potentially even (female)
humans [38]. Consequently, the report highlights the urgent need to change gender
expectations towards CAs before users become accustomed to their default (female)
design [43].
While the UNESCO report provides interesting insights on gender-specific cues in
the design of voice-based CAs and its potential implications, the report does not
include an analysis of chatbots since they are “not always as clearly gendered because
their output is primarily written text, not speech”[43, p. 92]. However, many studies
have shown that the gender of a chatbot can also be manifested without spoken voice
using other social cues such as name tags or avatars [e.g., 2,3,5,9,16,24,32].
Moreover, these studies suggest that gender-specific cues in the chatbot’s design can
have both positive [e.g., 5,24] and negative outcomes [e.g., 2,9].
Therefore, we argue that there is a need to analyze how chatbots –in contrast to
voice-based CAs –are gendered (i.e., through gender-specific cues in their design) and
whether there is evidence of a potential gender bias in the design of chatbots. To the
best of our knowledge, an empirical analysis of gender-specific cues in the design of
chatbots in the wild has not been conducted so far. To address this gap and to com-
plement the findings of the UNESCO report, we investigate the research question of
how gender-specific cues are implemented in the design of chatbots.
To address this question, we analyzed the design of 1,375 chatbots listed on the
platform chatbots.org. In our analysis, we focused on three cues that can indicate a
specific gender, namely the chatbot’s name, avatar, and description. In the following,
we refer to these cues as gender-specific cues. Our findings suggest that there is a
gender bias in the design of chatbots. More specifically, we find evidence that there is a
trend towards female names, female-looking avatars, and descriptions including female
pronouns, particularly in domains such as customer service, sales, and brand repre-
sentation. Overall, our work contributes to the emerging field of designing chatbots for
social good [20] by highlighting a gender bias in the design of chatbots and thus,
complementing the findings of the recent UNESCO report on the design of voice-based
CAs. Subsequently, we derive propositions to provide a starting point for future dis-
cussion and research in order to mitigate this gender bias and pave the way towards a
more gender-equal design of chatbots.
80 J. Feine et al.
2 Related Work
2.1 Gender-Specific Cues of Conversational Agents
CAs are software-based systems designed to interact with humans using natural lan-
guage [14]. This means, users interact with CAs via voice-based or text-based inter-
faces in a similar way as they usually interact with other human beings. Research on
CAs and in particular text-based CAs (i.e., chatbots) has been around for several
decades [e.g., 42]. However, the hype around this technology did not start until 2016
[10]. Due to the major adoption of mobile-messaging platforms (e.g., Facebook
Messenger) and the advances in the field of artificial intelligence (AI) [10], chatbots
became one of the most hyped technologies in recent years in research and practice [3].
Extant research in the context of CAs builds on the Computers-Are-Social-Actors
(CASA) paradigm [33]. The CASA paradigm states that human users perceive com-
puters as social actors and treat them as relevant social entities [32]. Therefore, humans
respond similar to computers as they usually react to other human beings (e.g., say
thank you to a computer). These reactions particularly occur when a computer exhibits
social cues that are similar to cues usually expressed by humans during interpersonal
communication [16]. Since CAs communicate via natural language (i.e., a central
human capability), social reactions towards CAs almost always happen [16]. For
example, humans apply gender stereotypes towards CAs whenever they display
specific social cues such as a male or female name, voice, or avatar (see Table 1for an
overview [18]). In addition, not only rather obvious social cues, such as the avatar,
voice, or name, indicate a belonging to a specific gender, but even movements of an
animated avatar are sufficient to do so [40]. Thus, it “appears that the tendency to
gender stereotype is deeply ingrained in human psychology, extending even to
machines”[34].
Table 1. Exemplary studies investigating the impact of a CA’s gender-specific cues.
Type
of CA
Investigated
Cue
Investigated
gender
User reaction towards gender-specific cue Reference
Voice-
based
Voice Female,
male
Gender impacts competence and perceived
friendliness of CA
[34]
Voice-
based
Avatar Female,
male
A specific gender was not preferred [13]
Voice-
based
Avatar,
voice
Female,
male
Gender impacts the comprehension scores
and impression ratings
[27]
Text-
based
Avatar Female,
male
Gender influences the impact of excuses to
reduce user frustration
[24]
Text-
based
Avatar Ambiguous,
female, male
Gender impacts comfort, confidence, and
enjoyment. Users did not prefer gender
ambiguous CAs
[35]
Voice-
based
Avatar,
voice
Female,
male
Gender impacts perceived power, trust,
expertise, and likability
[37]
(continued)
Gender Bias in Chatbot Design 81
2.2 Gender Bias in the Design of Voice-Based Conversational Agents
Since there is limited research on specific design guidelines for CAs [21,29], major
technology companies actively shape how CAs are designed [21]. However, the design
of the major voice-based CAs (e.g., Cortana, Alexa, Google Assistant, Siri) creates
considerable concerns whether the leadership position of technology companies in the
design of CA is desirable [43]. For example, if users directed sexual insults towards
Siri, she used to answer “I’d blush if I could”(till April 2019) and now answers, “I
don’t know how to respond to that”(since April 2019) [43].
Gender manifestations in the design of CAs also reinforce gender manifestations in
the user perception of CAs. This can have severe implications for everyday interper-
sonal interactions. For example, the fact that most of the female voice-based CAs act as
personal assistants leads to the general user expectation that these types of CAs should
be female [43]. Moreover, it “creates expectations and reinforces assumptions that
women should provide simple, direct and unsophisticated answers to basic questions”
[43 p., 115]. Therefore, such a development reinforces traditional gender stereotypes.
This is in particular harmful, since many children interact with voice-based CAs and
gender stereotypes are primarily instilled at a very young age [9].
Similarly, the active interventions of chatbot engineers into human affairs (e.g.,
establishing a gender bias in the design of chatbots) raises ethical considerations.
Several institutions are warning to avoid the gender-specific development of (interac-
tive) systems. For example, the UNESCO report [5] proposes several recommendations
to prevent digital assistants from perpetuating gender biases. Recently, the European
Union’s High-Level Expert Group on AI defined the guidelines for trustworthy AI and
also highlights the importance of equality, non-discrimination, and solidarity [15].
Myers and Venable [31] propose ethical guidelines for the design of socio-technical
system and also emphasize the importance of empowerment and emancipation for all.
Moreover, several research associations (e.g., AIS, ACM) provide ethical guidelines to
ensure ethical practice by emphasizing the importance of designing for an gender-equal
society [39].
Table 1. (continued)
Type
of CA
Investigated
Cue
Investigated
gender
User reaction towards gender-specific cue Reference
Text-
based
Name,
Avatar
Female,
male,
robotic
Gender impacts the attribution of negative
stereotypes
[9]
Voice-
based
Avatar Female,
male
Gender impacts learning performance
and learning effort
[26]
Text-
based
Avatar Female,
male
Gender impacts learning performance [23]
Text-
based
Avatar Female,
male
Gender impacts the belief in the credibility
of advice and competence of agent
[5]
82 J. Feine et al.
3 Method
To answer our research question, we analyzed three cues in the design of a broad
sample of chatbots. Currently, there are several online platforms that list chatbots, but
there is no central repository. Therefore, we decided to rely on the data provided by
chatbots.org. Chatbots.org is a large online community with 8000 members. Members
can add their chatbots to the repository and provide additional information (e.g., name,
avatar, language, description, application purpose). We selected chatbots.org because it
is one of the longest running chatbot directory services (established 2008) [6] and has
been used in research before [e.g., 25].
For our analysis, we retrieved the data of all chatbots listed on chatbots.org on June
28, 2019 by using a web crawler. This resulted in a data sample consisting of 1,375
chatbots including their name, avatar, description, and other meta-information such as
the application domain (i.e., chatbots.org assigns twelve, not mutually exclusive
application domains to the listed chatbots).
In our analysis, we focused on three cues: the chatbot’s (1) name, (2) avatar, and
(3) description. We selected these cues since several studies revealed that the gender of
the (1) name and the (2) avatar of a chatbot trigger stereotypical gender responses [e.g.,
5,9] and (3) that written text can convey gender attributes and personality traits [4].
Given our large sample size, we decided to automatically extract and analyze the
gender-specific design (female, male, none) of these three cues using available tools
and services. In addition, to validate our automated approach, we randomly selected
and manually coded 100 chatbots.
Our automated gender analysis approach is illustrated in Fig. 1. First, to investigate
the gender of the chatbots’names, we used two online services that identify the gender
of a given name, namely www.gender-api.com (includes the gender of over two million
names) and the node.js package “gender-detection”[36] (includes the gender of over
40,000 names). Only if both services recognized the same gender of one of the 1,375
chatbot names, we included it in the analyses. Second, to investigate the gender of an
avatar, we used Microsoft Azure’s Face API [30] which is able to detect, identify,
analyze, organize, and tag faces in photos and also to extract the gender of a face.
Investigated Cues
Gender Analysis
Methods
1,375 chatb ots listed
on chatbots.org
Avatar Descrip tionName
Micro so ft Azur e‘s f ace
rec o gn itio n AP I
Text mining o f gender
specific pronouns
gender-api.c om and node. js
package “ gender-d etection”
Investigated
Chatbot Sample
Fig. 1. Automated gender analysis approach to investigate gender-specific cues in the design of
chatbots.
Gender Bias in Chatbot Design 83
Therefore, we used this API to analyze 1373 chatbot avatar pictures that we down-
loaded from chatbots.org (two chatbots did not have a picture). Finally, to analyze the
chatbot’s description, we text-mined the description of all retrieved chatbots to identify
gender-specific pronouns that refer to one of the two genders (female: “she”,“her”,
“hers”; male: “he”,“him”,“his”)[4]. After excluding ambiguous descriptions (i.e.,
descriptions including both female and male pronouns), we assigned a gender to the
description of a chatbot. Table 2shows the results of the automated gender analysis
approach for three examples.
To investigate the reliability of our automated gender analysis approach, we
investigated whether there are conflicting results between the three methods (e.g., a
chatbot has a male name and a female avatar). In total, we identified only 15 conflicts in
our result set. Subsequently, we analyzed these conflicts in more detail and manually
coded all conflicting gender-specific cues. Overall, seven of these conflicts were caused
by a wrong gender assignment to an avatar of a chatbot. After analyzing these wrong
assignments, we identified that Microsoft’s face recognition API potentially has
problems to assign the correct gender to cartoon avatars with a low resolution. Another
five conflicts were caused by the text mining approach. In five cases, all pronouns in
the chatbot’s descriptions referred to another person (e.g., the chatbot engineer). Thus,
the pronouns did not refer to the chatbots itself. Finally, two chatbots names were
labeled wrong since the names (i.e., Nima, Charlie) are not clearly gendered and thus,
could have been assigned to both genders.
Table 2. Exemplary results of the automated gender analysis approach.
Name
(Company) Avatar Excerpt of Description (1) Name (2) Avatar (3) De-
scription
SOphiA
(BASF)
SOphiA is an Intranet Interactive
Assistant used internally by BASF
for its worldwide operations. She
answers questions about […].
Female Female Female
Frank
(Verizon)
Frank answers all of your Verizon
customer service support questions. Male Male None
BB
(KLM)
[…]. BB has her own professional,
helpful and friendly character, but
be warned; she can also be a bit
cheeky from time to time. […]
None None Female
Table 3. Comparison of automatic gender analysis approach and manual coding for a
subsample of 100 randomly selected chatbots.
Cues Number of conflicts between automated
and manual coding
Number of not recognized
gender-specific cues
Name 0 25 (15 female: 10 male)
Avatar 0 20 (17 female: 3 male)
Description 0 0
84 J. Feine et al.
To further validate the reliability of the automated gender analysis approach, we
retrieved a random sample of 100 chatbots from the total sample of 1,375 chatbots. The
first and second author manually coded the gender of the name, avatar, and description
of these chatbots. There were no disagreements between both coders. Subsequently, we
compared the results of the manual coding with our automated approach. The com-
parison showed that there were no conflicts between the genders that were identified.
However, as illustrated in Table 3, the manual coding approach resulted in the iden-
tification of more gender-specific names and avatars. Most names that were not rec-
ognized as having a gender were female. Similarity, most of the avatars that were not
recognized were female.
4 Results
In the following, we present the results of our automated analysis of three cues (i.e.,
name, avatar, and description) in our sample of 1,375 chatbots. First, we provide an
overview of the total amount of gendered chatbots before reporting the gender distri-
bution (i.e., female vs. male) of the gender-specific cues, and their distribution
according to the chatbot’s application domain.
In total, we identified the gender of 620 chatbot names (45.09% of all investigated
chatbots), 347 chatbot avatars (25.24%), and 497 chatbot descriptions (36.15%) using
our automated approach. As illustrated in Fig. 2, there are some overlaps between the
cues. Overall, 501 (36.44%) of the chatbots did not have one gender-specific cue. In
addition, we identified 874 chatbots (63.56%) with at least one gender-specific cue (i.e.,
No Gender: 501 chatbots
36.44%
Only Name Gender: 199
chatbots
14.47%
Only Avatar Gender:
65 chatbots
4.72%
Only D e s cript ion G ende r:
141 chatbots
10.25%
Name & Avatar Gender:
113 chatbots
8.22%
Name & Description
Gender: 187 chatbots…
Avatar & Description
Gender: 48 chatbots
3.50%
Name & Avatar & De scription
Gender: 121 chatbots
8.80%
No gend ered cu e: 501 (36.44%)
One gen dered cu e: 405 (29.44% )
Two gen dere d cues : 348 (25.32%)
Thre e g end ered cu es: 121 (08.80%)
Fig. 2. Distribution of gender-specific names, avatars, and descriptions in the investigated
chatbot sample.
Gender Bias in Chatbot Design 85
either a gendered name, avatar, or description). Moreover, 469 chatbots (34.11%) had
at least two gender-specific cues, and 121 chatbots (8.80%) had all three gender-
specific cues (i.e., a gendered name, avatar, and description). Taken together, the results
suggest that the majority of chatbots listed on chatbots.org are gendered in their design.
Next, we identified whether the gender-specific cues are female or male. As shown
in Fig. 3, the large majority of gender- specific names were female (76.94%). The
analyses of avatars and descriptions revealed similar results: 77.56% of the avatars
were classified as female and 67.40% of the descriptions were classified as female.
These results strongly suggest that most chatbots are designed to be female.
Our analysis of gendered chatbots and their application domains revealed that
48.90% of them belong to only three application domains, namely branded conver-
sations, customer service, and sales (see Table 4). Additionally, most domains (8) were
clearly dominated by female names and only three domains by male names. The same
patterns emerged in the analyses of avatars (i.e., all but one domain were dominated by
female avatars) and descriptions (i.e., only four categories were dominated by male
descriptions). Taken together, we conclude that the gender bias is particularly evident
in the design of chatbots for specific application domains such as branded conversa-
tions, customer service, and sales.
Fig. 3. Gender-specific distribution of investigated cues.
86 J. Feine et al.
Table 4. Chatbot application domains listed on chatbots.org and their gender-specific design
(note: application domains are not mutually exclusive).
Application
domain
Description of application
domain as listed on Chatbots.
org
All
chatbots
Gendered
names
Gendered
avatars
Gendered
descriptions
Animals &
aliens
“Speaking, listening and
responding virtual animals,
cartoonlike characters or
creatures from space”
20 Female: 1
Male: 2
Female: 0
Male: 0
Female: 4
Male: 11
Branded
conversations
“Dialogues on behalf of an
organization, on a product or
service”
511 Female:
257
Male: 54
Female:
137
Male: 32
Female:
162
Male: 38
Campaign “Designed for a limited
timeserving a campaign
objective”
61 Female: 9
Male: 11
Female:
13
Male: 6
Female: 4
Male: 8
Customer
service
“To answer questions about
delivered goods or services”
532 Female:
251
Male: 55
Female:
137
Male: 22
Female:
164
Male: 33
Knowledge
management
“To acquire information from
employees through natural
language interaction”
63 Female:
30
Male: 3
Female:
13
Male: 1
Female: 16
Male: 4
Market
research
“Conducting surveys with
consumers through automated
chat”
16 Female: 6
Male: 0
Female: 3
Male: 0
Female: 6
Male: 0
Sales “A conversion of a dialogue
focused on closing the deal”
236 Female:
106
Male: 16
Female:
61
Male: 9
Female: 81
Male: 10
Clone “A virtual version of a real
human being, whether still
alive or a historic person”
40 Female: 3
Male: 14
Female: 5
Male: 20
Female: 2
Male: 11
E-Learning “Human like characters in
virtual reality and augmented
reality with a scripted role”
21 Female: 4
Male: 0
Female: 2
Male: 1
Female: 7
Male: 6
Gaming “Conversational characters in
games or virtual worlds”
14 Female: 5
Male: 2
Female: 0
Male: 1
Female: 5
Male: 5
Proof of
concept
“Demonstrational versions
created by professional
developers on their own
websites”
152 Female:
52
Male: 18
Female:
28
Male: 6
Female: 45
Male: 28
Robot toy “Physical robotic gadgets with
natural language processing
capabilities”
1 Female: 0
Male: 0
Female: 0
Male: 0
Female: 0
Male: 1
Gender Bias in Chatbot Design 87
5 Discussion
In this paper, we show that gender-specific cues are commonly used in the design of
chatbots in the wild and that many chatbots are –explicitly or implicitly –designed to
convey a specific gender. This finding ranges from names and avatars to the textual
descriptions used to introduce them to their users. More specifically, most of the
chatbots have female names, female-looking avatars, and are described as female
chatbots. Thus, we found evidence that there is a tendency to prefer one gender (i.e.,
female) over another (i.e., male). Therefore, we conclude that there is a gender bias in
the design of chatbots. The gender bias is particularly evident in three domains (i.e.,
customer service, branded conversation, and sales).
Our findings do not only mirror the results of the UNESCO report [43] on gender
bias in voice-based CAs, but also support an observation already made in 2006. In their
analysis of genders stereotypes implemented in CAs, De Angeli and Brahnam [2]
conclude that virtual assistants on corporate websites „are often embodied by seductive
and nice looking young girls “(p. 5). Considering the majority of chatbots currently
used in customer service or marketing, one could argue that not much has changed
since then. Although recent studies have raised concerns about ethical issues of gender
stereotyping in chatbot design [e.g., 28], there are no guidelines for a gender-equal
design of chatbots that could support chatbot engineers to diminish gender stereotypes
(at least) in the context of text-based CAs. Since gender-specific cues are often per-
ceived even before interacting with the chatbot, they have a large impact on how users
interact with them [9]. Therefore, discussions between researchers, practitioners, and
users will be highly important to answer relevant questions (e.g., “Should a chatbot
have a specific gender?”,“Is it even possible to avoid gender attributions?”). To
provide a starting point for discussions and suggest avenues for future research, we
formulate four propositions (P) that could help to mitigate the gender bias and pave the
way towards a more gender-equal design of chatbots.
P1: Diverse Composition of Chatbot Development Teams: The technology sector,
their programmers, and also chatbot engineers are often dominated by males (i.e.,
“brogramming”)[19]. Without criticizing the individual chatbot engineer, decision
makers could foster a more gender equal distribution in teams who develop socio-
technical systems that actively intervene in human affairs, such as chatbots. This could
reduce potential gender biases, since women generally tend to produce less gender-
biased language than men [4]. A more diverse team composition is also in line with the
“ACM Code of Ethics and Professional Conduct”which states that “computing pro-
fessionals should foster fair participation of all people”also based on their “gender
identity”[1]. Moreover, chatbot design teams should not solely consist of engineers but
should further include a diverse composition of people from different domains, such as
from linguistics and psychology.
P2: Leverage Tool-Support for Identifying Gender Biases in Chatbot Design:
Comprehensive tool support could help chatbot engineers to avoid potential gender
stereotypes in their development. Since gender stereotypes are often processed (and
88 J. Feine et al.
also implemented) in a unconscious manner [8], active tool support could help chatbot
engineers to avoid their mindless implementation. A similar approach has been pro-
posed in the context of general software evaluation. For example, the method “Gen-
derMag”[11] uses personas and cognitive walkthroughs in order to identify gender
inclusiveness issues in software. Therefore, such an approach could also help chatbot
engineers. While more effort is needed to develop tools that automatically evaluate the
gender inclusiveness of the design of chatbots, first warning mechanism seem to be
easy to implement. For example, chatbot engineers could use the methods described in
this paper, namely gender analysis of chatbot names, avatar analysis using face
recognition, and text mining of descriptions. Additionally, chatbot configuration tools
could support chatbot engineers in making gender-equal design decisions [e.g., 17,18].
P3: Avoid “Female-by-Default”Chatbot Designs: Overall, it does not appear nec-
essary to give a chatbot a default (female) gender. However, it is currently not clear
whether developing non-gendered chatbots or challenging human perceptions of
chatbot gender is the solution. Thus, chatbot engineers and the research community are
still far from resolving those issues, and the community should be open to discussing
them. Nevertheless, chatbot engineers need to actively implement mechanisms to
respond to unsavory user queries in order to avoid the manifestations of gender
stereotypes in the use of chatbots [9]. For example, Apple’s Siri is not encouraging
gender-based insults anymore (e.g., “I’d blush if I could”). Other CAs do not pretend to
have a gender (e.g., if users ask Cortana, “what is your gender?”, Cortana automati-
cally replies, “technically, I’m a cloud of infinitesimal data computation”[43]. How-
ever, further research is needed to investigate user-centered designs and mechanisms to
mitigate and discourage negative stereotyping in the use of chatbots.
P4: Promote Ethical Considerations in Organizations: Although, gender equality is
one of the UN sustainability goals [39], gender-specific cues in the design of CAs are
rarely attracting the attention of governments and international organizations [43].
Therefore, decision makers and engineers need to take the first step and challenge each
chatbot design towards potential gender stereotypes and other ethical considerations.
By actively promoting such considerations, chatbot development teams and other
people engaged in the development process will profit from an increased awareness in
order to build more gender-equal societies. Such endeavors could further complement
the ongoing discussions about gender-equal designs of algorithmic decision systems
and other types of artificial intelligence [e.g., 12]. Finally, such organizational driven
approaches could complement the work of regulators to promote a more gender-equal
chatbot design.
5.1 Limitations and Future Research
There are limitations of this study that need to be addressed. First, our analysis is based
on a limited sample of chatbots. Although we did not differentiate between commercial
and research-based chatbots and did not check if they are still online, we argue that our
sample provides a sufficient base to draw conclusions about gender-specific cues in the
Gender Bias in Chatbot Design 89
design of chatbots. Future research could investigate gender-specific cues of different
chatbots using other samples and data sources such as BotList.co. This would help to
create a broader overview of the gender bias and would enhance our understanding of
the current design of chatbots.
Second, our automated approach for identifying the gender of the chatbot’s name,
avatar, and description might be susceptible to false positives and false negatives. To
address this limitation, we validated our approach by manually analyzing a subsample
of 100 chatbots. Because we did not identify any false positive result, we argue that
gender-specific cues identified by the approach are quite accurate. However, the
manual analysis also revealed that our automated approach did not identify all gender-
specific cues and indicated a few conflicts between the three methods. For example,
Azure’s face recognition API struggled with extracting a gender from low-resolution
cartoon avatars and some pronouns in the description did not refer to the chatbot.
Therefore, we can only interpret the results of the automated gender analysis approach
as a conservative predictor for the amount of gender-specific cues in chatbot sample.
Thus, the true value of gendered chatbots might be much higher. Despite this limita-
tion, we believe that our findings still hold because according to our manual analysis
most of the not recognized gender-specific cues where female.
Third, while our analysis included three important cues, several other cues in the
design of chatbots could be considered that may convey a gender-specific attribution.
Therefore, future research could extend our analysis to other relevant gender-specific
cues [16].
6 Conclusion
In this study, we examined the gender-specific design of three cues in the design of
1,375 chatbots using an automated gender analysis approach. Our results provide
evidence that there is a gender bias in the design of chatbots because most chatbots
were clearly gendered as female (i.e., in terms of their name, avatar, or description).
This bias is particularly evident in three application domains (i.e., branded conversa-
tions, customer service, and sales). Therefore, our study complements the findings of a
recent UNESCO report that identified a gender bias in the design of voice-based CAs
and provides propositions as a starting point for future discussions and research.
References
1. ACM: Code of Ethics and Professional Conduct. https://www.acm.org/code-of-ethics
(2019). Accessed 26 July 2019
2. de Angeli, A., Brahnam, S.: Sex Stereotypes and Conversational Agents (2006)
3. Araujo, T.: Living up to the chatbot hype: the influence of anthropomorphic design cues and
communicative agency framing on conversational agent and company perceptions. Comput.
Hum. Behav. 85, 183–189 (2018). https://doi.org/10.1016/j.chb.2018.03.051
4. Artz, N., Munger, J., Purdy, W.: Gender issues in advertising language. Women Lang. 22(2),
20 (1999)
90 J. Feine et al.
5. Beldad, A., Hegner, S., Hoppen, J.: The effect of virtual sales agent (VSA) gender –product
gender congruence on product advice credibility, trust in VSA and online vendor, and
purchase intention. Comput. Hum. Behav. 60,62–72 (2016). https://doi.org/10.1016/j.chb.
2016.02.046
6. Bhagyashree, R.: A chatbot toolkit for developers: design, develop, and manage
conversational UI (2019). https://hub.packtpub.com/chatbot-toolkit-developers-design-
develop-manage-conversational-ui/. Accessed 22 July 2019
7. Bickmore, T.W., Picard, R.W.: Establishing and maintaining long-term human-computer
relationships. ACM Trans. Comput.-Hum. Interact. 12(2), 293–327 (2005). https://doi.org/
10.1145/1067860.1067867
8. Bohnet, I.: What Works. Harvard University Press (2016)
9. Brahnam, S., de Angeli, A.: Gender affordances of conversational agents. Interact. Comput.
24(3), 139–153 (2012). https://doi.org/10.1016/j.intcom.2012.05.001
10. Brandtzaeg, P.B., Følstad, A.: Chatbots: changing user needs and motivations. Interactions
25(5), 38–43 (2018). https://doi.org/10.1145/3236669
11. Burnett, M., et al.: GenderMag: a method for evaluating software’s gender inclusiveness.
Interact. Comput. 28(6), 760–787 (2016). https://doi.org/10.1093/iwc/iwv046
12. Council of Europe: Discrimination, artificial intelligence, and algorithmic decision-making
(2018). https://rm.coe.int/discrimination-artificial-intelligence-and-algorithmic-decision-
making/1680925d73
13. Cowell, A.J., Stanney, K.M.: Manipulation of non-verbal interaction style and demographic
embodiment to increase anthropomorphic computer character credibility. Int. J. Hum.-
Comput. Stud. 62(2), 281–306 (2005). https://doi.org/10.1016/j.ijhcs.2004.11.008
14. Dale, R.: The return of the chatbots. Nat. Lang. Eng. 22(5), 811–817 (2016). https://doi.org/
10.1017/S1351324916000243
15. EU: Ethics Guidelines for Trustworthy AI (2019). https://ec.europa.eu/futurium/en/ai-
alliance-consultation. Accessed 30 July 2019
16. Feine, J., Gnewuch, U., Morana, S., Maedche, A.: A taxonomy of social cues for
conversational agents. Int. J. Hum.-Comput. Stud. 132, 138–161 (2019). https://doi.org/10.
1016/j.ijhcs.2019.07.009
17. Feine, J., Morana, S., Maedche, A.: Designing a chatbot social cue configuration system. In:
Proceedings of the 40th International Conference on Information Systems (ICIS). AISel,
Munich (2019)
18. Feine, J., Morana, S., Maedche, A.: Leveraging machine-executable descriptive knowledge
in design science research –the case of designing socially-adaptive chatbots. In: Tulu, B.,
Djamasbi, S., Leroy, G. (eds.) DESRIST 2019. LNCS, vol. 11491, pp. 76–91. Springer,
Cham (2019). https://doi.org/10.1007/978-3-030-19504-5_6
19. Følstad, A., Brandtzæg, P.B.: Chatbots and the new world of HCI. Interactions 24(4), 38–42
(2017). https://doi.org/10.1145/3085558
20. Følstad, A., Brandtzaeg, P.B., Feltwell, T., Law, E.L.-C., Tscheligi, M., Luger, E.A.: SIG:
chatbots for social good. In: Extended Abstracts of the 2018 CHI Conference on Human
Factors in Computing Systems, SIG06:1‐SIG06:4. ACM, New York (2018). https://doi.org/
10.1145/3170427.3185372
21. Følstad, A., Skjuve, M., Brandtzaeg, P.: Different chatbots for different purposes: towards a
typology of chatbots to understand interaction design, pp. 145–156 (2019)
22. Gnewuch, U., Morana, S., Maedche, A.: Towards designing cooperative and social
conversational agents for customer service. In: Proceedings of the 38th International
Conference on Information Systems (ICIS). AISel, Seoul (2017)
Gender Bias in Chatbot Design 91
23. Hayashi, Y.: Lexical network analysis on an online explanation task. Effects of affect and
embodiment of a pedagogical agent. IEICE Trans. Inf. Syst. 99(6), 1455–1461 (2016).
https://doi.org/10.1587/transinf.2015CBP0005
24. Hone, K.: Empathic agents to reduce user frustration. The effects of varying agent
characteristics. Interact. Comput. 18(2), 227–245 (2006). https://doi.org/10.1016/j.intcom.
2005.05.003
25. Johannsen, F., Leist, S., Konadl, D., Basche, M., de Hesselle, B.: Comparison of commercial
chatbot solutions for supporting customer interaction. In: Proceedings of the 26th European
Conference on Information Systems (ECIS), Portsmouth, United Kingdom, 23–28 June
2018
26. Kraemer, N.C., Karacora, B., Lucas, G., Dehghani, M., Ruether, G., Gratch, J.: Closing the
gender gap in STEM with friendly male instructors? On the effects of rapport behavior and
gender of a virtual agent in an instructional interaction. Comput. Educ. 99,1–13 (2016).
https://doi.org/10.1016/j.compedu.2016.04.002
27. Louwerse, M.M., Graesser, A.C., Lu, S.L., Mitchell, H.H.: Social cues in animated
conversational agents. Appl. Cogn. Psychol. 19(6), 693–704 (2005). https://doi.org/10.1002/
acp.1117
28. McDonnell, M., Baxter, D.: Chatbots and gender stereotyping. Interact. Comput. 31(2), 116–
121 (2019). https://doi.org/10.1093/iwc/iwz007
29. McTear, M.F.: The rise of the conversational interface: a new kid on the block? In: Quesada,
J.F., Martín Mateos, F.J., López-Soto, T. (eds.) FETLT 2016. LNCS (LNAI), vol. 10341,
pp. 38–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69365-1_3
30. Microsoft: Face recognition API (2019). https://azure.microsoft.com/en-us/services/
cognitive-services/face/. Accessed 22 July 2019
31. Myers, M.D., Venable, J.R.: A set of ethical principles for design science research in
information systems. Inf. Manag. 51(6), 801–809 (2014). https://doi.org/10.1016/j.im.2014.
01.002
32. Nass, C., Moon, Y.: Machines and mindlessness social responses to computers. J. Soc.
Issues 56(1), 81–103 (2000). https://doi.org/10.1111/0022-4537.00153
33. Nass, C., Steuer, J., Tauber, E.R.: Computers are social actors. In: Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, pp. 72–78. ACM, New York
(1994). https://doi.org/10.1145/191666.191703
34. Nass, C., Moon, Y., Green, N.: Are machines gender neutral? Gender-stereotypic responses
to computers with voices. J. Appl. Soc. Pyschol. 27(10), 864–876 (1997). https://doi.org/10.
1111/j.1559-1816.1997.tb00275.x
35. Niculescu, A., Hofs, D., van Dijk, B., Nijholt, A.: How the agent’s gender influence users’
evaluation of a QA system. In: International Conference on User Science and Engineering (i-
USEr) (2010)
36. npmjs: Gender-detection (2019). https://www.npmjs.com/package/gender-detection. Acces-
sed 22 July 2019
37. Nunamaker, J.E., Derrick, D.C., Elkins, A.C., Burgoon, J.K., Patton, M.W.: Embodied
conversational agent-based kiosk for automated interviewing. J. Manag. Inf. Syst. 28(1), 17–
48 (2011). https://doi.org/10.2753/mis0742-1222280102
38. Rosenwald, M.S.: How millions of kids are being shaped by know-it-all voice assistants
(2019). https://www.washingtonpost.com/local/how-millions-of-kids-are-being-shaped-by-
know-it-all-voice-assistants/2017/03/01/c0a644c4-ef1c-11e6-b4ff-ac2cf509efe5_story.html?
noredirect=on&utm_term=.7d67d631bd52. Accessed 16 July 2019
39. United Nations: Sustainability development goals. Goal 5: gender equality (2015). https://
www.sdgfund.org/goal-5-gender-equality. Accessed 30 Oct 2019
92 J. Feine et al.
40. Vala, M., Blanco, G., Paiva, A.: Providing gender to embodied conversational agents. In:
Vilhjálmsson, H.H., Kopp, S., Marsella, S., Thórisson, Kristinn R. (eds.) IVA 2011. LNCS
(LNAI), vol. 6895, pp. 148–154. Springer, Heidelberg (2011). https://doi.org/10.1007/978-
3-642-23974-8_16
41. Verhagen, T., van Nes, J., Feldberg, F., van Dolen, W.: Virtual customer service agents.
Using social presence and personalization to shape online service encounters. J. Comput.-
Mediat. Commun. 19(3), 529–545 (2014). https://doi.org/10.1111/jcc4.12066
42. Weizenbaum, J.: ELIZA - a computer program for the study of natural language
communication between man and machine. Commun. ACM 9(1), 36–45 (1966)
43. West, M., Kraut, R., Chew, H.E.: I’d blush if I could: closing gender divides in digital skills
through education (2019). https://unesdoc.unesco.org/ark:/48223/pf0000367416
Gender Bias in Chatbot Design 93