Conference PaperPDF Available

Measuring Diversity of Artificial Intelligence Conferences

Authors:

Abstract and Figures

The lack of diversity of the Artificial Intelligence (AI) field is nowadays a concern, and several initiatives such as funding schemes and mentoring programs have been designed to overcome it. However, there is no indication on how these initiatives actually impact AI diversity in the short and long term. This work studies the concept of diversity in this particular context and proposes a small set of diversity indicators (i.e. indexes) of AI scientific events. These indicators are designed to quantify the diversity of the AI field and monitor its evolution. We consider diversity in terms of gender, geographical location and business (understood as the presence of academia versus industry). We compute these indicators for the different communities of a conference: authors, keynote speakers and organizing committee. From these components we compute a summarized diversity indicator for each AI event. We evaluate the proposed indexes for a set of recent major AI conferences and we discuss their values and limitations.
Content may be subject to copyright.
Proceedings of Machine Learning Research 110, 2021 AAAI Workshop on Diversity in Artificial Intelligence (AIDBEI 2021)
Measuring Diversity of Artificial Intelligence Conferences
Ana Freire ana.freire@upf.edu
Lorenzo Porcaro lorenzo.porcaro@upf.edu
Universitat Pompeu Fabra, Barcelona
Roc Boronat, 138. 08018 Barcelona (Spain)
Emilia G´omez emilia.gomez@upf.edu
Joint Research Centre, European Commission.
Edificio Expo, Calle Inca Garcilaso, 3. 41092 Sevilla (Spain)
Abstract
The lack of diversity of the Artificial
Intelligence (AI) field is nowadays a
concern, and several initiatives such as
funding schemes and mentoring pro-
grams have been designed to overcome
it. However, there is no indication on
how these initiatives actually impact AI
diversity in the short and long term.
This work studies the concept of diver-
sity in this particular context and pro-
poses a small set of diversity indicators
(i.e. indexes) of AI scientific events.
These indicators are designed to quan-
tify the diversity of the AI field and
monitor its evolution. We consider di-
versity in terms of gender, geographi-
cal location and business (understood
as the presence of academia versus in-
dustry). We compute these indicators
for the different communities of a con-
ference: authors, keynote speakers and
organizing committee. From these com-
ponents we compute a summarized di-
versity indicator for each AI event. We
evaluate the proposed indexes for a set
of recent major AI conferences and we
discuss their values and limitations.
Keywords: Diversity, Artificial Intelli-
gence, Diversity Indicators, Gender.
1. Introduction
It is well recognized that Artificial Intelli-
gence (AI) field is facing a diversity crisis,
and that the lack of diversity contributes to
perpetuate historical biases and power im-
balance. Different reports, such as the Eu-
ropean Ethics guidelines for trustworthy AI1
and the last AI Now Institute report (West
et al. (2019)), emphasize the urgency of fight-
ing for diversity and re-considering diversity
in a broader sense, including gender, culture,
origin and other attributes such as discipline
or domain that can contribute to a more di-
verse research and development of AI sys-
tems.
As a consequence, the research commu-
nity has established different initiatives for
increasing diversity such as mentoring pro-
grams, visibility efforts, travel grants, com-
mittee diversity chairs and special work-
shops2. However, there is no mechanism to
measure and monitor the diversity of a sci-
entific community and be able to assess the
impact of these different initiatives and poli-
cies.
In order to address that we propose a
methodology to monitor the diversity of a
scientific community. We focus on scientific
conferences as they are the most relevant
1. https://ec.europa.eu/
digital-single-market/en/news/
ethics-guidelines-trustworthy-ai
2. See, for instance, the activities launched by the
Women in Machine Learning initiative: https:
//wimlworkshop.org
©2021 A. Freire, L. Porcaro & E. G´omez.
Measuring Diversity of Artificial Intelligence Conferences
outcome at the moment for AI research dis-
semination. We consider diversity in terms of
gender, geographical location and academia
vs industry (possibly to extend further) and
incorporate three different aspects of a sci-
entific conference: authors, keynote speakers
and organizers. After a literature review on
diversity, we present the proposed indicators
and illustrate them in a set of impact AI con-
ferences.
2. Background
2.1. The concept and measurement of
diversity
Addressing the problem of conceptualizing
diversity is a long-lasting debate in the aca-
demic community, object of study of several
disciplines such as ecology, geography, psy-
chology, linguistics, sociology, economics and
communication, among the others. The in-
terest in estimating the degree of diversity is
often justified by the relevance of its possible
impact: from the promotion of pluralism and
gender, racial and cultural equality, to the
enhancement of productivity, innovation and
creativity in sociotechnical systems (Stirling
(2007)). Introducing its ubiquity, in a very
broad sense Stirling defines diversity as ”an
attribute of any system whose elements may
be apportioned into categories”.
We can find in the previous definition two
words which reflect different dimensions of
diversity: elements and categories. The lat-
ter is strictly related to the concept of rich-
ness, which can be interpreted as the num-
ber of categories present in a system. The
former instead is connected to the evenness
of a system, i.e. the distribution of elements
across the categories. Richness and evenness
are the two facets of what in the literature is
called dual-concept diversity (McDonald and
Dimmick (2003)). Along with them, dispar-
ity is a third dimension of diversity, describ-
ing the difference between categories (Stir-
ling (2007)).
Nonetheless, even if Stirling’s definition
can be easily generalizable to different con-
texts, it is fundamental to notice that ap-
proaching diversity several interpretations
can be adopted, according to the context of
use. Indeed, to completely abstract from
the social context in which a technology is
implanted when modelling diversity can be
misleading, as Selbst et al. (2018) discuss.
Even if the authors focus on the concept
of fairness, likewise the issues identified can
be found while treating diversity, considering
the several common aspects between these
two values, as pointed out by Celis et al.
(2016). Similarly, Mitchell et al. (2020) dis-
cuss the link between fairness and diversity,
and emphasize the differences between the
idea of heterogeneity related to the variety
of any attribute within a set of instances, in
comparison to diversity intended as variety
with respect to sociopolitical power differen-
tials, such as gender and race. As Drosou
et al. (2017) affirm analyzing diversity in Big
Data applications, diversity can hardly be
universally defined in a unique way.
The issues which arise in the conceptu-
alization of diversity are reflected when at-
tempts are made for establishing a universal
formula to audit the diversity of a system.
However, in several fields different needs have
led to the formulation of different measure-
ments, nowadays still in use and effective.
Following, we refer to a diversity index as
a measure able to quantify the relationship
between elements distributed in categories of
a system.
Two diversity indexes still being widely
used have been proposed at the end of the
1940s: the commonly called Shannon index
(H’) (Shannon (1948)), and the Simpson in-
dex (D) (Simpson (1949)). Even if originally
from two different fields, namely Information
Theory and Ecology, both are based on the
2
Measuring Diversity of Artificial Intelligence Conferences
idea of choice and uncertainty. Indeed, Shan-
non defines his formula wondering what mea-
sure would be suitable to describe the degree
of uncertainty involved in choosing at ran-
dom one event within a set of events. Simi-
larly, Simpson formulated his index measur-
ing the probability of choosing randomly two
individuals from the same group within a
population.
The main limitation of these indexes is
their focus on the analysis of the frequency
of the elements, leaving aside any semantic
information. Bar-Hillel and Carnap (1953)
discuss this limit, considering also the mean-
ing of symbols in contrast to the frequen-
tist approach. This semantic gap of diversity
measurements can be partly solved by the in-
troduction of a third dimension of diversity,
disparity, which joins variety and balance by
creating a more solid framework for diver-
sity analysis. This dimension is present in
the Rao-Stirling diversity index:
∆ = X
i,j i6=j
(dij )α(pi·pj)β(1)
where dij indicates the disparity between
elements i and j, while piand pjthe pro-
portional representations of those elements.
This index initially proposed by Rao (1982),
and revisited by Stirling (2007), is often con-
sidered while analyzing research interdisci-
plinarity in Scientometrics studies, even if its
validity is still being discussed, as recently
done by Leydesdorff et al. (2019).
In the next sections, we focus separately
on the indexes we will use for our diversity
analysis.
2.2. Shannon Index
H0=
S
X
i=1
piln pi(2)
Consider that p=n/N is the proportion
of individuals of one particular species found
ndivided by the total number of individuals
found N, and Sis the number of species.
The Shannon index takes values between
1.5 and 3.5 in most ecological studies, and
the index is rarely greater than 4. This mea-
sure increases as both the richness and the
evenness of the community increase. The
fact that the index incorporates both com-
ponents of biodiversity can be seen as both
a strength and a weakness. It is a strength
because it provides a simple, synthetic sum-
mary, but it is a weakness because it makes it
difficult to compare communities that differ
greatly in richness.
2.2.1. Pielou Index
The Shannon evenness, discarding the rich-
ness, can be computed by means of the
Pielou diversity index (Pielou (1966)):
J0=H0
H0
max
(3)
H0is the Shannon diversity index and
H0
max is the maximum possible value of H0
(if every species was equally likely):
H0
max =
S
X
i=1
1
Sln 1
S= ln S(4)
J0is constrained between 0 and 1, meaning
1 the highest evenness.
2.3. Simpson Index
The Simpson diversity index is a dominance
index because it gives more weight to com-
mon or dominant species. In this case, a few
rare species with only a few representatives
will not affect the diversity. Since Dtakes
values between 0 and 1 and approaches 1 in
the limit of a mono-culture, 1 Dprovides
an intuitive proportional measure of diversity
that is much less sensitive to species richness.
Thus, Simpson’s index is usually reported as
its complement 1 D.
3
Measuring Diversity of Artificial Intelligence Conferences
D=PSn(n1)
N(N1) (5)
3. Diversity indexes for AI
conferences
This work proposes several diversity indi-
cators to measure gender, geographical and
business diversity in top Artificial Intelli-
gence conferences. Gender diversity is the
main focus of programs such as Women in
ML3; Geographic diversity is linked to the
presence of different countries and cultures
in AI research. Finally, academia vs industry
provides a way to assess the type of institu-
tions contributing to AI research. We think
these are three key socio-economic aspects
of AI communities. All our indicators base
their formulation in the biodiversity indexes
described in the previous sections.
3.1. Gender Diversity Index (GDI)
We consider Sdifferent species in the gender
dimension: ”male”, ”female” and any gen-
der identity beyond the binary framework.
We should distinguish between two possible
cases:
Only three species collected (”male”,
”female” and ”other”). In this case,
richness is not so relevant, while even-
ness gains more importance; therefore,
we can discard the Simpson index and
compute the Shannon evenness (we dis-
card richness) by means of the Pielou
diversity index.
More species (gender identities) col-
lected: we can apply the Shannon in-
dex in order to measure the evenness to-
gether with the richness.
To compute the Diversity Index, we con-
sider three different communities, as they
3. https://wimlworkshop.org/
represent complementary contributions to
the scientific event: keynotes (k), authors (a)
and organisers (o). Our final GDI performs
a weighted average among the Pielou index
in each community:
GDI =wkJ0
k+waJ0
a+woJ0
o(6)
As a default, we provide the same weight
to keynotes, authors and organisers, al-
though this can be configured to give more
relevance to certain groups:
W= [wk, wa, wo] = [1
3,1
3,1
3] (7)
3.2. Geographical Diversity Index
(GeoDI)
In order to compute the Geographical Diver-
sity Index we consider the same three com-
munities: keynotes, authors and organisers.
As we have multiple species (countries), we
want to measure the richness together with
the evenness, so we apply the weighted aver-
age of the Shannon Index community (this
index may be greater than 1), using the
weights Wdefined in Equation 7:
GeoDI =wkH0
k+waH0
a+woH0
o(8)
This index could also be computed by us-
ing the Simpson Index, but this would avoid
the effect of very infrequent species (few peo-
ple from some countries).
3.3. Business Diversity Index (BDI)
The Business Diversity Index aims to com-
pute the diversity of a conference regard-
ing the presence of industry, academia and
research centres. Thus, we apply Equa-
tion 3, considering S= 3 when computing
H0
max (similar to GDI considering 3 species).
Weights Ware defined in Equation 7:
BDI =wkJ0
k+waJ0
a+woJ0
o(9)
4
Measuring Diversity of Artificial Intelligence Conferences
3.4. Conference Diversity Index
(CDI)
The general Diversity Index of a Conference
(CDI) is computed by averaging GDI, GeoDI
and BDI. The typical values for the Shannon
index are generally between 1.5 and 3.5 in
most ecological studies, being rarely greater
than 4. Therefore, GeoDI needs to be nor-
malized between [0,1] before being combined
with the other indexes, so we divide it by 3.5.
See Table 1that summarises all the proposed
indexes.
CDI =GDI +GeoDI
3.5+BDI
3(10)
4. Indexes evaluation
In this section we describe the procedures of
handling the data in order to evaluate the
suitability of the proposed indexes to repre-
sent the diversity of major AI events.
4.1. Dataset
The data publicly available from AI confer-
ences is restricted to the name, surname and
affiliation of authors, keynotes and organ-
isers. This leads to some limitations when
computing the proposed indicators. For in-
stance, no data is provided about gender, so
it needs to be inferred based on the name
and surname, which introduces some errors
and oversimplification to binary labels. An-
other limitation affects the way in which
we compute the geographical diversity index.
Having information just about the affiliation
and not about the nationality, makes ethnic-
based analysis extremely difficult to be per-
formed. However, these limitations can be
solved if the conferences’ organisers collect
more data at registration time, for instance.
We are aware that some of this data might
be personal and sensitive information but it
can be beneficial to extract this kind of sta-
tistical analysis, which should ensure strict
privacy and governance rules.
In order to compute the diversity indexes,
we need to measure p=n/N (i.e.: the
proportion of individuals of one particular
species found ndivided by the total num-
ber of individuals found N), and S(i.e.: the
number of species). For this purpose, we col-
lected the names and affiliations of keynotes,
organisers and authors (of a random sam-
ple comprising 10% of the papers) of four
consecutive years of NeurIPS4, RecSys5and
ICML6. The size of each sample is listed in
Table 2. This data was gathered on several
hackfests using a collaborative web applica-
tion7designed for this purpose, that we also
used to engage with AI students and the re-
search community (e.g. AAAI conference)
and outreach on the relevance of the topic
and the need for community efforts. All the
project data and material is available openly
so it can be reproduced and extended to
other conferences. Note that our indicators
are exclusively based on public-domain infor-
mation available at conference proceedings.
4.1.1. Computing GDI
When computing the Gender Diversity In-
dex, our dataset does not provide gender
identity information, so we infer the gender
based on the given first name and surname
(in some cases, we made use of the NamSor
gender classifier library8).
Due to this limitation of the publicly avail-
able datasets for identifying more gender op-
tions, we got S= 2 different species: ”male”
4. NeurIPS: Conference on Neural Information Pro-
cessing Systems
5. RecSys: The ACM Recommender Systems con-
ference
6. ICML: International Conference on Machine
Learning
7. https://divinai.org
8. https://v2.namsor.com/
5
Measuring Diversity of Artificial Intelligence Conferences
Table 1: Diversity Indexes.
Index Notation Based on Range
Gender Diversity Index GDI Pielou/Shannon Index [0,1]/[0,4]
Geographical Diversity Index GeoDI Shannon Index [0,4]
Business Diversity Index BDI Pielou Index [0,1]
Conference Diversity Index CDI - [0,1]
Table 2: Analysed conferences and size of the collected samples per year. Note that the
sample size includes authors, keynotes and organisers.
Conference Acronym Year (Sample Size)
NeurIPS 2017 (343), 2018 (215), 2019 (549), 2020 (851)
RecSys 2017 (41), 2018 (68), 2019 (69), 2020 (70)
ICML 2017 (137), 2018 (264), 2019 (358), 2020 (450)
and ”female” and we used the Pielou diver-
sity index ([0,1]). As we mentioned before,
this fact can be overcame if the dataset in-
cludes more gender options (for instance, if
this data is collected by the organisers of a
conference, using information provided dur-
ing registration).
4.1.2. Computing GeoDI
In order to measure the Geographical Diver-
sity Index, the available information in our
dataset is just the country of the affiliation.
This means that we might not be consider-
ing the nationality, but the current location
of each individual. This limitation could be
avoided by, again, asking for the national-
ity in the registration form and building the
dataset with this information.
4.1.3. Computing BDI
The Business Diversity Index aims to com-
pute the diversity of a conference regarding
the presence of industry, academia and re-
search centres. Once again, the affiliation
gives us this information, although in some
cases some specific web search was needed to
label the dataset. In this case, we set S= 3.
4.2. Results
In this section, we analyse the diversity in-
dexes computed for the set of selected con-
ferences. We structure the analysis in four
different parts, corresponding to the four di-
versity indexes: GDI, GeoDI, BDI and the
general CDI.
Table 3reports the percentage of male
and female among authors, keynotes and or-
ganizers. In general, the values obtained
for GDI are quite high (over 0.50), ex-
cept for GDI(RecSys2017) = 0.35 and
GDI(RecSys2019) = 0.42), as it is penal-
ising the presence not just the low presence
of female authors and organisers but also
the presence of just one gender among the
keynotes (in 2017 all keynotes were ”male”
and in 2019 all keynotes were ”female”). If
we focus on the rest of the conferences, we
observe that there are efforts in the scien-
tific community to balance gender among
the keynotes. However, organisers and au-
thors are mostly ”male” and women do orga-
nize more than authorise. Those conferences
balancing also the gender among organisers
(NeurIPS 2019 and NeurIPS 2020) got the
highest GDI (0.89 and 0.9 respectively).
6
Measuring Diversity of Artificial Intelligence Conferences
Table 3: Gender Diversity Index (GDI), with the percentage of male and female among
authors (from a random sample of 10% of the papers), keynotes and organisers.
.
Conference %Female %Male GDI
Auth Key Org Auth Key Org
NeurIPS 2020 20.01 42.90 47.10 79.09 57.10 52.90 0.90
NeurIPS 2019 16.60 42.90 51.90 83.40 57.10 48.10 0.89
NeurIPS 2018 7.10 42.90 20.90 92.90 57.10 79.10 0.70
NeurIPS 2017 9.45 42.90 21.30 90.05 57.10 78.70 0.73
RecSys 2020 8.46 33.30 23.70 91.50 66.70 76.30 0.69
RecSys 2019 12.70 100 23.10 87.30 0 76.90 0.42
RecSys 2018 14.30 66.70 30.40 85.70 33.30 69.60 0.80
RecSys 2017 15.30 0 13.60 84.70 100 86.40 0.35
ICML 2020 15.50 33.30 37.90 84.50 66.70 62.10 0.84
ICML 2019 11.40 66.70 38.10 88.60 33.30 61.90 0.63
ICML 2018 9.80 50.00 28.90 90.20 50.00 71.10 0.78
ICML 2017 7.70 50.00 29.40 92.30 50.00 70.60 0.76
Table 4: Geographical Diversity Index (GeoDI), with the number of developing countries
and continents represented (we only collected the authors of the 10% of the papers).
Conference # Countries # Continents GeoDI/3.5 GeoDI continents
Auth Key Org Auth Key Org
NeurIPS 2020 28 5 9 5 3 5 0.50 0.55
NeurIPS 2019 5 2 8 3 1 4 0.19 0.16
NeurIPS 2018 15 3 11 4 2 3 0.36 0.34
NeurIPS 2017 19 3 10 4 2 3 0.36 0.36
RecSys 2020 9 2 16 4 2 4 0.48 0.51
RecSys 2019 5 2 8 3 1 3 0.36 0.33
RecSys 2018 9 2 9 3 1 3 0.42 0.31
RecSys 2017 5 2 10 3 2 4 0.37 0.43
ICML 2020 25 2 5 5 2 3 0.33 0.38
ICML 2019 20 2 3 3 2 2 0.30 0.35
ICML 2018 14 2 7 4 2 2 0.34 0.37
ICML 2017 13 3 6 3 2 4 0.41 0.46
7
Measuring Diversity of Artificial Intelligence Conferences
Table 5: Business Diversity Index (BDI), presenting as well the percentage of authors (from
a random sample of 10% of the papers), keynotes and organisers belonging to
Academia, Industry or Research Centres.
Conference %Academia %Industry %Research Centre BDI
Auth Key Org Auth Key Org Auth Key Org
NeurIPS 2020 52.60 57.10 41.20 31.70 14.30 47.10 15.70 28.60 11.80 0.89
NeurIPS 2019 49.10 85.70 44.40 39.50 14.30 40.07 11.40 0 14.80 0.72
NeurIPS 2018 72.30 57.10 59.70 9.22 42.90 31.30 18.40 0 8.96 0.71
NeurIPS 2017 73.10 57.10 63.90 22.20 42.90 29.50 4.73 0 6.56 0.67
RecSys 2020 69.00 66.70 78.90 27.60 33.30 18.40 3.40 0 2.60 0.59
RecSys 2019 40.00 100.00 69.20 55.40 0 23.10 4.62 0 7.69 0.49
RecSys 2018 23.80 33.30 56.50 47.60 66.70 39.10 28.60 0 4.35 0.78
RecSys 2017 70.80 50.00 81.80 28.30 50.00 13.60 0.89 0 4.55 0.58
ICML 2020 48.70 66.70 58.60 39.20 33.30 20.70 12.10 0 20.70 0.68
ICML 2019 43.10 66.70 76.20 42.50 33.30 19.00 14.40 0 4.76 0.70
ICML 2018 66.70 100.00 77.80 27.10 0 17.80 6.19 0 4.44 0.44
ICML 2017 51.80 50.00 88.20 34.20 25.00 11.80 14.00 25.00 0 0.72
The analysis regarding the Geographical
Diversity Index is summarised in Table 4.
As we mentioned before, we normalised it
to the range [0,1] to make it comparable
with the other indexes. As an index based
just on the number of countries might hide a
lack of diversity regarding, for instance, the
presence of researchers from least developed
countries, we also computed the number of
developing countries present (following the
United Nations classification9). We couldn’t
find any representation of any of the coun-
tries included in this list containing 46 coun-
tries. Thus, we also grouped the affiliations
data in order to report the presence of con-
tinents and explore the variability of the in-
dex in considering these major geographical
divisions. The GeoDIcontinents index is com-
puted following the Pielou Index formula (see
Equation 3) and is also listed in Table 4. It
belongs to the range [0,1], so normalization
is not needed.
We can see that, in general, the indexes
computed for the continents are very sim-
9. https://unctad.org/topic/
least-developed-countries/list
ilar to those related to the countries, and
they have a value below 0.5. We consider 7
continents (Africa, Antarctica, Asia, Europe,
North America, South America and Ocea-
nia), in order to avoid hiding lower represen-
tation of Latin American countries. In most
of the conferences explored, there are few
species (usually 3 -North America, Europe
and Asia- and rarely 4 -including Oceania-
). Moreover, these countries are not equally
represented: most of the keynotes come from
North America or Europe and just 5 re-
searchers from Africa were found among au-
thors and organisers. We would also like to
note the importance of the conference loca-
tion for promoting the presence of minorities.
We highlight the case of RecSys 2020, located
in Brazil, with a high representation of or-
ganisers (13 out of 38) and even one keynote
(out of 3) from South America, a continent
with almost no representation in the rest of
the events.
Table 5reports the Business Diversity In-
dex for the studied conferences. Again,
NeurIPS 2020 presents the higher diversity
index (0.89), as it has representation from
8
Measuring Diversity of Artificial Intelligence Conferences
all sectors. On the other side, the lowest in-
dexes are reached by ICML 2018 (0.44) and
RecSys 2019 (0.49) as all the keynotes be-
long to Academia. In general, most of the
conferences are academic (having low repre-
sentation from industry or research centres).
In the case of RecSys 2018 (0.78), this good
index is due the great balance of the different
species even if there is not keynotes belong-
ing to a Research Centre.
We observe that the diversity indexes can
provide more information at a glance than
the other measures reported (percentages,
number of countries...). In fact, the gen-
eral Conference Diversity Index (CDI) aims
to summarize, using just one value, the gen-
der, geographical and business diversity of a
conference. This index also provides a very
useful measure to monitor the diversity evo-
lution of a conference, and easily compare
it with other conferences of the same topic.
Figure 1shows how the different conferences
evolve in terms of the Conference Diversity
Index. Values of CDI over 0.5 means that
the conference achieves acceptable diversity
indexes. For a more detailed information
about this indicator, we should explore each
index separately.
Figure 1: Diversity evolution of the studied
conferences using the general Con-
ference Diversity Index (CDI).
5. Conclusions
This work aims to raise awareness about the
lack of diversity in Artificial Intelligence, by
defining a set of indicators that measure the
gender, geographical and business diversity
of AI conferences. We have explored a set
of recent top AI conferences in order to com-
pute their related indexes and compare them
in terms of diversity. The numbers have
shown a huge gender unbalance among au-
thors, a lack of geographical diversity (with
no representation of least developed coun-
tries and very low representation of African
countries). However, we could show evidence
of the recent efforts done in promoting mi-
norities among keynote speakers, reaching
gender balance in several conferences.
Our proposed formulation for measuring
diversity can be extremely useful for the or-
ganisers of conferences, as they can have ac-
cess to more detailed data that can shed light
on the lack of diversity from different per-
spectives. With this information, organisers
should focus on those indexes under 0.5 and
try to increase them by launching specific
actions for promoting the identified minori-
ties: scholarships or discounts for underrep-
resented collectives, celebrating conferences
in locations with lower representation, more
diversity among keynote speakers and organ-
isers, etc. We would like to note that the
indexes proposed in this work can easily be
applied to conferences of different topics.
As future work, we would like to explore
the viability of studying the ethnic diversity
based on the analysis of the names and sur-
names. Also, we will study how to improve
the indexes definition, focusing on how to
normalize them in a more suitable way, as
the GeoDI index might be a bit penalised
with the current normalisation.
9
Measuring Diversity of Artificial Intelligence Conferences
Acknowledgments
This work has been partially supported by
the HUMAINT programme (Human Be-
haviour and Machine Intelligence), Centre
for Advanced Studies, Joint Research Cen-
tre, European Commission. Authors ac-
knowledge financial support from the Span-
ish Ministry of Economy and Competitive-
ness under the Maria de Maeztu Units of
Excellence Programme (MDM-2015-0502).
Lorenzo Porcaro acknowledges financial sup-
port from the European Commission under
the TROMPA project (H2020 770376).
References
Yehoshua Bar-Hillel and Rudolf Carnap. Se-
mantic Information. The British Journal
for the Philosophy of Science, 4(14):147–
157, 1953.
L. Elisa Celis, Amit Deshpande, Tarun
Kathuria, and Nisheeth K. Vishnoi. How
to be Fair and Diverse? In Fairness, Ac-
countability and Transparency in Machine
Learning (FAT/ML) 2016, 2016.
Marina Drosou, H.V. Jagadish, Evaggelia Pi-
toura, and Julia Stoyanovich. Diversity in
Big Data: A Review. Big Data, 5(2):73–
84, 2017.
Loet Leydesdorff, Caroline S. Wagner, and
Lutz Bornmann. Interdisciplinarity as di-
versity in citation patterns among jour-
nals: Rao-Stirling diversity, relative vari-
ety, and the Gini coefficient. Journal of
Informetrics, 13(1):255–269, 2019.
Daniel G. McDonald and John Dimmick.
The conceptualization and measurement
of diversity. Communication Research, 30
(1):60–79, 2003.
Margaret Mitchell, Dylan Baker, Emily Den-
ton, Ben Hutchinson, Alex Hanna, and
Jamie Morgenstern. Diversity and Inclu-
sion Metrics in Subset Selection. In AIES
’20: Proceedings of the AAAI/ACM Con-
ference on AI, Ethics, and Society, pages
117–123, 2020. ISBN 9781450371100.
Evelyn C Pielou. The measurement of di-
versity in different types of biological col-
lections. Journal of theoretical biology, 13:
131–144, 1966.
C.R. Rao. Diversity: Its measurement, de-
composition, apportionment and analysis.
The Indian Journal of Statistics, 1(44):1–
22, 1982.
Andrew D. Selbst, Danah Boyd, Sorelle A.
Friedler, Suresh Venkatasubramanian, and
Janet Vertesi. Fairness and Abstraction
in Sociotechnical Systems. In ACM Con-
ference on Fairness, Accountability, and
Transparency (FAT*), volume 1, pages 59–
68, 2018.
Claude Elwood Shannon. A mathemati-
cal theory of communication. Bell system
technical journal, 27(3):379–423, 1948.
Edward H Simpson. Measurement of diver-
sity. Nature, 163(4148):688, 1949.
Andy Stirling. A general framework for
analysing diversity in science, technology
and society. Journal of The Royal Society
Interface, 4(15):707–719, 2007.
Sarah Myers West, Meredith Whittaker, and
Kate Crawford. Discriminating systems:
Gender, race and power in ai. AI Now
Institute, 2019.
10
... Beyond ACII, it is well recognized that the Artificial Intelligence (AI) field is facing a diversity crisis, being maledominated and with little representation of people from least developed countries [12]. As a field related to AI [13], Affective Computing is likely to follow the same trend. ...
... It is an open and collaborative initiative promoted by the European Commission to research and develop a set of diversity indicators related to AI developments, with special focus on gender balance, geographical representation, and presence of academia vs companies [12]. These indicators are informative about the visibility of minorities in AI big events, minorities authorship in AI research contributions (conference proceedings), and minorities presence in AI organisation committees. ...
... To monitor diversity in the Affective Computing research community, we collected data from past ACII conferences. We followed and complemented the methodology proposed by the divinAI initiative [12], thus allowing to compare with other related research communities and make the data and results publicly available in the platform. ...
... Beyond ACII, it is well recognized that the Artificial Intelligence (AI) field is facing a diversity crisis, being maledominated and with little representation of people from least developed countries [12]. As a field related to AI [13], Affective Computing is likely to follow the same trend. ...
... It is an open and collaborative initiative promoted by the European Commission to research and develop a set of diversity indicators related to AI developments, with special focus on gender balance, geographical representation, and presence of academia vs companies [12]. These indicators are informative about the visibility of minorities in AI big events, minorities authorship in AI research contributions (conference proceedings), and minorities presence in AI organisation committees. ...
... To monitor diversity in the Affective Computing research community, we collected data from past ACII conferences. We followed and complemented the methodology proposed by the divinAI initiative [12], thus allowing to compare with other related research communities and make the data and results publicly available in the platform. ...
Preprint
Full-text available
ACII is the premier international forum for presenting the latest research on affective computing. In this work, we monitor, quantify and reflect on the diversity in ACII conference across time by computing a set of indexes. We measure diversity in terms of gender, geographic location and academia vs research centres vs industry, and consider three different actors: authors, keynote speakers and organizers. Results raise awareness on the limited diversity in the field, in all studied facets, and compared to other AI conferences. While gender diversity is relatively high, equality is far from being reached. The community is dominated by European, Asian and North American researchers, leading the rest of continents under-represented. There is also a strong absence of companies and research centres focusing on applied research and products. This study fosters discussion in the community on the need for diversity and related challenges in terms of minimizing potential biases of the developed systems to the represented groups. We intend our paper to contribute with a first analysis to consider as a monitoring tool when implementing diversity initiatives. The data collected for this study are publicly released through the European divinAI initiative.
... AI and DS have emerged as promising areas for developing careers with critical financial benefit perspectives. Nevertheless, professional career perspectives in these disciplines are not equal depending on gender [23,24] and other criteria [25], including race/ethnicity, socio-economic level, the institution's reputation where people did their studies, etc. For example, in the United Kingdom, the House of Lords Select Committee on Artificial Intelligence in 2018 advocated increasing gender and ethnic diversity amongst AI developers. ...
Article
Full-text available
This paper discusses the problem of missing datasets for analysing and exhibiting the role of women in STEM with a particular focus on computer science (CS), artificial intelligence (AI) and data science (DS). It discusses the problem in a concrete case of a global south country (i.e., Mexico). Our study aims to point out missing datasets to identify invisible information regarding women and the implications when studying the gender gap in different STEM disciplines. Missing datasets about women in STEM show that the first step to understanding gender imbalance in STEM is building women’s history by “completing” existing datasets.
... This index ranges from 0 to 1, with 0 indicating no diversity or the dominance of one group over the others in a given zone and 1 indicating both richness in the number of groups and evenness in their representation (seeFreire et al., 2020, for a recent application). See lines 407-426 in our replication code (https:// cutt. ...
Article
Full-text available
Federal financial aid policies for higher education may be classified based on their “for-purchase” and “post-purchase” natures. The former include grants, loans, and work-study and intend to help students finance or afford college attendance, persistence, and graduation. Post-purchase policies are designed to minimize financial burdens associated with having invested in college attendance and are granted as tax incentives/expenditures. One of these expenditures is the IRS’s Student Loan Interest Deduction (SLID)—which offers up to $2,500 as an adjustment for taxable income based on having paid interest on student loans and has an annual cost of $12.81 billion—about 45.7% of the Pell grant cost. Despite this high cost, SLID has remained virtually unstudied. Accordingly, the study’s purpose is to assess how (in)effective SLID may be in reaching lower-income taxpayers. To address this purpose, we relied on an innovative analytic framework “multilevel modelling with spatial interaction effects” that allowed controlling for contextual and systemic observed and unobserved factors that may both affect college participation and may be related with SLID disbursements over and above income prospects. Data sources included the IRS, ACS, FBI, IPEDS, and the NPSAS:2015-16. Findings revealed that SLID is regressive at the top, wealthier taxpayers and students attending more expensive colleges realize higher tax benefits than lower income taxpayers and students. Indeed, 75% of community college students were found to not be eligible to receive SLID—data and replication code are provided. Is this the best use of this multibillion tax incentive? Is SLID designed to exclude the poorest, neediest students? A policy similar to Education Credits, focused on outstanding debt rather than on interest, that targets below-poverty line students with up to $5,000 in debt, would represent a true commitment, and better use of public funds, to close socioeconomic gaps, by helping those more prone to default.
... Aspects such as the diversity of the teams engaged in the design and development of MIR systems, the diversity of musical works and their creators, how to diversify tools to help MIR practitioners address cultural differences, and who and how is benefiting from the diversification strategies, and who is not, are part of the challenges described by Born (2020). Those challenges are similarly identified in the broader field of Artificial Intelligence (AI), the parent of concepts such as 'Music Intelligence' and 'AI Music' (Liebman and Stone, 2020), a field in which we are already witnessing a diversity crisis, for instance with regards to the workforce involved in the design of AI systems (West et al., 2019), or the academic community participating in AI conferences (Freire et al., 2021). ...
Article
Full-text available
Music Recommender Systems (Music RS) are nowadays pivotal in shaping the listening experience of people all around the world. Partly driven by the commercial application of this technology, music recommendation research has gained increasing attention both within and outside the Music Information Retrieval (MIR) community. Thanks also to the widespread use of recommender systems in music streaming services, it has been possible to enhance several characteristics of such systems in terms of performance, design, and user experience. Nonetheless, imagining Music RS only from an application-driven perspective may generate an incomplete view of how this technology is affecting people’s habitus, from the decision-making processes to the formation of musical taste and opinions. In this overview, we address the concept of diversity in music recommendation, and taking a value-driven approach we review diversity-related methodologies proposed in the Music RS literature. Additionally, by taking as an example the wider context of Information Technology (IT), we present the elements interacting in the diversity by design paradigm. We do that to acknowledge the lack of a comprehensive framework in Music RS research to address diversity, until now mostly driven by empirical results and fragmented in different application areas. Maintaining an interdisciplinary perspective, we discuss some challenges that MIR practitioners may face when researching Music RS, going beyond the search for better performance and instead questioning the theoretical foundations on which to base future research.
... Specifically in higher education it has previously been used to measure diversity in various academic populations (e.g. Freire, Porcaro, & Gómez, 2020;McLaughlin, McLaughlin, McLaughlin, & White, 2016). SDI measures the probability that two individuals selected at random will share the same characteristic, and can be expressed as D = 1-( n(n-1)/N(N-1)). ...
Article
With the trend toward internationalization of higher education systems across the world, international journals play an important role in disseminating research from a diverse range of national contexts. While studies have continued to show a persistent western hegemony in published scholarship, research has largely focused on the most prestigious journals in the field, and it remains unclear how journals from beyond the most elite contribute to geographic diversity. This study makes a unique contribution to the existing knowledge body, through comparative analysis of the internationality (as a product of editorial boards, published authors, authorship compositions, and study contexts) of higher education journals in both the highest quartile of impact (Q1) and the lowest quartile of impact (Q4). The results show that while some journals may orient themselves as international in scope, in practice they may be more concentrated in particular regions. Although Q-ranking was not found to be a clear indicator of geographic diversity, Q4 journals are statistically more likely to include research and researchers from outside of the core anglophone countries, making an important contribution to the diversity of scholarship beyond the dominating western and English-language discourse.
... Even if these benefits have been proven in several areas, simultaneously major concerns have emerged due to the lack of diversity, and Artificial Intelligence (AI) is a notable example of a field in which we are witnessing a diversity crisis [31,76]. Indeed, AI systems -an umbrella term wherein also Recommender Systems can be included [40] -have already been proven to reinforce hegemonic biases and discrimination in different applications, in fields such as Computer Vision [60] and Natural Language Processing [5]. ...
Preprint
Full-text available
Shared practices to assess the diversity of retrieval system results are still debated in the Information Retrieval community, partly because of the challenges of determining what diversity means in specific scenarios, and of understanding how diversity is perceived by end-users. The field of Music Information Retrieval is not exempt from this issue. Even if fields such as Musicology or Sociology of Music have a long tradition in questioning the representation and the impact of diversity in cultural environments, such knowledge has not been yet embedded into the design and development of music technologies. In this paper, focusing on electronic music, we investigate the characteristics of listeners, artists, and tracks that are influential in the perception of diversity. Specifically, we center our attention on 1) understanding the relationship between perceived diversity and computational methods to measure diversity, and 2) analyzing how listeners' domain knowledge and familiarity influence such perceived diversity. To accomplish this, we design a user-study in which listeners are asked to compare pairs of lists of tracks and artists, and to select the most diverse list from each pair. We compare participants' ratings with results obtained through computational models built using audio tracks' features and artist attributes. We find that such models are generally aligned with participants' choices when most of them agree that one list is more diverse than the other, while they present a mixed behaviour in cases where participants have little agreement. Moreover, we observe how differences in domain knowledge, familiarity, and demographics can influence the level of agreement among listeners, and between listeners and diversity metrics computed automatically.
Chapter
Discrimination and bias are inherent problems of many AI applications, as seen in, for instance, face recognition systems not recognizing dark-skinned women and content moderator tools silencing drag queens online. These outcomes may derive from limited datasets that do not fully represent society as a whole or from the AI scientific community's western-male configuration bias. Although being a pressing issue, understanding how AI systems can replicate and amplify inequalities and injustice among underrepresented communities is still in its infancy in social science and technical communities. This chapter contributes to filling this gap by exploring the research question: what do diversity and inclusion mean in the context of AI? This chapter reviews the literature on diversity and inclusion in AI to unearth the underpinnings of the topic and identify key concepts, research gaps, and evidence sources to inform practice and policymaking in this area. Here, attention is directed to three different levels of the AI development process: the technical, the community, and the target user level. The latter is expanded upon, providing concrete examples of usually overlooked communities in the development of AI, such as women, the LGBTQ+ community, senior citizens, and disabled persons. Sex and gender diversity considerations emerge as the most at risk in AI applications and practices and thus are the focus here. To help mitigate the risks that missing sex and gender considerations in AI could pose for society, this chapter closes with proposing gendering algorithms, more diverse design teams, and more inclusive and explicit guiding policies. Overall, this chapter argues that by integrating diversity and inclusion considerations, AI systems can be created to be more attuned to all-inclusive societal needs, respect fundamental rights, and represent contemporary values in modern societies.
Preprint
Full-text available
DivinAI is an open and collaborative initiative promoted by the European Commission's Joint Research Centre to measure and monitor diversity indicators related to AI conferences, with special focus on gender balance, geographical representation, and presence of academia vs companies. This paper summarizes the main achievements and lessons learnt during the first year of life of the DivinAI project, and proposes a set of recommendations for its further development and maintenance by the AI community.
Article
Full-text available
Questions of definition and measurement continue to constrain a consensus on the measurement of interdisciplinarity. Using Rao-Stirling (RS) Diversity sometimes produces anomalous results. We argue that these unexpected outcomes can be related to the use of "dual-concept diversity" which combines "variety" and "balance" in the definitions (ex ante). We propose to modify RS Diversity into a new indicator (DIV) which operationalizes "variety," "balance," and "disparity" independently and then combines them ex post. "Bal-ance" can be measured using the Gini coefficient. We apply DIV to the aggregated citation patterns of 11,487 journals covered by the Journal Citation Reports 2016 of the Science Citation Index and the Social Sciences Citation Index as an empirical domain and, in more detail, to the citation patterns of 85 journals assigned to the Web-of-Science category "information science & library science" in both the cited and citing directions. We compare the results of the indicators and show that DIV provides improved results in terms of distinguishing between interdisciplinary knowledge integration (citing references) versus knowledge diffusion (cited impact). The new diversity indicator and RS diversity measure different features. A routine for the measurement of the various operationalization of diversity (in any data matrix) is made available online.
Conference Paper
Full-text available
A key goal of the fair-ML community is to develop machine-learning based systems that, once introduced into a social context, can achieve social and legal outcomes such as fairness, justice, and due process. Bedrock concepts in computer science---such as abstraction and modular design---are used to define notions of fairness and discrimination, to produce fairness-aware learning algorithms, and to intervene at different stages of a decision-making pipeline to produce "fair" outcomes. In this paper, however, we contend that these concepts render technical interventions ineffective, inaccurate, and sometimes dangerously misguided when they enter the societal context that surrounds decision-making systems. We outline this mismatch with five "traps" that fair-ML work can fall into even as it attempts to be more context-aware in comparison to traditional data science. We draw on studies of sociotechnical systems in Science and Technology Studies to explain why such traps occur and how to avoid them. Finally, we suggest ways in which technical designers can mitigate the traps through a refocusing of design in terms of process rather than solutions, and by drawing abstraction boundaries to include social actors rather than purely technical ones.
Article
Full-text available
Due to the recent cases of algorithmic bias in data-driven decision-making, machine learning methods are being put under the microscope in order to understand the root cause of these biases and how to correct them. Here, we consider a basic algorithmic task that is central in machine learning: subsampling from a large data set. Subsamples are used both as an end-goal in data summarization (where fairness could either be a legal, political or moral requirement) and to train algorithms (where biases in the samples are often a source of bias in the resulting model). Consequently, there is a growing effort to modify either the subsampling methods or the algorithms themselves in order to ensure fairness. However, in doing so, a question that seems to be overlooked is whether it is possible to produce fair subsamples that are also adequately representative of the feature space of the data set - an important and classic requirement in machine learning. Can diversity and fairness be simultaneously ensured? We start by noting that, in some applications, guaranteeing one does not necessarily guarantee the other, and a new approach is required. Subsequently, we present an algorithmic framework which allows us to produce both fair and diverse samples. Our experimental results on an image summarization task show marked improvements in fairness without compromising feature diversity by much, giving us the best of both the worlds.
Article
Full-text available
This article defines dual-concept diversity as a two-dimensional construct that holds a central place of study in many fields, including communication. The authors present 12 measures of dual-concept diversity appearing in the literature and assess the differential sensitivity of these measures in capturing the two dimensions. After assessing each measure and eliminating measures that are redundant or computationally intractable, the article compares the remaining measures of diversity in a time series of 30 years of network radio programming. Graphic and statistical interrelationships are presented to facilitate comparison and choice between measures in future research.
Article
Information content may be used as a measure of the diversity of a many-species biological collection. The diversity of small collections, all of whose members can be identified and counted, is defined by Brillouin's measure of information. With larger collections it becomes necessary to estimate diversity; what is estimated is Shannon's measure of information which is a function of the population proportions of the several species. Different methods of estimation are appropriate for different types of collections. If the collection can be randomly sampled and the total number of species is known, Basharin's formula may be used. With a random sample from a population containing an unknown number of species, Good's method is sometimes applicable. With a patchy population of sessile organisms, such as a plant community, random samples are unobtainable since the contents of a randomly placed quadrat are not a random sample of the parent population. To estimate the diversity of such a community a method is proposed whereby the sample size is progressively increased by addition of new quadrats; as this is done the diversity of the pooled sample increases and then levels off. The mean increment in total diversity that results from enlarging the sample still more then provides an estimate of the diversity per individual in the whole population.
Article
An abstract is not available.