Conference PaperPDF Available

An Empirical Study of Contributors Diversity and Software Quality in GitHub Projects

Authors:

Abstract and Figures

GitHub is a popular software collaboration site where users from around the world can work on different projects. Development communities on GitHub are diverse in terms of demographic, status and skill level across different projects. Previous works have suggested that diverse communities are more productive but does not take into account the quality of the software that is produced. In this paper, we aim to provide an insight into the quality of software that is available using a quality metrics model suited for open source software on various large projects on GitHub with a high developer diversity. To understand the relation of diversity and quality, we looked at repository data through the GHTorrent dataset and GitHub API. For this study we observed more than 100 releases of 5 different Python projects and between 6800-20000 commits per project were checked against the more than 1300 contributor details. We found that diversity measures are partially positively correlated to software quality. Gender diversity and employment diversity projects a loss in software quality as diversity increases. There is good evidence that increase in location diversity and commits diversity show an increase in software quality.
Content may be subject to copyright.
An Empirical Study of Contributors Diversity and
Software Quality in GitHub Projects
Syafiq Kamarul Azman, Khawla AlDhaheri
Department of Electrical Engineering and Computer Science
Masdar Institute of Science and Technology
Abu Dhabi, United Arab Emirates
Abstract—GitHub is a popular software collaboration site
where users from around the world can work on different
projects. Development communities on GitHub are diverse in
terms of demographic, status and skill level across different
projects. Previous works have suggested that diverse communities
are more productive but does not take into account the quality
of the software that is produced. In this paper, we aim to provide
an insight into the quality of software that is available using a
quality metrics model suited for open source software on various
large projects on GitHub with a high developer diversity. To
understand the relation of diversity and quality, we looked at
repository data through the GHTorrent dataset and GitHub API.
For this study we observed more than 100 releases of 5 different
Python projects and between 6800 - 20000 commits per project
were checked against the more than 1300 contributor details. We
found that diversity measures are partially positively correlated
to software quality. Gender diversity and employment diversity
projects a loss in software quality as diversity increases. There
is good evidence that increase in location diversity and commits
diversity show an increase in software quality.
Keywordsgithub, contributor, diversity, software, quality.
I. INTRODUCTION
GitHub is a popular software collaboration site which hosts
many open source software (OSS) projects. Projects range
from simple programs and scripts to bodies of knowledge
and full-fledged software used in commercial settings. These
projects are developed by many individuals from different
regions and different experience backgrounds. It has to be
taken with a grain of salt that software and projects on GitHub
may not yield the same consistent quality as they progress
through the years with contributors of the projects coming and
going as they please. How can we know that OSS development
communities will continue to make quality software through
the years?
There have been numerous studies in the diversity of
project contributors and how productivity increases with more
geographically and technically diverse contributors on GitHub.
An empirical study about the geographical locations of OSS
contributors on GitHub shows that developers are concentrated
in certain locales and that there is a strong local bias in code
contribution and attention [1]. Productivity, and inherently
stability, of an OSS project is also evident in studies from [2]
and [3] which looked at different statuses and characteristics
of a contributor in GitHub projects. However, not much work
has been done towards understanding the quality of OSS in
connection to the contributor. Studies mention that critical con-
tributions are present in the core members of the team but does
not define clearly the relation between critical contribution and
quality output [3], [4], [5].
The definition for software quality has always been tran-
sient across different standards and viewpoints. Standards
like the ISO/IEC 25010 have been used to narrow down
definitions of software quality. The ISO/IEC 25010 lists eight
main attributes — along with sub-attributes — pertaining
to software quality. Attributes of software quality can also
be measured through metrics defined within the standards
[6]. However, [7] suggests that large collaborative projects
of OSS are incompatible with legacy standards to an extent
and have other attributes which should be taken into account.
Instead of strict adherence to the standards, software quality for
OSS should be measured and evaluated using the SQO-OSS
model which proposes additional attributes like documentation,
developer base, effectiveness and mailing list in place of some
of the main attributes and sub-attributes [7]. This model aids
the evaluation of a software quality based on the diversity
of the software project by considering social aspects of the
development.
In this study, we performed an empirical investigation into
OSS aiming to reveal a correlation between the diversity of
contributors of OSS projects and its measurable qualities. Our
research question is: how does the diversity of contributors
of OSS projects affect the overall quality of the software?
To answer this question we analyzed source code from five
GitHub projects. By measuring attributes of software quality
from the source code and using publicly available data, we in-
ferred the correlation between diversity and quality of GitHub
software projects. The GHTorrent dataset [8] provided data of
public GitHub projects which were significant for the empirical
studies in this paper. We also used the GitHub API to gather
related data which were not directly present in the GHTorrent
dataset.
A. Hypotheses
Hypothesis 1 (H1): A more demographically diverse group
will yield a higher software quality compared to a demograph-
ically similar group
Given the different ethnic and cultural backgrounds, geo-
graphically diverse developers will have different perspectives
in using software and understanding code. Developers could
consider use cases which were not previously though of in that
software project which increases the usability of the software.
Hypothesis 2 (H2): A more technically diverse group will
yield a higher software quality compared to a technically
superior group
It is not uncommon to find veteran developers accidentally
creating loopholes and caveats especially when developing new
features for OSS. However, there is evidence from [4], [5] that
some OSS projects have a group of peripheral developers who
quickly find and patch the bugs. Having a greater mix of core
and peripheral developers could ensure a more robust software.
Sections II and III discusses the background and related
works in this field. Section IV discusses the methodology to
obtain the necessary metrics from the dataset and describes the
statistical analyses that were performed. Section V discusses
our findings from the processed data. Section VI discusses
the threats of validity and limitations of the study and the
conclusion of our findings.
II. BACKGROU ND WO RK
A. Contributor diversity
OSS contributors are generally very diverse. It is trivial to
note that geographical diversity is present within OSS projects
due to the nature of its widespread. A large group of developers
for the Linux kernel are from North America and Europe
with minorities in South America, Africa and Asia [9]. An
empirical study about the geography of open source software
through Github shows that developers are mainly concentrated
in specific regions: mainly North America and Europe, while
a significant minority is present in other different regions [1].
Gender diversity: the measure of male density against female
density within a project ecosystem, is male-heavy within many
GitHub projects [10], [11]. This possibly extends to other
software repository host sites. Contributor age is quite varied
ranging from 16 to 70 years old with an average of around 33
years old [11].
It was also studied that participants in OSS have dif-
ferent means for contributing. The prior works studied in
[11] reported contributions for personal gain, as a form of
education (academic or employment), reputation or just to
reciprocate. Contributors also also different influence factors
towards contributing to OSS.
B. Structures in collaborative teams
Collaborative teams have been studied for years in many
different settings from biology, education and the workplace.
Software development teams have been subject to numerous
studies to understand how large global teams form social
structures [12]. One example that is widely known and studied
is the core and periphery structure. A study on collaborative
teams on the project hosting site SourceForge.com consisted of
1) core developers 2) active developers 3) passive developers
[13]. Another study on collaborative groups on GitHub is
consistent with the hierarchy in [13] but includes an extra
group further out in the fringe: the idle forkers [2]. This
provides information on the diversity of contributor status
within the project.
Others have different ways of categorizing contributors in a
team. A study on OSS reveals three main types of contributor
types: 1) individual contributors 2) non-profit organizations
3) for-profit organizations. Individuals are known to consist of
hobbyists and students in majority. Paid and unpaid employees
of non-profit organizations like Wikipedia or for-profit organi-
zations like Google are also contributors towards OSS projects
[11], [9]. From that, we know that developers with differing
technical skills and motivations also exist in the OSS project
community.
On GitHub.com, contributors are segregated to two groups,
namely: contributors and collaborators Contributors signify
users who are peripheral members of a project aspiring to com-
mit some code into the project whilst collaborators are users
who have commit access to the project’s main development
branch.
C. Measuring software quality
Software quality has been thoroughly studied amongst
academics for the past 40 years. Models used in the ISO/IEC
25010 have come from numerous developments since the mid-
1970s [6], [14]. In traditional, closed software development
groups, software development process may follow a well
regulated model.
However, the open source development process is not
well defined [15]. This means that there are metrics which
become obscure in OSS or metrics which are introduced as a
result of this "more chaotic" process. When studying OSS,
authors have resorted to including and focusing on defect
density as a measure of software quality [16], [17]. Therefore,
there is evidence that measurements of defect density is an
important measure of software quality. Cyclomatic complexity
introduced by McCabe is a widely agreed-upon metric for
software quality and is used in many studies [18].
Another well known measurement of software quality is
the maintainability index used in prior works such as [19],
[20]. The maintainability index can be used to analyze other
characteristics of a software including prediction of bugs and
defect which can increase the overall usability and performance
of a software product [21].
The Open Source Maturity Model (OSMM), Open Busi-
ness Readiness Rating (OpenBRR) and the Qualification and
Selection of Open Source Software (QSOS) are examples of
quality models that adapt well to the new open source envi-
ronment. While the OpenBRR is more business-oriented, the
OSMM and QSOS are considerable but becomes cumbersome
due to the lack of automation in the model. The SQO-OSS
model addresses the shortcomings in present in the other OSS
quality models. The SQO-OSS model draws measurements in
an ordinal scale and is evaluated based on a pre-defined profile
[7].
III. REL ATED WO RK
A. Impacts of demographical diversity
Effects of diversity on global software teams have been
studied only recently and with varying conclusions. A study
on a telecommunications software development company, soft-
ware quality is negatively correlated to national diversity
within a team. However, the negative correlation is only
evident within teams whose team members have not worked
together previously [22].
TABLE I. PROJECT INFORMATION
Project Period (years) Releases Contributors
IPython 2008 - 2015 34 420
boto 2006 - 2015 77 484
SymPy 2000 - 2015 51 386
matplotlib 2003 - 2015 46 396
Buildbot 2005 - 2015 80 324
Diversity in OSS teams have also been actively studied. A
study in gender and tenure diversity was conducted connecting
an increase in productivity with a higher gender and tenure
mix in a GitHub project team. Although this study does not
record any relationship between geographical diversity, there
is a relationship — to an extent — between gender and
tenure which can be further studied [10]. Another study on
geographical diversity of contributors in GitHub showed a
strong local bias in code contribution and attention in certain
regions [1].
Geographically diverse development teams have also been
reported as having little to no effect upon the software quality.
In the case of Windows Vista, the study shows no relationship
between a geographically diverse team and the output quality
of the software [23].
B. Impacts of technical diversity
There is an abundance of works on diversity-quality corre-
lations in closed source development teams. According to [24],
software development teams with diverse technical knowledge,
opinions, and experiences are much more stable and adaptive
and hence perform better. There is evidence in [25] that a
diversity in knowledge and background of contributors leads
to task conflicts but promotes learning within a team. This
ultimately leads to an improvement in software quality output.
The downside of these studies is that the teams are non-
OSS teams and these studies may not generalize to the OSS
ecosystem.
IV. METHODOLOGY
A. Projects
The projects which we looked at are listed in this section.
The reasoning behind the choices for these projects are ex-
plained in the next subsection where our control variables are
listed.
IPython (https://github.com/ipython/ipython) is a popular
interactive python console.
boto (https://github.com/boto/boto) is a Python interface to
Amazon Web Services (AWS).
SymPy (https://github.com/sympy/sympy) is a Python li-
brary for symbolic mathematics.
matplotlib (https://github.com/matplotlib/matplotlib) is a
2D plotting library and toolkit.
Buildbot (https://github.com/buildbot/buildbot) is an open-
source framework for automating software build, test, and
release processes.
B. Control variables
To ensure that we are ultimately inferring quality attributes
from changes in diversity, there are variables that we must
control.
Programming language: The type of programming language
used in a project. Different types of programming languages
have certain impacts on the external quality of the software.
Presence of features like weak typing or functional program-
ming in different programming languages has significant dif-
ference in the overall quality of the output software [17], [16].
To control for programming languages, we look at projects
written primarily in Python which is also the second-most
popular language for GitHub projects. Choosing Python allows
us to cover a larger demographic while controlling for the
programming language used.
Project popularity: The age and widespread of a project.
To ensure an impactful study, we look at relatively long-lived
projects which have been used by many users. Controlling for
project popularity allows the study to generalize on real-life
applications and programs. Projects on GitHub are rated based
on the number of forks and stars. The higher the star values
means that the project has gained recognition on GitHub by
the users of the project. We choose projects which have many
stars and forks as evidence of their popularity.
Project origin: The origin status of the project. Some GitHub
projects are forked form an original repository for purposes
such as personal use and for eventual authoring leading up to
a pull request. This means that different users and companies
will have a copy of a project. We selected the project repos-
itories hosted by the original developers and disregarded the
forked projects.
Team size: The number of contributors in a project team. A
larger team is likely to have better code quality and higher
productivity. More developers can check code submitted by
other developers and fix defects in the software [3], [4]. To
control for team size, we look at projects which have roughly
equal number of contributors.
C. Data collection for demographical diversity
For geographical locations data, we will use the GHTorrent
dataset and the GitHub API. Missing data is one of many
problems with public datasets. To mitigate this, we looked at
the existence of geographical information. If it is available, we
used it, otherwise the contributor’s locations will be assigned
as "Others" as a separate category, assuming that in this
category contain all the users unwilling to share their personal
information and keep more privacy. The location was then
classified further into country as many developers located
themselves in cities of the United States which may cause
an unbalance in diversity calculations.
For gender data, we will collect the data by following the
approach from the literature. The approach uses a combination
of transformations, diminutive resolution and heuristics along
with female/male frequency name lists that are gathered from
thirty different countries. Therefore, country data are very
important and essential for identifying a person’s gender from
their name. Prior work claimed that the precision of the
approach genderComputer is 93% which is adequate for
our study [10].
To obtain the gender of a contributor using
genderComputer we considered the following steps:
1) if a name and location is provided, compute gender using
both values,
2) if only the name is provided, compute gender using the
name value,
3) if no name was provided, compute gender using the first
half of the email string provided,
4) otherwise, compute gender using the login name.
Since the failure rate of the genderComputer is much
higher in the order listed above, we assumed the imputed
values of the gender were generally correct. For genders which
were not computed after the four stages, we assumed the
contributor to be male.
There are various number of diversity measurement meth-
ods which are divided into three groups: 1) Probability-
based measures 2) Logarithm-based measures 3) Rank-based
measures . Each single type of measurement on these groups
is called a diversity index; which is a quantitative measure
that indicates the number of different types, in our case
geographical locations and commit density (explained in next
subsection), in the dataset collected.
Despite the large number of diversity measurement meth-
ods, only few measurements was mostly used for measur-
ing team diversity like Simpson index, or Blau’s index of
heterogeneity [26]. To measure the diversity, we used the
Simpson index (which was mimicked by Blau’s index) shown
in Equation 1:
D= 1
R
P
i=1
ni(ni1)
N(N1)
(1)
where Dis the diversity of the team (values between 0 and 1),
Ris the number of geographical locations, niis the number
of contributors from country i, and Nis the total number
of contributors in a project. Besides diversity index, some
researchers used another method for measuring diversity which
is by calculating percentages [10], [11].
For gender diversity, we only have two possible values
which are: male or female. This means that the diversity set
does not grow and hence the use of the Simpson index yields
non-ideal results for large groups. We can assume equal values
of male and female to be completely diverse for our study
as it minimizes projected errors. In this case we will use the
Shannon index [27] for measuring gender diversity. The value
ranges from 0 to 1, where 0 is a completely uni-gendered group
and 1 is an equally split group. The equation is as follows:
D=
2
X
i=1
pilog2pi(2)
where Dis the diversity of the team and piis the probability
of a gender (p1is male and p2is female).
D. Data collection for technical diversity
One way to measure technical diversity is to look at a
group of contributors academic or employment credentials.
Developers who have a technical university-level degree (de-
grees such as computer science or software engineering) are
likely to perform better than developers who do not have
such certifications. Developers who are also currently or have
worked in a professional setting for a number of years also
have a higher skill level compared to developers who have not
worked in the past. This can be extended to other curricular
activities such as freelancing or personal projects. A mixture
of these credentials in a project would yield a higher technical
diversity.
However, measuring technical skill is difficult since those
credentials are attained over time and are not readily available.
Alternatively, one can infer the technical skill by assuming
that a developer has attained all the current credentials since
the developers started contributing to a project. On GitHub,
contributors can only supply their current employer. Using this
data we can assume that the contributors have been employed
since they started contributing to a project.
A contributor becomes more versed in the code structure
the more commits a contributors makes towards a project.
With more contributions, the contributor should have a good
understanding of the code. For skill diversity we looked at the
number of commits towards the project on which a contributor
is working. This allows us to separate the contributors into
the fringe developers and the core developers. The diversity
measure is made by classifying contributors based on their
commits counts. For each commit count, there will be a certain
number of developers.
For the amount of contributions towards a project, the
number of contributions were grouped into based on the
number of commits (or commits density). This means that
contributors who committed once are in abundance compared
to users who committed several. This ranking classifies each
contributor skill levels. To obtain the diversity index, we use
the Simpson index (see Equation 1) for commits density and
the Shannon index (see Equation 2) for employment diversity.
E. Anonymous Contributors
Upon closer inspection of the data, it was found that
a small percentage of contributors (between 0.6% to 13%)
of contributors in a project are anonymous. The users are
confirmed to have contributed as the SHA digest of their
commit is logged and verified but no further information
is provided. This is possibly due to the repository structure
prior to the project’s migration to GitHub from another code
repository host (such as Google Code).
One method of handling anonymous contributors is to
remove them from the dataset. However, this means that
the commit which was made by that contributor has to be
discarded altogether. Discarding the commit will have con-
founding effects towards the projects which were studied: how
does one modify the software at a certain point in time (and in
the future) such that the commit did not happen? To mitigate
this problem, we assumed the following characteristics for
an anonymous contributor: 1) the contributor is male 2) the
contributor is belongs to the ’Others’ location 3) the contributor
has no working experience
This may introduce some noise to the data but with the
relatively low percentage of anonymous contributors, imputing
these values is not malignant to our study.
F. Measuring Software Quality
Software quality measures are abundant as discussed in
Section II. For this study, we considered cyclomatic complexity
(CC) as a measure of software quality for its well-acceptance
as a valid software quality metric. To ensure that we measure
the the quality of the software correctly, we skipped CC
measurements of files which are not directly related to the
core functionality of the software. This included:
configuration files,
Python init files,
test files or any files under a test folder,
readme files,
and documentations.
These files were ignored as they added noise to the CC
measurement by decreasing the project’s overall complexity.
Another software quality metric which was measured was
the maintainability index (MI) of the project. Similar to the
measurements for CC, we considered only the core software
files.
To obtain the metrics, we used an open source software
measurement tool for Python called Radon. Radon has the
capability to exclude files and ignore folders. Radon can also
perform analysis recursively into the project folders which al-
lowed for a thorough search To obtain each release, we cloned
the complete repository of a project from it’s GitHub source
and using the git checkout <release> function, we
were able to return to the state of the project at the provided
release parameters. For each official release of a project, we
performed a Radon analysis of the core files and took the
overall CC value and the average MI values.
G. Statistical analysis
1) Variations in release frequency: As pointed out in
[16], variations in release frequencies in different projects
are inevitable. There will be discrepancies in release versions
over time. Following closely to the methodology of [16], we
performed a linear interpolation of release data based on all
five project releases. Releases are truncated to the date of
release and for projects with no actual release on that date, an
observation is added assuming a linear increment or decrement
based on the prior and posterior values.
We excluded development, alpha, beta and candidate re-
leases and only focus on official releases. This minimizes
the bias in data in contrast to aligning all releases to the
project with the longest development history when performing
a time series analysis. We performed a Kolmogorov-Smirnov
normality test to ensure that noise is not introduced into the
generated data points.
2) Hypothesis testing: The measurement which was taken
were dependent on time. To measure the correlation of each of
the diversity indices against the quality metrics, we performed
a time-series analysis using linear regression on the diversity
values against the software quality metrics. We considered
the hypothesis as acceptable if the corresponding p-value is
less than 0.01. We also take into account the r-value of the
linear regression which provides the correlation value between
1and 1. Higher r-values mean a stronger positive corre-
lation and lower r-values mean a more negative correlation.
Measurements were taken on all aggregated release dates for
each project. Linear regression was applied onto each measure
of demographical diversity: 1) geographical diversities and
2) gender diversity, and also the technical diversities: 1) em-
ployment status 2) commit numbers against the maintainability
index and cyclomatic complexity.
V. RESULTS
For this study we observed more than 100 releases and
between 6800 - 20000 commits per project were checked
against the more than 1300 contributor details.
While obtaining the results for Boto, we found that the
earliest official release version of the software was version 2.0
and was released on July 13 2011. After further investigation
it was found that the Boto developers have previously hosted
another half of their code on their Google Code repository.
The Google Code repository dates the earliest official release
version (version 0.7) on January 2007. Due to the differences
in version control system employed by Google Code (SVN as
opposed to git) and presence of inconsistencies in data stored
for contributors, we disregarded any measurements for Boto
version prior to version 2.0. Unfortunately this presented a
shallower data of the Boto project but there was still enough
data to perform a statistically significant study for Boto.
A. Maintainability Index
1) Boto: From the results obtained, we can see that the
maintainability index is decreasing steadily throughout time
from 2011 to 2015, its value decreased almost from 71.0 to
68.5. For location diversity, it was increasing unsteadily from
2011 to 2015, which may give an indication of reverse relation-
ship with maintainability index. However, commit diversity is
shown to be increasing at the beginning, from 2011 to 2013,
then it decreased sharply until 2014, afterwards it turn to steady
decrease to 2015. For gender diversity it started increasing
sharply until Nov 2011, then it turn to decrease steadily until
2015, we can see that the values though time was close to
0 which is very low. Employment diversity had a very high
value close to 1 (between 0.992 and 0.997), the employment
diversity was increasing over time from 2011 to 2015, however
it has a sharp decrease between Nov 2013 and May 2014.
2) Buildbot: For Buildbot, we found that the maintainabil-
ity index was decreasing during the first four years, then it
started to increase until 2013 when it began to plateau. It is
shown in Figure 1 that location diversity value started very high
(0.8) and fluctuated between 2007 and 2008 before stabilizing
by 2015. By 2015 the location diversity value was about (0.71)
which was still high diversity value. Similarly, employment
diversity started by fluctuating from 2007 to 2009, then it
remained almost stable with slight increase between 2014 and
2015. For gender diversity, it stared by increasing sharply for a
short time, the value reached 0.2, then back to decreasing until
2015, the value was very low, close to 0 as in boto project.
Commit diversity value started very high, almost 1, then it
decreased sharply to almost (0.87) and remained almost stable
until 2015.
Fig. 1. Plot of location diversity (left y-axis) and maintainability (right y-axis)
index against time for Buildbot
Fig. 2. Plot of employment diversity (left y-axis) and maintainability index
(right y-axis) against time for Matplotlib
3) IPython: Maintainability index in IPython project
started stable from 2009 to 2012, then it has fluctuous increase
towards 2015. We also found that the location diversity had a
high value, between 0.72 and 0.70, and it is clear that it was
decreasing over time from 2009 to 2015. Gender diversity was
fluctuating and slightly increasing over time but its value was
very small as other previous mentioned projects. Employment
diversity slightly increase at the beginning from 2009 to almost
2012, then it started to decrease over time until 2015. Commit
diversity had a very high value, almost between 0.86 and 0.9,
and it was decreasing from 2009 to 2015.
4) Matplotlib: For Matplotlib, we can see from Figure 2
that the maintainability index decreased from 2009 and 2010,
then from 2010 it started increasing until 2011 and remained
stable onwards. We found that all diversity variables did not
change drastically through time and remained stable from 2009
to 2015. The difference between these variables was their
values, in which location and commit diversity had very high
values (about 0.7 and 0.9 respectively), employment diversity
has a value of 0.49, and gender diversity has a very low value.
5) IPython:
6) Sympy: For Sympy we found that maintainability index
decreased over time from 2008 to 2015. From the results, we
found that location diversity was increasing over time, it started
with unstable increase in the first year then it was almost
increasing steadily. Gender diversity, as in previous projects,
was stable throughout time and it has a very low value. For
employment diversity, we can see that it decreased sharply in
the first year and started increasing steadily after that. Commit
diversity is shown to be almost stable throughout time with
slight decreasing starting from 2011 to 2015.
In general, we gained some observations from the results
shown several conclusions. First, we found that gender di-
versity is very low in all five projects, which indicates that
Fig. 3. A plot of gender diversity (left y-axis) and cyclomatic complexity
(right y-axis) against time for Boto
contributors are male dominant which concurs with prior
works. Moreover, we can see from most of location diversity
results that it has a reverse relation with maintainability index.
B. Cyclomatic Complexity
1) Boto: Cyclomatic complexity in boto project is shown
in Figure 3 to be decreasing at May 2012, then it turned
to almost remain stable through time until 2015. Location
diversity started with a high value, of 0.624, and it increased
slightly over time until 2015 with value of 0.633. Gender
diversity appears to have a very low value as previous results
(value between 0.04 and 0.06). Commit diversity increased
slightly from 2011 to 2013, then it started to decrease until
2015, however its value remained high through time. For
employment diversity, we found that the employment diversity
value was very high and close to 1, it was increasing over time
with a slight drawback between Nov 2013 and May 2014.
2) Buildbot: From the results, we can see that cyclomatic
complexity was slightly increasing over time with sharp de-
crease and increase between 2010 and 2011. Location diversity
was fluctuating at the first year, from 2007 to 2008, them
it remained stable up to 2015 with a high value about 0.7.
Gender diversity stared with value of 0.2, which is a low value,
and it continued by decreasing until it reached 0.05 by 2015.
Commit diversity started by decreasing sharply in 2007, then it
remained stable throughout time up to 2015, and it has a very
high value about 0.8. . Employment diversity was increasing
over time until 2015 with a very high value close to 1.
3) IPython: The cyclomatic complexity in IPython project
is shown in Figure 4 to be slightly increasing over time.
Location diversity decreased slightly between 2009 and 2015,
however its value remained high which is almost 0.7. As in
previous results, gender diversity has a very low value over
time with fluctuating between the values 0.10 and 0.16. Com-
mit and employment diversity started almost stable and then
slightly decreasing until 2015. Both commit and employment
diversity had very high values over time.
4) Matplotlib: Cyclomatic complexity for matplotlib
started steady from 2009 to 2012, then it increased and
decreased between 2012 and 2014, afterwards it remained
stable until 2015. We found that all diversity variable, lo-
cation/gender/commit/employment, did not change through
time and remained stable from 2009 to 2015. The difference
between these variables was their values, in which location
and commit diversity had very high values (about 0.7 and 0.9
Fig. 4. A plot of commits diversity (left y-axis) and cyclomatic complexity
(right y-axis) against time for IPython
respectively), employment diversity has a value of 0.49, and
gender diversity has a very low value.
5) Sympy: For sympy project, the cyclomatic complexity
slightly increased and afterwards it remained stable until 2015.
The results show that location diversity was increasing over
time, it started with unstable increase in the first year then it
was almost increasing steadily. Gender diversity, as in previous
projects, was stable throughout time and it has a very low
value. For employment diversity, we can see that it decreased
sharply in the first year and started increasing steadily after
that. Commit diversity is shown to be almost stable throughout
time with slight decreasing starting from 2011 to 2015.
In some projects, the relation between cyclomatic com-
plexity and location diversity was a reverse relationship, as
same as for maintainablity index, however some other projects
show positive relationship between cyclomatic complexity and
location diversity.
C. Demographical diversity
Referring back to Hypothesis 1, we postulated that a more
demographically diverse group will yield better software. For
our first hypothesis test, we performed a linear regression for
the gender diversity values against software quality over the
years. Demographical diversity is measured with the Simpson’s
index on the gender of a contributor in a project and also the
location of the contributor.
1) Gender diversity: For gender diversity, the group is sep-
arated into two: male and female. If a gender is more dominant
in terms of population, then the diversity will decrease. The
more equal the values in each gender, the more diverse. A
diversity value of 1 would be an equal split of male and female.
For each release we calculated the diversity value and
interpolated as necessary. To test across all releases if an
increase in diversity will yield a higher quality software, we
performed a linear regression of the diversity index against the
two software quality metrics: cyclomatic complexity (CC) and
maintainability index (MI). Our test hypotheses are as follows:
H1A
0: As gender diversity increases, the CC remains the
same or increases
H1A
1: As gender diversity increases, the CC decreases
H1B
0: As gender diversity increases, the MI remains the
same or decreases
H1B
1: As gender diversity increases, the MI increases
If H1A
0is true, we expect a positive or zero r-value; and
conversely, if H1A
0is false, we expect a negative r-value. This
TABLE II. LINEAR REGRESSION VALUES FOR GENDER DIVERSITY
AGAINST CYCLOMATIC COMPLEXITY
Application Slope r-value p-value Conclusion
(for H1A
0)
Boto 2.7 0.538 < 0.01 Not rejected
Buildbot -1.9 -0.298 < 0.01 Rejected
IPython 1.1 0.289 < 0.01 Not rejected
Matplotlib 3.2 0.600 < 0.01 Not rejected
Sympy 2.3 0.166 0.07 Not rejected
TABLE III. LINEAR REGRESSION VALUES FOR GENDER DIVERSITY
AGAINST MAINTAINABILITY INDEX
Application Slope r-value p-value Conclusion
(for H1B
0)
Boto 64.0 0.684 < 0.01 Rejected
Buildbot -15.8 -0.074 0.25 Not rejected
IPython 31.6 0.522 < 0.01 Rejected
Matplotlib -69.6 -0.396 < 0.01 Not rejected
Sympy -54.4 -0.276 < 0.01 Not rejected
is because an increase in cyclomatic complexity is worse for
a software. This expectation is reversed for H1B
0and H1B
1
because a higher maintainability index is more ideal. The
results also needs to be paired with a p-value of less than
0.01 for a statistically significant result. Otherwise there not
enough evidence to reject the null hypothesis.
Our results for gender diversity is shown in Tables II and
III. For cyclomatic complexity, we rejected only for Buildbot
because the p-value is than 0.01 meaning an significant result.
We rejected IPython, Matplotlib and Boto since the values are
positive and are all statistically significant. Sympy shows a
positive correlation but is only significant when testing at 10%
significance level. We can conclude that the null hypothesis for
H1Ais not rejected.
For maintainability index, we only accepted the alternative
hypothesis for IPython and Boto. It is worth noting that
the majority of the correlation values are negative. We can
conclude that there is a mixed response the rejection of H1B
since the values of Buildbot are not statistically significant.
In conclusion: we have more evidence that an increase in
gender diversity generally yields poorer cyclomatic complexity
and maintainability values rather than the converse. It is likely
that gender diversity is detrimental towards software quality.
2) Location diversity: For location diversity, the group is
separated into the contributor’s respective country of origin.
Similarly with gender diversity, a large population in a country
yields a lower diversity. Our primary hypothesis is that as more
contributors from different countries contribute to software
development, the software quality will become better. For
location diversity our test hypotheses are as follows:
H1C
0: As location diversity increases, the CC remains the
same or increases
H1C
1: As location diversity increases, the CC decreases
H1D
0: As location diversity increases, the MI remains the
same or decreases
H1D
1: As location diversity increases, the MI increases
Our results for location diversity is shown in Tables IV
and V. For location diversity, there is a much more evidence
TABLE IV. LINEAR REGRESSION VALUES FOR LOCATION DIVERSITY
AGAINST CYCLOMATIC COMPLEXITY
Application Slope r-value p-value Conclusion
(for H1C
0)
Boto -100.4 -0.0602 0.61 Not rejected
Buildbot -55.8 -0.173 0.060 Not rejected
IPython -133.2 -0.809 < 0.01 Rejected
Matplotlib -211.6 -0.314 < 0.01 Rejected
Sympy -97.1 -0.902 < 0.01 Rejected
TABLE V. LINEAR REGRESSION VALUES FOR LOCATION DIVERSITY
AGA INS T MA INTA INA BIL IT Y IND EX
Application Slope r-value p-value Conclusion
(for H1
0)
Boto -4.9 -0.0848 0.46 Not rejected
Buildbot -3.4 -0.341 < 0.01 Not rejected
IPython -6.7 -0.658 < 0.01 Not rejected
Matplotlib 7.7 0.431 < 0.01 Rejected
Sympy 14.2 0.871 < 0.01 Rejected
that with increasing location diversity of contributors, there
is a decrease in cyclomatic complexity. We rejected the null
hypotheses for 3 out of 5 of the projects with Buildbot
considerably true only if we had tested at 10% significance
level. However to be consistent with our measurements, we
only accept when the pvalues are less than 0.01 (i.e. 1%
significance level).
In terms of maintainability index, the location diversity of
contributors have mixed responses with two projects (Buildbot
and IPython) giving significant evidence that there is a decrease
in maintainability index whilst another two (Matlotlib and
Sympy) has evidence of the opposite.
In conclusion: we have more evidence that an increase in
location diversity of contributors yields lower CC while there is
evidence that maintainability is possibly compensated. We can
only partially claim that with more contributors from different
countries contributing to a project, the higher the quality of
the software.
D. Technical diversity
Referring back to Hypothesis 2, we postulated that a more
technically diverse group will yield better software compared
to a technically superior group. The diversity measurements we
considered here are the employment status of the contributor
and the commits density of a project. A technically diverse
group would have more employed contributors and more
contributors committing to most of the code base rather than a
more sparse commit density. Diversity values were measured
with the Shannon index and Simpson index for employment
diversity and commits density respectively.
1) Employment diversity: For employment diversity, the
contributors are separated into a group of contributors who
have working experience and a group of contributors who have
no working experience. This detail is inferred based on the
contributor’s company status on their GitHub profile. Our test
hypotheses are as follows:
H2A
0: As employment diversity increases, the CC remains
the same or increases
Fig. 5. Bubble plot of the location diversity of Buildbot as of the latest
release (larger bubbles imply greater population)
H2A
1: As employment diversity increases, the CC de-
creases
H2B
0: As employment diversity increases, the MI remains
the same or decreases
H2B
1: As employment diversity increases, the MI in-
creases
Our results for employment diversity is shown in Tables
VI and VII. For CC we find that there are mixed responses
for all projects. We can reject the null hypothesis for Boto
and IPython but the highly positive correlation in Buildbot
and Sympy suggests otherwise. For Matplotlib, the significance
level needed was too low to be considered and is hence a
confounding result.
In terms of maintainability index, the employment diversity
of contributors have generally negative responses with only one
project, Buildbot, having enough evidence to reject the null
hypothesis. Matplotlib displays a negative correlation at 10%
significance level which can be considered and an acceptance
of the null hypothesis.
In conclusion: there is no clear correlation between CC
and employment diversity and there is a trend of negative
correlation between MI and employment diversity. We can only
conclude a possible negative relation between employment
diversity and software quality as we do not have enough
evidence to claim positive correlation or negative correlation.
2) Commits diversity: For commits diversity, the contrib-
utors are separated into a group of contributors based on the
number of commits made a contributors. It can be seen in
Figure 6 that there are an abundance of fringe developers
(developers who only commit very few times) and much less
core developers (developers who continue to contribute). Based
on this grouping we performed a linear regression of the
diversity of commits against software quality metrics for each
aggregated release.
TABLE VI. LINEAR REGRESSION VALUES FOR EMPLOYMENT
DIVERSITY AGAINST CYCLOMATIC COMPLEXITY
Application Slope r-value p-value Conclusion
(for H1
0)
Boto -16.7 -0.512 < 0.01 Rejected
Buildbot 5.2 0.604 < 0.01 Not rejected
IPython -27.9 -0.554 < 0.01 Rejected
Matplotlib -0.73 -0.0386 0.7 Not rejected
Sympy 6.5 0.515 < 0.01 Not rejected
TABLE VII. LINEAR REGRESSION VALUES FOR EMPLOYMENT
DIVERSITY AGAINST MAINTAINABILITY INDEX
Application Slope r-value p-value Conclusion
(for H1
0)
Boto -297.2 -0.488 < 0.01 Not rejected
Buildbot 114.1 0.413 < 0.01 Rejected
IPython -461.6 -0.579 < 0.01 Not rejected
Matplotlib -115.4 -0.181 0.07 Not rejected
Sympy -121.5 -0.674 < 0.01 Not rejected
Our primary hypothesis is that with increasing mix of
fringe and core developers, the software quality will also be
of higher quality. Our test hypotheses are as follows:
H2C
0: As commits diversity increases, the CC remains the
same or increases
H2C
1: As commits diversity increases, the CC decreases
H2D
0: As commits diversity increases, the MI remains the
same or decreases
H2D
1: As commits diversity increases, the MI increases
Our results for commits diversity is shown in Tables VIII
and IX. For CC we find that there is a negative correlation
between CC and commits diversity. We accepted the alternative
hypothesis for Buildbot, IPython and Sympy but could not
do the same for Boto and Matplotlib. However, the p-values
for Boto and Matplotlib are low enough when tested at 10%
significance level and can be accepted. However, we will only
consider the alternative hypothesis to be true for Buildbot
IPython and Sympy. We can partially conclude that there is
as more contributors appear in the fringe and stay as core
developers, projects are likely to evolve into higher quality
software.
In terms of maintainability index, there is not enough evi-
dence to claim a positive correlation or a negative correlation.
Of the correlations that are statistically significant, two of
the correlations (Buildbot and IPython) are negative and one
(Sympy) is positive. There are confounding results for the
remaining projects which have very low significance values.
We can only conclude that there is not enough information to
claim a positive or negative correlation.
In conclusion: there is good evidence that there is a
positive correlation between CC and commits diversity but
there is no clear evidence regarding the correlation between
MI and commits diversity and employment diversity. We can
only partially claim that as more contributors appear in the
fringe and continue as core contributors, the software quality
is become higher.
Fig. 6. Bubble plot of the commits diversity of IPython as of the latest
release (labels represent number of commits and larger bubbles represent
greater population)
TABLE VIII. LINEAR REGRESSION VALUES FOR COMMIT DIVERSITY
AGAINST CYCLOMATIC COMPLEXITY
Application Slope r-value p-value Conclusion
(for H1
0)
Boto -1.2 -0.0464 0.070 Not rejected
Buildbot -5.2 -0.567 < 0.01 Rejected
IPython -3.1 -0.600 < 0.01 Rejected
Matplotlib -0.4 -0.0423 0.067 Not rejected
Sympy -12.6 -0.712 < 0.01 Rejected
TABLE IX. LINEAR REGRESSION VALUES FOR COMMIT DIVERSITY
AGAINST MAINTAINABILITY INDEX
Application Slope r-value p-value Conclusion
(for H1
0)
Boto -97.1 -0.0201 0.86 Not rejected
Buildbot -70.8 -0.426 < 0.01 Not rejected
IPython 94.3 -0.710 < 0.01 Not rejected
Matplotlib -182.4 0.129 0.20 Not rejected
Sympy -191.4 0.887 < 0.01 Rejected
VI. EVALUATIO N
A. Threats to Validity
There are several threats to the validity of our study. The
software tools and methods which were used to impute gender,
location and employment data are likely to introduce noise to
the dataset. We considered manually imputing data by search-
ing on the internet for information regarding a contributor but
found that it was an arduous task. Another method which was
considered was to email the contributors for information but
due to the lack of personal information provided by a majority
of contributors on their GitHub profiles, we agreed that this
was not a good option. However, we assumed the modal case
when possible to reduce noise input into the imputed data.
One possible explanation as to the increase in software
quality in certain cases could be due to Linus’ Law which
states that with more contributors to a project, bugs and
defects are likely to diminish because bugs can be caught
and fixed more frequently. To ensure that we were measuring
contributor diversity, we considered multiple facets of diversity
and performed calculations on each diversity measure.
There was also a shortage of concrete data provided by
the GitHub API. The values of employment status cannot be
tracked according to the date and hence we only used the
current employment status of a contributor. Skill diversity is
very difficult to measure, especially over a long period of time.
We believe that by imputing the current employment status as
the contributor’s employment status for all periods of time is
a good start to understanding different ways to measure skill
level.
The location data of the GitHub contributors, in some
cases, are not the contributors native country. It would be
good to collect the contributors country of origin as it provides
a better measurement for diversity rather than location. This
requires more resources of which were not available during
this study.
B. Conclusion
In this paper we investigated an under-studied area of open
source software development which is the effects of contributor
diversity towards overall software quality. We attempted to
generalize the results to typical open source software by choos-
ing popular, widely-used and long-lived projects on GitHub,
one of the major software repository hosts. We considered
cases of software developed in Python and obtained software
quality metrics over several years of development. We exam-
ined a total of more than 2000 contributors roughly equally
distributed across all projects and their respective diversities in
terms of location, gender, employment and commit strength.
We found that diversity measures are partially positively
correlated to software quality. Considering gender diversity,
there is evidence that increases in gender diversity projects a
loss in software quality. There is good evidence that increase
in location diversity and commits diversity show an increase
in software quality. There is not enough data to support any
claims for employment diversity as correlation values are
confounding but leans towards negative impacts on software
quality.
Our work posits the need of extensive study in this area to
better understand the importance of diversity in collaborative
software development. More care and consideration towards
collecting user profile is necessary to attain results which are
concurring. A more complete quality evaluation of software
projects is necessary to better model better software quality.
REFERENCES
[1] Y. Takhteyev and A. Hilts, “Investigating the Geography of Open Source
Software through Github,” Oceania, pp. 1–10, 2010.
[2] N. Matragkas, J. R. Williams, D. S. Kolovos, and R. F. Paige,
“Analysing the ’biodiversity’ of open source ecosystems: the GitHub
case,” Proceedings of the 11th Working Conference on Mining Software
Repositories - MSR 2014, pp. 356–359, 2014. [Online]. Available:
http://dl.acm.org.prox.lib.ncsu.edu/citation.cfm?id=2597073.2597119
[3] B. Vasilescu, V. Filkov, and A. Serebrenik, “Perceptions of diversity on
github: A user survey,” CHASE. IEEE, 2015.
[4] A. Mockus, R. T. Fielding, and J. Herbsleb, “A case study of open
source software development: The apache server,” in Proceedings of
the 22Nd International Conference on Software Engineering, ser. ICSE
’00. New York, NY, USA: ACM, 2000, pp. 263–272. [Online].
Available: http://doi.acm.org/10.1145/337180.337209
[5] A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of
open source software development: Apache and mozilla,ACM Trans.
Softw. Eng. Methodol., vol. 11, no. 3, pp. 309–346, Jul. 2002. [Online].
Available: http://doi.acm.org/10.1145/567793.567795
[6] International Organization For Standardization Iso, “ISO/IEC
25010:2011,” Tech. Rep., 2011. [Online]. Available: http://www.iso.org/
iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733
[7] I. Samoladas, G. Gousios, D. Spinellis, and I. Stamelos, “The SQO-OSS
quality model: Measurement based open source software evaluation,
IFIP Advances in Information and Communication Technology, vol.
275, pp. 237–248, 2008.
[8] G. Gousios, “The {GHT}orrent dataset and tool suite,” in Proceedings
of the 10th Working Conference on Mining Software Repositories,
ser. MSR ’13, May 2013, pp. 233–236. [Online]. Available:
/pub/ghtorrent-dataset- toolsuite.pdf
[9] G. Hertel, S. Niedner, and S. Herrmann, “Motivation of software devel-
opers in open source projects: an internet-based survey of contributors
to the linux kernel,” Research policy, vol. 32, no. 7, pp. 1159–1177,
2003.
[10] B. Vasilescu, D. Posnett, B. Ray, M. G. van den Brand, A. Serebrenik,
P. Devanbu, and V. Filkov, “Gender and tenure diversity in github
teams,” in CHI. ACM, 2015.
[11] D. Ehls and C. Herstatt, “Diversity of participants in open source
projects: Revealing differences within and between software, content,
fun and business communities,” in Proceedings of the Annual Open and
User Innovation Conference, Boston, USA, 2014.
[12] K. Crowston and J. Howison, “Hierarchy and centralization in free and
open source software team communications,” Knowledge, Technology
& Policy, vol. 18, no. 4, pp. 65–85, 2006.
[13] K. Crowston, K. Wei, Q. Li, and J. Howison, “Core and periphery in
free/libre and open source software team communications,” in System
Sciences, 2006. HICSS’06. Proceedings of the 39th Annual Hawaii
International Conference on, vol. 6. IEEE, 2006, pp. 118a–118a.
[14] B. W. Boehm, J. R. Brown, and M. Lipow, “Quantitative evaluation of
software quality,” in Proceedings of the 2nd international conference
on Software engineering. IEEE Computer Society Press, 1976, pp.
592–605.
[15] S. McConnell, “Open-source methodology: Ready for prime time?”
IEEE Software, 1999.
[16] P. Bhattacharya and I. Neamtiu, “Assessing programming language
impact on development and maintenance: A study on c and c++,” in
Software Engineering (ICSE), 2011 33rd International Conference on.
IEEE, 2011, pp. 171–180.
[17] B. Ray, D. Posnett, V. Filkov, and P. T. Devanbu, “A Large Scale Study
of Programming Languages and Code Quality in Github,” pp. 155–165,
2014.
[18] T. J. McCabe, “A complexity measure,” Software Engineering, IEEE
Transactions on, no. 4, pp. 308–320, 1976.
[19] S. Mitchell, M. Oâ ˘
A´
ZSullivan, and I. Dunning, “Pulp: a linear pro-
gramming toolkit for python,” Sep-2011, 2011.
[20] G. Farah, J. S. Tejada, and D. Correal, “Openhub: a scalable architecture
for the analysis of software quality attributes,” in Proceedings of the
11th Working Conference on Mining Software Repositories. ACM,
2014, pp. 420–423.
[21] D. Coleman, D. Ash, B. Lowther, and P. Oman, “Using metrics to
evaluate software system maintainability,” Computer, vol. 27, no. 8,
pp. 44–49, 1994.
[22] I. Alfaro and R. Chandrasekaran, “Software quality and development
speed in global software development teams,Business & Information
Systems Engineering, vol. 57, no. 2, pp. 91–102, 2015.
[23] C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy, “Does
distributed development affect software quality?: an empirical case
study of windows vista,Communications of the ACM, vol. 52, no. 8,
pp. 85–93, 2009.
[24] D.-N. Chen, Y.-J. Shie, and T.-P. Liang, “The impact of knowledge
diversity on software project team’s performance,” in Proceedings of the
11th international conference on electronic commerce. ACM, 2009,
pp. 222–230.
[25] T.-P. Liang, J. Jiang, G. S. Klein, and J. Y.-C. Liu, “Software quality
as influenced by informational diversity, task conflict, and learning
in project teams,” Engineering Management, IEEE Transactions on,
vol. 57, no. 3, pp. 477–487, 2010.
[26] D. G. McDonald and J. Dimmick, “The conceptualization and measure-
ment of diversity,” Communication Research, vol. 30, no. 1, pp. 60–79,
2003.
[27] C. E. Shannon, “A mathematical theory of communication,” ACM
SIGMOBILE Mobile Computing and Communications Review, vol. 5,
no. 1, pp. 3–55, 2001.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Existing literature on distributed development in software engineering, and other fields discusses various challenges, including cultural barriers, expertise transfer difficulties, and communication and coordination overhead. Conventional wisdom, in fact, holds that distributed software development is riskier and more challenging than collocated development. We revisit this belief, empirically studying the overall development of Windows Vista and comparing the post-release failures of components that were developed in a distributed fashion with those that were developed by collocated teams. We found a negligible difference in failures. This difference becomes even less significant when controlling for the number of developers working on a binary. Furthermore, we also found that component characteristics (such as code churn, complexity, dependency information, and test code coverage) differ very little between distributed and collocated components. Finally, we examine the software process used during the Vista development cycle and examine how it may have mitigated some of the difficulties of distributed development introduced in prior work in this area.
Conference Paper
Full-text available
Software development is usually a collaborative venture. Open Source Software (OSS) projects are no exception; indeed, by design, the OSS approach can accommodate teams that are more open, geographically distributed, and dynamic than commercial teams. This, we find, leads to OSS teams that are quite diverse. Team diversity, predominantly in offline groups, is known to correlate with team output, mostly with positive effects. How about in OSS? Using GitHub, the largest publicly available collection of OSS projects, we studied how gender and tenure diversity relate to team productivity and turnover. Using regression modeling of GitHub data and the results of a survey, we show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a sizable fraction of the data variability. These results can inform decision making on all levels, leading to better outcomes in recruiting and performance.
Conference Paper
Full-text available
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
Article
Full-text available
This article defines dual-concept diversity as a two-dimensional construct that holds a central place of study in many fields, including communication. The authors present 12 measures of dual-concept diversity appearing in the literature and assess the differential sensitivity of these measures in capturing the two dimensions. After assessing each measure and eliminating measures that are redundant or computationally intractable, the article compares the remaining measures of diversity in a time series of 30 years of network radio programming. Graphic and statistical interrelationships are presented to facilitate comparison and choice between measures in future research.
Article
The paper draws on information processing theory to propose that national diversity creates barriers to the integration of information among members of global software development teams, negatively impacting software quality and development speed. However, the effect of such relationships was expected to be contingent upon the amount of time that team members had worked together in the past (i.e., previous working ties). Hypotheses were tested in a field study involving 62 global software development teams distributed across Europe and Central and South America. Teams with high levels of previous working ties developed greater quality software at a faster pace. National diversity had a positive effect on software quality in teams with high levels of previous working ties, but a negative effect in teams with low levels of previous working ties. National diversity also had a negative impact on software development speed, but the effect was less pronounced among teams with high levels of previous working ties than on teams low in previous working ties.
Article
What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages.
Article
Bell System Technical Journal, also pp. 623-656 (October)
Article
The paper presents an empirical study of the geography of open source software development that looks at Github, a popular project hosting website. We show that developers are highly clustered and concentrated primarily in North America and Western and Northern Europe, though a substantial minority is present in other regions. Code contributions and attention show a strong local bias. Users in North America account for a larger share of received contributions than of contributions made. They also receive a disproportionate amount of attention.