Content uploaded by Jinghui Cheng
Author content
All content in this area was uploaded by Jinghui Cheng on Apr 01, 2020
Content may be subject to copyright.
Activity-Based Analysis of Open Source Software
Contributors: Roles and Dynamics
Jinghui Cheng
Department of Computer and Software Engineering
Polytechnique Montreal, Montreal, Canada
jinghui.cheng@polymtl.ca
Jin L.C. Guo
School of Computer Science
McGill University, Montreal, Canada
jguo@cs.mcgill.ca
Abstract—Contributors to open source software (OSS) com-
munities assume diverse roles to take different responsibilities.
One major limitation of the current OSS tools and platforms
is that they provide a uniform user interface regardless of
the activities performed by the various types of contributors.
This paper serves as a non-trivial first step towards resolving
this challenge by demonstrating a methodology and establishing
knowledge to understand how the contributors’ roles and their
dynamics, reflected in the activities contributors perform, are
exhibited in OSS communities. Based on an analysis of user
action data from 29 GitHub projects, we extracted six activities
that distinguished four Active roles and five Supporting roles of
OSS contributors, as well as patterns in role changes. Through
the lens of the Activity Theory, these findings provided rich design
guidelines for OSS tools to support diverse contributor roles.
Index Terms—open source software, open source community,
activity-based analysis, contributor roles
I. INTRODUCTION
As a software development model, OSS has experienced
a fast growth during the past decades. The communities
around OSS projects are becoming increasingly heteroge-
neous, comprising not only developers and tech-savvies but
also designers, managers, and users with a wide-ranging level
of experience and expertise. As a result, the ways participants
contribute to the OSS projects also become increasingly di-
verse [7], [12]. However, one major limitation of the current
OSS tools and platforms is that they provide a uniform user
interface regardless of the activities performed by the various
types of contributors interacting with the platform. In other
words, the current OSS tools do not take into enough account
the various roles assumed by the OSS contributors.
This paper serves as a non-trivial first step towards re-
solving this challenge by demonstrating a methodology and
establishing knowledge to understand how the roles and their
dynamics are currently exhibited in OSS communities. In
particular, we focused on examining the roles and their dy-
namics based on the types of activities that OSS contributors
perform. This perspective is inspired by several aspects of
the Activity Theory [24]. Particularly, the Activity Theory,
applied to the field of Human-Computer Interaction, specifies
that information and communication tools need to focus on
mediating human activities, facilitating users to perform a
group of low-level actions and operations in order to achieve
higher-level objectives; additionally, such mediation needs to
be adjustable in an evolving context [20]. To guide our study,
we pose the following research questions:
RQ1: What are the decisive activities that distinguish the roles
assumed by OSS contributors?
RQ2: What are the prominent roles that can be identified
through analyzing a wide range of actions community
contributors perform in a diverse set of projects?
RQ3: How do the roles assumed by the community contrib-
utors in an OSS project change over time?
To answer these questions, we collected and analyzed action
data of 20,838 unique contributors from 29 diverse GitHub
projects and conducted factor and clustering analyses to iden-
tify the prominent activities and roles of OSS contributors.
In the following sections, we first briefly review the related
work (Section II). We then outline our data collection and
analysis methods (Section III). In Section IV, we report our
results on the identified activities and roles, as well as the
role dynamic patterns. We then discuss the implications of
our findings to the design of OSS tools (Section V). Finally,
we provide concluding remarks in Section VI
II. RE LATE D WORK
Our study is related to previous work that focused on users’
roles in information and communication technologies (ICTs)
and studies that investigated the structure of OSS communities.
A. Role-Based Approaches in ICTs
Previous work has explored models and techniques to
identify and support different roles in ICTs of various ap-
plication domains, including collaboration tools [4], access
control systems [5], knowledge co-production platforms [3],
and software engineering tools [1], [27]. For example, Arazy et
al. [3] identified seven roles of Wikipedia contributors, such as
all-round contributors and layout shapers, through a clustering
analysis of user actions in Wikipedia articles.
Our study is most closely related to previous work that
investigated roles in tools and techniques that support software
design and development [23]. Zhu et al. [27] advocated a
complete and consistent role consideration in all aspects of
software engineering and in research about tools through the
lens of Role-Based Software Development (RBSD). Acu˜
na
and Juristo [1] also proposed a model that consists of 20 gen-
eral capabilities crucial in software development and mapped
arXiv:1903.05277v1 [cs.SE] 13 Mar 2019
these capabilities with 20 predetermined roles in software
projects. Leveraging this model, they presented a procedure
for assigning people to roles according to their capabilities.
More recently, researchers investigated role dynamics in
self-organized software development teams. Hoda et al. [13]
conducted Grounded Theory research involving 58 agile prac-
titioners from 23 software organizations to understand the
role dynamics in agile teams. They identified six “infor-
mal, implicit, transient, and spontaneous” roles performed
by practitioners to reinforce the self-organizing nature of
agile practice. These roles include, for example, mentors who
guide and inform the team in using agile methods, translators
who communicate between customers and technical team, and
champions who acquire supports from senior management.
Our study builds upon these work and explores the dynamics
of various activity-based roles in OSS communities.
B. OSS Community Structure
Much work that investigated the OSS community structure
is based on the “onion” model [19], [25]. This model proposed
a layered structure of responsibilities for OSS projects that
included a small number of core members and a larger number
of peripheral developers and bug fixers [19]. Mockus et al.
[17] examined the Apache web server and the Mozilla browser
as case studies and empirically generated several hypotheses
concerning the OSS community structure. These hypotheses
echoed with the “onion” model that a small number of
developers contributes to the majority of the codebase.
Because of the self-organizing nature of OSS communities
[13], researchers have particularly investigated the evolution
of the OSS structure [6], [15], [21]. For example, Cheng et
al. [6] identified several factors that significantly influenced
developers’ evolution into a core member in OSS ecosystems;
such factors included the total number of projects developer
were willing to join and the degree to which the developer’s
peers were closely connected. Joblin et al. [15] also identified
that the OSS communities tended to evolve from a hierarchi-
cal structure to a hybrid one with a greater distribution of
contributions while the number of developers increases.
More closely related to our work, several recent studies
focused on exploring classification methods for OSS com-
munities. Through a clustering analysis on code committing
metrics extracted from ten OSS projects, Di Bella et al.
(2013) identified three major factors and four developer role
groups that fell on the spectrum from core to occasional
rare developers [10]. Agrawal et al. (2016) also adopted a
clustering approach and explored decision tree models to
classify OSS code committers; their developer classes also
ranged from core developers to less engaged developers [2].
A major limitation of these studies is that they only focused
on code committing activities. While we adopted a similar
statistical approach, our study focused on a much wider va-
riety of actions beyond code contribution and identified more
descriptive activity-oriented factors and roles. We also aimed
to extract common roles in a wide range of OSS projects.
In sum, while previous work has demonstrated non-
negligible effort on understanding the roles and their structures
in ICTs and OSS communities, there are seldom explicit
investigations, with the aim of improving tool support in OSS,
on the correlations among actions performed during goal-
driven activities, nor on the dynamics of activity migrations
accompanied by frequent role changes. Therefore, our work
fills the gap by exploring those important aspects with a data-
driven approach and by following up with a detailed discussion
on the implications for OSS tool design.
III. MET HO DS
We analyzed user action data within the last three years
from 29 GitHub projects that exhibit diverse characteristics.
All data was collected in January 2018.
A. Projects Selection
To cover a wide range of OSS communities, we focused
on projects in different application domains. Particularly, we
randomly selected one project in each category in GitHub
“Collections” 1. GitHub “Collections” are curated lists (a total
of 31 lists at the time of our data collection) of recently
active and influential projects and communities. We eliminated
two lists, “Open data” and “Policies”, which focused on non-
software projects. Table I includes the names of the selected
projects. These projects involved a total of 20,838 unique
contributors (including code contributors, issue reporters and
discussion participants, and pull request reporters and discus-
sion participants), 41,275 issues, 73,763 pull requests, and
240,024 commits. The code repositories are comprised of
4,963,540 lines of code in 24,451 files, covering 15 program-
ming languages.
B. Metrics Selection
To effectively assess the participants’ contribution to their
OSS community, we selected metrics gathered from various
aspects. Those metrics describe the detailed actions contrib-
utors take in order to participate on the OSS projects. First,
code contribution metrics include numbers of commits made,
lines of code changed, and files edited, as well as metrics
related to pull requests (PRs) made by contributors. Second,
opinion contribution metrics assess actions associated with
reporting issues and commenting in issue and PR discussions.
Third, network-related metrics include the number of times a
participant was mentioned or referred other issues or PRs in
discussions. Finally, administration metrics measure manage-
rial actions such as managing labels or manipulating issues or
PRs. Those metrics were inspired by several previous works
[2], [10], [14] and are summarized in Table II.
C. Data Collection
We aimed at extracting the necessary metrics from the
repositories of the 29 GitHub projects and focused on the con-
tributor actions within the three-year period between January
1st, 2015 and January 1st, 2018. To collect such a data set, we
1https://github.com/collections/
TABLE I
NAM ES OF GI THU B PROJ EC TS SE LE CTE D TO OU R ST UDY
accessibility-developer-tools better errors hospitalrun-frontend neovim refined-github the silver searcher
adarkroom brew jekyll picongpu SoundManager2 TrueCraft
advocacy.mozilla.org cocos2d-html5 kubernetes primer spine urh
artsy.github.io csslint madison pysc2 superpowers-core utron
basscss guardian/frontend mention-bot railsbridge/docs swipl-devel
TABLE II
ACTION METRICS TO ASSESS OSS PARTICIPANT’S CONTRIBUTION
Type Metric
Code
Contrib.
# of commits made
# of line of code changed in the codebase
# of files worked on
# of pull requests (PRs) made
Avg. length of PR descriptions*
Opinion
Contrib.
# of issues reported
Avg. length of issue descriptions*
# of comments made in issue discussions
Avg. length of issue comments*
# of comments made in PR discussions
Avg. length of PR comments*
Network
# of times being mentioned in issue comments
# of times being mentioned in PR comments
# of times referred other issues/PRs in issue comments
# of times referred other issues/PRs in PR comments
Admin.
# of times applied or removed labels on issues
# of times applied or removed labels on PRs
# of times closed issues
# of times closed pull requests
* All lengths were measured in number of characters
first used the GitHub REST API 2to download the raw data
about code committing actions, issue reporting and comment-
ing actions, PR reporting and commenting actions, as well as
issue and PR events (e.g. labels applied/removed, closed, etc.)
for each project. We then excluded any action data performed
by “bots” (i.e. automated processes presented as GitHub users
who perform event-driven actions). In order to understand the
dynamics of the OSS roles, data for each contributor was then
divided based on the quarter of a year when we calculated the
metrics. As such, our data set accumulated metrics for each
participant in each project across 12 time periods. In total, this
data set is comprised of 38,891 data points, each included 19
dimensions corresponding to the metrics described in Table II.
D. Identifying Activities and Roles
The metrics introduced previously in Section III-B were
selected to measure the concrete actions taken by the user
from distinct perspectives. Those metrics, however, might be
interrelated and can be influenced or determined by a set
of hidden factors. We hypothesize that those hidden factors
are the common activities that OSS contributors engage in
when they are serving certain roles in the projects. To identify
these activities, we first performed a Factor Analysis on the
dataset to understand and interpret the interrelations between
those metrics. Based on these factors, we then conducted a
2https://developer.github.com/v3/
Clustering Analysis to identify the prominent contributor roles.
Before the factor analysis, all metrics were standardized to
have a mean of zero and unit variance.
1) Factor Analysis: Factor analysis, especially exploratory
factor analysis, is a statistical method to discover underlying
patterns in a set of variables [8]. The main procedures for
factor analysis include factor extraction and rotation.
Maximum Likelihood and Principal Axis Factors (PAF) are
two commonly adopted factor extraction techniques [9]. We
chose the PAF approach because preliminary analysis indi-
cated that the distributions of our data violate the assumption
of multivariate normality [11]. After extracting the factors,
we used the Kaiser criterion and retained the factors with
an eigenvalue larger than 1.0, indicating that those are the
most influential factors (i.e. factors that account for the most
variance in the data) [28].
The retained factors were then rotated to attain a sim-
ple structure that supports a better interpretation. In such a
structure, each rotated factor aims to define a distinct group
of interrelated metrics. Rotation techniques can be generally
divided into orthogonal and oblique rotations; the former
produces factors that are uncorrelated while the latter allows
the factors to correlate. In social science, behaviors can rarely
be partitioned into groups that are independent [9]. We hypoth-
esized that the factors influencing the contributors’ activities
in OSS communities would also exhibit some correlations. We
therefore decided to use the oblique rotation techniques as they
would render more accurate and reproducible results when the
factors are correlated.
Factor analysis produces two results: factor loading and
factor scores. Factor loading represents the correlation of the
original metrics with each identified factor, while factor scores
are the values of each data point mapped in the factor space.
We used the factor loading result to interpret the relations
between metrics listed in Table II. The factor scores were then
used for the clustering analysis in the next step.
2) Clustering: After the activities (i.e. factors) were iden-
tified, we conducted a hierarchical clustering analysis based
on the factor scores data to identify the prominent roles of
OSS contributors. This method aims to construct a hierarchical
structure of clusters; such structure provides more information
about the dataset than unstructured clusters produced by flat
clustering methods such as K-means. Furthermore, hierarchi-
cal methods do not require a predetermined number of clusters
and most of them are deterministic. As such, this method
supports the exploratory nature of our study.
Particularly, we used an agglomerative (or bottom-up) hier-
archical clustering method. In general, agglomerative methods
first treat each data point as a singleton cluster. Pairs of
closest clusters are then successively merged until all clusters
have been merged into a single one that contains all data.
This process produces a hierarchy of clustering that can be
visualized in a tree diagram named dendrogram. Cutting the
dendrogram at a certain level creates a partition of disjoint
clusters. This step is equivalent to grouping only the clusters
with high similarity. Different strategies have been proposed
for measuring the similarity between two clusters. Based on
our initial experiment, we decided to use the ward’s method
[18]. This method produces clusters that are more compact
and suitable for identifying and interpreting prominent roles.
We used the silhouette value to measure the quality of
clusters [22]. It represents how similar one data point is to its
own cluster compared to other clusters. To choose the optimal
number of clusters, we considered the silhouette value while
also referencing to the dendrogram produced by the ward
hierarchical algorithm.
3) Interpreting activities and roles: In order to identify the
meaningful activities and roles represented in the factors and
the clusters, we followed a qualitative process that involved the
following steps. First, both authors independently examined
the actual actions represented by the influential metrics for
each factor and each wrote three to five keywords/phrases to
describe their understanding of the factor. Then the authors
discussed their notes and conducted an “Affinity Diagraming”
study to group their keywords/phrases. Next, a phrase of
higher-level abstraction was given to each group to describe
the factor. Finally, the authors discussed and agreed on the
phrase that described the biggest group in the affinity diagram
of each factor as the activity it represented. We adopted a
similar process in identifying the roles from the clustering
analysis results.
E. Analyzing Role Dynamics
To identify patterns in the dynamics of changes in roles
assumed by individual contributors, we first analyzed the
frequency of changes among the roles with respect to all
contributors. We then measured the role change intensity
(RCI) for each contributor. A contributor’s RCI was calculated
by accumulating, over the 12 time periods, the quantity of
role change between each two consecutive time periods; this
quantity is measured using the Euclidean distance between
cluster centroids of the two roles taken by the contributor
in two consecutive time periods. To accommodate the large
range of change intensity values and to ease comparison, we
calculate RCI using a logarithmic scale. Therefore, the overall
Role Change Intensity (RCI) for each contributor iis:
RoleChangeIntensity(i) = log10
12
X
t=2
dist(Ri
t,Ri
t−1)(1)
where Ri
tis the cluster centroid of the role assumed by
contributor iat time tand dist(A,B) = pPn(an−bn)2
represents the Euclidean distance between vectors Aand B.
This measure provides an ordinal evaluation of the intensity
of the contributors’ role change.
IV. RES ULT S
In the following sections, we first present our results on
factor and clustering analyses. We then present findings on
role dynamics.
A. Activity Extraction
Based on the criterion introduced in Section III-D1, we
retained six factors that had eigenvalues greater than 1.0. These
six factors explained 61% of the data variance. The factor
loading results are shown in Table III. Based on the qualitative
analysis described in Section III-E, we explain the activities
represented in these factors as follows:
−Factor 1 measures three types of actions: commenting,
being mentioned in comments, and manipulating labels
on PRs. The commenting actions may be associated with
several purposes such as voicing opinions, providing sug-
gestions, and asking or answering questions. But this factor
is most heavily influenced by the number of times the
contributor being mentioned; it also puts a heavy weight
on label manipulation actions. These facts indicated that it
mainly measures behaviors of providing information and
knowledge about the project. We thus name this activity
Knowledge Sharing.
−Factor 2 exclusively measures participants’ contributions to
the codebase. We name it Code Contribution.
−Factor 3 measures issue referring and label manipulating
actions. We found that issue referring actions are usually
associated with identifying duplicated issues or redirecting
participants to move their discussion to other issues. At
the same time, manipulating issue labels usually involve
categorizing issues (e.g. into bugs or feature requests),
identifying duplicated issues, and/or indicating stages in
issue resolving progress (e.g. triaging, assigned). We thus
name this activity Issue Coordination.
−Factor 4 is mostly associated with actions of closing issues
or PRs. We name this activity Progress Control.
−Factor 5 is influenced by actions of making PRs and
working on a large number of files. These indicate feature
tweaking or bug fixing activities in which contributors
make small changes on many files and file PRs for these
changes to be included in the main repository. We thus
name this activity Code Tweaking.
−Factor 6 is only influenced by the number of issues
reported. We thus name it Issue Reporting.
The factor analysis result also demonstrated some corre-
lations among the extracted activity dimensions (see Table
IV). Particularly, Knowledge Sharing, Issue Coordination, and
Progress Control exhibited high correlations (all pair-wise cor-
relation coefficients r > 0.5). Two other pairs of dimensions,
Knowledge Sharing–Code Tweaking and Issue Reporting–
Issue Coordination, also demonstrated moderate correlation
(r > 0.4). These results supported our hypothesis that factors
TABLE III
FACTO R LOA DI NGS O F TH E ACTI ON M ETR IC S
Metrics Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 h2u2
# of commits made 0.00 1.03 0.02 -0.07 -0.04 0.05 0.997 0.003
# of line of code changed in the codebase 0.00 1.03 0.02 -0.07 -0.04 0.05 0.997 0.003
# of files worked on 0.00 0.26 -0.07 0.35 0.45 -0.17 0.575 0.425
# of pull requests (PRs) made 0.06 -0.04 -0.11 0.45 0.70 0.05 0.876 0.124
Avg. length of PR descriptions 0.05 -0.01 -0.03 -0.01 0.13 0.01 0.022 0.978
# of issues reported 0.30 0.04 0.18 0.11 0.11 0.35 0.643 0.358
Avg. length of issue descriptions -0.01 0.00 -0.03 0.00 0.02 0.16 0.024 0.976
# of comments made in issue discussions 0.52 -0.01 0.30 0.24 -0.13 0.16 0.900 0.100
Avg. length of issue comments -0.02 0.01 0.00 -0.01 0.00 0.1 0.008 0.992
# of comments made in PR discussions 0.72 -0.01 0.09 0.03 0.25 -0.12 0.820 0.180
Avg. length of PR comments 0.00 -0.01 0.04 -0.02 0.09 0.03 0.013 0.987
# of times being mentioned in issue comments 0.78 0.01 0.1 0.01 -0.02 0.14 0.840 0.160
# of times being mentioned in PR comments 1.02 0.01 -0.16 -0.2 0.2 -0.22 0.743 0.257
# of times referred other issues/PRs in issue comments -0.12 0.02 1.06 -0.07 0.02 -0.15 0.808 0.192
# of times referred other issues/PRs in PR comments 0.06 0.00 0.59 -0.05 0.21 0.00 0.503 0.497
# of times applied or removed labels on issues 0.06 0.01 0.72 0.24 -0.23 -0.07 0.725 0.275
# of times applied or removed labels on PRs 0.58 0.00 0.13 -0.03 0.15 -0.07 0.516 0.485
# of times closed issues 0.28 -0.03 0.08 0.65 -0.24 0.04 0.701 0.299
# of times closed pull requests -0.30 -0.08 0.07 0.95 0.33 -0.06 0.857 0.144
Activity Name Knwl.
Sharing
Code
Contrib.
Issue
Coord.
Prog.
Ctrl.
Code
Twking.
Issue
Rptg.
Note 1: The h2column represents the estimated proportion of variance of the each metrics that are shared with other metrics and can
explained by factors. The u2column (equals 1−h2) denotes the variance that are unique to the metric itself.
Note 2: Yellow cells indicate that the loading is greater than 0.5; green cells indicate that the loading is between 0.3 and 0.5.
TABLE IV
COR REL ATIO N COE FFIC IEN T AM ONG FAC TOR S
Knwl.
Sharing
Code
Contri.
Issue
Coord.
Prog.
Ctrl.
Code
Twking.
Code Contrib. 0.18
Issue Coord. 0.70 0.09
Progress Ctrl. 0.61 0.35 0.56
Code Twking. 0.47 0.30 0.29 0.32
Issue Rptg. 0.38 -0.05 0.44 0.13 0.22
influencing the contributors’ actions in the OSS community
are not independent.
B. Roles Identification
Figure 1 shows the dendrogram of our hierarchical cluster-
ing results. We observed that there are two major groups of
clusters that exhibited markedly different structures. The ma-
jority of the data points (N= 37,310) fell into a cluster with a
low dendrogram height, while some data points (N= 1,581)
represented a much higher height. In other words, the variance
among sub-clusters in the first group was much smaller than
that of the second group. This difference indicated that our
data included two very distinct groups of users.
By examining the cluster centers and samples from each
group, we found that the second group generates a much
higher value in all factor dimensions when compared with
the first group; in other words, contributors in this group
are much more active in terms of all activities indicated by
the factors. We thus consider the second group as comprised
of Active Contributors in their communities, while the first
group represents the Supporting Contributors. Because the
0 50 100 150 200 250 300
Upper tree of cut at h=30
Branch 1
Branch 2
Branch 3
Branch 4
Branch 5
Branch 6
Branch 7
Branch 8
Branch 9
Branch 10
Branch 11
Branch 12
Branch 13
Branch 14Branch 15
Branch 16
Branch 17
Branch 18
Branch 19
Branch 20
Branch 21
Branch 22Branch 23
Branch 24Branch 25
Branch 26
Branch 27
Branch 28
Branch 29
Branch 30
Branch 31
Branch 32
Branch 33
Branch 34
Branch 35
Branch 36
Branch 37
Branch 38
Branch 39
Branch 40
Branch 41
Branch 42
Branch 43
Branch 44
Branch 45
Branch 46
Branch 47
Branch 48
Branch 49
Branch 50
Branch 51
Branch 52
Branch 53
Branch 54
Branch 55
Branch 56
Branch 57
Branch 58
Branch 59
Branch 60
Branch 61
Branch 62
Branch 63
Branch 64
Branch 65
Branch 66
Branch 67
Branch 68
Branch 69
Branch 70
Branch 71
Branch 72
Branch 73
Branch 74
Branch 75
Branch 76
Branch 77
Branch 78
Branch 79
Branch 80
Branch 81
Branch 82
Branch 83
Branch 84
Branch 85
Branch 86
Branch 87
Branch 88
Branch 89
Branch 90
Branch 91
Branch 92
Branch 93
Branch 94
Branch 95
Branch 96
Branch 97
Branch 98
Branch 99
Branch 100
Branch 101
Branch 102
Branch 103
Branch 104
Branch 105
Branch 106
Branch 107
Branch 108
Branch 109
Branch 110
Branch 111
Branch 112
Branch 113
Branch 114
Branch 115
Branch 116
Branch 117
Branch 118
Branch 119
Branch 120
Branch 121
Branch 122
Branch 123
Branch 124
Branch 125
Branch 126
Branch 127
Branch 128
Branch 129
Branch 130
Branch 131
Branch 132
Branch 133
Branch 134
Branch 135
Branch 136
Branch 137
Branch 138
Branch 139
Branch 140
Branch 141
Branch 142
Branch 143
Branch 144
Branch 145
Branch 146
Branch 147
Branch 148
Branch 149
Branch 150
Branch 151
Branch 152
Branch 153
Branch 154
Branch 155
Branch 156
Branch 157
Branch 158
Branch 159
Branch 160
Branch 161
Branch 162
Branch 163
Branch 164
Branch 165
Branch 166
Branch 167
Branch 168
Branch 169
Branch 170
Branch 171
Branch 172
Branch 173
Branch 174Branch 175
Branch 176
Branch 177
Branch 178
Branch 179
Branch 180
Branch 181
Branch 182
Branch 183
Branch 184
Branch 185
Branch 186
Branch 187
Branch 188
Branch 189
Branch 190
Branch 191
Branch 192
Branch 193
Branch 194
Branch 195
Branch 196
Branch 197
Branch 198
Branch 199
Branch 200
Branch 201
Branch 202
Branch 203
Branch 204
Branch 205
Branch 206
Branch 207
Branch 208
Branch 209
Branch 210
Branch 211
Branch 212
Branch 213
Branch 214
Branch 215
Branch 216
Branch 217
Branch 218
Branch 219
Branch 220
Branch 221
Branch 222
Branch 223
Branch 224
Branch 225
Branch 226
Branch 227
Branch 228
Branch 229
Branch 230
Branch 231
Branch 232
Branch 233
Branch 234
Branch 235
Branch 236
Branch 237
Branch 238
Branch 239
Branch 240
Branch 241
Branch 242Branch 243
Branch 244
Branch 245
Branch 246
Branch 247
Branch 248
Branch 249
Branch 250
Branch 251
Branch 252
Branch 253
Branch 254
Branch 255
Branch 256
Branch 257
Branch 258
Branch 259
Branch 260
Branch 261
Branch 262
Branch 263
Branch 264
Branch 265
Branch 266
Branch 267
Branch 268
Branch 269
Branch 270
Branch 271
Branch 272
Branch 273
Branch 274
Branch 275
Branch 276
Branch 277
Branch 278
Branch 279
Branch 280
Branch 281
Branch 282
Branch 283
Branch 284
Branch 285
Branch 286
Branch 287
Branch 288
Branch 289
Branch 290
Branch 291
Branch 292
Branch 293
Branch 294
Branch 295
Branch 296
Branch 297
Branch 298
Branch 299
Branch 300
Branch 301
Branch 302
Branch 303
Branch 304
Branch 305
Branch 306
Branch 307
Branch 308
Branch 309
Branch 310
Branch 311
Branch 312
Branch 313
Branch 314
Branch 315
Branch 316
Branch 317
Branch 318
Branch 319
Branch 320
Branch 321
Branch 322
Branch 323
Branch 324
Branch 325
Branch 326
Branch 327
Branch 328
Branch 329
Branch 330
Branch 331
Branch 332
Branch 333
Branch 334
Branch 335
Branch 336
Branch 337
Branch 338
Branch 339
Branch 340
Branch 341
Branch 342
Branch 343
Branch 344
Branch 345
Branch 346
Branch 347
Branch 348
Branch 349
Branch 350
Branch 351
Branch 352
Branch 353
Branch 354
Branch 355
Branch 356
Branch 357
Branch 358
Branch 359
Branch 360
Branch 361
Branch 362
Branch 363
Branch 364
Branch 365
Branch 366
Branch 367
Branch 368
Branch 369
Branch 370
Branch 371
Branch 372
Branch 373
Branch 374
Branch 375
Branch 376
Branch 377
Branch 378
Branch 379
Branch 380
Branch 381
Branch 382
Branch 383
Branch 384
Branch 385
Branch 386
Branch 387
Branch 388
Branch 389
Branch 390
Branch 391
Branch 392
Branch 393
Branch 394
Branch 395
Branch 396
Branch 397
Branch 398
Branch 399
Branch 400
Branch 401
Branch 402
Branch 403
Branch 404
Branch 405
Branch 406
Branch 407
Branch 408
Branch 409
Branch 410
Branch 411
Branch 412
Branch 413
Branch 414
Branch 415
Branch 416
Branch 417
Branch 418
Branch 419
Branch 420
Branch 421
Branch 422
Branch 423
Branch 424
Branch 425
Branch 426
Branch 427
Branch 428
Branch 429
Branch 430
Branch 431
Branch 432
Branch 433
Branch 434
Branch 435
Branch 436
Branch 437
Branch 438
Branch 439
Branch 440
Branch 441
Branch 442
Branch 443
Branch 444
Branch 445
Branch 446
Branch 447
Branch 448
Branch 449
Branch 450
Branch 451
Branch 452
Branch 453
Branch 454
Branch 455
Branch 456
Branch 457
Branch 458
Branch 459
Branch 460
Branch 461
Branch 462
Branch 463
Branch 464
Branch 465
Branch 466
Branch 467
Branch 468
Branch 469
Branch 470
Branch 471
Branch 472
Branch 473
Branch 474
Branch 475
Branch 476
Branch 477
Branch 478
Branch 479
Branch 480
Branch 481
Branch 482
Branch 483
Branch 484
Branch 485
Branch 486
Branch 487
Branch 488
Branch 489
Branch 490
Branch 491
Branch 492
Branch 493
Branch 494
Branch 495
Branch 496
Branch 497
Branch 498
Branch 499
Branch 500
Branch 501
Branch 502
Branch 503
Branch 504
Branch 505
Branch 506
Branch 507
Branch 508
Branch 509
Branch 510
Branch 511
Branch 512
Branch 513
Branch 514
Branch 515
Branch 516
Branch 517
Branch 518
Branch 519
Branch 520
Branch 521
Branch 522
Branch 523
Branch 524
Branch 525
Branch 526
Branch 527
Branch 528
Branch 529
Branch 530
Branch 531
Branch 532
Branch 533
Branch 534
Branch 535
Branch 536
Branch 537
Branch 538
Branch 539
Branch 540
Branch 541
Branch 542
Branch 543
Branch 544
Branch 545
Branch 546
Branch 547
Branch 548
Branch 549
Branch 550
Branch 551
Branch 552
Branch 553
Branch 554
Branch 555
Branch 556
Branch 557
Branch 558
Branch 559
Branch 560
Branch 561
Branch 562
Branch 563
Branch 564
Branch 565
Branch 566
Branch 567
Branch 568
Branch 569
Branch 570
Branch 571
Branch 572
Branch 573
Branch 574
Branch 575
Branch 576
Branch 577
Branch 578
Branch 579
Branch 580
Branch 581
Branch 582
Branch 583
Branch 584
Branch 585
Branch 586
Branch 587
Branch 588
Branch 589
Branch 590
Branch 591
Branch 592
Branch 593
Branch 594
Branch 595
Branch 596
Branch 597
Branch 598
Branch 599
Branch 600
Branch 601
Branch 602
Branch 603
Branch 604
Branch 605
Branch 606
Branch 607
Branch 608
Branch 609
Branch 610
Branch 611
Branch 612
Branch 613
Branch 614
Branch 615
Branch 616
Branch 617
Branch 618
Branch 619
Branch 620
Branch 621
Branch 622
Branch 623
Branch 624
Branch 625
Branch 626
Branch 627
Branch 628
Branch 629
Branch 630
Branch 631
Branch 632
Branch 633
Branch 634
Branch 635
Branch 636
Branch 637
Branch 638
Branch 639
Branch 640
Branch 641
Branch 642
Branch 643
Branch 644
Branch 645
Branch 646
Branch 647
Branch 648
Branch 649
Branch 650
Branch 651
Branch 652
Branch 653
Branch 654
Branch 655
Branch 656
Branch 657
Branch 658
Branch 659
Branch 660
Branch 661
Branch 662
Branch 663
Branch 664
Branch 665
Branch 666
Branch 667
Branch 668
Branch 669
Branch 670
Branch 671
Branch 672
Branch 673
Branch 674
Branch 675
Branch 676
Branch 677
Branch 678
Branch 679
Branch 680
Branch 681
Branch 682
Branch 683
Branch 684
Branch 685
Branch 686
Branch 687
Branch 688
Branch 689
Branch 690
Branch 691
Branch 692
Branch 693
Branch 694
Branch 695
Branch 696
Branch 697
Branch 698
Branch 699
Branch 700
Branch 701
Branch 702
Branch 703
Branch 704
Branch 705
Branch 706
Branch 707
Branch 708
Branch 709
Branch 710
Branch 711
Branch 712
Branch 713
Branch 714
Branch 715
Branch 716
Branch 717
Branch 718
Branch 719
Branch 720
Branch 721
Branch 722
Branch 723
Branch 724
Branch 725
Branch 726
Branch 727
Branch 728
Branch 729
Branch 730
Branch 731
Branch 732
Branch 733
Branch 734
Branch 735
Branch 736
Branch 737
Branch 738
Branch 739
Branch 740
Branch 741
Branch 742
Branch 743
Branch 744
Branch 745
Branch 746
Branch 747
Branch 748
Branch 749
Branch 750
Branch 751
Branch 752
Branch 753
Branch 754
Branch 755
Branch 756
Branch 757
Branch 758
Branch 759
Branch 760
Branch 761
Branch 762
Branch 763
Branch 764
Branch 765
Branch 766
Branch 767
Branch 768
Branch 769
Branch 770
Branch 771
Branch 772
Branch 773
Branch 774
Branch 775
Branch 776
Branch 777
Branch 778
Branch 779
Branch 780
Branch 781
Branch 782
Branch 783
Branch 784
Branch 785
Branch 786
Branch 787
Branch 788
Branch 789
Branch 790
Branch 791
Branch 792
Branch 793
Branch 794
Branch 795
Branch 796
Branch 797
Branch 798
Branch 799
Branch 800
Branch 801
Branch 802
Branch 803
Branch 804
Branch 805
Branch 806
Branch 807
Branch 808
Branch 809
Branch 810
Branch 811
Branch 812
Branch 813
Branch 814
Branch 815
Branch 816
Branch 817
Branch 818
Branch 819
Branch 820
Branch 821
Branch 822
Branch 823
Branch 824
Branch 825
Branch 826
Branch 827
Branch 828
Branch 829
Branch 830
Branch 831
Branch 832
Branch 833
Branch 834
Branch 835
Branch 836
Branch 837
Branch 838
Branch 839
Branch 840
Branch 841
Branch 842
Branch 843
Branch 844
Branch 845
Branch 846
Branch 847
Branch 848
Branch 849
Branch 850
Branch 851
Branch 852
Branch 853
Branch 854
Branch 855
Branch 856
Branch 857
Branch 858
Branch 859
Branch 860
Branch 861
Branch 862
Branch 863
Branch 864
Branch 865
Branch 866
Branch 867
Branch 868
Branch 869
Branch 870
Branch 871
Branch 872
Branch 873
Branch 874
Branch 875
Branch 876
Branch 877
Branch 878
Branch 879
Branch 880
Branch 881
Branch 882
Branch 883
Branch 884
Branch 885
Branch 886
Branch 887
Branch 888
Branch 889
Branch 890
Branch 891
Branch 892
Branch 893
Branch 894
Branch 895
Branch 896
Branch 897
Branch 898
Branch 899
Branch 900
Branch 901
Branch 902
Branch 903
Branch 904
Branch 905
Branch 906
Branch 907
Branch 908
Branch 909
Branch 910
Branch 911
Branch 912
Branch 913
Branch 914
Branch 915
Branch 916
Branch 917
Branch 918
Branch 919
Branch 920
Branch 921
Branch 922
Branch 923
Branch 924
Branch 925
Branch 926
Branch 927
Branch 928
Branch 929
Branch 930
Branch 931
Branch 932
Branch 933
Branch 934
Branch 935
Branch 936
Branch 937
Branch 938
Branch 939
Branch 940
Branch 941
Branch 942
Branch 943
Branch 944
Branch 945
Branch 946
Branch 947
Branch 948
Branch 949
Branch 950
Branch 951
Branch 952
Branch 953
Branch 954
Branch 955
Branch 956
Branch 957
Branch 958
Branch 959
Branch 960
Branch 961
Branch 962
Branch 963
Branch 964
Branch 965
Branch 966
Branch 967
Branch 968
Branch 969
Branch 970
Branch 971
Branch 972
Branch 973
Branch 974
Branch 975
Branch 976
Branch 977
Branch 978
Branch 979
Branch 980
Branch 981
Branch 982
Branch 983
Branch 984
Branch 985
Branch 986
Branch 987
Branch 988
Branch 989
Branch 990
Branch 991
Branch 992
Branch 993
Branch 994
Branch 995
Branch 996
Branch 997
Branch 998
Branch 999
Branch 1000
Branch 1001
Branch 1002
Branch 1003
Branch 1004
Branch 1005
Branch 1006
Branch 1007
Branch 1008
Branch 1009
Branch 1010
Branch 1011
Branch 1012
Branch 1013
Branch 1014
Branch 1015
Branch 1016
Branch 1017
Branch 1018
Branch 1019
Branch 1020
Branch 1021
Branch 1022
Branch 1023
Branch 1024
Branch 1025
Branch 1026
Branch 1027
Branch 1028
Branch 1029
Branch 1030
Branch 1031
Branch 1032
Branch 1033
Branch 1034
Branch 1035
Branch 1036
Branch 1037
Branch 1038
Branch 1039
Branch 1040
Branch 1041
Branch 1042
Branch 1043
Branch 1044
Branch 1045
Branch 1046
Branch 1047
Branch 1048
Branch 1049
Branch 1050
Branch 1051
Branch 1052
Branch 1053
Branch 1054
Branch 1055
Branch 1056
Branch 1057
Branch 1058
Branch 1059
Branch 1060
Branch 1061
Branch 1062
Branch 1063
Branch 1064
Branch 1065
Branch 1066
Branch 1067
Branch 1068
Branch 1069
Branch 1070
Branch 1071
Branch 1072
Branch 1073
Branch 1074
Branch 1075
Branch 1076
Branch 1077
Branch 1078
Branch 1079
Branch 1080
Branch 1081
Branch 1082
Branch 1083
Branch 1084
Branch 1085
Branch 1086
Branch 1087
Branch 1088
Branch 1089
Branch 1090
Branch 1091
Branch 1092
Branch 1093
Branch 1094
Branch 1095
Branch 1096
Branch 1097
Branch 1098
Branch 1099
Branch 1100
Branch 1101
Branch 1102
Branch 1103
Branch 1104
Branch 1105
Branch 1106
Branch 1107
Branch 1108
Branch 1109
Branch 1110
Branch 1111
Branch 1112
Branch 1113
Branch 1114
Branch 1115
Branch 1116
Branch 1117
Branch 1118
Branch 1119
Branch 1120
Branch 1121
Branch 1122
Branch 1123
Branch 1124
Branch 1125
Branch 1126
Branch 1127
Branch 1128
Branch 1129
Branch 1130
Branch 1131
Branch 1132
Branch 1133
Branch 1134
Branch 1135
Branch 1136
Branch 1137
Branch 1138
Branch 1139
Branch 1140
Branch 1141
Branch 1142
Branch 1143
Branch 1144
Branch 1145
Branch 1146
Branch 1147
Branch 1148
Branch 1149
Branch 1150
Branch 1151
Branch 1152
Branch 1153
Branch 1154
Branch 1155
Branch 1156
Branch 1157
Branch 1158
Branch 1159
Branch 1160
Branch 1161
Branch 1162
Branch 1163
Branch 1164
Branch 1165
Branch 1166
Branch 1167
Branch 1168
Branch 1169
Branch 1170
Branch 1171
Branch 1172
Branch 1173
Branch 1174
Branch 1175
Branch 1176
Branch 1177
Branch 1178
Branch 1179
Branch 1180
Branch 1181
Branch 1182
Branch 1183
Branch 1184
Branch 1185
Branch 1186
Branch 1187
Branch 1188
Branch 1189
Branch 1190
Branch 1191
Branch 1192
Branch 1193
Branch 1194
Branch 1195
Branch 1196
Branch 1197
Branch 1198
Branch 1199
Branch 1200
Branch 1201
Branch 1202
Branch 1203
Branch 1204
Branch 1205
Branch 1206
Branch 1207
Branch 1208
Branch 1209
Branch 1210
Branch 1211
Branch 1212
Branch 1213
Branch 1214
Branch 1215
Branch 1216
Branch 1217Branch 1218
Branch 1219
Branch 1220
Branch 1221
Branch 1222Branch 1223
Branch 1224
Branch 1225
Branch 1226
Branch 1227
Branch 1228
Branch 1229
Branch 1230
Branch 1231
Branch 1232
Branch 1233
Branch 1234
Branch 1235
Branch 1236
Branch 1237
Branch 1238
Branch 1239
Branch 1240
Branch 1241
Branch 1242
Branch 1243
Branch 1244
Branch 1245
Branch 1246
Branch 1247
Branch 1248
Branch 1249
Branch 1250
Branch 1251
Branch 1252
Branch 1253
Branch 1254
Branch 1255
Branch 1256
Branch 1257
Branch 1258
Branch 1259
Branch 1260
Branch 1261
Branch 1262
Branch 1263
Branch 1264
Branch 1265
Branch 1266
Branch 1267
Branch 1268
Branch 1269
Branch 1270
Branch 1271
Branch 1272
Branch 1273
Branch 1274
Branch 1275
Branch 1276
Branch 1277Branch 1278
Branch 1279
Branch 1280
Branch 1281
Branch 1282
Branch 1283
Branch 1284
Branch 1285
Branch 1286
Branch 1287
Branch 1288
Branch 1289
Branch 1290
Branch 1291
Branch 1292
Branch 1293
Branch 1294
Branch 1295
Branch 1296
Branch 1297
Branch 1298
Branch 1299
Branch 1300
Branch 1301
Branch 1302
Branch 1303
Branch 1304
Branch 1305
Branch 1306
Branch 1307
Branch 1308
Branch 1309
Branch 1310
Branch 1311
Branch 1312
Branch 1313
Branch 1314
Branch 1315
Branch 1316
Branch 1317
Branch 1318
Branch 1319
Branch 1320
Branch 1321
Branch 1322
Branch 1323
Branch 1324
Branch 1325
Branch 1326
Branch 1327
Branch 1328
Branch 1329
Branch 1330
Branch 1331
Branch 1332
Branch 1333
Branch 1334
Branch 1335
Branch 1336
Branch 1337
Branch 1338
Branch 1339
Branch 1340
Branch 1341
Branch 1342
Branch 1343
Branch 1344
Branch 1345
Branch 1346
Branch 1347
Branch 1348
Branch 1349
Branch 1350
Branch 1351
Branch 1352
Branch 1353
Branch 1354
Branch 1355
Branch 1356
Branch 1357
Branch 1358
Branch 1359
Branch 1360
Branch 1361
Branch 1362
Branch 1363
Branch 1364
Branch 1365
Branch 1366
Branch 1367
Branch 1368
Branch 1369
Branch 1370
Branch 1371
Branch 1372
Branch 1373
Branch 1374
Branch 1375
Branch 1376
Branch 1377
Branch 1378
Branch 1379
Branch 1380
Branch 1381
Branch 1382
Branch 1383
Branch 1384
Branch 1385
Branch 1386
Branch 1387
Branch 1388
Branch 1389
Branch 1390
Branch 1391
Branch 1392
Branch 1393
Branch 1394
Branch 1395
Branch 1396
Branch 1397
Branch 1398
Branch 1399
Branch 1400
Branch 1401
Branch 1402
Branch 1403
Branch 1404
Branch 1405
Branch 1406
Branch 1407
Branch 1408
Branch 1409
Branch 1410
Branch 1411
Branch 1412
Branch 1413
Branch 1414
Branch 1415
Branch 1416
Branch 1417
Branch 1418
Branch 1419
Branch 1420
Branch 1421
Branch 1422
Branch 1423
Branch 1424
Branch 1425
Branch 1426
Branch 1427
Branch 1428
Branch 1429
Branch 1430
Branch 1431
Branch 1432
Branch 1433
Branch 1434
Branch 1435
Branch 1436
Branch 1437
Branch 1438
Branch 1439
Branch 1440
Branch 1441
Branch 1442
Branch 1443
Branch 1444
Branch 1445
Branch 1446
Branch 1447
Branch 1448
Branch 1449
Branch 1450
Branch 1451
Branch 1452
Branch 1453
Branch 1454
Branch 1455
Branch 1456
Branch 1457
Branch 1458
Branch 1459
Branch 1460
Branch 1461
Branch 1462
Branch 1463
Branch 1464
Branch 1465
Branch 1466
Branch 1467
Branch 1468
Branch 1469
Branch 1470
Branch 1471
Branch 1472
Branch 1473
Branch 1474
Branch 1475
Branch 1476
Branch 1477
Branch 1478
Branch 1479
Branch 1480
Branch 1481
Branch 1482
Branch 1483
Branch 1484
Branch 1485
Branch 1486
Branch 1487
Branch 1488
Branch 1489
Branch 1490
Branch 1491
Branch 1492
Branch 1493
Branch 1494
Branch 1495
Branch 1496
Branch 1497
Branch 1498
Branch 1499
Branch 1500
Branch 1501
Branch 1502
Branch 1503
Branch 1504
Branch 1505
Branch 1506
Supporting Participants
Active Participants
Supporting Participants
Active Participants
Fig. 1. Hierarchical tree of clustering results. The red lines indicates the
height cutoff values for the two groups.
sub-cluster distances within these two high-level groups are
very different, we cut the two sub-trees at different heights
when identifying the specific role clusters.
Based on the silhouettes measure and the dendrogram
structure, we considered four clusters in the Active Contrib-
utors group and five clusters for Supporting Contributors;
the red lines on Figure 1 indicates the height cutoff values.
Once the clusters were determined, we followed a qualitative
process and named the clusters based on the activity space
characteristics of each cluster centroid and analysis of actual
actions performed by representative users in each cluster.
The characteristics for those clusters are shown in Figure 2
and Figure 3 respectively. We discuss those clusters and our
-0.1235191 -0.2124043 -0.2553889
Knowledge
Sharing
Code
Contribution
Issue
Coordination
Progress Control
Issue Fixing
Issue Reporting
Progress Controller
Engaged Issue Reporter
Issue Fixer
Occasional Issue Reporter
Rare Contrib utor
1.75
0.75
Knowledge
Sharing
Code
Contribution
Issue
Coordination
Progress Control
Issue Fixing
Issue Reporting
All-Rounder
Coor dinator
Core Developer
Intense Code Contributor
10
6
2
46.5
Code Tweaking
Fig. 2. Activity space characteristics of roles in the Active Contributors group.
-0.1235191 -0.2124043 -0.2553889
Knowledge
Sharing
Code
Contribution
Issue
Coordination
Progress Control
Issue Fixing
Issue Reporting
Progress Controller
Engaged Issue Reporter
Issue Fixer
Occasional Issue Reporter
Rare Contrib utor
1.75
0.75
Knowledge
Sharing
Code
Contribution
Issue
Coordination
Progress Control
Issue Fixing
Issue Reporting
All-Rounder
Coor dinator
Core Developer
Intense Code Contributor
10
6
2
46.5
Code Tweaking
Fig. 3. Activity space characteristics of roles in the Supporting Contributors
group.
rationale for naming the roles as follows.
Among the Active Contributors,Intense Code Contribu-
tors exerted an extremely high contribution to the codebase.
Additionally, there was only a small number of contributors
assumed this role. Their main focus seemed to be developing a
certain functionality of the software within a short time period.
Coordinators provided only a small amount of code contri-
bution. Instead, they focused mainly on Knowledge Sharing,
Issue Coordination, and Issue Reporting activities. They are
usually the owner of the project or a core member of the
community. Core Developer exerted very little contribution to
Issue Reporting but performed actively in Code Contribution,
Code Tweaking, Progress Control, and Knowledge Sharing.
They seemed to focus mainly on development and knowledge
sharing about the code. All-Rounders provided a medium
level of contribution in all dimensions.
The Supporting Contributors usually focused on only one
or two activities. Engaged Issue Reporters and Occasional
Issue Reporters both focused on Issue Reporting, but differed
on the quantity of their contributions. Progress Controllers
mostly engaged in the Progress Control activity, with some
contribution to Knowledge Sharing and Issue Coordination;
they almost never engaged in Code Tweaking contributions.
An analysis of sample users of this group revealed that they
are usually core members of the community and focused
on activities such as code reviewing, quality control, and
approving and merging PRs. Issue Fixers focused on making
small tweaks to the code or fixing bugs. Rare Contributors
only participated in a minuscule amount of activities.
C. Role Dynamics
Among all contributors included in our data set, most
(N= 16,706,78.9%) only assumed the Rare Contributor
role in certain periods of time during the past three years.
There were also two contributors who engaged in their projects
throughout the analyzed time periods with the same role (All-
Rounder). We excluded them in our analysis. Within the rest
of the contributors, there are 4,483 who only assumed roles
in the Supporting role group throughout the 12 time periods.
The rest (N= 479) have assumed roles in the Active role
group at least one time in the past three years. We focused
on analyzing the role change dynamics of these two types
OSS community contributors. When performing the analysis,
we considered “Absent” (i.e. did not perform any contribution
during a time period) as an additional role type.
Figure 4 shows the heatmap of role transition frequency
for contributors who only assumed the Supporting roles. Not
surprisingly, the most frequency transition happened among
Absent, Rare Contributor, and Occasional Issue Reporter roles.
The transition frequency from Absent to the other roles also
indicated ways people got involved to an OSS community:
people rarely started as a Progress Controller; aside from
occasional contributions, contributors usually began to engage
in a project by assuming Issue Fixer and Engaged Issue Re-
porter roles. Interestingly, while any transition to the Progress
Controller was rare, the transition from Progress Controller to
Occasional Issue Reporter was frequent. This may illustrate
a retiring path of community core members if they do not
continue contributing as an Active Participant.
To
From
Absen t
Rare
Contri.
Issue
Fixer
Progr ess
Controller
OCC Issue
Reporter
ENG Issue
Reporter
Absen t ⎼2048 253 33213 278
Rare Contri. 2591 ⎼194 8752 81
Issue Fix er 95 272 ⎼171 5
Progr ess C ontroller 118 3⎼511 0
OCC Issue Reporter 2248 1492 76 2⎼92
ENG Issue R eporter 166 136 16 00⎼
To
From
Absen t
Rare
Contri.
Issue
Fixer
Progr ess
Controller
OCC Issue
Reporter
ENG Issue
Reporter
Absen t ⎼2048/1587 253/250 3/3 3213/3100 278/278
Rare Contri. 2591/1738 ⎼194/163 8/5 752/ 686 81/76
Issue Fix er 95/95 272/252 ⎼1/1 71/71 5/5
Progr ess C ontroller 1/1 18/11 3/3 ⎼511/377 0/0
OCC Issue Reporter 2248/2167 1492/ 1325 76/70 2/ 2 ⎼92/82
ENG Issue R eporter 166/165 136/135 16/13 0/0 0/0 ⎼
Fig. 4. Role transition frequency heatmap for contributors who only assumed
the Supporting roles.
Figure 5 shows the heatmap of role transition frequency for
contributors who have assumed the Active roles. Aside from
transitions between Absent and Rare Contributors, the most
frequent transitions happened between All-Rounder (Active
role) and Issue Fixer (Supporting role). This type of transition
may have represented a working style of a group of OSS
contributors, who generally engaged in all aspects of the
community but switch to focus on issue fixing when issues ac-
cumulate. Transitions among All-Rounder (Active role), Rare
Contributor (Supporting role), and Occasional Issue Reporter
(Supporting role) were also frequent, indicating many active
community contributors may take breaks from their work.
To
From
Absent
All-Round er
Coordinator
Core
Developer
Intense
Code Contri.
Rare Contri.
Issue Fixer
Progr ess
Controller
OCC Issue
Reporter
ENG I ssue
Reporter
Absent ⎼108 412180 55 143 25
All-Round er 26 ⎼34 61 2109 181 32 82 30
Coordinator 132 ⎼4146412
Core Developer 044 10 ⎼125001
Intense Code Contri. 0420⎼11000
Rare Contri. 174 93 330⎼71 11 74 11
Issue Fixer 14 152 51094 ⎼455 4
Progr ess C ontroller 326 01017 8⎼5 1
OCC Issue Reporter 14 72 120109 46 4⎼23
ENG I ssue R eporter 234 03020 9 0 33 ⎼
Fig. 5. Role transition frequency heatmap for contributors who have assumed
the Active roles.
We calculated a Role Change Intensity (RCI) score for each
participant in each project using Equation 1. As the cluster dis-
tances among the Supporting roles are close to each other, the
RCI scores for contributors who only assumed the Supporting
roles are expected to be close to zero. As a result, we only
focused on RCI scores for contributors who have ever assumed
the Active roles. Figure 6 shows the histogram of their RCI
scores. The results showed a right-skewed distribution with a
median of 0.99 (IQR = 1.21 −0.92). Examining the scenario
in which only one role change has occurred throughout the
whole time periods, we found that the median RCI across all
types of role changes is 1.11 (IQR = 1.26 −0.35). These
results indicated that most contributors engaged in a medium-
level RCI while some experienced role changes across a
medium to high-level intensity.
Histogram of data
data
Frequency
0.5 1.0 1.5 2.0
0 50 100 150
Fig. 6. Histogram of RCI across contributors who have assumed the Active
roles.
V. DISCUSSION
Our work provides several implications for designing OSS
tools to support role-based interactions. In this section, we first
discuss these implications. We then consider the limitations of
our current study and discuss directions for future work.
A. Mediate OSS Activities Instead of Actions
Current OSS tools usually involve features that focus on
supporting low-level actions such as code committing, issue
reporting, commenting, applying labels, etc. There is a lack of
focus on mediating higher-level OSS activities. Through the
factor analysis, we identified six activities that most clearly
distinguished the roles of OSS contributors. Some of them,
such as Code Contribution and Issue Reporting, only involved
a small number of actions and were well supported in the
current tools. However, others involved multiple actions that
were currently supported in different, and sometimes isolated
tool features. For example, the Issue Coordination activity
involves actions to create links among the issues and the pull
requests, as well as managing issue labels; on GitHub, there is
no connection between the link creation (through commenting)
and the issue labeling features.
Reflecting on the Activity Theory, tools need to serve as
“functional organs” to help users achieve their goal-oriented
activities [16], [20]. We argue that the OSS tool designers need
to consider the activities identified in this paper, which were
aligned with the representative goals that OSS contributors
hold when performing the corresponding actions. Particularly,
they may explore connections surrounding the features that
support the underlying actions for each activity to facilitate
a smooth transition among the actions. Moreover, based on
the correlations found among the activities, OSS tool design-
ers may consider more sophisticated feature connections to
support users move among activities.
B. Support Role-Based Interaction
Through the clustering analysis, we identified four Active
roles and five Supporting roles of OSS contributors. While
the literature has strongly advocated role-based interaction
in software engineering tools [26], the realization of such
interaction is still immature in the OSS world. One reason for
this gap is that there is currently little knowledge or guidance
for the tool designers to have a clear conception as to what
high-level activities and detailed actions each role takes. Our
data-driven and activity-based roles addressed this limitation.
On one hand, the roles identified in this paper reflected some
characteristics of the roles in the literature (e.g. the “onion”
model [19]). For example, confirming the hypothesis posed
in the “onion” model, our data indicated a small number of
Active contributors who make a large amount of contributions
and a vast number of Supporting contributors. On the other
hand, however, our roles provided a non-simplistic trace to the
main activities each role focuses on. For example, our results
showed that the Progress Controllers do not only perform the
Progress Control activity but they were usually also involved
in Knowledge Sharing and Issue Coordination; the Engaged
Issue Reporters usually also perform the Issue Coordination
activity. As a result, these roles paint a more comprehensive
picture about activities and responsibilities of OSS contribu-
tors. The OSS tool designers can use this information to better
satisfy the goals and needs of OSS contributors in role-based
interaction design. Particularly, they can use the activities and
the actions underlying each role as a design guideline.
C. Support Role Change: Onboarding and Retiring
There is limited discussion about OSS tools that support role
change in the literature. However, our data indicated that role
change in OSS communities is both frequent and somewhat
intense. As a result, techniques and tools that facilitate a
smooth change of roles can be useful for OSS contributors.
While our results have indicated a complex role change
model, onboarding and retiring are among the most important
types of role change for OSS communities. Our data confirmed
a common impression that OSS contributors usually get in-
volved in a project through issue reporting and fixing. We also
identified that a common retiring path of Active contributors
is also though issue-related activities. These findings indicated
a central role of the Issue Management Systems in the on-
boarding and retiring processes. Tool designers may consider
including features in the Issue Management Systems to support
new contributors to be better engaged in the community culture
and acquire the necessary knowledge and skill. They may
also enhance the Issue Management Systems to help retiring
members transfer knowledge and tasks.
D. Limitations and Future Work
Although diverse, the contribution metrics used in our
study are based only on quantity, rather than quality, of the
actions taken by OSS contributors. Our research can thus
be extended with studies focused on extracting qualitative
measures of contribution. Additionally, while the 29 OSS
projects analyzed in this work were carefully selected to cover
a wide variety in terms of application domains, programming
language, community size, and code base size, future work
that validates our findings in a larger amount of OSS projects
and communities can be useful. Moreover, we focused on in-
vestigating OSS contributors’ roles within a project. However,
many contemporary OSS communities were structured around
a group of projects (i.e. a project ecosystem). Exploring how
our model and method can generalize to such higher-scaled
OSS communities is an interesting future work.
VI. CONCLUSION
In this study, we adopted a data-driven approach to under-
standing the diverse roles and their dynamics in OSS commu-
nities. From an analysis of 29 OSS projects, we extracted six
activities that determined four Active roles and five Supporting
roles. This approach allowed us to provide rich informa-
tion, grounded in the data, about the actions and activities
performed by each role. Through the lens of the Activity
Theory, such information rendered useful design guidelines
for role-based OSS tools. We argue that such methodology
and the generated information are crucial to understanding and
supporting the collaboration among diverse OSS contributors.
REFERENCES
[1] S. T. Acu˜
na and N. Juristo. Assigning people to roles in software
projects. Software: Practice and Experience, 34(7):675–696, jun 2004.
[2] K. Agrawal, M. Aschauer, T. Thonhofer, S. Bala, A. Rogge-Solti, and
N. Tomsich. Resource Classification from Version Control System
Logs. In 2016 IEEE 20th International Enterprise Distributed Object
Computing Workshop (EDOCW), pages 1–10. IEEE, sep 2016.
[3] O. Arazy, J. Daxenberger, H. Lifshitz-Assaf, O. Nov, and I. Gurevych.
Turbulent Stability of Emergent Roles: The Dualistic Nature of Self-
Organizing Knowledge Coproduction. Information Systems Research,
27(4):792–812, dec 2016.
[4] J. B. Barlow. Emergent roles in decision-making tasks using group
chat. In Proceedings of the 2013 conference on Computer supported
cooperative work - CSCW ’13, page 1505, New York, New York, USA,
2013. ACM Press.
[5] A. Ben Fadhel, D. Bianculli, and L. C. Briand. Model-driven run-time
enforcement of complex role-based access control policies. In Proceed-
ings of the 33rd ACM/IEEE International Conference on Automated
Software Engineering - ASE 2018, pages 248–258, New York, New
York, USA, 2018. ACM Press.
[6] C. Cheng, B. Li, Z.-Y. Li, Y.-Q. Zhao, and F.-L. Liao. Developer Role
Evolution in Open Source Software Ecosystem: An Explanatory Study
on GNOME. J. Comput. Sci. Technol., 32(2):396–414, mar 2017.
[7] J. Cheng and J. L. Guo. How do the open source communities address
usability and ux issues: An exploratory study. In Extended Abstracts
of the 2018 CHI Conference on Human Factors in Computing Systems,
CHI EA ’18, New York, NY, USA, 2018. ACM.
[8] D. Child. The essentials of factor analysis. A&C Black, 2006.
[9] A. B. Costello and J. W. Osborne. Best practices in exploratory factor
analysis: Four recommendations for getting the most from your analysis.
Practical assessment, research & evaluation, 10(7):1–9, 2005.
[10] E. di Bella, A. Sillitti, and G. Succi. A multivariate classification of
open source developers. Information Sciences, 221:72–83, feb 2013.
[11] L. R. Fabrigar, D. T. Wegener, R. C. MacCallum, and E. J. Strahan.
Evaluating the use of exploratory factor analysis in psychological
research. Psychological methods, 4(3):272, 1999.
[12] GitHub. How to contribute to open source. https://opensource.guide/,
2019.
[13] R. Hoda, J. Noble, and S. Marshall. Self-Organizing Roles on Agile
Software Development Teams. IEEE Transactions on Software Engi-
neering, 39(3):422–444, mar 2013.
[14] M. Joblin, S. Apel, C. Hunsen, and W. Mauerer. Classifying Developers
into Core and Peripheral: An Empirical Study on Count and Network
Metrics. In Proceedings of the 39th International Conference on
Software Engineering, pages 164–174. IEEE, may 2017.
[15] M. Joblin, S. Apel, and W. Mauerer. Evolutionary trends of developer
coordination: a network approach. Empirical Software Engineering,
22(4):2050–2094, Aug 2017.
[16] V. Kaptelinin. Activity Theory: Implications for Human- Computer
Interaction. In Context and consciousness: Activity theory and human-
computer interaction, pages 103–116. MIT, 1996.
[17] A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of
open source software development: Apache and Mozilla. ACM Trans.
Softw. Eng. Methodol., 11(3):309–346, jul 2002.
[18] F. Murtagh and P. Legendre. Ward’s hierarchical agglomerative cluster-
ing method: Which algorithms implement ward’s criterion? Journal of
Classification, 31(3):274–295, Oct 2014.
[19] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye.
Evolution patterns of open-source software systems and communities.
In Proceedings of the International Workshop on Principles of Software
Evolution, page 76, New York, New York, USA, 2002. ACM Press.
[20] B. A. Nardi. Activity Theory and Human-Computer Interaction. In Con-
text and consciousness: Activity theory and human-computer interaction,
pages 7–16. MIT, 1996.
[21] S. Onoue, H. Hata, and K. Matsumoto. Software population pyramids:
The current and the future of oss development communities. In Pro-
ceedings of the 8th ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement, ESEM ’14, pages 34:1–34:4,
New York, NY, USA, 2014. ACM.
[22] P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis. Journal of computational and applied
mathematics, 20:53–65, 1987.
[23] J. E. Tomakyo and O. Hazaan. Human Aspects of Software Engineering.
Laxmi Publications, 2005.
[24] L. S. Vygotsky. Mind in Society: Development of Higher Psychological
Processes. Harvard University Press, 1981.
[25] Y. Ye and K. Kishida. Toward an understanding of the motivation open
source software developers. In Proceedings of the 25th International
Conference on Software Engineering, ICSE ’03, pages 419–429, Wash-
ington, DC, USA, 2003. IEEE Computer Society.
[26] H. Zhu, M. Zhou, and M. Hou. Support Collaboration with Roles. In
Contemporary Issues in Systems Science and Engineering, pages 575–
598. John Wiley & Sons, Inc., Hoboken, NJ, USA, apr 2015.
[27] H. Zhu, M. Zhou, and P. Seguin. Supporting Software Development
With Roles. IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Humans, 36(6):1110–1123, nov 2006.
[28] W. R. Zwick and W. F. Velicer. Comparison of five rules for determining
the number of components to retain. Psychological bulletin, 99(3):432,
1986.