Conference PaperPDF Available

Activity-Based Analysis of Open Source Software Contributors: Roles and Dynamics



Content may be subject to copyright.
Activity-Based Analysis of Open Source Software
Contributors: Roles and Dynamics
Jinghui Cheng
Department of Computer and Software Engineering
Polytechnique Montreal, Montreal, Canada
Jin L.C. Guo
School of Computer Science
McGill University, Montreal, Canada
Abstract—Contributors to open source software (OSS) com-
munities assume diverse roles to take different responsibilities.
One major limitation of the current OSS tools and platforms
is that they provide a uniform user interface regardless of
the activities performed by the various types of contributors.
This paper serves as a non-trivial first step towards resolving
this challenge by demonstrating a methodology and establishing
knowledge to understand how the contributors’ roles and their
dynamics, reflected in the activities contributors perform, are
exhibited in OSS communities. Based on an analysis of user
action data from 29 GitHub projects, we extracted six activities
that distinguished four Active roles and five Supporting roles of
OSS contributors, as well as patterns in role changes. Through
the lens of the Activity Theory, these findings provided rich design
guidelines for OSS tools to support diverse contributor roles.
Index Terms—open source software, open source community,
activity-based analysis, contributor roles
As a software development model, OSS has experienced
a fast growth during the past decades. The communities
around OSS projects are becoming increasingly heteroge-
neous, comprising not only developers and tech-savvies but
also designers, managers, and users with a wide-ranging level
of experience and expertise. As a result, the ways participants
contribute to the OSS projects also become increasingly di-
verse [7], [12]. However, one major limitation of the current
OSS tools and platforms is that they provide a uniform user
interface regardless of the activities performed by the various
types of contributors interacting with the platform. In other
words, the current OSS tools do not take into enough account
the various roles assumed by the OSS contributors.
This paper serves as a non-trivial first step towards re-
solving this challenge by demonstrating a methodology and
establishing knowledge to understand how the roles and their
dynamics are currently exhibited in OSS communities. In
particular, we focused on examining the roles and their dy-
namics based on the types of activities that OSS contributors
perform. This perspective is inspired by several aspects of
the Activity Theory [24]. Particularly, the Activity Theory,
applied to the field of Human-Computer Interaction, specifies
that information and communication tools need to focus on
mediating human activities, facilitating users to perform a
group of low-level actions and operations in order to achieve
higher-level objectives; additionally, such mediation needs to
be adjustable in an evolving context [20]. To guide our study,
we pose the following research questions:
RQ1: What are the decisive activities that distinguish the roles
assumed by OSS contributors?
RQ2: What are the prominent roles that can be identified
through analyzing a wide range of actions community
contributors perform in a diverse set of projects?
RQ3: How do the roles assumed by the community contrib-
utors in an OSS project change over time?
To answer these questions, we collected and analyzed action
data of 20,838 unique contributors from 29 diverse GitHub
projects and conducted factor and clustering analyses to iden-
tify the prominent activities and roles of OSS contributors.
In the following sections, we first briefly review the related
work (Section II). We then outline our data collection and
analysis methods (Section III). In Section IV, we report our
results on the identified activities and roles, as well as the
role dynamic patterns. We then discuss the implications of
our findings to the design of OSS tools (Section V). Finally,
we provide concluding remarks in Section VI
Our study is related to previous work that focused on users’
roles in information and communication technologies (ICTs)
and studies that investigated the structure of OSS communities.
A. Role-Based Approaches in ICTs
Previous work has explored models and techniques to
identify and support different roles in ICTs of various ap-
plication domains, including collaboration tools [4], access
control systems [5], knowledge co-production platforms [3],
and software engineering tools [1], [27]. For example, Arazy et
al. [3] identified seven roles of Wikipedia contributors, such as
all-round contributors and layout shapers, through a clustering
analysis of user actions in Wikipedia articles.
Our study is most closely related to previous work that
investigated roles in tools and techniques that support software
design and development [23]. Zhu et al. [27] advocated a
complete and consistent role consideration in all aspects of
software engineering and in research about tools through the
lens of Role-Based Software Development (RBSD). Acu˜
and Juristo [1] also proposed a model that consists of 20 gen-
eral capabilities crucial in software development and mapped
arXiv:1903.05277v1 [cs.SE] 13 Mar 2019
these capabilities with 20 predetermined roles in software
projects. Leveraging this model, they presented a procedure
for assigning people to roles according to their capabilities.
More recently, researchers investigated role dynamics in
self-organized software development teams. Hoda et al. [13]
conducted Grounded Theory research involving 58 agile prac-
titioners from 23 software organizations to understand the
role dynamics in agile teams. They identified six “infor-
mal, implicit, transient, and spontaneous” roles performed
by practitioners to reinforce the self-organizing nature of
agile practice. These roles include, for example, mentors who
guide and inform the team in using agile methods, translators
who communicate between customers and technical team, and
champions who acquire supports from senior management.
Our study builds upon these work and explores the dynamics
of various activity-based roles in OSS communities.
B. OSS Community Structure
Much work that investigated the OSS community structure
is based on the “onion” model [19], [25]. This model proposed
a layered structure of responsibilities for OSS projects that
included a small number of core members and a larger number
of peripheral developers and bug fixers [19]. Mockus et al.
[17] examined the Apache web server and the Mozilla browser
as case studies and empirically generated several hypotheses
concerning the OSS community structure. These hypotheses
echoed with the “onion” model that a small number of
developers contributes to the majority of the codebase.
Because of the self-organizing nature of OSS communities
[13], researchers have particularly investigated the evolution
of the OSS structure [6], [15], [21]. For example, Cheng et
al. [6] identified several factors that significantly influenced
developers’ evolution into a core member in OSS ecosystems;
such factors included the total number of projects developer
were willing to join and the degree to which the developer’s
peers were closely connected. Joblin et al. [15] also identified
that the OSS communities tended to evolve from a hierarchi-
cal structure to a hybrid one with a greater distribution of
contributions while the number of developers increases.
More closely related to our work, several recent studies
focused on exploring classification methods for OSS com-
munities. Through a clustering analysis on code committing
metrics extracted from ten OSS projects, Di Bella et al.
(2013) identified three major factors and four developer role
groups that fell on the spectrum from core to occasional
rare developers [10]. Agrawal et al. (2016) also adopted a
clustering approach and explored decision tree models to
classify OSS code committers; their developer classes also
ranged from core developers to less engaged developers [2].
A major limitation of these studies is that they only focused
on code committing activities. While we adopted a similar
statistical approach, our study focused on a much wider va-
riety of actions beyond code contribution and identified more
descriptive activity-oriented factors and roles. We also aimed
to extract common roles in a wide range of OSS projects.
In sum, while previous work has demonstrated non-
negligible effort on understanding the roles and their structures
in ICTs and OSS communities, there are seldom explicit
investigations, with the aim of improving tool support in OSS,
on the correlations among actions performed during goal-
driven activities, nor on the dynamics of activity migrations
accompanied by frequent role changes. Therefore, our work
fills the gap by exploring those important aspects with a data-
driven approach and by following up with a detailed discussion
on the implications for OSS tool design.
We analyzed user action data within the last three years
from 29 GitHub projects that exhibit diverse characteristics.
All data was collected in January 2018.
A. Projects Selection
To cover a wide range of OSS communities, we focused
on projects in different application domains. Particularly, we
randomly selected one project in each category in GitHub
“Collections” 1. GitHub “Collections” are curated lists (a total
of 31 lists at the time of our data collection) of recently
active and influential projects and communities. We eliminated
two lists, “Open data” and “Policies”, which focused on non-
software projects. Table I includes the names of the selected
projects. These projects involved a total of 20,838 unique
contributors (including code contributors, issue reporters and
discussion participants, and pull request reporters and discus-
sion participants), 41,275 issues, 73,763 pull requests, and
240,024 commits. The code repositories are comprised of
4,963,540 lines of code in 24,451 files, covering 15 program-
ming languages.
B. Metrics Selection
To effectively assess the participants’ contribution to their
OSS community, we selected metrics gathered from various
aspects. Those metrics describe the detailed actions contrib-
utors take in order to participate on the OSS projects. First,
code contribution metrics include numbers of commits made,
lines of code changed, and files edited, as well as metrics
related to pull requests (PRs) made by contributors. Second,
opinion contribution metrics assess actions associated with
reporting issues and commenting in issue and PR discussions.
Third, network-related metrics include the number of times a
participant was mentioned or referred other issues or PRs in
discussions. Finally, administration metrics measure manage-
rial actions such as managing labels or manipulating issues or
PRs. Those metrics were inspired by several previous works
[2], [10], [14] and are summarized in Table II.
C. Data Collection
We aimed at extracting the necessary metrics from the
repositories of the 29 GitHub projects and focused on the con-
tributor actions within the three-year period between January
1st, 2015 and January 1st, 2018. To collect such a data set, we
accessibility-developer-tools better errors hospitalrun-frontend neovim refined-github the silver searcher
adarkroom brew jekyll picongpu SoundManager2 TrueCraft cocos2d-html5 kubernetes primer spine urh csslint madison pysc2 superpowers-core utron
basscss guardian/frontend mention-bot railsbridge/docs swipl-devel
Type Metric
# of commits made
# of line of code changed in the codebase
# of files worked on
# of pull requests (PRs) made
Avg. length of PR descriptions*
# of issues reported
Avg. length of issue descriptions*
# of comments made in issue discussions
Avg. length of issue comments*
# of comments made in PR discussions
Avg. length of PR comments*
# of times being mentioned in issue comments
# of times being mentioned in PR comments
# of times referred other issues/PRs in issue comments
# of times referred other issues/PRs in PR comments
# of times applied or removed labels on issues
# of times applied or removed labels on PRs
# of times closed issues
# of times closed pull requests
* All lengths were measured in number of characters
first used the GitHub REST API 2to download the raw data
about code committing actions, issue reporting and comment-
ing actions, PR reporting and commenting actions, as well as
issue and PR events (e.g. labels applied/removed, closed, etc.)
for each project. We then excluded any action data performed
by “bots” (i.e. automated processes presented as GitHub users
who perform event-driven actions). In order to understand the
dynamics of the OSS roles, data for each contributor was then
divided based on the quarter of a year when we calculated the
metrics. As such, our data set accumulated metrics for each
participant in each project across 12 time periods. In total, this
data set is comprised of 38,891 data points, each included 19
dimensions corresponding to the metrics described in Table II.
D. Identifying Activities and Roles
The metrics introduced previously in Section III-B were
selected to measure the concrete actions taken by the user
from distinct perspectives. Those metrics, however, might be
interrelated and can be influenced or determined by a set
of hidden factors. We hypothesize that those hidden factors
are the common activities that OSS contributors engage in
when they are serving certain roles in the projects. To identify
these activities, we first performed a Factor Analysis on the
dataset to understand and interpret the interrelations between
those metrics. Based on these factors, we then conducted a
Clustering Analysis to identify the prominent contributor roles.
Before the factor analysis, all metrics were standardized to
have a mean of zero and unit variance.
1) Factor Analysis: Factor analysis, especially exploratory
factor analysis, is a statistical method to discover underlying
patterns in a set of variables [8]. The main procedures for
factor analysis include factor extraction and rotation.
Maximum Likelihood and Principal Axis Factors (PAF) are
two commonly adopted factor extraction techniques [9]. We
chose the PAF approach because preliminary analysis indi-
cated that the distributions of our data violate the assumption
of multivariate normality [11]. After extracting the factors,
we used the Kaiser criterion and retained the factors with
an eigenvalue larger than 1.0, indicating that those are the
most influential factors (i.e. factors that account for the most
variance in the data) [28].
The retained factors were then rotated to attain a sim-
ple structure that supports a better interpretation. In such a
structure, each rotated factor aims to define a distinct group
of interrelated metrics. Rotation techniques can be generally
divided into orthogonal and oblique rotations; the former
produces factors that are uncorrelated while the latter allows
the factors to correlate. In social science, behaviors can rarely
be partitioned into groups that are independent [9]. We hypoth-
esized that the factors influencing the contributors’ activities
in OSS communities would also exhibit some correlations. We
therefore decided to use the oblique rotation techniques as they
would render more accurate and reproducible results when the
factors are correlated.
Factor analysis produces two results: factor loading and
factor scores. Factor loading represents the correlation of the
original metrics with each identified factor, while factor scores
are the values of each data point mapped in the factor space.
We used the factor loading result to interpret the relations
between metrics listed in Table II. The factor scores were then
used for the clustering analysis in the next step.
2) Clustering: After the activities (i.e. factors) were iden-
tified, we conducted a hierarchical clustering analysis based
on the factor scores data to identify the prominent roles of
OSS contributors. This method aims to construct a hierarchical
structure of clusters; such structure provides more information
about the dataset than unstructured clusters produced by flat
clustering methods such as K-means. Furthermore, hierarchi-
cal methods do not require a predetermined number of clusters
and most of them are deterministic. As such, this method
supports the exploratory nature of our study.
Particularly, we used an agglomerative (or bottom-up) hier-
archical clustering method. In general, agglomerative methods
first treat each data point as a singleton cluster. Pairs of
closest clusters are then successively merged until all clusters
have been merged into a single one that contains all data.
This process produces a hierarchy of clustering that can be
visualized in a tree diagram named dendrogram. Cutting the
dendrogram at a certain level creates a partition of disjoint
clusters. This step is equivalent to grouping only the clusters
with high similarity. Different strategies have been proposed
for measuring the similarity between two clusters. Based on
our initial experiment, we decided to use the ward’s method
[18]. This method produces clusters that are more compact
and suitable for identifying and interpreting prominent roles.
We used the silhouette value to measure the quality of
clusters [22]. It represents how similar one data point is to its
own cluster compared to other clusters. To choose the optimal
number of clusters, we considered the silhouette value while
also referencing to the dendrogram produced by the ward
hierarchical algorithm.
3) Interpreting activities and roles: In order to identify the
meaningful activities and roles represented in the factors and
the clusters, we followed a qualitative process that involved the
following steps. First, both authors independently examined
the actual actions represented by the influential metrics for
each factor and each wrote three to five keywords/phrases to
describe their understanding of the factor. Then the authors
discussed their notes and conducted an “Affinity Diagraming”
study to group their keywords/phrases. Next, a phrase of
higher-level abstraction was given to each group to describe
the factor. Finally, the authors discussed and agreed on the
phrase that described the biggest group in the affinity diagram
of each factor as the activity it represented. We adopted a
similar process in identifying the roles from the clustering
analysis results.
E. Analyzing Role Dynamics
To identify patterns in the dynamics of changes in roles
assumed by individual contributors, we first analyzed the
frequency of changes among the roles with respect to all
contributors. We then measured the role change intensity
(RCI) for each contributor. A contributor’s RCI was calculated
by accumulating, over the 12 time periods, the quantity of
role change between each two consecutive time periods; this
quantity is measured using the Euclidean distance between
cluster centroids of the two roles taken by the contributor
in two consecutive time periods. To accommodate the large
range of change intensity values and to ease comparison, we
calculate RCI using a logarithmic scale. Therefore, the overall
Role Change Intensity (RCI) for each contributor iis:
RoleChangeIntensity(i) = log10
where Ri
tis the cluster centroid of the role assumed by
contributor iat time tand dist(A,B) = pPn(anbn)2
represents the Euclidean distance between vectors Aand B.
This measure provides an ordinal evaluation of the intensity
of the contributors’ role change.
In the following sections, we first present our results on
factor and clustering analyses. We then present findings on
role dynamics.
A. Activity Extraction
Based on the criterion introduced in Section III-D1, we
retained six factors that had eigenvalues greater than 1.0. These
six factors explained 61% of the data variance. The factor
loading results are shown in Table III. Based on the qualitative
analysis described in Section III-E, we explain the activities
represented in these factors as follows:
Factor 1 measures three types of actions: commenting,
being mentioned in comments, and manipulating labels
on PRs. The commenting actions may be associated with
several purposes such as voicing opinions, providing sug-
gestions, and asking or answering questions. But this factor
is most heavily influenced by the number of times the
contributor being mentioned; it also puts a heavy weight
on label manipulation actions. These facts indicated that it
mainly measures behaviors of providing information and
knowledge about the project. We thus name this activity
Knowledge Sharing.
Factor 2 exclusively measures participants’ contributions to
the codebase. We name it Code Contribution.
Factor 3 measures issue referring and label manipulating
actions. We found that issue referring actions are usually
associated with identifying duplicated issues or redirecting
participants to move their discussion to other issues. At
the same time, manipulating issue labels usually involve
categorizing issues (e.g. into bugs or feature requests),
identifying duplicated issues, and/or indicating stages in
issue resolving progress (e.g. triaging, assigned). We thus
name this activity Issue Coordination.
Factor 4 is mostly associated with actions of closing issues
or PRs. We name this activity Progress Control.
Factor 5 is influenced by actions of making PRs and
working on a large number of files. These indicate feature
tweaking or bug fixing activities in which contributors
make small changes on many files and file PRs for these
changes to be included in the main repository. We thus
name this activity Code Tweaking.
Factor 6 is only influenced by the number of issues
reported. We thus name it Issue Reporting.
The factor analysis result also demonstrated some corre-
lations among the extracted activity dimensions (see Table
IV). Particularly, Knowledge Sharing, Issue Coordination, and
Progress Control exhibited high correlations (all pair-wise cor-
relation coefficients r > 0.5). Two other pairs of dimensions,
Knowledge Sharing–Code Tweaking and Issue Reporting–
Issue Coordination, also demonstrated moderate correlation
(r > 0.4). These results supported our hypothesis that factors
Metrics Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 h2u2
# of commits made 0.00 1.03 0.02 -0.07 -0.04 0.05 0.997 0.003
# of line of code changed in the codebase 0.00 1.03 0.02 -0.07 -0.04 0.05 0.997 0.003
# of files worked on 0.00 0.26 -0.07 0.35 0.45 -0.17 0.575 0.425
# of pull requests (PRs) made 0.06 -0.04 -0.11 0.45 0.70 0.05 0.876 0.124
Avg. length of PR descriptions 0.05 -0.01 -0.03 -0.01 0.13 0.01 0.022 0.978
# of issues reported 0.30 0.04 0.18 0.11 0.11 0.35 0.643 0.358
Avg. length of issue descriptions -0.01 0.00 -0.03 0.00 0.02 0.16 0.024 0.976
# of comments made in issue discussions 0.52 -0.01 0.30 0.24 -0.13 0.16 0.900 0.100
Avg. length of issue comments -0.02 0.01 0.00 -0.01 0.00 0.1 0.008 0.992
# of comments made in PR discussions 0.72 -0.01 0.09 0.03 0.25 -0.12 0.820 0.180
Avg. length of PR comments 0.00 -0.01 0.04 -0.02 0.09 0.03 0.013 0.987
# of times being mentioned in issue comments 0.78 0.01 0.1 0.01 -0.02 0.14 0.840 0.160
# of times being mentioned in PR comments 1.02 0.01 -0.16 -0.2 0.2 -0.22 0.743 0.257
# of times referred other issues/PRs in issue comments -0.12 0.02 1.06 -0.07 0.02 -0.15 0.808 0.192
# of times referred other issues/PRs in PR comments 0.06 0.00 0.59 -0.05 0.21 0.00 0.503 0.497
# of times applied or removed labels on issues 0.06 0.01 0.72 0.24 -0.23 -0.07 0.725 0.275
# of times applied or removed labels on PRs 0.58 0.00 0.13 -0.03 0.15 -0.07 0.516 0.485
# of times closed issues 0.28 -0.03 0.08 0.65 -0.24 0.04 0.701 0.299
# of times closed pull requests -0.30 -0.08 0.07 0.95 0.33 -0.06 0.857 0.144
Activity Name Knwl.
Note 1: The h2column represents the estimated proportion of variance of the each metrics that are shared with other metrics and can
explained by factors. The u2column (equals 1h2) denotes the variance that are unique to the metric itself.
Note 2: Yellow cells indicate that the loading is greater than 0.5; green cells indicate that the loading is between 0.3 and 0.5.
Code Contrib. 0.18
Issue Coord. 0.70 0.09
Progress Ctrl. 0.61 0.35 0.56
Code Twking. 0.47 0.30 0.29 0.32
Issue Rptg. 0.38 -0.05 0.44 0.13 0.22
influencing the contributors’ actions in the OSS community
are not independent.
B. Roles Identification
Figure 1 shows the dendrogram of our hierarchical cluster-
ing results. We observed that there are two major groups of
clusters that exhibited markedly different structures. The ma-
jority of the data points (N= 37,310) fell into a cluster with a
low dendrogram height, while some data points (N= 1,581)
represented a much higher height. In other words, the variance
among sub-clusters in the first group was much smaller than
that of the second group. This difference indicated that our
data included two very distinct groups of users.
By examining the cluster centers and samples from each
group, we found that the second group generates a much
higher value in all factor dimensions when compared with
the first group; in other words, contributors in this group
are much more active in terms of all activities indicated by
the factors. We thus consider the second group as comprised
of Active Contributors in their communities, while the first
group represents the Supporting Contributors. Because the
0 50 100 150 200 250 300
Upper tree of cut at h=30
Branch 1
Branch 2
Branch 3
Branch 4
Branch 5
Branch 6
Branch 7
Branch 8
Branch 9
Branch 10
Branch 11
Branch 12
Branch 13
Branch 14Branch 15
Branch 16
Branch 17
Branch 18
Branch 19
Branch 20
Branch 21
Branch 22Branch 23
Branch 24Branch 25
Branch 26
Branch 27
Branch 28
Branch 29
Branch 30
Branch 31
Branch 32
Branch 33
Branch 34
Branch 35
Branch 36
Branch 37
Branch 38
Branch 39
Branch 40
Branch 41
Branch 42
Branch 43
Branch 44
Branch 45
Branch 46
Branch 47
Branch 48
Branch 49
Branch 50
Branch 51
Branch 52
Branch 53
Branch 54
Branch 55
Branch 56
Branch 57
Branch 58
Branch 59
Branch 60
Branch 61
Branch 62
Branch 63
Branch 64
Branch 65
Branch 66
Branch 67
Branch 68
Branch 69
Branch 70
Branch 71
Branch 72
Branch 73
Branch 74
Branch 75
Branch 76
Branch 77
Branch 78
Branch 79
Branch 80
Branch 81
Branch 82
Branch 83
Branch 84
Branch 85
Branch 86
Branch 87
Branch 88
Branch 89
Branch 90
Branch 91
Branch 92
Branch 93
Branch 94
Branch 95
Branch 96
Branch 97
Branch 98
Branch 99
Branch 100
Branch 101
Branch 102
Branch 103
Branch 104
Branch 105
Branch 106
Branch 107
Branch 108
Branch 109
Branch 110
Branch 111
Branch 112
Branch 113
Branch 114
Branch 115
Branch 116
Branch 117
Branch 118
Branch 119
Branch 120
Branch 121
Branch 122
Branch 123
Branch 124
Branch 125
Branch 126
Branch 127
Branch 128
Branch 129
Branch 130
Branch 131
Branch 132
Branch 133
Branch 134
Branch 135
Branch 136
Branch 137
Branch 138
Branch 139
Branch 140
Branch 141
Branch 142
Branch 143
Branch 144
Branch 145
Branch 146
Branch 147
Branch 148
Branch 149
Branch 150
Branch 151
Branch 152
Branch 153
Branch 154
Branch 155
Branch 156
Branch 157
Branch 158
Branch 159
Branch 160
Branch 161
Branch 162
Branch 163
Branch 164
Branch 165
Branch 166
Branch 167
Branch 168
Branch 169
Branch 170
Branch 171
Branch 172
Branch 173
Branch 174Branch 175
Branch 176
Branch 177
Branch 178
Branch 179
Branch 180
Branch 181
Branch 182
Branch 183
Branch 184
Branch 185
Branch 186
Branch 187
Branch 188
Branch 189
Branch 190
Branch 191
Branch 192
Branch 193
Branch 194
Branch 195
Branch 196
Branch 197
Branch 198
Branch 199
Branch 200
Branch 201
Branch 202
Branch 203
Branch 204
Branch 205
Branch 206
Branch 207
Branch 208
Branch 209
Branch 210
Branch 211
Branch 212
Branch 213
Branch 214
Branch 215
Branch 216
Branch 217
Branch 218
Branch 219
Branch 220
Branch 221
Branch 222
Branch 223
Branch 224
Branch 225
Branch 226
Branch 227
Branch 228
Branch 229
Branch 230
Branch 231
Branch 232
Branch 233
Branch 234
Branch 235
Branch 236
Branch 237
Branch 238
Branch 239
Branch 240
Branch 241
Branch 242Branch 243
Branch 244
Branch 245
Branch 246
Branch 247
Branch 248
Branch 249
Branch 250
Branch 251
Branch 252
Branch 253
Branch 254
Branch 255
Branch 256
Branch 257
Branch 258
Branch 259
Branch 260
Branch 261
Branch 262
Branch 263
Branch 264
Branch 265
Branch 266
Branch 267
Branch 268
Branch 269
Branch 270
Branch 271
Branch 272
Branch 273
Branch 274
Branch 275
Branch 276
Branch 277
Branch 278
Branch 279
Branch 280
Branch 281
Branch 282
Branch 283
Branch 284
Branch 285
Branch 286
Branch 287
Branch 288
Branch 289
Branch 290
Branch 291
Branch 292
Branch 293
Branch 294
Branch 295
Branch 296
Branch 297
Branch 298
Branch 299
Branch 300
Branch 301
Branch 302
Branch 303
Branch 304
Branch 305
Branch 306
Branch 307
Branch 308
Branch 309
Branch 310
Branch 311
Branch 312
Branch 313
Branch 314
Branch 315
Branch 316
Branch 317
Branch 318
Branch 319
Branch 320
Branch 321
Branch 322
Branch 323
Branch 324
Branch 325
Branch 326
Branch 327
Branch 328
Branch 329
Branch 330
Branch 331
Branch 332
Branch 333
Branch 334
Branch 335
Branch 336
Branch 337
Branch 338
Branch 339
Branch 340
Branch 341
Branch 342
Branch 343
Branch 344
Branch 345
Branch 346
Branch 347
Branch 348
Branch 349
Branch 350
Branch 351
Branch 352
Branch 353
Branch 354
Branch 355
Branch 356
Branch 357
Branch 358
Branch 359
Branch 360
Branch 361
Branch 362
Branch 363
Branch 364
Branch 365
Branch 366
Branch 367
Branch 368
Branch 369
Branch 370
Branch 371
Branch 372
Branch 373
Branch 374
Branch 375
Branch 376
Branch 377
Branch 378
Branch 379
Branch 380
Branch 381
Branch 382
Branch 383
Branch 384
Branch 385
Branch 386
Branch 387
Branch 388
Branch 389
Branch 390
Branch 391
Branch 392
Branch 393
Branch 394
Branch 395
Branch 396
Branch 397
Branch 398
Branch 399
Branch 400
Branch 401
Branch 402
Branch 403
Branch 404
Branch 405
Branch 406
Branch 407
Branch 408
Branch 409
Branch 410
Branch 411
Branch 412
Branch 413
Branch 414
Branch 415
Branch 416
Branch 417
Branch 418
Branch 419
Branch 420
Branch 421
Branch 422
Branch 423
Branch 424
Branch 425
Branch 426
Branch 427
Branch 428
Branch 429
Branch 430
Branch 431
Branch 432
Branch 433
Branch 434
Branch 435
Branch 436
Branch 437
Branch 438
Branch 439
Branch 440
Branch 441
Branch 442
Branch 443
Branch 444
Branch 445
Branch 446
Branch 447
Branch 448
Branch 449
Branch 450
Branch 451
Branch 452
Branch 453
Branch 454
Branch 455
Branch 456
Branch 457
Branch 458
Branch 459
Branch 460
Branch 461
Branch 462
Branch 463
Branch 464
Branch 465
Branch 466
Branch 467
Branch 468
Branch 469
Branch 470
Branch 471
Branch 472
Branch 473
Branch 474
Branch 475
Branch 476
Branch 477
Branch 478
Branch 479
Branch 480
Branch 481
Branch 482
Branch 483
Branch 484
Branch 485
Branch 486
Branch 487
Branch 488
Branch 489
Branch 490
Branch 491
Branch 492
Branch 493
Branch 494
Branch 495
Branch 496
Branch 497
Branch 498
Branch 499
Branch 500
Branch 501
Branch 502
Branch 503
Branch 504
Branch 505
Branch 506
Branch 507
Branch 508
Branch 509
Branch 510
Branch 511
Branch 512
Branch 513
Branch 514
Branch 515
Branch 516
Branch 517
Branch 518
Branch 519
Branch 520
Branch 521
Branch 522
Branch 523
Branch 524
Branch 525
Branch 526
Branch 527
Branch 528
Branch 529
Branch 530
Branch 531
Branch 532
Branch 533
Branch 534
Branch 535
Branch 536
Branch 537
Branch 538
Branch 539
Branch 540
Branch 541
Branch 542
Branch 543
Branch 544
Branch 545
Branch 546
Branch 547
Branch 548
Branch 549
Branch 550
Branch 551
Branch 552
Branch 553
Branch 554
Branch 555
Branch 556
Branch 557
Branch 558
Branch 559
Branch 560
Branch 561
Branch 562
Branch 563
Branch 564
Branch 565
Branch 566
Branch 567
Branch 568
Branch 569
Branch 570
Branch 571
Branch 572
Branch 573
Branch 574
Branch 575
Branch 576
Branch 577
Branch 578
Branch 579
Branch 580
Branch 581
Branch 582
Branch 583
Branch 584
Branch 585
Branch 586
Branch 587
Branch 588
Branch 589
Branch 590
Branch 591
Branch 592
Branch 593
Branch 594
Branch 595
Branch 596
Branch 597
Branch 598
Branch 599
Branch 600
Branch 601
Branch 602
Branch 603
Branch 604
Branch 605
Branch 606
Branch 607
Branch 608
Branch 609
Branch 610
Branch 611
Branch 612
Branch 613
Branch 614
Branch 615
Branch 616
Branch 617
Branch 618
Branch 619
Branch 620
Branch 621
Branch 622
Branch 623
Branch 624
Branch 625
Branch 626
Branch 627
Branch 628
Branch 629
Branch 630
Branch 631
Branch 632
Branch 633
Branch 634
Branch 635
Branch 636
Branch 637
Branch 638
Branch 639
Branch 640
Branch 641
Branch 642
Branch 643
Branch 644
Branch 645
Branch 646
Branch 647
Branch 648
Branch 649
Branch 650
Branch 651
Branch 652
Branch 653
Branch 654
Branch 655
Branch 656
Branch 657
Branch 658
Branch 659
Branch 660
Branch 661
Branch 662
Branch 663
Branch 664
Branch 665
Branch 666
Branch 667
Branch 668
Branch 669
Branch 670
Branch 671
Branch 672
Branch 673
Branch 674
Branch 675
Branch 676
Branch 677
Branch 678
Branch 679
Branch 680
Branch 681
Branch 682
Branch 683
Branch 684
Branch 685
Branch 686
Branch 687
Branch 688
Branch 689
Branch 690
Branch 691
Branch 692
Branch 693
Branch 694
Branch 695
Branch 696
Branch 697
Branch 698
Branch 699
Branch 700
Branch 701
Branch 702
Branch 703
Branch 704
Branch 705
Branch 706
Branch 707
Branch 708
Branch 709
Branch 710
Branch 711
Branch 712
Branch 713
Branch 714
Branch 715
Branch 716
Branch 717
Branch 718
Branch 719
Branch 720
Branch 721
Branch 722
Branch 723
Branch 724
Branch 725
Branch 726
Branch 727
Branch 728
Branch 729
Branch 730
Branch 731
Branch 732
Branch 733
Branch 734
Branch 735
Branch 736
Branch 737
Branch 738
Branch 739
Branch 740
Branch 741
Branch 742
Branch 743
Branch 744
Branch 745
Branch 746
Branch 747
Branch 748
Branch 749
Branch 750
Branch 751
Branch 752
Branch 753
Branch 754
Branch 755
Branch 756
Branch 757
Branch 758
Branch 759
Branch 760
Branch 761
Branch 762
Branch 763
Branch 764
Branch 765
Branch 766
Branch 767
Branch 768
Branch 769
Branch 770
Branch 771
Branch 772
Branch 773
Branch 774
Branch 775
Branch 776
Branch 777
Branch 778
Branch 779
Branch 780
Branch 781
Branch 782
Branch 783
Branch 784
Branch 785
Branch 786
Branch 787
Branch 788
Branch 789
Branch 790
Branch 791
Branch 792
Branch 793
Branch 794
Branch 795
Branch 796
Branch 797
Branch 798
Branch 799
Branch 800
Branch 801
Branch 802
Branch 803
Branch 804
Branch 805
Branch 806
Branch 807
Branch 808
Branch 809
Branch 810
Branch 811
Branch 812
Branch 813
Branch 814
Branch 815
Branch 816
Branch 817
Branch 818
Branch 819
Branch 820
Branch 821
Branch 822
Branch 823
Branch 824
Branch 825
Branch 826
Branch 827
Branch 828
Branch 829
Branch 830
Branch 831
Branch 832
Branch 833
Branch 834
Branch 835
Branch 836
Branch 837
Branch 838
Branch 839
Branch 840
Branch 841
Branch 842
Branch 843
Branch 844
Branch 845
Branch 846
Branch 847
Branch 848
Branch 849
Branch 850
Branch 851
Branch 852
Branch 853
Branch 854
Branch 855
Branch 856
Branch 857
Branch 858
Branch 859
Branch 860
Branch 861
Branch 862
Branch 863
Branch 864
Branch 865
Branch 866
Branch 867
Branch 868
Branch 869
Branch 870
Branch 871
Branch 872
Branch 873
Branch 874
Branch 875
Branch 876
Branch 877
Branch 878
Branch 879
Branch 880
Branch 881
Branch 882
Branch 883
Branch 884
Branch 885
Branch 886
Branch 887
Branch 888
Branch 889
Branch 890
Branch 891
Branch 892
Branch 893
Branch 894
Branch 895
Branch 896
Branch 897
Branch 898
Branch 899
Branch 900
Branch 901
Branch 902
Branch 903
Branch 904
Branch 905
Branch 906
Branch 907
Branch 908
Branch 909
Branch 910
Branch 911
Branch 912
Branch 913
Branch 914
Branch 915
Branch 916
Branch 917
Branch 918
Branch 919
Branch 920
Branch 921
Branch 922
Branch 923
Branch 924
Branch 925
Branch 926
Branch 927
Branch 928
Branch 929
Branch 930
Branch 931
Branch 932
Branch 933
Branch 934
Branch 935
Branch 936
Branch 937
Branch 938
Branch 939
Branch 940
Branch 941
Branch 942
Branch 943
Branch 944
Branch 945
Branch 946
Branch 947
Branch 948
Branch 949
Branch 950
Branch 951
Branch 952
Branch 953
Branch 954
Branch 955
Branch 956
Branch 957
Branch 958
Branch 959
Branch 960
Branch 961
Branch 962
Branch 963
Branch 964
Branch 965
Branch 966
Branch 967
Branch 968
Branch 969
Branch 970
Branch 971
Branch 972
Branch 973
Branch 974
Branch 975
Branch 976
Branch 977
Branch 978
Branch 979
Branch 980
Branch 981
Branch 982
Branch 983
Branch 984
Branch 985
Branch 986
Branch 987
Branch 988
Branch 989
Branch 990
Branch 991
Branch 992
Branch 993
Branch 994
Branch 995
Branch 996
Branch 997
Branch 998
Branch 999
Branch 1000
Branch 1001
Branch 1002
Branch 1003
Branch 1004
Branch 1005
Branch 1006
Branch 1007
Branch 1008
Branch 1009
Branch 1010
Branch 1011
Branch 1012
Branch 1013
Branch 1014
Branch 1015
Branch 1016
Branch 1017
Branch 1018
Branch 1019
Branch 1020
Branch 1021
Branch 1022
Branch 1023
Branch 1024
Branch 1025
Branch 1026
Branch 1027
Branch 1028
Branch 1029
Branch 1030
Branch 1031
Branch 1032
Branch 1033
Branch 1034
Branch 1035
Branch 1036
Branch 1037
Branch 1038
Branch 1039
Branch 1040
Branch 1041
Branch 1042
Branch 1043
Branch 1044
Branch 1045
Branch 1046
Branch 1047
Branch 1048
Branch 1049
Branch 1050
Branch 1051
Branch 1052
Branch 1053
Branch 1054
Branch 1055
Branch 1056
Branch 1057
Branch 1058
Branch 1059
Branch 1060
Branch 1061
Branch 1062
Branch 1063
Branch 1064
Branch 1065
Branch 1066
Branch 1067
Branch 1068
Branch 1069
Branch 1070
Branch 1071
Branch 1072
Branch 1073
Branch 1074
Branch 1075
Branch 1076
Branch 1077
Branch 1078
Branch 1079
Branch 1080
Branch 1081
Branch 1082
Branch 1083
Branch 1084
Branch 1085
Branch 1086
Branch 1087
Branch 1088
Branch 1089
Branch 1090
Branch 1091
Branch 1092
Branch 1093
Branch 1094
Branch 1095
Branch 1096
Branch 1097
Branch 1098
Branch 1099
Branch 1100
Branch 1101
Branch 1102
Branch 1103
Branch 1104
Branch 1105
Branch 1106
Branch 1107
Branch 1108
Branch 1109
Branch 1110
Branch 1111
Branch 1112
Branch 1113
Branch 1114
Branch 1115
Branch 1116
Branch 1117
Branch 1118
Branch 1119
Branch 1120
Branch 1121
Branch 1122
Branch 1123
Branch 1124
Branch 1125
Branch 1126
Branch 1127
Branch 1128
Branch 1129
Branch 1130
Branch 1131
Branch 1132
Branch 1133
Branch 1134
Branch 1135
Branch 1136
Branch 1137
Branch 1138
Branch 1139
Branch 1140
Branch 1141
Branch 1142
Branch 1143
Branch 1144
Branch 1145
Branch 1146
Branch 1147
Branch 1148
Branch 1149
Branch 1150
Branch 1151
Branch 1152
Branch 1153
Branch 1154
Branch 1155
Branch 1156
Branch 1157
Branch 1158
Branch 1159
Branch 1160
Branch 1161
Branch 1162
Branch 1163
Branch 1164
Branch 1165
Branch 1166
Branch 1167
Branch 1168
Branch 1169
Branch 1170
Branch 1171
Branch 1172
Branch 1173
Branch 1174
Branch 1175
Branch 1176
Branch 1177
Branch 1178
Branch 1179
Branch 1180
Branch 1181
Branch 1182
Branch 1183
Branch 1184
Branch 1185
Branch 1186
Branch 1187
Branch 1188
Branch 1189
Branch 1190
Branch 1191
Branch 1192
Branch 1193
Branch 1194
Branch 1195
Branch 1196
Branch 1197
Branch 1198
Branch 1199
Branch 1200
Branch 1201
Branch 1202
Branch 1203
Branch 1204
Branch 1205
Branch 1206
Branch 1207
Branch 1208
Branch 1209
Branch 1210
Branch 1211
Branch 1212
Branch 1213
Branch 1214
Branch 1215
Branch 1216
Branch 1217Branch 1218
Branch 1219
Branch 1220
Branch 1221
Branch 1222Branch 1223
Branch 1224
Branch 1225
Branch 1226
Branch 1227
Branch 1228
Branch 1229
Branch 1230
Branch 1231
Branch 1232
Branch 1233
Branch 1234
Branch 1235
Branch 1236
Branch 1237
Branch 1238
Branch 1239
Branch 1240
Branch 1241
Branch 1242
Branch 1243
Branch 1244
Branch 1245
Branch 1246
Branch 1247
Branch 1248
Branch 1249
Branch 1250
Branch 1251
Branch 1252
Branch 1253
Branch 1254
Branch 1255
Branch 1256
Branch 1257
Branch 1258
Branch 1259
Branch 1260
Branch 1261
Branch 1262
Branch 1263
Branch 1264
Branch 1265
Branch 1266
Branch 1267
Branch 1268
Branch 1269
Branch 1270
Branch 1271
Branch 1272
Branch 1273
Branch 1274
Branch 1275
Branch 1276
Branch 1277Branch 1278
Branch 1279
Branch 1280
Branch 1281
Branch 1282
Branch 1283
Branch 1284
Branch 1285
Branch 1286
Branch 1287
Branch 1288
Branch 1289
Branch 1290
Branch 1291
Branch 1292
Branch 1293
Branch 1294
Branch 1295
Branch 1296
Branch 1297
Branch 1298
Branch 1299
Branch 1300
Branch 1301
Branch 1302
Branch 1303
Branch 1304
Branch 1305
Branch 1306
Branch 1307
Branch 1308
Branch 1309
Branch 1310
Branch 1311
Branch 1312
Branch 1313
Branch 1314
Branch 1315
Branch 1316
Branch 1317
Branch 1318
Branch 1319
Branch 1320
Branch 1321
Branch 1322
Branch 1323
Branch 1324
Branch 1325
Branch 1326
Branch 1327
Branch 1328
Branch 1329
Branch 1330
Branch 1331
Branch 1332
Branch 1333
Branch 1334
Branch 1335
Branch 1336
Branch 1337
Branch 1338
Branch 1339
Branch 1340
Branch 1341
Branch 1342
Branch 1343
Branch 1344
Branch 1345
Branch 1346
Branch 1347
Branch 1348
Branch 1349
Branch 1350
Branch 1351
Branch 1352
Branch 1353
Branch 1354
Branch 1355
Branch 1356
Branch 1357
Branch 1358
Branch 1359
Branch 1360
Branch 1361
Branch 1362
Branch 1363
Branch 1364
Branch 1365
Branch 1366
Branch 1367
Branch 1368
Branch 1369
Branch 1370
Branch 1371
Branch 1372
Branch 1373
Branch 1374
Branch 1375
Branch 1376
Branch 1377
Branch 1378
Branch 1379
Branch 1380
Branch 1381
Branch 1382
Branch 1383
Branch 1384
Branch 1385
Branch 1386
Branch 1387
Branch 1388
Branch 1389
Branch 1390
Branch 1391
Branch 1392
Branch 1393
Branch 1394
Branch 1395
Branch 1396
Branch 1397
Branch 1398
Branch 1399
Branch 1400
Branch 1401
Branch 1402
Branch 1403
Branch 1404
Branch 1405
Branch 1406
Branch 1407
Branch 1408
Branch 1409
Branch 1410
Branch 1411
Branch 1412
Branch 1413
Branch 1414
Branch 1415
Branch 1416
Branch 1417
Branch 1418
Branch 1419
Branch 1420
Branch 1421
Branch 1422
Branch 1423
Branch 1424
Branch 1425
Branch 1426
Branch 1427
Branch 1428
Branch 1429
Branch 1430
Branch 1431
Branch 1432
Branch 1433
Branch 1434
Branch 1435
Branch 1436
Branch 1437
Branch 1438
Branch 1439
Branch 1440
Branch 1441
Branch 1442
Branch 1443
Branch 1444
Branch 1445
Branch 1446
Branch 1447
Branch 1448
Branch 1449
Branch 1450
Branch 1451
Branch 1452
Branch 1453
Branch 1454
Branch 1455
Branch 1456
Branch 1457
Branch 1458
Branch 1459
Branch 1460
Branch 1461
Branch 1462
Branch 1463
Branch 1464
Branch 1465
Branch 1466
Branch 1467
Branch 1468
Branch 1469
Branch 1470
Branch 1471
Branch 1472
Branch 1473
Branch 1474
Branch 1475
Branch 1476
Branch 1477
Branch 1478
Branch 1479
Branch 1480
Branch 1481
Branch 1482
Branch 1483
Branch 1484
Branch 1485
Branch 1486
Branch 1487
Branch 1488
Branch 1489
Branch 1490
Branch 1491
Branch 1492
Branch 1493
Branch 1494
Branch 1495
Branch 1496
Branch 1497
Branch 1498
Branch 1499
Branch 1500
Branch 1501
Branch 1502
Branch 1503
Branch 1504
Branch 1505
Branch 1506
Supporting Participants
Active Participants
Supporting Participants
Active Participants
Fig. 1. Hierarchical tree of clustering results. The red lines indicates the
height cutoff values for the two groups.
sub-cluster distances within these two high-level groups are
very different, we cut the two sub-trees at different heights
when identifying the specific role clusters.
Based on the silhouettes measure and the dendrogram
structure, we considered four clusters in the Active Contrib-
utors group and five clusters for Supporting Contributors;
the red lines on Figure 1 indicates the height cutoff values.
Once the clusters were determined, we followed a qualitative
process and named the clusters based on the activity space
characteristics of each cluster centroid and analysis of actual
actions performed by representative users in each cluster.
The characteristics for those clusters are shown in Figure 2
and Figure 3 respectively. We discuss those clusters and our
Progress Control
Issue Fixing
Issue Reporting
Progress Controller
Engaged Issue Reporter
Issue Fixer
Occasional Issue Reporter
Rare Contrib utor
Code Tweaking
Fig. 2. Activity space characteristics of roles in the Active Contributors group.
-0.1235191 -0.2124043 -0.2553889
Progress Control
Issue Fixing
Issue Reporting
Progress Controller
Engaged Issue Reporter
Issue Fixer
Occasional Issue Reporter
Rare Contrib utor
Progress Control
Issue Fixing
Issue Reporting
Coor dinator
Core Developer
Intense Code Contributor
Code Tweaking
Fig. 3. Activity space characteristics of roles in the Supporting Contributors
rationale for naming the roles as follows.
Among the Active Contributors,Intense Code Contribu-
tors exerted an extremely high contribution to the codebase.
Additionally, there was only a small number of contributors
assumed this role. Their main focus seemed to be developing a
certain functionality of the software within a short time period.
Coordinators provided only a small amount of code contri-
bution. Instead, they focused mainly on Knowledge Sharing,
Issue Coordination, and Issue Reporting activities. They are
usually the owner of the project or a core member of the
community. Core Developer exerted very little contribution to
Issue Reporting but performed actively in Code Contribution,
Code Tweaking, Progress Control, and Knowledge Sharing.
They seemed to focus mainly on development and knowledge
sharing about the code. All-Rounders provided a medium
level of contribution in all dimensions.
The Supporting Contributors usually focused on only one
or two activities. Engaged Issue Reporters and Occasional
Issue Reporters both focused on Issue Reporting, but differed
on the quantity of their contributions. Progress Controllers
mostly engaged in the Progress Control activity, with some
contribution to Knowledge Sharing and Issue Coordination;
they almost never engaged in Code Tweaking contributions.
An analysis of sample users of this group revealed that they
are usually core members of the community and focused
on activities such as code reviewing, quality control, and
approving and merging PRs. Issue Fixers focused on making
small tweaks to the code or fixing bugs. Rare Contributors
only participated in a minuscule amount of activities.
C. Role Dynamics
Among all contributors included in our data set, most
(N= 16,706,78.9%) only assumed the Rare Contributor
role in certain periods of time during the past three years.
There were also two contributors who engaged in their projects
throughout the analyzed time periods with the same role (All-
Rounder). We excluded them in our analysis. Within the rest
of the contributors, there are 4,483 who only assumed roles
in the Supporting role group throughout the 12 time periods.
The rest (N= 479) have assumed roles in the Active role
group at least one time in the past three years. We focused
on analyzing the role change dynamics of these two types
OSS community contributors. When performing the analysis,
we considered “Absent” (i.e. did not perform any contribution
during a time period) as an additional role type.
Figure 4 shows the heatmap of role transition frequency
for contributors who only assumed the Supporting roles. Not
surprisingly, the most frequency transition happened among
Absent, Rare Contributor, and Occasional Issue Reporter roles.
The transition frequency from Absent to the other roles also
indicated ways people got involved to an OSS community:
people rarely started as a Progress Controller; aside from
occasional contributions, contributors usually began to engage
in a project by assuming Issue Fixer and Engaged Issue Re-
porter roles. Interestingly, while any transition to the Progress
Controller was rare, the transition from Progress Controller to
Occasional Issue Reporter was frequent. This may illustrate
a retiring path of community core members if they do not
continue contributing as an Active Participant.
Absen t
Progr ess
OCC Issue
ENG Issue
Absen t 2048 253 33213 278
Rare Contri. 2591 194 8752 81
Issue Fix er 95 272 171 5
Progr ess C ontroller 118 3511 0
OCC Issue Reporter 2248 1492 76 292
ENG Issue R eporter 166 136 16 00
Absen t
Progr ess
OCC Issue
ENG Issue
Absen t 2048/1587 253/250 3/3 3213/3100 278/278
Rare Contri. 2591/1738 194/163 8/5 752/ 686 81/76
Issue Fix er 95/95 272/252 1/1 71/71 5/5
Progr ess C ontroller 1/1 18/11 3/3 511/377 0/0
OCC Issue Reporter 2248/2167 1492/ 1325 76/70 2/ 2 92/82
ENG Issue R eporter 166/165 136/135 16/13 0/0 0/0
Fig. 4. Role transition frequency heatmap for contributors who only assumed
the Supporting roles.
Figure 5 shows the heatmap of role transition frequency for
contributors who have assumed the Active roles. Aside from
transitions between Absent and Rare Contributors, the most
frequent transitions happened between All-Rounder (Active
role) and Issue Fixer (Supporting role). This type of transition
may have represented a working style of a group of OSS
contributors, who generally engaged in all aspects of the
community but switch to focus on issue fixing when issues ac-
cumulate. Transitions among All-Rounder (Active role), Rare
Contributor (Supporting role), and Occasional Issue Reporter
(Supporting role) were also frequent, indicating many active
community contributors may take breaks from their work.
All-Round er
Code Contri.
Rare Contri.
Issue Fixer
Progr ess
OCC Issue
ENG I ssue
Absent 108 412180 55 143 25
All-Round er 26 34 61 2109 181 32 82 30
Coordinator 132 4146412
Core Developer 044 10 125001
Intense Code Contri. 042011000
Rare Contri. 174 93 33071 11 74 11
Issue Fixer 14 152 51094 455 4
Progr ess C ontroller 326 01017 85 1
OCC Issue Reporter 14 72 120109 46 423
ENG I ssue R eporter 234 03020 9 0 33
Fig. 5. Role transition frequency heatmap for contributors who have assumed
the Active roles.
We calculated a Role Change Intensity (RCI) score for each
participant in each project using Equation 1. As the cluster dis-
tances among the Supporting roles are close to each other, the
RCI scores for contributors who only assumed the Supporting
roles are expected to be close to zero. As a result, we only
focused on RCI scores for contributors who have ever assumed
the Active roles. Figure 6 shows the histogram of their RCI
scores. The results showed a right-skewed distribution with a
median of 0.99 (IQR = 1.21 0.92). Examining the scenario
in which only one role change has occurred throughout the
whole time periods, we found that the median RCI across all
types of role changes is 1.11 (IQR = 1.26 0.35). These
results indicated that most contributors engaged in a medium-
level RCI while some experienced role changes across a
medium to high-level intensity.
Histogram of data
0.5 1.0 1.5 2.0
0 50 100 150
Fig. 6. Histogram of RCI across contributors who have assumed the Active
Our work provides several implications for designing OSS
tools to support role-based interactions. In this section, we first
discuss these implications. We then consider the limitations of
our current study and discuss directions for future work.
A. Mediate OSS Activities Instead of Actions
Current OSS tools usually involve features that focus on
supporting low-level actions such as code committing, issue
reporting, commenting, applying labels, etc. There is a lack of
focus on mediating higher-level OSS activities. Through the
factor analysis, we identified six activities that most clearly
distinguished the roles of OSS contributors. Some of them,
such as Code Contribution and Issue Reporting, only involved
a small number of actions and were well supported in the
current tools. However, others involved multiple actions that
were currently supported in different, and sometimes isolated
tool features. For example, the Issue Coordination activity
involves actions to create links among the issues and the pull
requests, as well as managing issue labels; on GitHub, there is
no connection between the link creation (through commenting)
and the issue labeling features.
Reflecting on the Activity Theory, tools need to serve as
“functional organs” to help users achieve their goal-oriented
activities [16], [20]. We argue that the OSS tool designers need
to consider the activities identified in this paper, which were
aligned with the representative goals that OSS contributors
hold when performing the corresponding actions. Particularly,
they may explore connections surrounding the features that
support the underlying actions for each activity to facilitate
a smooth transition among the actions. Moreover, based on
the correlations found among the activities, OSS tool design-
ers may consider more sophisticated feature connections to
support users move among activities.
B. Support Role-Based Interaction
Through the clustering analysis, we identified four Active
roles and five Supporting roles of OSS contributors. While
the literature has strongly advocated role-based interaction
in software engineering tools [26], the realization of such
interaction is still immature in the OSS world. One reason for
this gap is that there is currently little knowledge or guidance
for the tool designers to have a clear conception as to what
high-level activities and detailed actions each role takes. Our
data-driven and activity-based roles addressed this limitation.
On one hand, the roles identified in this paper reflected some
characteristics of the roles in the literature (e.g. the “onion”
model [19]). For example, confirming the hypothesis posed
in the “onion” model, our data indicated a small number of
Active contributors who make a large amount of contributions
and a vast number of Supporting contributors. On the other
hand, however, our roles provided a non-simplistic trace to the
main activities each role focuses on. For example, our results
showed that the Progress Controllers do not only perform the
Progress Control activity but they were usually also involved
in Knowledge Sharing and Issue Coordination; the Engaged
Issue Reporters usually also perform the Issue Coordination
activity. As a result, these roles paint a more comprehensive
picture about activities and responsibilities of OSS contribu-
tors. The OSS tool designers can use this information to better
satisfy the goals and needs of OSS contributors in role-based
interaction design. Particularly, they can use the activities and
the actions underlying each role as a design guideline.
C. Support Role Change: Onboarding and Retiring
There is limited discussion about OSS tools that support role
change in the literature. However, our data indicated that role
change in OSS communities is both frequent and somewhat
intense. As a result, techniques and tools that facilitate a
smooth change of roles can be useful for OSS contributors.
While our results have indicated a complex role change
model, onboarding and retiring are among the most important
types of role change for OSS communities. Our data confirmed
a common impression that OSS contributors usually get in-
volved in a project through issue reporting and fixing. We also
identified that a common retiring path of Active contributors
is also though issue-related activities. These findings indicated
a central role of the Issue Management Systems in the on-
boarding and retiring processes. Tool designers may consider
including features in the Issue Management Systems to support
new contributors to be better engaged in the community culture
and acquire the necessary knowledge and skill. They may
also enhance the Issue Management Systems to help retiring
members transfer knowledge and tasks.
D. Limitations and Future Work
Although diverse, the contribution metrics used in our
study are based only on quantity, rather than quality, of the
actions taken by OSS contributors. Our research can thus
be extended with studies focused on extracting qualitative
measures of contribution. Additionally, while the 29 OSS
projects analyzed in this work were carefully selected to cover
a wide variety in terms of application domains, programming
language, community size, and code base size, future work
that validates our findings in a larger amount of OSS projects
and communities can be useful. Moreover, we focused on in-
vestigating OSS contributors’ roles within a project. However,
many contemporary OSS communities were structured around
a group of projects (i.e. a project ecosystem). Exploring how
our model and method can generalize to such higher-scaled
OSS communities is an interesting future work.
In this study, we adopted a data-driven approach to under-
standing the diverse roles and their dynamics in OSS commu-
nities. From an analysis of 29 OSS projects, we extracted six
activities that determined four Active roles and five Supporting
roles. This approach allowed us to provide rich informa-
tion, grounded in the data, about the actions and activities
performed by each role. Through the lens of the Activity
Theory, such information rendered useful design guidelines
for role-based OSS tools. We argue that such methodology
and the generated information are crucial to understanding and
supporting the collaboration among diverse OSS contributors.
[1] S. T. Acu˜
na and N. Juristo. Assigning people to roles in software
projects. Software: Practice and Experience, 34(7):675–696, jun 2004.
[2] K. Agrawal, M. Aschauer, T. Thonhofer, S. Bala, A. Rogge-Solti, and
N. Tomsich. Resource Classification from Version Control System
Logs. In 2016 IEEE 20th International Enterprise Distributed Object
Computing Workshop (EDOCW), pages 1–10. IEEE, sep 2016.
[3] O. Arazy, J. Daxenberger, H. Lifshitz-Assaf, O. Nov, and I. Gurevych.
Turbulent Stability of Emergent Roles: The Dualistic Nature of Self-
Organizing Knowledge Coproduction. Information Systems Research,
27(4):792–812, dec 2016.
[4] J. B. Barlow. Emergent roles in decision-making tasks using group
chat. In Proceedings of the 2013 conference on Computer supported
cooperative work - CSCW ’13, page 1505, New York, New York, USA,
2013. ACM Press.
[5] A. Ben Fadhel, D. Bianculli, and L. C. Briand. Model-driven run-time
enforcement of complex role-based access control policies. In Proceed-
ings of the 33rd ACM/IEEE International Conference on Automated
Software Engineering - ASE 2018, pages 248–258, New York, New
York, USA, 2018. ACM Press.
[6] C. Cheng, B. Li, Z.-Y. Li, Y.-Q. Zhao, and F.-L. Liao. Developer Role
Evolution in Open Source Software Ecosystem: An Explanatory Study
on GNOME. J. Comput. Sci. Technol., 32(2):396–414, mar 2017.
[7] J. Cheng and J. L. Guo. How do the open source communities address
usability and ux issues: An exploratory study. In Extended Abstracts
of the 2018 CHI Conference on Human Factors in Computing Systems,
CHI EA ’18, New York, NY, USA, 2018. ACM.
[8] D. Child. The essentials of factor analysis. A&C Black, 2006.
[9] A. B. Costello and J. W. Osborne. Best practices in exploratory factor
analysis: Four recommendations for getting the most from your analysis.
Practical assessment, research & evaluation, 10(7):1–9, 2005.
[10] E. di Bella, A. Sillitti, and G. Succi. A multivariate classification of
open source developers. Information Sciences, 221:72–83, feb 2013.
[11] L. R. Fabrigar, D. T. Wegener, R. C. MacCallum, and E. J. Strahan.
Evaluating the use of exploratory factor analysis in psychological
research. Psychological methods, 4(3):272, 1999.
[12] GitHub. How to contribute to open source.,
[13] R. Hoda, J. Noble, and S. Marshall. Self-Organizing Roles on Agile
Software Development Teams. IEEE Transactions on Software Engi-
neering, 39(3):422–444, mar 2013.
[14] M. Joblin, S. Apel, C. Hunsen, and W. Mauerer. Classifying Developers
into Core and Peripheral: An Empirical Study on Count and Network
Metrics. In Proceedings of the 39th International Conference on
Software Engineering, pages 164–174. IEEE, may 2017.
[15] M. Joblin, S. Apel, and W. Mauerer. Evolutionary trends of developer
coordination: a network approach. Empirical Software Engineering,
22(4):2050–2094, Aug 2017.
[16] V. Kaptelinin. Activity Theory: Implications for Human- Computer
Interaction. In Context and consciousness: Activity theory and human-
computer interaction, pages 103–116. MIT, 1996.
[17] A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of
open source software development: Apache and Mozilla. ACM Trans.
Softw. Eng. Methodol., 11(3):309–346, jul 2002.
[18] F. Murtagh and P. Legendre. Ward’s hierarchical agglomerative cluster-
ing method: Which algorithms implement ward’s criterion? Journal of
Classification, 31(3):274–295, Oct 2014.
[19] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye.
Evolution patterns of open-source software systems and communities.
In Proceedings of the International Workshop on Principles of Software
Evolution, page 76, New York, New York, USA, 2002. ACM Press.
[20] B. A. Nardi. Activity Theory and Human-Computer Interaction. In Con-
text and consciousness: Activity theory and human-computer interaction,
pages 7–16. MIT, 1996.
[21] S. Onoue, H. Hata, and K. Matsumoto. Software population pyramids:
The current and the future of oss development communities. In Pro-
ceedings of the 8th ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement, ESEM ’14, pages 34:1–34:4,
New York, NY, USA, 2014. ACM.
[22] P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis. Journal of computational and applied
mathematics, 20:53–65, 1987.
[23] J. E. Tomakyo and O. Hazaan. Human Aspects of Software Engineering.
Laxmi Publications, 2005.
[24] L. S. Vygotsky. Mind in Society: Development of Higher Psychological
Processes. Harvard University Press, 1981.
[25] Y. Ye and K. Kishida. Toward an understanding of the motivation open
source software developers. In Proceedings of the 25th International
Conference on Software Engineering, ICSE ’03, pages 419–429, Wash-
ington, DC, USA, 2003. IEEE Computer Society.
[26] H. Zhu, M. Zhou, and M. Hou. Support Collaboration with Roles. In
Contemporary Issues in Systems Science and Engineering, pages 575–
598. John Wiley & Sons, Inc., Hoboken, NJ, USA, apr 2015.
[27] H. Zhu, M. Zhou, and P. Seguin. Supporting Software Development
With Roles. IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Humans, 36(6):1110–1123, nov 2006.
[28] W. R. Zwick and W. F. Velicer. Comparison of five rules for determining
the number of components to retain. Psychological bulletin, 99(3):432,
... We assess the collected data using different sets of characteristics, each covering a (mail-based) software development aspect: (1) contribution, (2) exposition, and (3) administration, see Table I for their summary. These aspects are interrelated and thus their enclosed characteristics can be assembled into descriptive action categories using Exploratory Factor Analysis [6] (EFA), according to the process described in Cheng and Guo [5]. In EFA language, these descriptive action categories are known as latent factors. ...
... Doing so revealed 5 action types: (1) Code Contribution, (2) Knowledge Sharing, (3) Patch Posting, (4) Progress Control, and (5) Acknowledgement / Response. Like Cheng and Guo's work [5], we use X j 's factor scores to extract clusters of action categories, called general developer activity types (see Figure 3). Unlike their work, we generalize the factor scores to sequential data. ...
Open-source is frequently described as a driver for unprecedented communication and collaboration, and the process works best when projects support teamwork. Yet, their cooperation processes in no way protect project contributors from considerations of trust, power, and influence. Indeed, achieving the level of trust necessary to contribute to a project and thus influence its direction is a constant process of change, and developers take many different routes over many communication channels to achieve it. We refer to this process of influence-seeking and trust-building, trust ascendancy. This paper describes a methodology for understanding the notion of trust ascendancy, and introduces the capabilities that are needed to localizing trust ascendancy operations happening over open-source projects. Much of the prior work in understanding trust in open-source software development has focused on a static view of the problem, and study it using different forms of quantity measures. However, trust ascendancy is not static but rather adapt to changes in the open-source ecosystem in response to developer role changes, new functionality, new technologies, and so on. This paper is the first attempt to articulate and study these signals, from a dynamic view of the problem. In that respect, we identify related work that may help illuminate research challenges, implementation tradeoffs, and complementary solutions. Our preliminary results show the effectiveness of our method at capturing the trust ascendancy developed by individuals involved in a well-documented 2020 social engineering attack. Our future plans highlight research challenges, and encourage cross-disciplinary collaboration to create more automated, accurate, and efficient ways to modeling and then tracking trust ascendancy in open-source projects.
... This type of unhealthy, and sometimes disturbing or harmful behavior can be the result of a variety of reasons. For example, even though diversity has many benefits for open source communities [29,30], the mix of cultures, personalities, and interests of open source contributors can cause a clash of personal values and opinions [2]. Furthermore, as a social-technical platform, ITSs sometimes host social context discussions, such as conversations about the black lives matter and me too movements, which can increase the chances of conflicts and arguments. ...
... Inspired by the coding framework of Miller et al. [17], we investigate where the uncivil comments are positioned in uncivil issues. Particularly, we considered three locations: (1) in the issue description, (2) in the first comment, and (3) in later comments (i.e., emerged from the discussion). For each one of the 138 issues that included at least one uncivil comment, the combination of the above three locations resulted in seven conditions of where the uncivil comments were positioned: (i) only in the issue description, (ii) only in the first comment, (iii) only in the issue description and the first comment, (iv) in the issue description and it emerged from the discussion, (v) in the first comment and it emerged from the discussion, (vi) in the issue description, first comment, and it emerged from the discussion, or (vii) only emerged from the discussion. ...
Full-text available
Although issues of open source software are created to discuss and solve technical problems, conversations can become heated, with discussants getting angry and/or agitated for a variety of reasons, such as poor suggestions or violation of community conventions. To prevent and mitigate discussions from getting heated, tools like GitHub have introduced the ability to lock issue discussions that violate the code of conduct or other community guidelines. Despite some early research on locked issues, there is a lack of understanding of how communities use this feature and of potential threats to validity for researchers relying on a dataset of locked issues as an oracle for heated discussions. To address this gap, we (i) quantitatively analyzed 79 GitHub projects that have at least one issue locked as too heated, and (ii) qualitatively analyzed all issues locked as too heated of the 79 projects, a total of 205 issues comprising 5,511 comments. We found that projects have different behaviors when locking issues: while 54 locked less than 10% of their closed issues, 14 projects locked more than 90% of their closed issues. Additionally, locked issues tend to have a similar number of comments, participants, and emoji reactions to non-locked issues. For the 205 issues locked as too heated, we found that one-third do not contain any uncivil discourse, and only 8.82% of the analyzed comments are actually uncivil. Finally, we found that the locking justifications provided by maintainers do not always match the label used to lock the issue. Based on our results, we identified three pitfalls to avoid when using the GitHub locked issues data and we provide recommendations for researchers and practitioners.
... munidades de produção de software divide-se entre o 1% dos participantes que lideram o início de novos projetos de software, os 5% a 10% dos participantes que editam o código-fonte 172 das aplicações que utilizam, contribuindo para o todo, e os restantes, aqueles que apenas usufruem do software produzido pelos outros participantes(Ducheneaut, 2005;Cheng e Guo. 2019;von Hippel e Lakhani, 2000; von Hippel, 2016;Mockus et al., 2000;Koch e Schneider 2002). ...
Full-text available
Este é um livro dedicado à interpretação e descodificação do que é a comunicação da comunicação. De como a nossa forma de comunicar está a moldar as nossas instituições, como a mediação moldou a nossa comunicação e a rede transformou a comunicação de massas numa comunicação em rede e a cultura de massas numa cultura mediatizada. Criando, nesse processo, um novo sistema dos media e um novo paradigma comunicacional. Este é um livro sobre a necessidade de uma sociologia da mediação algorítmica que explique porque é que numa cultura mediatizada, gerada por uma comunicação em rede, as pessoas são a mensagem e porque o seu traço mais distintivo reside na comunicação da comunicação.
... They chose extraversion and conscientiousness as their main criteria and focused on the binary combinations of them. Cheng and Guo (2019) made an activity-based analysis of OSS contributors, then adopted a data-driven approach to finding out the dynamics and roles of the contributors. Milewicz et al. (2019) worked on the contributor roles in scientific OSS projects. ...
Full-text available
Context In a software project, properly analyzing the contributions of developers could provide valuable insights for decision-makers. The contributions of a developer could be in many different forms such as committing and reviewing code, opening and resolving issues. Previous approaches mainly consider the commit-based contributions which provide an incomplete picture of developer contributions. Objective Different from the traditional commit-based approaches for analyzing developer contributions, we aim to provide a more holistic approach to reflect the rich set of software development activities using artifact traceability graphs. Method For analyzing the developer contributions, we propose a novel categorization of developers (Jacks, Mavens and Connectors) in a software project. We introduce a set of algorithms on artifact traceability graphs to identify key developers, recommend replacements for leaving developers and evaluate knowledge distribution among developers. Results We evaluate our proposed algorithms on six open-source projects and demonstrate that the identified key developers match the top commenters up to 98%, recommended replacements are correct up to 91% and identified knowledge distribution labels are compatible 94% on average with the baseline approaches. Conclusions The proposed algorithms using artifact traceability graphs for analyzing developer contributions could be used by software project decision-makers in several scenarios. (1) Identifying different types of key developers. (2) Finding a replacement developer in large teams. (3) Evaluating the overall knowledge distribution amongst developers to take early precautions.
... We contribute to this emerging body of research with particular attention to power relationships between new Web technologies and established corporate platforms, especially how these relationships impact the work of building and maintaining alternatives. In addition, researchers have used GitHub data to study social structures in software development (Cheng & Guo, 2019;Strzalkowski et al., 2019), and GitHub issues in particular to understand community governance around codes of conduct (Li et al., 2021). However, to our knowledge, ours is the first study to use GitHub issues to examine how interoperability among software can challenge the ongoing achievement of desired social and ethical outcomes. ...
Full-text available
Open access: Concentrations of power over the internet among a small number of corporate platforms have motivated attempts to build alternative social media. Using the contemporary internet routinely involves relying on a small number of dominant corporate platforms. In reaction against this centralization of power, there are many attempts to build alternative Web technologies that reconfigure the internet’s power structures and enact their own values. However, given the entrenchment of large corporate platforms, this typically involves co-existing with rather than replacing them, at least in the present. Accordingly, it is important to investigate challenges arising when alternative social media operate alongside and even within the systems to which they propose an alternative. We investigate this through an empirical study of the IndieWeb, a community of personal websites with social networking features including syndication to and from corporate platforms. Using GitHub data, we study the development of a tool for this syndication called Bridgy, focusing on its relationship with the Facebook API. By identifying breakdowns in this relationship, we identify the following challenges: translating differing logics between the open Web and APIs, occasional ambiguity in Facebook’s presentation of privacy settings, and ongoing precarity due to API updates. Our analysis illustrates the reality of maintaining alternative technical systems as part of present-day infrastructures and generates insights for building socially empowering technologies for the future.
... It uses information retrieval and machine learning techniques to build triageassisting recommendation models by leveraging bug reports' structured information, such as title, comment, and description. Second, the graph-based approach [4] takes full advantage of developer attributes (such as reputation, team, and role) and the collaboration among developers to calculate the probability that a developer fixes a given bug. It models the relationship of bug triaging between developers as goal-oriented tossing graphs and searches out appropriate developers by the weight-based searching algorithm. ...
Full-text available
The bug triaging process, an essential process of assigning bug reports to the most appropriate developers, is related closely to the quality and costs of software development. Since manual bug assignment is a labor-intensive task, especially for large-scale software projects, many machine learning-based approaches have been proposed to triage bug reports automatically. Although developer collaboration networks (DCNs) are dynamic and evolving in the real world, most automated bug triaging approaches focus on static tossing graphs at a single time slice. Also, none of the previous studies consider periodic interactions among developers. To address the problems mentioned above, in this article, we propose a novel spatial–temporal dynamic graph neural network (ST-DGNN) framework, including a joint random walk (JRWalk) mechanism and a graph recurrent convolutional neural network (GRCNN) model. In particular, JRWalk aims to sample topological structures in a developer collaboration network with two sampling strategies by considering both developer reputation and interaction preference. GRCNN has three components with the same structure, i.e., hourly-periodic, daily-periodic, and weekly-periodic components, to learn the spatial–temporal features of nodes on dynamic DCNs. We evaluated our approach’s effectiveness by comparing it with several state-of-the-art graph representation learning methods in three domain-specific tasks (i.e., the bug fixer prediction task and two downstream tasks of graph representation learning: node classification and link prediction). In the three tasks, experiments on two real-world, large-scale developer collaboration networks collected from the Eclipse and Mozilla projects indicate that the proposed approach outperforms all the baseline methods on three different time scales (i.e., long-term, medium-term, and short-term predictions) in terms of F1−score.
Full-text available
Design systems represent a user interaction design and development approach that is currently of avid interest in the industry. However, little research work has been done to synthesize knowledge related to design systems in order to inform the design of tools to support their creation, maintenance, and usage practices. This paper represents an important step in which we explored the issues that design system projects usually deal with and the perceptions and values of design system project leaders. Through this exploration, we aim to investigate the needs for tools that support the design system approach. We found that the open source communities around design systems focused on discussing issues related to behaviors of user interface components of design systems. At the same time, leaders of design system projects faced considerable challenges when evolving their design systems to make them both capable of capturing stable design knowledge and flexible to the needs of the various concrete products. They valued a bottom-up approach for design system creation and maintenance, in which components are elevated and merged from the evolving products. Our findings synthesize the knowledge and lay foundations for designing techniques and tools aimed at supporting the design system practice and related modern user interaction design and development approaches.
Using popular open source projects on GitHub, we provide evidence that bots are regularly among the most active contributors, even though GitHub does not explicitly acknowledge their presence. This poses a problem for techniques that analyze human contributor activity.
Full-text available
Design systems represent a user interaction design and development approach that is currently of avid interest in the industry. However, little research work has been done to synthesize knowledge related to design systems in order to inform the design of tools to support their creation, maintenance, and usage practices. This paper represents an important step in which we explored the issues that design system projects usually deal with and the perceptions and values of design system project leaders. Through this exploration, we aim to investigate the needs for tools that support the design system approach. We found that the open source communities around design systems focused on discussing issues related to behaviors of user interface components of design systems. At the same time, leaders of design system projects faced considerable challenges when evolving their design systems to make them both capable of capturing stable design knowledge and flexible to the needs of the various concrete products. They valued a bottom-up approach for design system creation and maintenance, in which components are elevated and merged from the evolving products. Our findings synthesize the knowledge and lay foundations for designing techniques and tools aimed at supporting the design system practice and related modern user interaction design and development approaches.
Conference Paper
Full-text available
Usability and user experience (UX) issues are often not well emphasized and addressed in open source software (OSS) development. There is an imperative need for supporting OSS communities to collaboratively identify, understand, and fix UX design issues in a distributed environment. In this paper, we provide an initial step towards this effort and report on an exploratory study that investigated how the OSS communities currently reported, discussed, negotiated, and eventually addressed usability and UX issues. We conducted in-depth qualitative analysis of selected issue tracking threads from three OSS projects hosted on GitHub. Our findings indicated that discussions about usability and UX issues in OSS communities were largely influenced by the personal opinions and experiences of the participants. Moreover, the characteristics of the community may have greatly affected the focus of such discussion.
Conference Paper
Full-text available
Collaboration in business processes and projects requires a division of responsibilities among the participants. Version control systems allow us to collect profiles of the participants that hint at participants' roles in the collaborative work. The goal of this paper is to automatically classify participants into the roles they fulfill in the collaboration. Two approaches are proposed and compared in this paper. The first approach finds classes of users by applying k-means clustering to users based on attributes calculated for them. The classes identified by the clustering are then used to build a decision tree classification model. The second approach classifies individual commits based on commit messages and file types. The distribution of commit types is used for creating a decision tree classification model. The two approaches are implemented and tested against three real datasets, one from academia and two from industry. Our classification covers 86% percent of the total commits. The results are evaluated with actual role information that was manually collected from the teams responsible for the analyzed repositories.
Full-text available
Knowledge about the roles developers play in a software project is crucial to understanding the project's collaborative dynamics. Developers are often classified according to the dichotomy of core and peripheral roles. Typically, operationalizations based on simple counts of developer activities (e.g., number of commits) are used for this purpose, but there is concern regarding their validity and ability to elicit meaningful insights. To shed light on this issue, we investigate whether commonly used operationalizations of core--peripheral roles produce consistent results, and we validate them with respect to developers' perceptions by surveying 166 developers. Improving over the state of the art, we propose a relational perspective on developer roles, using developer networks to model the organizational structure, and by examining core--peripheral roles in terms of developers' positions and stability within the organizational structure. In a study of 10 substantial open-source projects, we found that the existing and our proposed core--peripheral operationalizations are largely consistent and valid. Furthermore, we demonstrate that a relational perspective can reveal further meaningful insights, such as that core developers exhibit high positional stability, upper positions in the hierarchy, and high levels of coordination with other core developers.
Full-text available
Increasingly, new forms of organizing for knowledge production are built around self-organizing coproduction community models with ambiguous role definitions. Current theories struggle to explain how high-quality knowledge is developed in these settings and how participants self-organize in the absence of role definitions, traditional organizational controls, or formal coordination mechanisms. In this article, we engage the puzzle by investigating the temporal dynamics underlying emergent roles on individual and organizational levels. Comprised of a multilevel large-scale empirical study of Wikipedia stretching over a decade, our study investigates emergent roles in terms of prototypical activity patterns that organically emerge from individuals' knowledge production actions. Employing a stratified sample of 1,000 Wikipedia articles, we tracked 200,000 distinct participants and 700,000 coproduction activities, and recorded each activity's type. We found that participants' role-taking behavior is turbulent across roles, with substantial flow in and out of coproduction work. Our findings at the organizational level, however, show that work is organized around a highly stable set of emergent roles, despite the absence of traditional stabilizing mechanisms such as predefined work procedures or role expectations. This dualism in emergent work is conceptualized as "turbulent stability." We attribute the stabilizing factor to the artifact-centric production process and present evidence to illustrate the mutual adjustment of role taking according to the artifact's needs and stage. We discuss the importance of the affordances of Wikipedia in enabling such tacit coordination. This study advances our theoretical understanding of the nature of emergent roles and self-organizing knowledge coproduction. We discuss the implications for custodians of online communities as well as for managers of firms engaging in self-organized knowledge collaboration.
Conference Paper
A Role-based Access Control (RBAC) mechanism prevents unauthorized users to perform an operation, according to authorization policies which are defined on the user’s role within an enterprise. Several models have been proposed to specify complex RBAC policies. However, existing approaches for policy enforcement do not fully support all the types of policies that can be expressed in these models, which hinders their adoption among practitioners. In this paper we propose a model-driven enforcement framework for complex policies captured by GemRBAC+CTX, a comprehensive RBAC model proposed in the literature. We reduce the problem of making an access decision to checking whether a system state (from an RBAC point of view), expressed as an instance of the GemRBAC+CTX model, satisfies the constraints corresponding to the RBAC policies to be enforced at run time. We provide enforcement algorithms for various types of access requests and events, and a prototype tool (MORRO) implementing them. We also show how to integrate MORRO into an industrial Web application. The evaluation results show the applicability of our approach on a industrial system and its scalability with respect to the various parameters characterizing an AC configuration.
An open source software (OSS) ecosystem refers to an OSS development community composed of many software projects and developers contributing to these projects. The projects and developers co-evolve in an ecosystem. To keep healthy evolution of such OSS ecosystems, there is a need of attracting and retaining developers, particularly project leaders and core developers who have major impact on the project and the whole team. Therefore, it is important to figure out the factors that influence developers’ chance to evolve into project leaders and core developers. To identify such factors, we conducted a case study on the GNOME ecosystem. First, we collected indicators reflecting developers’ subjective willingness to contribute to the project and the project environment that they stay in. Second, we calculated such indicators based on the GNOME dataset. Then, we fitted logistic regression models by taking as independent variables the resulting indicators after eliminating the most collinear ones, and taking as a dependent variable the future developer role (the core developer or project leader). The results showed that part of such indicators (e.g., the total number of projects that a developer joined) of subjective willingness and project environment significantly influenced the developers’ chance to evolve into core developers and project leaders. With different validation methods, our obtained model performs well on predicting developmental core developers, resulting in stable prediction performance (0.770, F-value).
This chapter determines the requirements for roles by examining collaboration in a group. Roles are fundamental tools to support collaboration activities. It considers "collaboration" as a generalized concept. The chapter denotes collaboration in the following categories: among people, that is, natural collaboration; among people through systems, that is, computer-supported cooperative work (CSCW); among people and systems, that is, human-computer interaction (HCI); among systems, that is, distributed and collaborative systems. Roles can be made to facilitate an organizational structure, provide orderly system behavior, and consolidate system security for both human and non-human entities that collaborate and coordinate their activities with or within systems. Because the roles have inherent advantages to facilitate collaboration, a role-based collaboration (RBC) methodology was proposed. The E-CARGO model is an abstract structure based on roles for a group of people and agents involved in collaboration or team work. © 2015 The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.