PreprintPDF Available

GitterCom -A Dataset of Open Source Developer Communications in Gitter

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Team communication is essential for the development of modern software systems. For distributed software development teams, such as those found in many open source projects, this communication usually takes place using electronic tools. Among these, modern chat platforms such as Gitter are becoming the de facto choice for many software projects due to their advanced features geared towards software development and effective team communication. Gitter channels contain numerous messages exchanged by developers regarding the state of the project, issues and features of the system, team logistics, etc. These messages can contain important information to researchers studying open source software systems, developers new to a particular project and trying to get familiar with the software, etc. Therefore, uncovering what developers are communicating about through Gitter is an essential first step towards successfully understanding and leveraging this information. We present a new dataset, called GitterCom, which is meant to enable research in this direction and represents the largest manually labeled and curated dataset of Gitter developer messages. The dataset is comprised of 10,000 Gitter messages collected from 10 Gitter communities associated with the development of open source software systems. Each message was manually annotated and verified by two of the authors, capturing the purpose of the communication expressed by the message. While the dataset has not yet been used in any publication, we discuss how it can enable interesting research opportunities in the field.
GitterCom sample studies show the rise of instant messaging tools and the impact they have on reshaping team dynamics and the communication landscape in increasingly distributed software development environments. Future studies could make use of GitterCom to study the relationship between open source development activity and communication trends. In particular, GitterCom enables further research to analyze and understand patterns in developer communications and to address important questions such as: How do software teams use tools like Gitter to communicate among themselves and with other stakeholders? How do team dynamics reflect in team communications? Do developers exchange different types of messages at different times in the software life cycle? Do developers new to a project post different types of messages than the more senior developers? GitterCom could also be used as a training dataset for machine learning approaches for automatically classifying new developer messages based on their purpose. This could, in turn, be useful to automatically organize messages into threads or to create summaries of developer conversations based on their purpose, such that developers that were away for a while or newcomers to a project could quickly catch up on important conversations they missed. Another avenue for future work would be to use GitterCom in order to perform large scale replications of previous studies that analyzed developer communications in Slack [1, 2, 10], but used much smaller or restricted datasets (e.g., communications in student projects or a particular software company). These replications on GitterCom could help corroborate previous findings or uncover new information about how developers communicate through instant messaging tools. One example of such work that could benefit from a large scale replication is work on the identification of messages that contain rationale for the decisions made by developers throughout the software life cycle [2]. Thus far, work on rationale has been limited to analyzing the chat messages of three student teams working on a multi-project capstone course.
… 
Content may be subject to copyright.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
GierCom - A Dataset of Open Source Developer
Communications in Gier
Esteban Parra
parrarod@cs.fsu.edu
Florida State University
Tallahassee, Florida
Ashley Ellis
ake17@my.fsu.edu
Florida State University
Tallahassee, Florida
Sonia Haiduc
shaiduc@cs.fsu.edu
Florida State University
Tallahassee, Florida
Abstract
Team communication is essential for the development of modern
software systems. For distributed software development teams, such
as those found in many open source projects, this communication
usually takes place using electronic tools. Among these, modern
chat platforms such as Gitter are becoming the de facto choice
for many software projects due to their advanced features geared
towards software development and eective team communication.
Gitter channels contain numerous messages exchanged by devel-
opers regarding the state of the project, issues and features of the
system, team logistics, etc. These messages can contain important
information to researchers studying open source software systems,
developers new to a particular project and trying to get familiar
with the software, etc. Therefore, uncovering what developers are
communicating about through Gitter is an essential rst step to-
wards successfully understanding and leveraging this information.
We present a new dataset, called GitterCom, which is meant to
enable research in this direction and represents the largest man-
ually labeled and curated dataset of Gitter developer messages.
The dataset is comprised of 10,000 Gitter messages collected from
10 Gitter communities associated with the development of open
source software systems. Each message was manually annotated
and veried by two of the authors, capturing the purpose of the
communication expressed by the message. While the dataset has
not yet been used in any publication, we discuss how it can enable
interesting research opportunities in the eld.
CCS Concepts
Software and its engineering Collaboration in software
development;Open source model;Documentation;
Keywords
datasets, communication, chat, social media, team communication
platforms
ACM Reference Format:
Esteban Parra, Ashley Ellis, and Sonia Haiduc. 2020. GitterCom - A Dataset
of Open Source Developer Communications in Gitter. In MSR’20: MSR,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MSR’20, May 25–26, 2020, Seoul, South Korea
©2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-9999-9/18/06. . . $15.00
https://doi.org/10.1145/1122445.1122456
May 25–26, 2020, Seoul, South Korea. ACM, New York, NY, USA, 5 pages.
https://doi.org/10.1145/1122445.1122456
1 Introduction
Modern, complex open source software systems often require large
teams in order to be developed. The teams are usually geograph-
ically distributed across dierent locations, countries and even
continents. In order to collaborate, communicate, and coordinate,
these teams make use of electronic tools such as instant messaging,
email, etc. [
4
,
5
,
8
,
10
]. Recently, modern messaging and collabo-
ration platforms such as Gitter
1
and Slack
2
have revolutionized
team communications and project coordination by providing a user-
friendly way of managing and organizing conversations, facilitating
knowledge sharing, and by integrating with external software de-
velopment tools such as GitHub, Asana, and Jira[
9
]. Given their
features and the support for software development, many open
source projects have adopted Gitter and Slack as their preferred
communication means [
5
]. In particular, Gitter is currently the most
popular instant messaging platform in open source development
teams [5]. It also presents some advantages over Slack, such as:
Open access to communications: in Slack, communities are con-
trolled by the administrators, whereas in Gitter, access to the user-
generated data is public. In particular, public messages and user-
generated content in Gitter are subject to the Creative Commons
license: Attribution + Non-Commercial + ShareAlike (BY-NC-SA)
3
Free access to historical data: in Slack communities, only the latest
10,000 messages are accessible without paying. Since most public
Slack channels use the free tier [
3
], their historical data is unavail-
able. Conversely, messages posted to public Gitter channels are
preserved and accessible indenitely in chat room logs.
Despite its advantages over Slack, its greater popularity among
open source developers, and the availability of tens of thousands
of message exchanges between developers of open source soft-
ware, there have been no papers so far investigating developer
communications in Gitter. Rather, existing works analyzing devel-
oper communications in modern instant messaging platforms have
so far focused solely on Slack [1–3, 6].
We argue that Gitter developer communications are an untapped
information resource that could be leveraged by researchers in or-
der to get a deeper understanding about the nature of developer
communications in open source software. With this paper, we aim
to encourage research in this direction by introducing GitterCom,
the rst manually labeled dataset of Gitter instant message histories
in open source systems. The dataset consists of 10,000 messages
1https://gitter.im/
2https://slack.com/
3https://creativecommons.org/licenses/by-nc-sa/3.0/us/
1
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
MSR’20, May 25–26, 2020, Seoul, South Korea Esteban Parra, Ashley Ellis, and Sonia Haiduc
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
Table 1: Distribution of messages per purpose category
Category Cucumber Freezing ImageJ Jhipster JSPM MJS4Sklearn5THW6UIKit Xenko Overall
Communication 325 794 490 446 635 695 506 583 321 480 5275
Customer support 442 0 150 239 0 0 4 145 451 0 1431
Dev-Ops 198 183 308 269 305 235 464 240 190 383 2775
Discovery and news 13 1 10 9 7 2 3 15 5 32 97
Fun 0 2 0 0 0 39 0 0 1 0 42
Networking and social
activities
0 0 1 3 0 3 0 0 0 32 39
Participation in Com-
munities of Practice
4 2 9 15 21 13 13 10 12 54 153
Team Collaboration 18 18 32 19 32 13 10 7 20 19 188
across ten Gitter communities devoted to the development of ten
dierent open source systems. The messages were automatically
extracted and then manually labeled by two of the authors with
respect to the communication purpose they express, based on the
categories identied in previous work by Lin et al. [
6
] through
surveys of developers. GitterCom is overall the largest manually
labeled dataset of developer instant messages; the only other man-
ually labeled dataset available is comprised of 500 developer Slack
messages in one software company [10].
The rest of the paper is structured as follows. Section 2 presents
an overview of the dataset, section 3 outlines the data collection
process we followed, section 4 discusses potential research direc-
tions using this dataset, section 5 presents limitations and future
improvements that could be made to the data set and lastly, section
6 concludes the paper.
2 Dataset Description
GitterCom includes data about 10,000 messages collected from
10 open source software development Gitter communities (1,000
messages per community). Each message was manually labeled with
information about the purpose of the communication it expresses,
based on the categories identied by Lin et al. [6].
GitterCom is available in CSV le format online
7
. In the CSV le,
each line is a data record. Each record contains the information for
a single message and consists of seven information elds, separated
by comma and using quotes as the text delimiter. In particular, each
row contains: (i) the channel/system the message belongs to, (ii)
a unique messageID, (iii) the date and time at which the message
was posted, (iv) the author of the message, (v) the content of the
message in plain text, (vi) the corresponding purpose category
(manual label), and (vii) the purpose subcategory (manual label).
Next, we present brief descriptions of the dierent purposes,
their categories and subcategories we used to manually label the
messages in GitterCom. These were rst identied by Lin et al. [
6
],
who surveyed software developers about their use of Slack.
The rst purpose, called
Personal benets
, includes messages
in which the developer’s main purpose is to fulll personal needs.
Messages within this purpose can be further divided into three cat-
egories: discovery and aggregation of news and information, where
4MarionneteJS
5SciKit-Learn
6TheHollyWae
7https://gshare.com/s/9b3df36e22a8a8f77169
developers post reliable, interesting, and relevant blogs or other
sources of information; networking and social activities, where de-
velopers interact with other developers who share similar interests
or jobs; and fun, which are messages sharing gifs and memes or
meant for participating in gaming activities.
The second purpose relates to
Team-wide
activities and includes
messages aimed towards carrying out software development activi-
ties related to the system being developed. Messages within this
purpose can be further divided into the following four categories:
communication messages in which the developers engage in activ-
ities such as communication with teammates (e.g., members of a
distributed team) during meetings and note-taking, communication
with other stakeholders, or discussing non-work topics; team collab-
oration messages in which the developers engage in activities such
as team management, le, and code sharing; Dev-Ops messages in
which the developers engage in activities such as communicating
updates regarding the status of the project (e.g., development oper-
ation notications about recent changes to the system, commits,
bug xes, pushes to the repository, merges), software deployments,
and team Q&As; and nally, customer support messages in which
the developers assist new or existing users of the system on how
to perform certain tasks, identify bugs, and troubleshoot errors.
The last purpose is represented by
Community support
mes-
sages, where developers participate in communities of practice or
special interest groups. These messages are characterized by devel-
opers aiming to keep up with specic frameworks/communities, to
learn about new tools and frameworks for developing applications,
or to brainstorm ideas with other people in the community.
Table 1 shows the number of messages per category in Gitter-
Com, for each of the 10 open source systems/communities we
considered, while Figure 2 shows the overall distribution of mes-
sages associated with each category across all the communities in
GitterCom.
Based on the hierarchy presented above, we notice that the major-
ity of Gitter messages in GitterCom belong to Team-wide purposes.
Figure 2 shows that the distribution of messages varies signicantly
across categories. In particular, 83% of the messages are meant to
support activities directly associated with the development of the
system. On the other hand, 14.31% of the messages are related to
community support and engagement with communities of practice,
and only 2.69% of the messages are linked to personal benets.
Moreover, 53% of the messages involve communication between
the developers and stakeholders, 28% of the messages communicate
2
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
GierCom - A Dataset of Open Source Developer Communications in Gier MSR’20, May 25–26, 2020, Seoul, South Korea
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
updates regarding the status of the system, and 15% of the messages
involve customer support.
52.75%
27.75%
14.31%
0.97% 0.39%
0.42% 1.88% 1.53%
Communication
Dev-ops
Communities of practice
Team collaboration
Discovery and news
Fun
Networking and social
Customer support
Figure 1: Distribution of messages by category
3 Data Collection
This section presents in detail the data collection and curating pro-
cedure we used to create the GitterCom dataset. We rst gathered
the list of all the Gitter communities listed in Gitter’s Explore inter-
face
8
on April 1, 2019. We then excluded the channels in which the
conversations were not in English, resulting in a list of 139 Gitter
communities. Afterwards, using the Gitter API
9
, we extracted all
of the messages in the main channels of these communities, from
their inception until April 1, 2019. This data collection resulted in a
set of 2,939,335 messages across all 139 channels.
To extract the raw data for GitterCom, we used a custom python
script, which uses pycurl to connect to Gitter’s REST API and obtain
all the messages and their corresponding metadata. Afterwards, to
facilitate the labeling process, we ran a custom Java script to convert
the extracted messages from the JSON format provided by Gitter’s
API to CSV format. The data collection scripts and instructions on
their usage are found in our replication package [7].
The 139 channels collected as raw data vary in three main ways:
by membership - the channels contain between 100 and 17,000 mem-
bers per channel, by level of activity - the smallest channel contains
21 messages, whereas the largest channel contains over 423,000
messages, and by type - channels can be made for the development
of a particular software system, where the developers communicate
with each other and with the system’s stakeholders, or made for
building communities of practice in which the members’ discussion
revolves around particular topics, frameworks, or programming
languages, but does not involve discussion about the active devel-
opment of a system.
While we make the entire data we extracted for all the 139 chan-
nels available for download to other researchers
10
, our main goal
for GitterCom was to manually curate and label a subset of the
messages, based on the purposes/intents identied by Lin et al.
[
6
] (as described in Section 2). We therefore selected the rst ten
8https://gitter.im/explore
9https://developer.gitter.im
10https://gshare.com/s/3fd5af0b869b8fd010bb
channels which met the following criteria: (i) they are linked to
an active GitHub repository, (ii) they are used as a communication
tool for the active development of an open-source software system,
(iii) they cover dierent application domains, (iv) they have been
active in the past year, and (v) they contain at least 1,000 messages.
Table 2 shows the details of the selected systems/channels.
Table 2: Subset of Gitter communities included in GitterCom
Community Members Messages Application domain
Marionette 3014 181108 Javascript framework
jspm 1103 27245 Package manager
scikit-learn 3188 9844 Machine Learning
Xenko3d 103 2890 Game engine
FreezingMoon 109 207925 Video game
UIkit 2155 41265 Front-end framework
jHipster 2575 39418 Application generator
Cucumber 337 2030 Testing framework
Imagej 209 8149 Image processing
TheHolyWae 196 15046 VoIP communication
From each of the ten selected channels we then collected the
1,000 most recent consecutive messages up to April 1, 2019, for a
total of 10,000 messages. The rst two authors then carried out a
coding procedure to label these messages, using the categories and
subcategories identied by Lin et al. [
6
] as labels. More specically,
each message was assigned a category describing the main purpose
of the message and a subcategory describing the specic activity
the message relates to. If a message did not provide any meaningful
information by itself (e.g., a single emoji, "ok", "great", ""), it was
classied as "Uninformative". After the individual coding, the two
authors met, discussed, and resolved any coding conicts. The mes-
sages for which a classication of "Uninformative" was agreed upon
were discarded and replaced by an equal number of messages from
the same channel. Then, the coding process was applied on these
new messages. This procedure was repeated until 1,000 messages
were obtained for each channel, all having a label other than "Un-
informative". Across all channels, a total of 1,061 messages were
labeled as "Uninformative" during the labeling process.
During the coding process, when the content of a message was
insucient to determine a category, we used the list of contributors
to the system’s repository as a source of additional information
that could give an insight into the nature of the message. One ex-
ample of such ambiguous messages were questions which could be
interpreted as either a customer asking about the system (Customer
Support) or a developer of the system asking about a part of the
system they are unfamiliar with (Team Q&A). In this particular
case, if a question was made by a contributor to the system, it was
classied as Team Q&A, and Customer Support otherwise.
The manual coding procedure took the two authors overall three
weeks to complete. After completing the manual labeling, we ob-
tained GitterCom, a dataset comprised of 10,000 Gitter messages,
1,000 per Gitter channel, classied according to their purpose.
4 Potential Research Applications
Previous studies have investigated the growing use of alternative
communication means by developers [
5
,
6
,
10
]. The results of these
3
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
MSR’20, May 25–26, 2020, Seoul, South Korea Esteban Parra, Ashley Ellis, and Sonia Haiduc
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
Figure 2: GitterCom sample
studies show the rise of instant messaging tools and the impact
they have on reshaping team dynamics and the communication
landscape in increasingly distributed software development envi-
ronments. Future studies could make use of GitterCom to study the
relationship between open source development activity and commu-
nication trends. In particular, GitterCom enables further research to
analyze and understand patterns in developer communications and
to address important questions such as: How do software teams
use tools like Gitter to communicate among themselves and with
other stakeholders? How do team dynamics reect in team com-
munications? Do developers exchange dierent types of messages
at dierent times in the software life cycle? Do developers new
to a project post dierent types of messages than the more senior
developers?
GitterCom could also be used as a training dataset for machine
learning approaches for automatically classifying new developer
messages based on their purpose. This could, in turn, be useful
to automatically organize messages into threads or to create sum-
maries of developer conversations based on their purpose, such that
developers that were away for a while or newcomers to a project
could quickly catch up on important conversations they missed.
Another avenue for future work would be to use GitterCom in
order to perform large scale replications of previous studies that
analyzed developer communications in Slack [
1
,
2
,
10
], but used
much smaller or restricted datasets (e.g., communications in student
projects or a particular software company). These replications on
GitterCom could help corroborate previous ndings or uncover
new information about how developers communicate through in-
stant messaging tools. One example of such work that could benet
from a large scale replication is work on the identication of mes-
sages that contain rationale for the decisions made by developers
throughout the software life cycle [
2
]. Thus far, work on rationale
has been limited to analyzing the chat messages of three student
teams working on a multi-project capstone course.
5 Limitations and Future Improvements
Although GitterCom is the largest data set of curated and manually
labeled developer instant messages, it still encompasses a small subset
of all the existing Gitter developer communications. Therefore, one
limitation to GitterCom could be that the collected projects are
not representative of all open-source projects and that the most
recent 1,000 messages for a project are not representative of all
the messages exchanged by developers in a project. Improvements
that would help increase the generalizability of the results of future
studies analyzing this dataset include the expansion of the labeled
data in GitterCom to include more messages from more projects.
For this purpose we also release the raw, unlabeled data extracted
by our crawler script, containing over 2 million messages from 139
open source projects at https://gshare.com/s/3fd5af0b869b8fd010bb.
We therefore hope other researchers will join our eort and will
select more of this raw data to label and contribute to GitterCom.
6 Conclusions
Due to the rapid growth in the adoption of instant messaging tools
in open source development communities, there is a strong need
to study the nature of this type of communication between devel-
opers, and its implications for open source software development.
However, such analysis is not possible without data to explore.
We introduced GitterCom, the largest manually labeled and cu-
rated dataset of Gitter developer messages. It comprises 10,000
messages and their corresponding purpose labels across multiple
open source Gitter channels, corresponding to systems covering
a wide range of application domains. We believe that our dataset
provides immense opportunities for researchers to perform large
scale empirical research and further analysis on developer discus-
sions, communication with stakeholders, and team dynamics in
open source systems. Our hope is nevertheless that the initial data
set in this paper will spur interest for the continuing collection and
analysis of developer instant communications.
4
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
GierCom - A Dataset of Open Source Developer Communications in Gier MSR’20, May 25–26, 2020, Seoul, South Korea
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
References
[1]
Rana Alkadhi, Jan Ole Johanssen, Emitza Guzman, and Bernd Bruegge. 2017.
REACT: An Approach for Capturing Rationale in Chat Messages. In Proceedings
of the 11th ACM/IEEE International Symposium on Empirical Software Engineering
and Measurement (ESEM’17). IEEE, Toronto, ON, Canada, 175–180.
[2]
R. Alkadhi, T. Lata, E. Guzmany, and B. Bruegge. 2017. Rationale in Development
Chat Messages: An Exploratory Study. In Proceedings of the 14th IEEE/ACM
International Conference on Mining Software Repositories (MSR’17). 436–446.
[3]
Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and
Nicholas A Kraft. 2019. Exploratory Study of Slack Q&A Chats as a Mining Source
for Software Engineering Tools. In Proceedings of the 16th IEEE International
Conference on Mining Software Repositories (MSR’19). IEEE, Montreal, Canada,
490–501.
[4]
Shaiful Alam Chowdhury and Abram Hindle. 2015. Mining StackOverow to
Filter out O-topic IRC Discussion. In Proceedings of the 12th IEEE Working
Conference on Mining Software Repositories (MSR’15). IEEE, Florence, Italy, 422–
425.
[5]
Verena Käfer, Daniel Graziotin, Ivan Bogicevic, Stefan Wagner, and Jasmin Ra-
madani. 2018. Communication in Open-Source Projects-End of the E-mail Era?.
In Proceedings of the 40th IEEE/ACM International Conference on Software Engi-
neering(ICSE’18). IEEE, Gothenburg, Sweden, 242–243.
[6]
Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik.
2016. Why Developers Are Slacking O: Understanding How Software Teams
Use Slack. In Proceedings of the 19th ACM Conference on Computer Supported
Cooperative Work and Social Computing (CSCW’16 ). ACM, 333–336.
[7]
Esteban Parra. 2020. GitterCom, dataset. https://gshare.com/s/
9b3df36e22a8a8f77169
[8]
M. Storey, A. Zagalsky, F. F. Filho, L. Singer, and D. M. German. 2017. How Social
and Communication Channels Shape and Challenge a Participatory Culture in
Software Development. IEEE Transactions on Software Engineering 43, 2 (Feb.
2017), 185–204.
[9]
Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho,
and Alexey Zagalsky. 2014. The (R) Evolution of Social Media in Software
Engineering. In Proceedings of the 36th ACM/IEEE International Conference in
Software Engineering, Future of Software Engineering (FOSE’14). ACM, Hyderabad,
India, 100–116.
[10]
Viktoria Stray, Nils Brede Moe, and Mehdi Noroozi. 2019. Slack Me if You Can!:
Using Enterprise Social Networking Tools in Virtual Agile Teams. In Proceedings
of the 14th International Conference on Global Software Engineering (ICGSE’19).
IEEE, Montreal, Quebec, Canada, 101–111.
5
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Virtual teams rely on enterprise social networking tools such as Slack to collaborate efficiently. While such tools contribute to making the communication more synchronous and support distributed agile development, there are several challenges such as how to interact with each other and how to balance the communication with other types of communication mechanisms such as meetings, e-mail, and phone. In this paper, we describe and discuss how a distributed global project used Slack. Some of the challenges we identified were related to language problems, using too much direct messaging when communicating, and unbalanced activity (33% of the users accounted for 86% of the messages). The positive aspects of using the tool were increased transparency, team awareness, and informal communication. Further, Slack facilitates problem-focused communication which is essential for agile teams. Our study stresses the importance of reflecting on how virtual teams use communication tools, and we suggest that teams decide on guidelines on how to use the tools to improve their coordination.
Conference Paper
Full-text available
Software developers rely on media to communicate, learn, collaborate, and coordinate with others. Recently, social media has dramatically changed the landscape of software engineering, challenging some old assumptions about how developers learn and work with one another. We see the rise of the social programmer who actively participates in online communities and openly contributes to the creation of a large body of crowdsourced socio-technical content. In this paper, we examine the past, present, and future roles of social media in software engineering. We provide a review of research that examines the use of different media channels in software engineering from 1968 to the present day. We also provide preliminary results from a large survey with developers that actively use social media to understand how they communicate and collaborate, and to gain insights into the challenges they face. We find that while this particular population values social media, traditional channels, such as face-to-face communication, are still considered crucial. We synthesize findings from our historical review and survey to propose a roadmap for future research on this topic. Finally, we discuss implications for research methods as we argue that social media is poised to bring about a paradigm shift in software engineering research.
Preprint
Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential use- fulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.
Conference Paper
Slack is a modern communication platform for teams that is seeing wide and rapid adoption by software develop-ment teams. Slack not only facilitates team messaging and archiving, but it also supports a wide plethora of inte-grations to external services and bots. We have found that Slack and its integrations (i.e., bots) are playing an increas-ingly significant role in software development, replacing email in some cases and disrupting software development processes. To understand how Slack impacts development team dynamics, we designed an exploratory study to inves-tigate how developers use Slack and how they benefit from it. We find that developers use Slack for personal, team-wide and community-wide purposes. Our research also reveals that developers use and create diverse integrations (called bots) to support their work. This study serves as the first step towards understanding the role of Slack in sup-porting software engineering.
Article
Software developers use many different communication tools and channels in their work. The diversity of these tools has dramatically increased over the past decade and developers now have access to a wide range of socially enabled communication channels and social media to support their activities. The availability of such social tools is leading to a participatory culture of software development, where developers want to engage with, learn from, and co-create software with other developers. However, the interplay of these social channels, as well as the opportunities and challenges they may create when used together within this participatory development culture are not yet well understood. In this paper, we report on a large-scale survey conducted with 1,449 GitHub users. We discuss the channels these developers find essential to their work and gain an understanding of the challenges they face using them. Our findings lay the empirical foundation for providing recommendations to developers and tool designers on how to use and improve tools for software developers.
Communication in Open-Source Projects-End of the E-mail Era
  • Verena Käfer
  • Daniel Graziotin
  • Ivan Bogicevic
  • Stefan Wagner
  • Jasmin Ramadani
Verena Käfer, Daniel Graziotin, Ivan Bogicevic, Stefan Wagner, and Jasmin Ramadani. 2018. Communication in Open-Source Projects-End of the E-mail Era?. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering(ICSE'18). IEEE, Gothenburg, Sweden, 242-243.