Content uploaded by Jürgen Münch
Author content
All content in this area was uploaded by Jürgen Münch on Oct 10, 2016
Content may be subject to copyright.
!
!
!
!
!
1
Onboarding in Open Source Projects
Fabian Fagerholm
Department of Computer Science
University of Helsinki
P.O. Box 68
FI-00014 University of Helsinki
Finland
fabian.fagerholm@helsinki.fi
Alejandro Sanchez Guinea
Department of Co mputer Science
University o f Helsinki
P.O. Box 68
FI-00014 University of Helsinki
Finland
azsanche@cs.he lsinki.fi
Jay Borenstein
Jürgen Münch
Department of Co mputer Science
University o f Helsinki
P.O. Box 68
FI-00014 University of Helsinki
Finland
Department of Co mputer
Science
Stanford University
353 Serra Mall
Stanford, CA 94305 USA
Facebook
1601 Willow
Road
Menlo Park, CA
94025 USA
borenstein@cs.stanford.edu
juergen.mue nch@cs.helsinki.fi
Abstract
In today’s world, many companies turn to open source projects as a source fo r incre ased
productivity and inno vation. A major cha llenge with managing this kind of develop ment is the
onboarding of new developers into virtual teams which drive such projects. However, there is
little guidance on how to arrange the initiation of new members into suc h teams and how to
overcome the learning curve. This case study on Open Source Software projects shows that
mentoring can have a significant impact on onboarding new members into virtual software
development teams.
Keywords
onboarding; open source software projects; virtual teams; mentoring; global software
development; distributed software development; case study
2
Imagine working for a software company that has decided to make one o f its projects go open
source. Yo u expect to be nefit from an influx of new talent, enabling you to grow your product
beyond the capabilities of a single organization. Your continuous concern will be to involve new
developers in your virtual team, but time is scarce and you risk slowing down progress by adding
mo re people. In 1975, Fred Brooks observed that adding people to a late software project makes
it later [1] – an observation that has become known as Brooks' Law. To this day, almost 40 years
later, software project managers struggle with questio ns on how and when to introduce new
people intoso ftware projects. Although Brooks acknowledged that the “law” is a n “outrageous
simplification”, heobserved two main factorsthat impede the introductionof newcomers:the
ramp-up time it takes for them to learn eno ugh about the technical content to become productive,
and the increased communication overhead that results from more people having to coordinate
work.
THE NEED FOR CONTINUOUS ONBOARDING
The general problem of introducing new people into an existing organization is especially
prono unced in distributed settings with virtual teams. Virtua l teams are small, often temporary
groups of knowledge workers, separated by geographical, temporal, and cultural distances which
add to the communication challenges involved in getting them to work effectively [2 ]. Today,
Open Source Software (OSS) projects are among the most distributed kinds of projects carried
out by humans. They rely exclusively on virtual teams, whose me mbers can be placed on a
continuum ranging from more pe rmanent core developers to more temporary peripheral
developers. In many application do mains, engagement with open source is an inevitable part of
the computing b usiness. To gain market share, companies may have no choice but to participate
in an existing open source ecosystem. The possibilities for low-cost innovation, access to large-
scale project capabilities, and opportunities for recruiting proven talent are among the factors
driving companies towards open source. OSS also plays an important role in government IT and
the open source approach is often considered an enabler for technology and knowledge transfer
to developing countries. Simultaneously, open source deve lopment presents several challenges to
organizations that are used to a more traditio nal workfo rce, with opaque organizational
3
boundaries, hierarchical management, and relatively long-term employment. The flexibility
provided by virtual teams in open source p rojects requires rethinking resource management and
integration of new project members.
Onboarding, or organizational socialization, is a process that helps newcome rs become
integrated members of their organization. As part of onboarding, new members learn the
knowledge, skills, and behavior they need to succeed and be prod uctive in their work [3].
Onboarding is well explored in the organizational management literature. In software
engineering research, there are some case studies concerning the process of onboarding (e.g. [4,
5]). However, there is little research or advice on onboarding for open source projects with
virtual teams. The literature does not provide evidence-based guidance that wo uld he lp project
managers successfully involve developers in such scenarios. In this article, we p resent findings
that complement an earlier study on onboarding in open source projects [6, 7]. We previously
sho wed the general e ffect o f onboarding support on newcomer activity [6, 7] and the moderating
effect of project characteristics, such as age, number of contributo rs, and appeal, on the speed o f
the onboarding process [7]. This article continues the analysis by examining developer activity
during onboarding more closely, assessing the potential cost of mentoring in terms of lost
productivity, and suggesting guidelines for using mentoring as an onboarding support
mechanism. Rather tha n focusing on developer retention, we focus on the very initial stage of
integrating newco mers, i.e. climbing the learning c urve in virtual teams. This particular concern
is especially rele vant for virtual teams in an open source context, where developers join and
leave at a rapid rate, and onboarding is needed on a continuous basis.
ELEMENTS FOR ONBOARDING IN OPEN SOURCE
The precise practices and actions involved in onboarding differ depending on context. In this
case, the context was a large-scale collaboration pro gra m with multiple universities and open
source projects (see sidebar). The onboarding procedures co nsisted of two elements: a co- located
Hackathon e vent, and mentoring by e xperienced open source developers. At a three-day
Hackatho n e vent at Facebook’s headquarters in Palo Alto, California, student developer tea ms
met face to face to start working on their respective OSS projects. A mentor from the pro ject was
assigned to each team. Three days o f intensive coding and socialization allowed developers to
4
get to kno w each other and their mentors, and provided an immersive introduction to the world
of distributed OSS development. Part of the activities o f the Hackathon included familiarization
with the code base, tools, and procedures used in the proje ct.
Mentors were tasked with recommending and detailing tasks, explaining the software
architecture, and assisting in technical development deta ils. With the exception of the
Hackathon, all interactions between mentors and developers took place in the regular channels
used in each OSS project, including mailing lists, discussion forums, blogs, social networks, and
internet relay chat (IRC).
Developers were free to work on any tasks relevant to their projects. Initially, mentors would
typically direct them to small tasks suitable for novices. They were assumed to gradually start
taking the initiative and tackle tasks of greater complexity. Most tasks involved programming,
ranging from small b ug fixes to complicated new features. Other tasks included writing test
cases, creating new issues in tracking systems when new b ugs were found, a nd improving some
no n-functional aspect of the software, taking into account maintainability, performance, and user
experience.
The developers were integrated into each open source project and community through its regular
procedures. They were exposed to the norms and implicit policies of each community. In
addition, they received support from their mentors, from their local and remote team members,
as well as any support provided by their home organizations. In company settings, similar
suppo rt structures can realistically be enacted both to enable entry into external open source
projects as well as to enable third parties to enter projects driven by the company itself.
DOES MENTORING HELP?
To assess the impact of mentoring support on developers, we use a compound metric called
activity. This is the sum of basic metrics that can be directly obtained from GitHub, a web-based
hub for software development where mo st of the development took place. Thus, activity is
defined here as the total number of commits, pull requests, and interactions by the developers
considered.
5
A commit is a sub mission of source code changes to a version control system. A pull request is a
notification that a proposed set of commits is available for integration into the main code branch.
The changes can be reviewed and potential required mod ifications discussed before a final
decisio n is made to integrate o r reject the pull request. The amount of pull requests reflects the
group work that occurs between developers, since they involve discussions and code reviews in
which more than one developer is involved. Finally, we define an interactio n as a single message
posted by a developer in a GitHub discussion forum. Discussions are usually linked to commits
or pull requests. They may concern source code modifications, messages justifying submitted
modifications, or general messages related to the current development o f a pull request or
commit. The number of interactions corresponds to the amount of communication and group
work between developers. Each component in the activity metric has an associated time stamp.
This allows us to calc ulate the activity over a certain period of time for a certain set of
developers.
We assess whether mentoring positively influences the performance of developers by comparing
the activity of developers receiving mentoring support with the activity of developers that
haven’t received such support. The non-mentored gro up was randomly selected among
developers who particip ate in the corresponding OSS project but are not part of the core
development group, since being a core developer implies not only successful project
involvement, but a level of expertise which cannot be expected from a newcomer.
To make a comparison over time, we defined a time-series sampling strategy. We sampled the
weekly activity of each developer for 16 weeks. For the mentored gro up, we selected the initial
week of the collaboration as the starting point. For the non-mentored group, we defined the
starting point as one week before the first activity found for each developer. The weeks are thus
relative to when the developer began their o nboarding process.
Figure 1 shows the accumulated activity over time of developers with onboarding support
compared to the non-supported de velopers. Initia lly, the progression of both groups is similar.
However, as the onboarding process unfolds, activity among supported developers increases
significantly compared to the activity registered by non- mentored developers. The level of
activity in the supported group then continues to rise in a more or less constant fashion
throughout the whole time period.
6
Fi g. 1. Accumulated activity over time of mentored and non-me nt ore d de vel ope rs . Ment ore d de ve l opers
pe r fo r m s i g ni fi c an tl y h i g he r t h an n on -me ntore d de vel oper s. Pe rf or mance bu mps a nd pl ate aus lik el y r efl ec t a
human le arning effect. The linear regres si on trendlines display a high goodness of fit (R2) and illus trate the
di f fe r en ce i n ac t i vit y o ve r ti me m or e c le arl y.
The bumps that appear in Figure 1 likely reflect the nature of human learning. When
encountering a previously unknown task, we expect that developers e ngage in learning activities
suc h as gathering and interpreting information to clarify the task and understand the software
they are about to modify. Once they have accumulated enough kno wledge and have reached a
satisfactory level of understanding, they can begin to perform actual visible work.
y = 37,241x - 23,55
R² = 0,922
y = 18,165x - 41,275
R² = 0,9196
0
100
200
300
400
500
600
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Activity
Weeks
Mentor ed
Non-Mentored
7
The supported developers surpass the performance of the ones without support. We assume that
the positive results with increased activity are related to the support given by the mentor
throughout the duration of the program. Thus, the effect of mentoring is refle cted as a n inc rease
in the activity and participation of a developer in an OSS project. For instance, if a mentor
provides suggestions for suitab le tasks that a developer could perform, this could have a positive
impact on the number of commits by mak ing developers focus on appropriate activities and
reduce time wasted on other activities. S imilarly, the amount of collaborative activities between
developers could be affected by having the mentor mediate discussions and collaborative work.
In our interpretation, the higher degree of activity amo ng supported developers is, in part, a
consequence of the mentoring support provided to the team.
THE RELEVANCE OF INTERACTION
The aggregation of the activity metric can hide relevant patters which would be visib le with
more refined metrics. Such patterns can help understand the way in which developers contribute
to their projects. For this reason, we ana lyze each component of activity separately.
Figure 2 shows the time series for each of the compo nents of the activity metric for the
developers that received onboarding support. The peaks reflect the learning effect shown in
Figure 1. The figure shows that the largest portio n of the activity metric stems from interactions
with other developers, which seems natural for a group of newcomers who are still learning the
processes and practices of a distributed project. Also, the interactions themselves may be
valuable, as they may contain important informatio n that spurs new ideas and inno vation in the
projects.
Since open source development relies on trust relationships built and maintained by developers
themselves, it is important to prepare newcomers to interact according to the project culture. The
same applies to other kinds of distributed projects with virtua l teams. Project managers should
emphasize the importance of informal communications, as it may increase the chances that new
developers build trust and gain deeper access to important project members more quickly.
8
Fi g. 2. Detaile d acti vi ty patterns of newco mers during the onbo arding pr ocess. The amou nt of inter acti ons
reflects the coordination required to accomplish actual technical tasks.
DOES MENTORING PAY OFF?
It appears to be highly beneficial to utilize experienced developers to help newcomers become
successfully involved in OSS projects. However, a potential problem may arise: the mentor
needs to be engaged in support activities for some time, during which regular duties may suffer.
It is important to consider the impact of the mentoring task on the performance of the mentor.
0
20
40
60
80
100
120
140
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Magnitude
Weeks
Commits
Pulls
Interactions
9
To gain a coarse- gra ined understanding of the amount of work involved in mentoring besides
other duties, we observed mentors’ own development contributions to their respective OSS
project over time. Figure 3 shows the development contributions by each mentor during a period
of 13 months. The highlighted areas indicate the Open Academy program. Regardless of the
differences in contribution patterns between each mentor, the amount of contrib utions is reduced
while they are performing mentoring duties. This difference is seen more clearly in F igure 4,
whic hcompares the me nto rs’ average contr ibut io nper week while performing mentoring tasks
(“during mentoring”) and while performing regularly (“regular development”). However, in all
Fi g. 3. Mentor contributions over a time span of 13 months. Mentors are visibly less productive while engaged
in mentoring tasks during the Open Academy Program (months three to six).
10
cases, me ntors continued to contribute to their projects throughout the program. While the
performance drop can be significant, it can be justified by the fact that it is limited to the
onboarding period and its potential impact on increased productivity and increased innovation
that may be gained from new project members. In practice, this opportunity cost must be
evaluated on a case-by case basis. In many situations, the cost o f mentoring is overshadowed by
the potential benefits of gaining new me mbers, even if they are te mporary.
Fi g. 4. Comparison of average mentor contribution per week during mentoring and non -men t ori ng
pe ri o ds . Me nt o ri n g a ct i vi t ies r edu ce me n to r pr o du c ti vi t y, bu t c o n tr i bu ti o ns d o c on t in ue e ve n du ri n g th e
ment oring per io d.
0
5
10
15
20
25
30
35
40
Ruby on Rails
Kotlin
Phabricator
Socket.IO
Average contribution per week
During mentoring
Regular Development
11
The onboarding process in the Open Academy program resulted in better performance than
what would have been expected from the usual joining procedures in the participating OSS
projects. O ur interpretation is that the support given to developers did influence their onboarding
process, allowing them to become more active. The results indicate that mentoring increases the
chance of developers being exposed to, selecting, and performing tasks in a proactive and self-
directed manner in open source projects. We assume that this applies in other kinds of settings
where development is conducted by virtual teams.
The results presented here will inevitably vary depending on the particular context in which
projects take place and depend on validity limitations of the study. Onboa rding is a complex
construct with multiple impact and context factors that ma y limit the applicability o f the results.
The analysis presented here cannot distinguish between mentoring and other onboarding support
factors. Other contextual factors may play a role. For example, developers in the gro up receiving
onboarding support may be more involved and spend more time more regularly with the project.
Also, the Hackathon e vent may be another factor impacting the results. A further limitat ion is
that we have not examined whether developers in the non-supported group have received any
treatment during the observed period o f time that would impact the results.
An interesting question is whether the effect of onboarding support is permanent after mentoring
activities are removed. Since our focus was on newcomer initiation and not retention, and on
rather temporary virtual teams, we did not examine this issue further. Future work could a ssess
what is required to increase the likelihood of retaining p roject members after onboard ing in
highly volatile tea m environments.
In summary, mentoring is a viab le support activity for onboarding, helping to rapidly integrate
new members into OSS projects. Directly engaging with an open source project thro ugh an
experienced mentor gives better results than having no mentor. Furthermore, improved
onboarding performa nce seems to justify the cost of mentoring in ma ny cases.
Our results have direct relevance for leading and managing virtual teams in an open source
context. In light of our results, we have the following recommendations for project leaders and
managers.
12
Identify core developers who can spend a limited time on intensive mentoring. Provide
direct incentives for mentoring. For example, the opportunity to get help for pending
tasks can be attractive for potential mentors. Clearly limiting the duration of mentoring
reduces the negat ive effect o n the me ntor’s performa nce in other project tasks and can
reduce some of the resistance to participate.
Organize or sponsor collocated events, such as Hackathons, and use them to kick off the
mentoring period. Face-to- face events can help team members and mentors to focus on
problems which are difficult to overcome in a d istributed setting, and can further boost
the success o f o nboarding new members into virtual teams. Many open source projects
already arrange periodic collocated events and welcome participation by newcomers.
Engaging with these provides direct access to the project community.
Expect considerable variation in performance increases over time. Assessing the cost and
outcomes of mentoring requires understanding onboarding as a learning process which
does not proceed linearly. Some o nboarding activity will not be publicly visible. Engage
directly with mentors and newcomers to gain insight of how onboarding is progressing.
Adapt the onboarding program to project characteristics and culture and be prepared to
provide different kinds of support to mentors in different kinds of projects. Take the
maturity of the target project and its e xisting onboarding practices into account. Lo w-
maturity p rojects may require more support to instill a productive mentoring culture,
while mature projects may already have an e xisting culture of integrating new developers
and may be ready for tailoring towards more specific inclusio n ta rgets. Consider taking
on an expert with knowledge of open source projects and software engineering pedagogy
to guide the mentors so that their expertise is transferred most effectively.
13
REFERENCES
1.FrederickP. Brooks,Jr.“TheMythicalMan-Month”.Addison-Wesley, 1975.
2. Nader, A. E., Shamsuddin, A., Zahari, T. “Virtual R & D teams in small and medium
enterprises: A literaturereview”.Sc ie ntific Researcha nd Essays,pp. 1575-1590, Vol. 4, No. 13,
2009.
3. Bauer, T. N. Erdogan, B. “Organizational socialization: The effective onboarding of new
employees” in S. Zedeck (Ed.), APA Handbook of industrial and organizational p sychology,
Vol. 3, pp. 51-64. Washingto n, DC, USA, 2011, American Psychological Association.
4. Steinmac he r, I., W iese, I., Chaves, A.P., Gerosa, M.A., “Why do newco mers abando n ope n
source soft ware projects?,” 6th Inte rnatio na l Workshop on Cooperative and HumanA spects of
Software Engineering (CHASE), pp. 25-32, 2013.
5.A.BegelandB.Simon,“NoviceSoftwareDevelopers,AllO verAgain,”inProceedingsofthe
Fourth International Workshop on Computing Education Research (ICER), pp. 3 -14, New York,
2008, ACM.
6. F. Fagerholm, Johnson, P., Sanchez Guinea, A., Borenstein,J., Münch, J., “Onboarding in
OpenSourceSo ft ware Proje cts: A PreliminaryA na lys is,” IE EE 8thInter natio na lC onference on
Global So ftware Engineering Workshops (ICGSEW), 2013.
7. Fagerholm, F., Sanc he z G uinea A., Münch, J., Borenstein, J. “The Role of M entoring and
Project C harac teristics for Onbo arding in O pen Source Software Projects”. Empirica l Software
Engineering and Measurement (ESEM), 2014.
8. F. Fagerholm, O za, N., M ünc h, J., “A P lat form for Teaching Applied Distrib uted So ft ware
Develop ment: The O ngoing Journey of the Helsinki Software Factory”, 3rd International
Workshop on Collaborative Teaching of Globally Distributed Software Development
(CTGDSD), 2013.
14
ABOUT THE AUTHORS
Fabian Fagerholm is a doctoral student at the University of Helsinki,
working for the Department of Computer Science in its Software Systems
Engineering Research group. He has driven the design, implementation, and
operatio no f t he department’sSo ftwareFac tory laboratory fore xp erimental
software engineering research and education. His main interests are human
aspects of software engineering and software deve lopment processes.
Fagerholm obtained his Master’s degree from the University of Helsinki.
Contact him at fabian.fagerholm@helsinki.fi.
Alejandro Sanc hez Guinea is a research assistant a t the University of
Helsinki, working for the Department of Computer Science in its Software
Systems Engineering Research group. His main research interests are
software measurement, software processes improvement, and software
engineeringdesignmethodolo gies.HeobtainedhisMaster’sdegreefromthe
Institut Supérieur de l’Aéronautique et de l’Espace. Contact him a t
azsanche@cs.he lsinki.fi.
Jay Borenstein teaches computer science at Stanford University and also
runs Facebook Open Academy as part of a larger Facebook effort to
modernize education. Jay finds it very fulfilling to help motivated, bright
minds grow and succeed. Contact him at borenstein@cs.stanford.edu or
through www.facebook.com/openacademyprogram.
Jürgen Münch is a professor o f software systems engineering at the
University of Helsinki, head of its Software Systems Engineering Research
group, and principal investigator of the experimental R&D laboratory
“Software Factory”. His main interests are quantitative modeling and
analysis of software systems and processes, data a nalytics, and continuous
experime ntation. He received his PhD degree (Dr. rer. nat.) in Computer
Science from the University of Kaiserslaute rn, Germany. Contact him at
juergen.mue nch@cs.helsinki.fi.
15
[sidebar]
GLOBAL OPEN SOURCE PROGRAM
Starting in spring 2013, Stanford University and Facebook, Inc. launc hed the Open Academy
program (www. facebook.com/openacademyprogram), which invo lves several open source
projects and a significant number of top universities around the world. The intention of the
program is to improve computer science university curric ula through a practical and applied
learning experience.
Table 1. P artici pating Open Source Projects.
Project Name
Web Site
Included in This Study?
Freeseer
http://freeseer.github.io/
No
Kotlin
http://kotlin.jetbrains.org/
Yes
Mongo DB
http://www.mongodb.org/
No
Mozilla OpenBadges
http://openbadges.org/
No
Revie wBoard
http://www.reviewboard.org/
No
Phabricator
http://phabricator.org/
Yes
PouchDB
http://pouchdb.com/
No
Ruby on Rails
http://rubyonrails.org/
Yes
Socket.IO
http://socket.io/
Yes
The findings we report here are based on results from the 2013 edition of the Open Academy
program, including nine OSS projects, more than a dozen universities, and more than 120
students, referred to in the study as developers. Table 1 shows a ll projects participating in the
program and indicates four projects which we examined in this study. Students at the University
of Helsinki participated in these four projects through Software Factory, an experimental
researc h and development laboratory [8], which allowed us to follow the onboarding process in
these projects very closely. The projects represent a range of different project sizes, ages, and
technology types.
The teams participating in the program followed the typical p ractices o f open source
development and can be characterized as virtual teams with members being distributed across
16
different geographical locations with their local cultures. All the participating open source
projects had a wide temporal distribution, each receiving a constant stream o f contributions
around the clock.