Conference PaperPDF Available

Knowledge Transfer in Modern Code Review

Authors:

Abstract and Figures

Knowledge transfer is one of the main goals of modern code review, as shown by several studies that surveyed and interviewed developers. While knowledge transfer is a clear expectation of the code review process, there are no analytical studies using data mined from software repositories to assess the effectiveness of code review in "training" developers and improve their skills over time. We present a mining-based study investigating how and whether the code review process helps developers to improve their contributions to open source projects over time. We analyze 32,062 peer-reviewed pull requests (PRs) made across 4,981 GitHub repositories by 728 developers who created their GitHub account in 2015. We assume that PRs performed in the past by a developer D that have been subject to a code review process have "transferred knowledge" to D. Then, we verify if over time (i.e., when more and more reviewed PRs are made by D), the quality of the contributions made by D to open source projects increases (as assessed by proxies we defined, such as the acceptance of PRs, or the polarity of the sentiment in the review comments left for the submitted PRs). With the above measures, we were unable to capture the positive impact played by the code review process on the quality of developers' contributions. This might be due to several factors, including the choices we made in our experimental design.Additional investigations are needed to confirm or contradict such a negative result. CCS CONCEPTS • Software and its engineering → Collaboration in software development; Software libraries and repositories; • Information systems → Sentiment analysis.
Content may be subject to copyright.
Knowledge Transfer in Modern Code Review
Maria Caulo1, Bin Lin2, Gabriele Bavota2, Giuseppe Scanniello1, Michele Lanza2
1: University of Basilicata, Italy, 2: REVEAL @ Software Institute - Università della Svizzera Italiana, Switzerland
Email:{maria.caulo,giuseppe.scanniello}@unibas.it,{bin.lin,gabriele.bavota,michele.lanza}@usi.ch
ABSTRACT
Knowledge transfer is one of the main goals of modern code review,
as shown by several studies that surveyed and interviewed devel-
opers. While knowledge transfer is a clear expectation of the code
review process, there are no analytical studies using data mined
from software repositories to assess the eectiveness of code re-
view in “training” developers and improve their skills over time. We
present a mining-based study investigating how and whether the
code review process helps developers to improve their contributions
to open source projects over time. We analyze 32,062 peer-reviewed
pull requests (PRs) made across 4,981 GitHub repositories by 728
developers who created their GitHub account in 2015. We assume
that PRs performed in the past by a developer
D
that have been
subject to a code review process have “transferred knowledge” to
D
. Then, we verify if over time (i.e., when more and more reviewed
PRs are made by
D
), the quality of the contributions made by
D
to
open source projects increases (as assessed by proxies we dened,
such as the acceptance of PRs, or the polarity of the sentiment in
the review comments left for the submitted PRs). With the above
measures, we were unable to capture the positive impact played by
the code review process on the quality of developers’ contributions.
This might be due to several factors, including the choices we made
in our experimental design.Additional investigations are needed to
conrm or contradict such a negative result.
CCS CONCEPTS
Software and its engineering Collaboration in software
development
;
Software libraries and repositories
;
Informa-
tion systems Sentiment analysis.
KEYWORDS
knowledge transfer, code review, mining software repositories
ACM Reference Format:
Maria Caulo
1
, Bin Lin
2
, Gabriele Bavota
2
, Giuseppe Scanniello
1
, Michele
Lanza
2
. 2020. Knowledge Transfer in Modern Code Review. In 28th In-
ternational Conference on Program Comprehension (ICPC ’20), October 5–
6, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA, 11 pages.
https://doi.org/10.1145/3387904.3389270
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea
©2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7958-8/20/05. . . $15.00
https://doi.org/10.1145/3387904.3389270
1 INTRODUCTION
Code review is the process by which peer developers inspect the
code written by a teammate to assess its quality, to recommend
changes and, nally, to approve it for merging [
3
]. Previous works
have investigated code review from several perspectives. Some
authors studied the factors inuencing the likelihood of getting
a patch accepted as the results of the code review process [
5
,
41
],
while others studied the reviewing habits of developers in specic
contexts [
34
]. Several works focused on the benets, motivations,
and expectations of the review process. Most of these studies are
qualitative in nature [
2
,
6
,
33
], and were conducted by survey-
ing/interviewing developers or by inspecting their conversations
in mailing lists or issue trackers of open source projects. Only a few
researchers analyzed data from a quantitative perspective, mostly to
assess the impact of code review on code quality (e.g., the relation-
ship between code review and post-release defects) [4, 20, 24, 25].
The work conducted at Microsoft by Bacchelli and Bird [
2
] pro-
vided qualitative evidence of the central role played by code review
in knowledge transfer among developers. However, no quantitative,
mining-based study has tried to investigate this phenomenon, and
in particular to answer the following high-level research question
(RQ): Does code review enable knowledge transfer among developers?.
Answering this RQ, by mining software repositories, is far from
trivial since: (i) quantitatively measuring knowledge transfer is
challenging and an open research problem by itself and (ii) many
confounding factors come into play when collecting developer-
related data from online repositories. We quantitatively answer the
above research question by making the following assumptions:
The number of
reviewed
pull requests (PRs) a developer made in
the past across all repositories she contributed to is a proxy of the
transferred knowledge she beneted of. Given a developer
D
, we
assume that the higher the number of closed PRs (i.e., accepted
and rejected ones) that were subject to review (i.e., received
comments from peer developers)
D
performed, the higher the
knowledge transfer Dbeneted of.
We can measure the actual benets of the knowledge transfer expe-
rienced through the code review process by a developer, by observing
if, with the increase of the received knowledge transfer, the qual-
ity of her contributions to open source projects increases as well.
Given the various types of projects involved, it is necessary to
adopt contribution quality measures which are independent from
project languages and domains. We assume that how code re-
viewers respond to developers’ PRs can reect the quality of
the submitted contribution.We use as proxies for the quality of
the contributions provided by
D
:(i) the percentage of
D
’s PRs
that are accepted (expected to increase over time); (ii) the time
required to review the changes
D
contributes (expected to de-
crease); (iii) the amount of recommendations provided by the
ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea M. Caulo, et al.
reviewers to improve the code
D
contributes in PRs (expected
to decrease); and (iv) through sentiment analysis, the polarity of
the sentiment in the discussion of the PRs
D
submits (expected
to be more positive).
Based on these assumptions, we analyzed the contribution history
of 728 developers across 4,981 repositories hosted on GitHub. We
studied whether the number of reviewed PRs opened in the past by
a developer impacts the quality of her contributions over time.
We grouped developers into dierent sets based on the amount
of knowledge transfer they beneted of (low, medium-low, medium-
high, high), as assessed by the number of reviewed PRs they per-
formed in the past. Any result achieved with such an experimental
design may be due to a simple increase of the developer’s experience
over time rather than to the knowledge transfer that took place over
the reviewed PRs. To control for this, we replicated our analysis by
grouping the developers based on the number of commits rather
than the number of reviewed PRs they performed in the past (into
the four groups listed above). Using our experimental design with
the measures mentioned above, we were not able to capture the
positive impact played by the code review process on the quality of
developers’ contributions. Such a negative result might be due to
several factors, including the choices we made in our experimental
design (see Section 3). For this reason, additional studies are needed
to corroborate or contradict our ndings.
2 RELATED WORK
Recent works studying PR-based software development [
13
,
21
,
31
,
32
,
35
37
,
39
] have focused on the motivations of acceptance or
rejection of changes proposed in the form of PRs after the code
review process, identifying various inuencing factors, such as:
Programming Language
: proposed changes in Java are the
least easily accepted, whereas for C, Typescript, Scala and Go the
opposite happens [32], [36];
Size and Complexity of the PR
: the greater the size and com-
plexity of the PR to be reviewed (e.g., the number of the commits,
or the committed les) the lower the likelihood of acceptance
[39], [37], [35], [31], [21];
Addition and Change of les
: PRs which propose to add les
have a 8% lower chance of acceptance [
36
]; the same applies for
PRs which contain many changed les [31];
Excessive forking
: PR acceptance decreases when many forks
are present [32];
Tests
: contributions including test code are more likely to be
merged [39], [13];
Developer’s type
: if the PR was made by a member of the core
team, it has more chances to be accepted as compared to a PR
made by an external. The existence of a social connection be-
tween the requester, the project and the reviewer, positively
inuences merge decisions [36], [39], [21];
Experience in making PRs
: the higher the percentage of pre-
viously merged PRs by a developer, the higher the chances of
acceptance [
13
]. Developers with 20 to 50 months of experience
are the most productive in submitting and being accepted their
PRs [
32
]. When a PR is the rst made by a developer, the chance
of a merge considerably decreases [39], [37], [36], [21];
Number of comments
: the more comments have been made in
the PR discussion, the lower the chance of acceptance [
39
], [
35
].
Bosu et al. [
10
] investigated which factors lead to qualitatively
high code reviews. To discern if a code review feedback is useful or
not, the authors built and veried a classication model, and exe-
cuted it on 1.5 million review comments from 5 Microsoft projects,
nding several factors that aect the usefulness of reviews feedback:
(i) the working period of the reviewer in the company: in the rst
year she tends to provide more useful comments than afterward;
(ii) reviewers from dierent teams gave slightly more useful com-
ments than reviewers from the same team; (iii) the density of useful
comments increases over time; (iv) source code les had the highest
density of useful comments than other types of les; and (v) the
higher the size of the change (i.e., the number of les involved) that
the author would bring to a project, the lower the usefulness of
the review comments to such an author, conrming in some sense
the results by Weißgerber et al. [41]. Weißgerber et al. studied the
email archives of two open source projects to nd which factors
aect the acceptance of patches. They found that small patches (at
most 4 lines changed) have higher chances to get accepted, but the
size of a patch does not signicantly inuence acceptance time.
Baysal et al. [
5
] investigated which factors aect the likelihood of
a code change to be accepted after code review. They extracted both
“ordinary” factors (code quality-related) and non-technical ones,
such as organizational (company-related) and personal (developers-
related) features, nding that nontechnical factors signicantly
impact the code review outcome.
Company and developers-related factors of reviews practices
(in open-source projects) have been qualitatively studied also by
Rigby et al. [
33
,
34
], who compared, by means of emails archives and
version control repositories, the two techniques used by developers
of Apache server project: review-then-commit and commit-then-
review [
33
]. Apache reviews resulted to be early and frequent,
related to small and completed patches (in line with Weißger-
ber et al. [
41
]), and conducted by a small number of developers.
Rigby et al. [
34
] also investigated (i) the mechanisms and behaviours
that developers use to nd (or ignore) code changes they are com-
petent to review and (ii) how developers interact with one another
during the review process.
Research has also been conducted to study how software quality
is impacted by code reviews, and how they allow to identify defects.
Kemerer and Paulk [
20
] studied the review rate to adopt to have
eective reviews when removing defects or inuencing the software
quality. The authors studied two datasets from a personal software
process (PSP) approach with regression and mixed models. The
PSP review rate turned out to be signicant for the eectiveness of
bug-xing tasks. Mäntylä et al. [
29
] classied the issues found by
both students and professional developers during code review. They
found that 75% of issues concerned “evolvability” issues (e.g., limited
readability/maintainability of code). Beller et al. [
6
] conrmed this
nding by classifying changes brought by the reviewed code of
two open-source software projects. They found a 3:1 ratio between
maintainability-related and functional defects. They also found that
bug-xing tasks need fewer changes than others, and the person
who conducts the review does not impact the number of required
changes. Czerwonka et al. [
15
] observed that code reviews often
Knowledge Transfer in Modern Code Review ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea
do not identify functionality problems. The authors found that
code reviews performed by unskilled developers are not eective,
highlighting the importance of social aspects in code review.
McIntosh et al. quantitatively studied the relationship between
software quality and (i) the amount of changes that have been
code reviewed, and, (ii) code review participation, i.e., the degree of
reviewer involvement in the code review process [
24
]. The authors
studied three projects and found that both aspects are linked to
software quality: poorly reviewed code leads to components with
up to two post-release defects; low participation up to ve. Bavota
and Russo [
4
] studied the impact of code review on the quality of
the committed code. They found that unreviewed commits have
twice more chances of introducing bugs as compared to reviewed
commits. Also, code committed after a review is more readable than
unreviewed code.
Morales et al. [
26
] studied the eect of code review practices
on software design quality. They considered the occurrences of 7
design and implementation anti-patterns and found that the lower
the review coverage the higher the likelihood to observe those anti-
patterns in code. Bernart et al. [
7
,
8
] highlighted that continuous
code review practices in agile development produce high benets
to a project, such as (i) the reduction of the eort in software
engineering practices, (ii) the support of collective ownership; and
(iii) the improvements in the general understandability of the code.
Recent research work also focused on the content of conversa-
tions deriving from the code review activity, the topic of the discus-
sions, and how developers emotionally felt [
16
,
22
,
30
]. Li et al. [
22
]
classied review comments according to a custom taxonomy of
topics, nding that (i) PRs submitted by inexperienced contributors
are likely to have potential problems even if they passed the tests;
and (ii) external contributors tend to not follow project conventions
in their early contributions. Destefanis et al. [
16
] analyzed GitHub
issues commenters (i.e., those users who only post comments with-
out posting any issues nor proposing changes to repositories) from
the eectiveness perspective. The authors found that commenters
are less polite and positive, and express a lower level of emotions
in their comments than other types of users. Ortu et al. [
30
] found
that GitHub issues with a high level of Anger, Sadness, Arousal
and Dominance are less likely to be merged, while high values of
Valence and Joy tend to make issues merged.
Bacchelli and Bird [
2
] studied the tool-based code review prac-
tices adopted at Microsoft, reporting that even if nding defects
remains the main motivation for reviews, they provide additional
benets, such as knowledge transfer, increased team awareness,
and creation of alternative solutions to problems.
2.1 Taking Stock
The relevance of code reviews has been investigated from dierent
perspectives. The eect of code reviews on knowledge transfer
has been only marginally studied, let alone from a quantitative
perspective, which is the goal of this paper: We used the number
of past reviewed PRs submitted by a developer as a proxy for the
amount of knowledge transfer she has been subject to. Then, we
assess whether with the increase in received knowledge transfer,
the quality of submitted code contributions improves over time.
From this perspective, the most similar work is the recent one by
Chen et al. [
13
], in which the authors found that the highest the
percentage of previously merged PRs by a developer, the higher the
chances of acceptance of new PRs.
Dierently from Chen et al. [
13
], we consider past submitted
PRs (both accepted and rejected) that have been actually reviewed
(i.e., received at least one comment from peer developers), to get
a “reliable” proxy of the amount of knowledge transfer of a devel-
oper in the past. Also, besides analyzing the impact of the received
knowledge transfer on the likelihood of acceptance for future sub-
mitted PRs, we consider many other proxies to assess the quality
of the contributions submitted by a developer.
3 STUDY DESIGN
3.1 Hypothesis
Software development is a knowledge-intensive activity [
9
]. Quali-
tative research provided evidence that code review plays a pivotal
role in knowledge transfer among developers [
2
]. However, no
quantitative evidence exists in support of this claim. In this study,
we mine software repositories to quantitatively assess the knowl-
edge transfer happening thanks to code review.
There is no well-established metric to assess the “quantity of
knowledge” involved in a given process. Knowledge can be classi-
ed as either explicit (which “can be spoken and codied in words,
gures or symbols”) or tacit (which “is embedded in individuals’
minds and is hard to express and communicate to others”) [
1
]. We
focus on the tacit knowledge acquired by developers over time,
which cannot be easily seen and quantied. More specically, we
investigate whether the experience gained by receiving feedback
during code review improves the quality of developers’ future con-
tributions to open source projects. Intuitively, one might expect
that developers gradually gain knowledge by receiving feedback
from their peers, thus improving their skills over time. Therefore,
we formulated and studied the following hypothesis:
H.
The quality of developers’ contributions to software
projects will increase with the experience gained from
their past reviewed PRs.
3.2 Study Context
The study context consists of 728 developers, 4,981 software repos-
itories they contributed to, and 77,456 closed PRs (among which
32,062 PRs are peer-reviewed).
3.2.1 Developers selection. To run our study, we collected infor-
mation about GitHub users (from here onward referred to also as
developers), who created their account in 2015. This was done to
collect at least four years of contribution history for each developer.
Since data was collected in September 2019, we can observe
4
years of contributions even for users who created their GitHub ac-
count in December 2015. A four-year time window is long enough
to observe enough PRs submitted by developers and, consequently,
to study the knowledge transfer over time.
We used the GitHub Search API
1
to retrieve the developers who
joined GitHub on the rst day of each month in 2015. Since the
GitHub Search API only provides up to 1,000 results for search, we
collected a total of 12,000 developers who created their account
1https://developer.github.com/v3/search/
ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea M. Caulo, et al.
in 2015 (i.e., 1,000 per month). As the next step, we collected all
the PRs submitted by these 12,000 developers across all GitHub
repositories they contributed to.
Since the GitHub Search API cannot return over 1,000 PRs for
a single developer, to ensure the data completeness, we excluded
nine developers who submitted over 1,000 PRs in the studied time
window. This reduced the number of developers to 11,991.
We removed from our dataset developers who submitted too few
PRs. This was needed since we want to analyze how the quality
of developers’ contributions to open source projects changes over
time. Having only one or two PRs submitted by a developer would
not allow to perform such an analysis. For this reason, we excluded
from our study all developers who submitted less than 30 PRs in the
considered time period (i.e., 2014-2019). This further lter removed
11,173 developers, leaving 818 developers in total.
3.2.2 Pull requests collection and filtering. We collected all the
“closed” PRs submitted by the 818 subject developers from the day
they joined GitHub until the end of September 2019, when we
collected the data. This led to a total of 77,456 PRs spanning 9,845
repositories. We only focused on closed PRs to be sure that the PRs
underwent a code review process and, thus, were either accepted
or rejected instead of still pending. For each PR, we collected the
following information:
(1) Creation date: the date in which the PR was submitted.
(2) Acceptance: whether the closed PR was accepted.
(3) Closing date: the date in which the PR was closed.
(4)
Source code comments: the comments left by the reviewers that
are explicitly linked to parts of the code submitted for review.
Comments left by the PR author are excluded.
(5)
General comments: all the comments left in the PR discussion
by all the developers other than the PR author, excluding source
code comments. These comments are generally used to ask for
clarications or to explain why a PR should be accepted/rejected.
Source code comments, instead, reports explicit action items for
the PR author to improve the submitted code. We separate the
source code comments and the general comments, as there might
be dierent levels of technical details in these two categories.
(6) Author: the author of the PR.
(7)
Contributors: all the developers who have been involved in the
discussion and handling of the PR.
Since we plan to use the comments related to each PR as one of
the variables for our study, i.e., to assess the amount of feedback
received by developers as well as to check whether a PR was actually
subject to code review (meaning, it received at least one comment),
we removed general comments posted by bots (this problem does
not occur for source code comments). We discriminated whether a
comment was left by a bot following the steps below:
(1)
We calculated how many general comments each commenter
(i.e., entity who posted at least one comment in the considered
PRs) left in the PRs and sorted them in descending order. As a
result, around 60% of the comments were left by the top-500
commenters, with a long tail of commenters only posting a
handful of comments in their history.
(2)
For these top-500 commenters, we manually checked their user-
names and prole images. If the username contained “bot,” or
the prole image represented a robot, we then further inspected
whether their comments followed a predened structure, e.g.,
“Automated fastforward with [GitMate.io] (https://gitmate.io)
was successful!”, by gitmate-bot. If this was the case, we consid-
ered the commenter as a bot.
(3)
For the rest of the commenters, we manually checked the GitHub
proles of those whose username contained “bot”.
This process led to the disclosure of 147 bot commenters. The
manual identication of the bots was done by the rst author,
and the nal output (i.e., the 147 removed bots) is available in our
replication package [12].
After this cleaning process, we further excluded 90 developers
from our study since they authored less than 30 closed PRs (includ-
ing those which did not receive comments). This led to the nal
number of 728 developers considered in our study, who authored a
total of 77,456 PRs (among which 32,062 PRs received comments).
3.2.3 Project collection. We cloned all the projects
2
in which the
selected developers submitted at least one PR, for a total of 4,981
repositories. To provide a better overview of the collected projects,
our replication package[
12
] also includes basic information (e.g.,
programming languages, project size) of these repositories.
3.3 Measures
To verify our hypothesis, we use proxies to measure the knowledge
transfer experienced by developers through their past reviewed
PRs and to assess the quality of developers’ contribution over time.
3.3.1 Knowledge measures. We use the number of
reviewed
PRs a
developer contributed (authored) in the past (i.e., before the current
PR) as a proxy of the amount of knowledge transferred to her thanks
to the code review process. That is, we assume that the more closed
and peer-reviewed PRs a developer has, the more knowledge the
developer gained. In our study, we consider that peer-reviewed PRs
are those which received at least one comment by non-bot users.
The rationale behind this choice is that if no comments are given by
other developers, we assume that the PR was not subject of a formal
review process and, thus, it is not interesting for our goals, since
no transfer knowledge can happen in that PR. We compute this
number for each developer before each of their peer-reviewed PR.
We use this variable to split developers into dierent groups based
on the knowledge transfer they experienced (i.e., low, medium-low,
medium-high, and high), and compare the quality of the submitted
contributions (as assessed by the proxies described in the following
section) among the dierent groups. This means that the same
developer can belong, in dierent time periods, to dierent groups
(i.e., she starts in the low transfer knowledge group, she then moves
to medium-low, etc.). The exact process used for data analysis is
detailed later on.
To verify whether the quality of the submitted contributions is
actually inuenced by the knowledge transfer during code review
or if it is just a result of the increasing developer’s experience over
time, we also collected the number of commits performed in the
past by each developer before submitting each PR. The commits are
2
This was done since we also used in our analysis the number of commits performed
by the studied developers over time. While this information can be collected through
the GitHub APIs as well, cloning the repositories simplied data collection.
Knowledge Transfer in Modern Code Review ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea
extracted from all repositories in which the developers submitted
at least one PR. As done for the past PRs, we use past commits
to split developers into groups and contrast the quality of their
contributions over time.
This allows us to see whether potential dierences in contribu-
tion quality among the groups can be attributed to the code review
process put into place in PR (i.e., these dierences are visible when
splitting developers based on past reviewed PRs, but not when
splitting them based on past commits) or if they are mainly due
to changes in the experience over time (i.e., the dierences can
be observed both when splitting by past reviewed PRs as well as
by past commits). When retrieving past commits for developers,
there are two issues worth noting: 1) The developer’s username
on GitHub (as extracted using the GitHub API) might be dierent
from the author name in the Git commit history (as extracted from
the Git logs); 2) One developer might use several dierent iden-
tities to author commits. Therefore, we employed the following
process to map GitHub accounts to their corresponding identities.
For each of the 728 developers included in our study, we rst tried
to match their GitHub account to the author names in the commits
of the repositories they contributed to through PRs. As a result,
360 GitHub usernames could be matched to the commit author
names, while no link could be established for the remaining 368
accounts. For this latter, we manually checked their GitHub prole
and tried to match their displayed name and email to the author
names and emails in Git logs. If no match was found, we manually
inspected the “contributors” page of their corresponding reposito-
ries on GitHub to check if the developer has made any commits. If
the developer did not appear in the list of contributors, we assume
no commit was made by the developer. Otherwise, we manually
browsed developers’ commits to those repositories (which is not
possible to retrieve with the GitHub API), and obtained the com-
mit hash. Then, in the local repository, we checked the commit
information linked to the commit hash, such that we could obtain
the author names they used for commits. As developers might use
multiple author names in the commits, we also recorded the other
author names associated with the same email addresses they used,
and iterated this process with the newly found author names until
no new author name emerged. This process was performed by the
second author. Through this manual process, we managed to collect
the identities of 715 developers, while for the rest 13 we assume
they did not make any commit.
3.3.2 Contribution quality measures. We assume that with the
knowledge transfer one of the major benets developers receive is
the improvement of the quality of their contributions (i.e., PRs) over
time. While there are a few existing metrics to evaluate code qual-
ity (see e.g., CK metrics [
38
] and bug count [
28
]), some limitations
hinder their applications in our study context: 1) The software repos-
itories involved can be written in dierent programming languages,
making it impossible to set universal thresholds for CK metrics, let
alone not all programming languages are object-oriented. 2) Metrics
like bug count rely on the assumption that bugs can be identied
thanks to the consistent usage of issue tracking systems, which
is not always the case.We do not pick repositories of specic lan-
guages or programming domains as we believe knowledge gained
from dierent types of projects can still be benecial. In our study
we adopt quality contribution measures which are independent
from the programming language and application domain. For each
submitted PR, we use the following contribution quality measures
as dependent variables:
General comments received. The number of general comments re-
ceived from all the developers other than the PR author. We expect
that with the increase of past reviewed PRs (i.e., with more knowl-
edge transfer the developer beneted of), fewer discussions will be
triggered by the PR, leading to a reduction of general comments.
Source code comments received. The number of source code com-
ments received from all the developers other than the PR author.
Similarly to general comments received, we would expect that the
source code comments received will decrease over time as well.
Acceptance Rate. The rate of the past PRs acceptance. We expect
that the percentage of accepted PRs over time will increase.
Accepted PR closing time. The time (expressed in minutes) be-
tween the creation and the closing of the accepted PRs. We expect
that the time needed to accept PRs will decrease over time.
Sentiment of source code comments. The sentiment polarity of all
source code comments in the PRs. We expect that with the increase
of contribution quality more appreciation will be received in the
code review. Thus, the sentiment of the developer embedded in the
comments should be increasingly positive over time.
Sentiment of general comments. The sentiment polarity of all the
general comments in the PRs. Similarly to source code comments,
we expect general comments will also be more positive over time.
Sentiment analysis.
To calculate the sentiment polarity of the
comments in the PRs, we adopted
SentiStrength-SE
[
19
] and
Senti4SD
[
11
]. Both tools are designed to work on software-related
datasets. For each PR, we aggregate all comments and feed them into
these two sentiment analysis tools. Comments are not considered
if 1) they are empty, which is possible in general comments when
the reviewer just assigns a status to the PR (e.g., Approved”); or
2) the text contains special characters other than English letters,
numbers, punctuation, or emoticons.
SentiStrength-SE
returns a negative sentiment score (from -1
to -5) and a sentiment score (from +1 to +5). We summed up the
two scores and standardized the result in the following way, as
suggested by the original authors:
(1) a new score “-1” is assigned if the sum is lower than -1;
(2) a new score “0” is assigned if the sum is in [-1; 1];
(3) a new score “1” is assigned if the sum is higher than 1.
Senti4SD
returns three sentiment polarity categories (i.e., “posi-
tive”, “negative” or “neutral”), and we standardized these values to
“-1”, “0”, and “1”, respectively.
3.4 Data Analysis
Our hypothesis suggests that developers, who beneted of higher
knowledge transfer thanks to the past reviewed PRs they submitted,
are also the ones contributing higher quality PRs in the project.
We verify this hypothesis thanks to the data previously extracted:
Each peer-reviewed PR
i
submitted by any of the studied developers
represents a row in our dataset, reporting (i) the knowledge transfer
measures, meaning the number of past reviewed PRs performed by
the developer before PR
i
as well as our control variable, represented
by the number of commits she performed in the past (i.e., before
ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea M. Caulo, et al.
PR
i
); and (ii) the contribution quality measures (i.e., acceptance of
PRs, number of general comments, etc.). However, the contribution
quality measures cannot be only computed for the current PR. In-
deed, this would make our analysis heavily biased by outliers. For
example, a developer having a certain level of knowledge transfer
measures may have submitted nine PRs before PR
i
, having all of
them accepted but PR
i
. Indicating a 90% acceptance rate as a proxy
for the quality of her recent contributions would be more repre-
sentative of the actual facts rather than reporting a 0% since only
considering PR
i
. Therefore, we rely on a xed sliding window with
a length of ve PRs to compute the contribution quality measures
for each row in our dataset. Instead of reporting the contribution
quality measures only for PR
i
, we compute these measures on the
most recent ve PRs (including PR
i
) submitted by PR
i
’s author.
There are two exceptions to this process. First, for the measure
accepted PR closing time we consider the most recent ve accepted
PRs. Second, for the sentiment polarity, we only considered the
comments in PR
i
, since there is a guarantee that PR
i
contains at
least one comment. We ignore the history of each developer before
she performed at least ve PRs. This ensures that there are always
ve PRs falling into the xed sliding window.
Following the above-described process, we created two dierent
datasets, named cross-project scenario and single-project scenario.
In the rst, we consider all PRs and all commits performed across
all repositories to which a developer contributed, assuming that
knowledge acquired thanks to the code review process performed
on project
Px
, can help developers in submitting better contribu-
tions not only to project
Px
, but also to project
Py
. While both
datasets contain one row for each PR performed by the developer
in any repository, they dier in the way we compute the knowledge
transfer measures and the contribution quality measures. Given a row
in the dataset representing the PR
i
, in the single-project scenario
only PRs and commits performed in the past by the developer in
the same project PR
i
belongs to are considered. This means, for
example, that a developer who made 50 PRs in the past, only 12 of
which belong to the same project as PR
i
, will get 12 as the number
of past reviewed PRs she submitted in the row corresponding to
PR
i
. Dierently, in the cross-project scenario, these measures are
computed by considering all PRs and commits submitted in any
project by PRi’s developer (50 in the example).
Once the datasets were created, we split their rows (i.e., contri-
butions representing PRs) based on the knowledge transfer measures
of the developer who submitted them. In particular, we extract the
rst (Q1), second (Q2), and third (Q3) quartile of the distributions
for the number of past reviewed PRs submitted and the number of
past commits performed by developers. Then, we split the rows into
four groups based on the number of past reviewed PRs submitted:
low (
Q1), medium-low (
>
Q1 &
Q2), medium-high (
>
Q2 &
Q3), and high (
>
Q3). Note that, while a contribution (i.e., a row
in our dataset) can only appear in one of these groups, the PRs
submitted by a developer can appear in more than one group, since
her number of past reviewed PRs submitted increases over time. We
perform the same grouping also for the number of past commits.
Table 1 lists the value ranges of each “knowledge” measure (the
value denoted by
n
) for each group in both cross-project and single-
project scenarios. For example, when we are considering the single
project scenario and the knowledge measure # past reviewed PRs,
Table 1: Groups for each “knowledge” measure
Knowledge
measure
Knowledge
group
Study scenario
Single project Cross project
# past reviewed PRs
low n11 n 19
median-low 11<n26 19<n46
median-high 26<n64 46<n110
high n>64 n>110
# past commits
low n20 n 52
median-low 20<n67 52<n171
median-high 67<n215 171<n446
high n>215 n>446
all the PRs whose author made up to eleven PRs in the past fall into
the low experience group.
3.4.1 Statistical methods. For both cross-project and single-project
scenarios and each of the experience measures (i.e., # past reviewed
PRs, # past commits), we compare via box plots the contribution
quality measures in dierent knowledge groups. The comparisons
are also performed via the Mann-Whitney test [
14
], with results
intended as statistically signicant at
α=
0
.
05. We use the Mann-
Whitney test because it is a robust non-parametric test and we
did not know a priori (and we could not assume) what kind of
distribution of data we had [
27
]. To control the impact of multiple
pairwise comparisons (e.g., the “low knowledge group” is compared
with all the other three groups), we adjust
p
-values with Holm’s
correction [
18
]. We estimate the magnitude of the dierences by
using the Cli’s Delta (
d
), a non-parametric eect size measure.
We follow well-established guidelines to interpret the eect size:
negligible for
|d|<
0
.
148, small for 0
.
148
≤ |d|<
0
.
33, medium for
0.33 ≤ |d|<0.474, and large for |d| ≥ 0.474 [17].
Note that, before running the above-described analyses, we rst
remove outliers from the compared data distributions. Given Q1
and Q3 the rst and third quartile of a given distribution, and IQR
the interquartile range computed as Q3-Q1, we remove all values
lower than Q1-(1.5
×
IQR) or higher than Q3+(1.5
×
IQR)[
40
]. This
was done for the analyses carried out for (i) the number of general
comments received,(u) the number of source code comments received,
and (iii) the accepted PR closing time. This was instead not needed
for the percentage of accepted PRs (as it is always between 0 and 1),
and for the comment sentiment scores (always between -1 and 1).
4 RESULTS
The box plots in Figures 1, 2, 3, and 4 show the trends of the de-
pendent variables (i.e., the contribution quality measures), for both
the cross- (left) and single- (right) project scenarios, with respect
to the two independent variables (i.e., the knowledge measures).
In particular, the top part of each gure reports the results obtained
when splitting developers into “knowledge groups” based on the
past reviewed PRs they submitted, while the bottom part shows
the same results when grouping developers based on the number
of past commits they performed. The red dot represents the mean
value in each box plot.
In Table 2, we report the results of the Mann-Whitney test and
Cli’s Delta for past reviewed PRs in the cross-project scenario. The
same analyses are reported in Tables 3 (cross-project) and 4 (single-
project) for past commits. Due to lack of space, the tables only
report results of comparisons that are (i) statistically signicant (i.e.,
adjusted p-value lower than 0.05), and (ii) have at least a small eect
Knowledge Transfer in Modern Code Review ICPC ’20, October 5–6, 2020, Seoul, Republic of Korea
● ●
low mediumlow mediumhigh high
0.0 0.2 0.4 0.6 0.8 1.0
Developer Experience
Average Acceptance
● ●
low mediumlow mediumhigh high
0.0 0.2 0.4 0.6 0.8 1.0
Developer Experience
Average Acceptance
Cross-project Single-project
low med-low med-high high
Knowledge group (by past reviewed PRs submitted)
low med-low med-high high
100
80
60
40
20
0
Acceptance Rate
100
80
60
40
20
0
Acceptance Rate
● ●
● ●
low mediumlow mediumhigh high
0.0 0.2 0.4 0.6 0.8 1.0
Developer Experience
Average Acceptance
● ●
● ●
low mediumlow mediumhigh high
0.0 0.2 0.4 0.6 0.8 1.0
Developer Experience
Average Acceptance
Cross-project Single-project
low med-low med-high high
Knowledge group (by commits performed in the past)
low med-low med-high high
100806040200
Acceptance Rate
100
806040200
Acceptance Rate
(a) (b)
(c) (d)
Figure 1: Acceptance rate for PRs submitted by developers.
low mediumlow mediumhigh high
0 5000 10000 15000 20000
Developer Experience
Average Closing Time of PR
low mediumlow mediumhigh high
0 5000 10000 15000 20000
Developer Experience
Average Closing Time of PR
Cross-project Single-project
low med-low med-high high
Knowledge group (by past reviewed PRs submitted)
low med-low med-high high
20k
15k10k5k
0
Accepted PRs Closing Time