Conference PaperPDF Available

Who Should Review My Code? A File Location-Based Code-Reviewer Recommendation Approach for Modern Code Review

Authors:

Abstract and Figures

Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose REVFINDER, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that REVFINDER accurately recommended 79% of reviews with a top 10 recommendation. REVFINDER also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of REVFINDER is 3 times better than that of a baseline approach. We believe that REVFINDER could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.
Content may be subject to copyright.
Who Should Review My Code?
A File Location-Based Code-Reviewer Recommendation Approach for Modern Code Review
Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula,
Norihiro Yoshida, Hajimu Iida, Ken-ichi Matsumoto
Nara Institute of Science and Technology, Osaka University, Nagoya University, Japan
{patanamon-t, chakkrit-t, matumoto}@is.naist.jp, iida@itc.naist.jp, raula-k@ist.osaka-u.ac.jp, yoshida@ertl.jp
Abstract—Software code review is an inspection of a code
change by an independent third-party developer in order to iden-
tify and fix defects before an integration. Effectively performing
code review can improve the overall software quality. In recent
years, Modern Code Review (MCR), a lightweight and tool-based
code inspection, has been widely adopted in both proprietary
and open-source software systems. Finding appropriate code-
reviewers in MCR is a necessary step of reviewing a code change.
However, little research is known the difficulty of finding code-
reviewers in a distributed software development and its impact on
reviewing time. In this paper, we investigate the impact of reviews
with code-reviewer assignment problem has on reviewing time.
We find that reviews with code-reviewer assignment problem
take 12 days longer to approve a code change. To help devel-
opers find appropriate code-reviewers, we propose REV FIN DER,
a file location-based code-reviewer recommendation approach.
We leverage a similarity of previously reviewed file path to
recommend an appropriate code-reviewer. The intuition is that
files that are located in similar file paths would be managed
and reviewed by similar experienced code-reviewers. Through an
empirical evaluation on a case study of 42,045 reviews of Android
Open Source Project (AOSP), OpenStack, Qt and LibreOffice
projects, we find that RE VFI NDER accurately recommended 79%
of reviews with a top 10 recommendation. REV FINDER also
correctly recommended the code-reviewers with a median rank
of 4. The overall ranking of RE VFI NDER is 3 times better than
that of a baseline approach. We believe that REV FINDER could
be applied to MCR in order to help developers find appropriate
code-reviewers and speed up the overall code review process.
KeywordsDistributed Software Development, Modern Code
Review, Code-Reviewer Recommendation
I. INT ROD UC TI ON
Software code review has been an engineering best practice
for over 35 years [1]. It is an inspection of a code change
by an independent third-party developer to identify and fix
defects before integrating a code change into a system. While
a traditional code review, a formal code review involving in-
person meetings, has shown to improve the overall quality
of software product [2–4], however, the traditional practice
is limited in the adoption to the globally-distributed software
development [5].
Recently, Modern Code Review (MCR) [6], an informal,
lightweight and tool-based code review methodology, has
emerged as a widely used tool in both industrial software
and open-source software. Generally, when a code change,
i.e., patch, is submitted for review, the author will invite
a set of code-reviewers to review the code change. Then,
the code-reviewers will discuss the change and suggest fixes.
The code change will be integrated to the main version
control system when one or more code-reviewers approve the
change. Rigby et al. [7] find that code reviews are expensive
because they require code-reviewers to read, understand, and
critique a code change. To effectively assess a code change,
an author should find appropriate code-reviewers who have a
deep understanding of the related source code to well examine
code changes and find defects [4]. As a huge amount of
code changes must be reviewed before the integration, finding
appropriate code-reviewers to every piece of code changes can
be time-consuming and labor-intensive for developers [8].
In this paper, we first set out to better understand how
do reviews with code-reviewer assignment problem impact
reviewing time. In particular, we investigate (1) what is the
proportion of reviews with code-reviewer assignment problem;
(2) how do reviews with code-reviewer assignment problem
impact reviewing time; and (3) does a code-reviewer rec-
ommendation tool is necessary in distributed software devel-
opment. We manually examine 7,597 comments from 1,461
representative review samples of four open-source software
systems to identify the reviews with a discussion of code-
reviewer assignment. Our results show that 4%-30% of re-
views have code-reviewer assignment problem. These reviews
significantly take 12 days longer to approve a code change. Our
findings also show that a code-reviewer recommendation tool
is necessary in distributed software developerment to speed up
a code review process.
To help developers find appropriate code-reviewers, we
propose RE VFI ND ER, a file location-based code-reviewer rec-
ommendation approach. We leverage a similarity of previously
reviewed file path to recommend an appropriate code-reviewer.
The intuition is that files that are located in similar file paths
would be managed and reviewed by similar experienced code-
reviewers. In order to evaluate RE VFI ND ER , we perform a case
study on 42,045 reviews of four open-source software systems
i.e., Android Open Source Project (AOSP), OpenStack, Qt
and LibreOffice. The results show that REVFINDER correctly
recommended 79% of reviews with a top 10 recommenda-
tion. RE VFI ND ER is 4 times more accurate than a baseline
approach, indicating that leveraging a similarity of previously
reviewed file path can accurately recommend code-reviewers.
REV FIN DE R also recommended the correct code-reviewers
with a median rank of 4. The overall ranking of RE VFI ND ER
is 3 times better than that of the baseline approach, indicating
that RE VFI ND ER provides a better ranking of recommended
code-reviewers. Therefore, we believe that REV FIN DE R can
help developers find appropriate code-reviewers and speed up
the overall code review process.
The main contributions of this paper are:
An exploratory study on the impact of code-reviewer
assignment on reviewing time.
REV FIN DE R, a file location-based code-reviewers recom-
mendation approach, with promising evaluation results
to automatically suggest appropriate code-reviewers for
MCR.
A rich data set of reviews data in order to encourage
future research in the area of code-reviewer recommen-
dation.1
The remainder of the paper is organized as follows. Sec-
tion II describes a background of MCR process and related
work. Section III presents an exploratory study of the impact
of code-reviewer assignment on reviewing time. Section IV
presents our proposed approach, REVFINDER. Section V de-
scribes an empirical evaluation of the approach. Section VI
presents the results of our empirical evaluation. Section VII
discusses the performance and applicability of REVFINDER,
and addresses the threats to validity. Finally, Section VIII
draws our conclusion and future work.
II. BACKGROU ND A ND RE LATE D WOR K
A. Modern Code Review
Code review is the manual assessment of source code
by humans, mainly intended to identify defects and quality
problems [9]. However, the traditional code review practice
is limited in the adoption to the globally-distributed software
development [5]. In recent years, Modern Code Review (MCR)
has been developed as a tool-based code review system which
is less formal than the traditional one. MCR becomes popular
and widely used in both proprietary software (e.g., Google,
Cisco, Microsoft) and open-source software (e.g., Android, Qt,
LibreOffice) [6]. Below, we briefly describe the Gerrit-based
code review system, which is a prominent tool and widely used
in previous studies [10–13].
To illustrate the Gerrit-based code review system, we use
an example of Android reviews ID 187672(see Figure 1).
The developers’ information are blinded for privacy reasons.
In general, there are four steps as following:
1) An author (Smith) creates a change and submits it for
review.
2) The author (Smith) invites a set of reviewers (i.e., Code-
reviewers and Verifiers) to review the patch.
3) A code-reviewer (Alex) will discuss the change and
suggest fixes. A verifier (John) will execute tests to
ensure that: (1) patch truly fix the defect or add the feature
that the authors claim to, and (2) do not cause regression
of system behavior. The author (Smith) needs to revise
and re-submit the change to address the suggestion of the
reviewers (Alex and John).
4) The change will be integrated to the main repository when
it receives a code-review score of +2 (Approved) from a
code-reviewer and a verified score of +1 (Verified) from
a verifier. Then, the review will be marked as “Merged.
Otherwise, the change will be automatically rejected if
1http://github.com/patanamon/revfinder
2https://android-review.googlesource.com/#/c/18767/
Fig. 1: An example of Gerrit code reviews in Android Open Source
Project.
it receives a code-review score of -2 (Rejected) and the
review will be marked as “Abandoned.
From the example in Figure 1, we observe that finding
appropriate code-reviewers is a tedious task for developers.
The author (Smith) has a code-reviewer assignment problem
since he cannot find code-reviewers to review his change. To
find code-reviewers, the author (Smith) asks other developers
in the discussion:“Can you please add appropriate reviewers
for this change?” Finding an appropriate code-reviewer can
increase the reviewing time and decrease the effectiveness of
MCR process. Therefore, an automatic code-reviewer recom-
mendation tool would help developers reduce their time and
effort.
B. Related Work
We discuss the related work with respect to code review in
distributed software development and expert recommendation.
Software Code Review. Understanding software code review
practices receive much attention in the last few years. Rigby et
al. [14] empirically investigate the activity, participation, re-
view interval and quality of code review process in an open-
source software. They observe that if a code change is not
reviewed immediately, it will not likely be reviewed. Weiger-
ber et al. [15] examine the characteristics of patch acceptance
and found that smaller code changes are more likely to be
accepted. They also observed that some code changes need
more than two weeks until being merged. Rigby and Bird [11]
also observe that 50% of reviews have reviewing time almost
30 days. A recent study by Tsay et al. [16] also find that some
code changes are awaiting to be merged for 2 months.
To better understand what influences code review interval,
Jiang et al. [17] empirically investigate the characteristics of
TABLE I: A statistical summary of datasets for each studied system.
Android OpenStack Qt LibreOffice
Studied Period 10/2008 - 01/2012 07/2011 - 05/2012 05/2011 - 05/2012 03/2012 - 06/2014
# Selected Reviews 5,126 6,586 23,810 6,523
# Code-reviewers 94 82 202 63
# Files 26,840 16,953 78,401 35,273
Avg. Code-reviewers per Review 1.06 1.44 1.07 1.01
Avg. Files per Review 8.26 6.04 10.64 11.14
accepted code changes and its reviewing time on a Linux
case study. They found that reviewing time is impacted by
submission time, the number of affected subsystem, the num-
ber of code-reviewers and developer’s experience. Bosu et
al. [18] also found that the reviewing time of a code change
submitted by core developers is shorter than peripheral de-
velopers. Pinzger et al. [19] also found that the reviewing
time in Github is influenced by the developer’s contribution
history, the size of the project, and its test coverage, and the
project’s openness to external contriubtions. However, none
of these studies addresses the difficulty of finding appropriate
code-reviewers nor do they investigate the impact of reviews
with code-reviewer assignment problem on reviewing time. In
this paper, we conduct an exploratory study to investigate the
impact of reviews with code-reviewer assignment problem on
reviewing time.
Expert Recommendation. Finding relevant expertise is a
critical need in collaborative software engineering, particularly
in geographically distributed developments [20]. We briefly
address two closely related research areas i.e., expert rec-
ommendation for bug fixing process and MCR process. To
recommend experts for bug fixing process, Anvik et al. [21]
propose an approach based on machine learning techniques to
recommend developers to fix a new bug report. Shokripour et
al. [22] propose an approach to recommend developers based
on information in bug report and history of fixed files. Xia et
al. [23] propose a developer recommendation using bug report
and developer information. Surian et al. [24] propose a devel-
oper recommendation using developers’ collaboration network.
In a recent study, Tian et al. [25] propose an expert recom-
mendation system for Question & Answer community using
topic modeling and collaborative voting scores. In contrast, we
focus on the code review process which has limited textual
information in the code review systems.
To recommend experts for code review process, Jeong et
al. [26] extract features from patch information and build
prediction model using Baysian network. Yu et al. [27] use
social relationship among developers to recommend code-
reviewers for Github systems. Balachandran et al. [28] use
a modification history in line-by-line of source code to recom-
mend code-reviewers for industrial environment of VMware,
called RE VI EW BOT. Our prior study has shown that the
performance of the RE VI EWBOT is limited in other software
systems with no or little modification history in line-by-line of
source code [29]. Therefore, code-reviewer recommendation
approaches can be further improved. In this paper, we use a
similarity of previously reviewed file pathto recommend code-
reviewers.
III. ANEXP LO RATO RY STU DY OF TH E IMPAC T OF
COD E-RE VI EW ER AS SI GN ME NT O N REV IE WI NG TI ME
In this section, we report the results of our exploratory
study on the difficulty of finding code-reviewers in a dis-
tributed software development and its impact on reviewing
time.
(RQ1) How do reviews with code-reviewer assignment prob-
lem impact reviewing time?
Motivation. Little is known about the difficulty of finding
code-reviewers in a distributed software development and
its impact on reviewing time. We suspect that reviews with
code-reviewer assignment problem are likely to require more
time and discussion in order to identify an appropriate code-
reviewer. Hence, we set out to empirically investigate the
impact of reviews with code-reviewer assignment problem has
on reviewing time as compared to the reviews without code-
reviewer assignment problem. In particular, we investigate:
(1) what is the proportion of reviews with code-reviewer
assignment problem; (2) how do reviews with code-reviewer
assignment problem impact reviewing time; and (3) does a
code-reviewer recommendation tool is necessary in distributed
software development.
Approach. To address RQ1, we first select a representative
sample of reviews. We then manually examine the discus-
sion to identify the reviews with code-reviewer assignment
problem. We then calculate the reviewing time of the review
samples. The results are then analyzed and discussed. We
describe how we perform each step in particular below.
(Step 1) Data Collection: We use the review data of
Android Open Source Project (AOSP), OpenStack and Qt
projects provided by Hamasaki et al. [30]. We also expand
the dataset to include the review data of LibreOffice project
using the same collection technique [30]. After collecting
reviews, we select them in the following manner; (1) Reviews
are marked as “Merged” or “Abandoned”; and (2) Reviews
contain at least one code change to conform the purpose
of code review practice [13]. Table I shows the statistical
summary for each studied systems.
(Step 2) Representative Sample Selection: To identify the
proportion of reviews with code-reviewer assignment problem,
we select a representative sample of reviews for manual
analysis, since the full set of reviews is too large to study
entirely. To obtain proportion estimates that are within 5%
bounds of the actual proportion with a 95% confidence level,
we use a sample size s=z2p(1p)
0.052, where pis the proportion
that we want to estimate and z= 1.96. Since we did not know
the proportion in advance, we use p= 0.5. We further correct
Reviews with code−reviewer assignment problem Reviews without code−reviewer assignment problem
Studied systems
Reviewing time (Days)
Android OpenStack Qt LibreOffice
0 10 20 30
Fig. 2: A comparison of the code reviewing time (days) of reviews with and without code-reviewer assignment problem. The horizontal lines
indicate the average (median) review time (days).
for the finite population of reviews Pusing ss =s
1+ s1
P
to obtain our sample for manual analysis. Table II shows
the numbers of samples for manual analysis of 357 Android
reviews, 363 OpenStack reviews, 378 Qt reviews, and 363
LibreOffice reviews. In total, we manually examine 7,597
comments of 1,461 review samples.
(Step 3) Review Classification: To determine whether a
review has a code-reviewer assignment problem, we manually
classify the reviews by examining through each of their com-
ments. Since the comments are written in natural language,
it is very difficult to automatically analyze them. We define
that reviews with code-reviewer assignment problem are the
reviews that their discussions address an issue about who
should review these code changes.
(Step 4) Data Analysis: Once we manually examine the
review samples, we calculate the reviewing time of the review
samples. Reviewing time is the time difference from the time
that a code change has been submitted to the time that the
code changes are approved or abandoned. We then compare the
distributions of the reviewing time between reviews with and
without code-reviewer assignment problem using beanplots
[31]. Beanplots are boxplots in which the vertical curves sum-
marize the distributions of different data set. The horizontal
lines indicate the average (median) reviewing time (days).
We use a Mann-Whitney’s U test (α= 0.05), which is a
non-parametric test, to statistically determine the difference
of the reviewing time distributions. Since reviewing time can
be influenced by patch size [15, 19], we divide the reviewing
time by the patch size before performing a statistical test.
Results. 4%-30% of reviews have code-reviewer assign-
ment problem. The percentage of reviews with code-reviewer
assignment problem is shown in Table II. From our empirical
investigation, we found that, for Android, OpenStack, Qt, and
LibreOffice, 10%, 5%, 30%, and 4% of reviews have code-
reviewer assignment problem, respectively. We observe that
Qt has the highest proportion of reviews with code-reviewer
assignment problem. It may in part be due to the size of the
community and software system (i.e., the amount of reviews,
code-reviewers, and files). This indicates that the larger the
system is, the more difficult of finding appropriate code-
reviewers it is.
On average, reviews with code-reviewer assignment
problem require 12 days longer to approve code changes.
Figure 2 shows that reviews with code-reviewer assignment
problem require 18, 9, 13, and 6 days to make an integration
decision of code changes for Android, OpenStack, Qt, and
TABLE II: The numbers of statistical representative samples for each
studied projects and the percentage of reviews with code-reviewer
assignment problem with a 95% confidence level and a confidence
interval of ±5%.
Android OpenStack Qt LibreOffice
# Review 357 363 378 363
Percentage 10% 5% 30% 4%
LibreOffice, respectively. In contrast, reviews without code-
reviewer assignment problem can be integrated within one
day. Mann-Whitney U tests confirm that the differences are
statistically significant (p-value <0.001 for Android, Open-
Stack, and Qt, and p-value <0.01 for LibreOffice). This
finding indicates that most of reviews with code-reviewer
assignment problem could slow down the code review process
of distributed software development.
A code-reviewer recommendation tool is necessary in
distributed software development to speed up the code
review process. During our manual examination, we find that
finding a code-reviewer is truly a necessary step of MCR
process. For example, a Qt developer said: “You might want
to add some approvers to the reviewers list if you want
it reviewed/approved.3Additionally, developers often ask a
question of finding appropriate code-reviewers. For example,
a Qt developer said that; “Feel free to add reviewers, I am not
sure who needs to review this...4One of the Android develop-
ers also said; “Can you please add appropriate reviewers for
this change?5Therefore, finding an appropriate code-reviewer
to review a code change is a tedious task for developers and
posses a problem in distributed software development in recent
years. Moreover, a Qt developer suggested an author to add
code-reviewers to speed up the code review process: “for the
future, it speeds things up often if you add reviewers for your
changes :)6
4%-30% of reviews have code-reviewer assignment
problem. These reviews significantly take 12 days longer
to approve a code change. A code-reviewer
recommendation tool is necessary in distributed software
development to speed up a code review process.
3Qt-16803 https://codereview.qt-project.org/#/c/16803
4Qt-40477 https://codereview.qt-project.org/#/c/40477
5AOSP-18767 https://android-review.googlesource.com/#/c/18767/
6Qt-14251 https://codereview.qt-project.org/#/c/14251
IV. REV FIN DE R: A F IL E LOC ATION-BA SE D
COD E-RE VI EW ER RE CO MM EN DATION APPROACH
A. An Overview of REVFI ND ER
REV FIN DE R is a combination of recommended code-
reviewers from the Code-Reviewers Ranking Algorithm.
REV FIN DE R aims to recommend code-reviewers who have
previously reviewed similar functionality. Therefore, we lever-
age a similarity of previously reviewed file path to recommend
code-reviewers. The intuition is that files that are located in
similar file paths would be managed and reviewed by similar
experienced code-reviewers. RE VFI ND ER has two main parts
i.e., the Code-Reviewers Ranking Algorithm and the Combi-
nation Technique.
The Code-Reviewers Ranking Algorithm computes code-
reviewer scores using a similarity of previously reviewed file
path. To illustrate, we use Figure 3 as a calculation example
of the algorithm. Given a new review R3; and two previously
closed reviews R1and R2, the algorithm first calculates a
review similarity score for each of previous reviews (R1,R2)
by comparing file paths with the new review R3. Therefore,
we will have two review similarity scores of (R3, R1)and
(R3, R2). To compute a review similarity score, we use a
state-of-the-art string comparison technique [32], which is
successfully used in computational biology. In this example, it
is obvious that the file path of R3and R2share some common
keywords (video,src) more than a pair of R3and R1.
We presume that the review similarity score of (R3, R1)is
0.1 and that of (R3, R2)is 0.5. Then, these review similarity
scores are propagated to each code-reviewer who has involved
in i.e., Code-Reviewer A earns review similarity scores of
0.5 + 0.1and Code-Reviewer B earns a review similarity
score of 0.1. Finally, the algorithm will produce a list of
code-reviewers along with their scores. Since there are many
well-known variants of string comparison techniques [32],
REV FIN DE R combines the different lists of code-reviewers
into a unified list of code-reviewers. By combining, the truly-
relevant code-reviewers are likely to “bubble up” to the top of
the combined list, providing code-reviewers with fewer false
positive matches to recommend.
Below, we explain the calculation of Code-Reviewers
Ranking Algorithm, String Comparison Techniques and the
Combination Technique.
B. The Code-Reviewers Ranking Algorithm
The pseudo-code of the Code-Reviewers Ranking Algo-
rithm is shown in Algorithm 1. It takes as input a new review
(Rn) and produces a list of code-reviewer candidates (C) with
code-reviewer scores. The algorithm begins with retrieving the
reviews before Rnas pastReviews and sorts them by their cre-
ation date in reverse chronological order (Lines 7 and 8). We
note that the pastReviews are previously closed reviews that
are marked as “Merged” or “Abandoned” and must be created
before Rn. The algorithm calculates a review similarity score
between each of pastReviews and the new review (Rn). Then,
the review similarity scores are propagated to code-reviewers
who involved in (Lines 9 to 24). For each review (Rp) of
the pastReviews, the review similarity score (ScoreRp) is an
average of file path similarity value of every file path in Rn
Review&R3&
Files&
‐"video/src/a.java"
‐"video/src/b.java"
Code‐Reviewers&
Previous"Reviews"
Review&R2&
Files&&
‐"video/src/x.java"
‐"video/src/y.java&
Code‐Reviewers&
A&
Review&R1&
Files&&
‐"video/resource/
a.xml&
Code‐Reviewers&
A& B&
Review&
Similarity&
?& ?& ?&
1&
2&
Code‐Reviewers"Scores"
A&
B&
="ReviewSimilarity(R3,R1)"+"ReviewSimilarity"(R3,R2)"="0.1"+"0.5"="0.6""
="ReviewSimilarity"(R3,R1)"="0.1"
Review"History"
Review&
Similarity&
Fig. 3: A calculation example of the Code-Reviewers Ranking
Algorithm.
Algorithm 1 The Code-Reviewers Ranking Algorithm
1: Code-ReviewersRankingAlgorithm
2: Input:
3: Rn: A new review
4: Output:
5: C: A list of code-reviewer candidates
6: Method:
7: pastReviews A list of previously closed reviews
8: pastReviews order(pastReviews).by(createdDate)
9: for Review RppastReviews do
10: F ilesngetFiles(Rn)
11: F ilespgetFiles(Rp)
12: # Compute review similarity score between Rnand Rp
13: ScoreRp0
14: for fnF ilesndo
15: for fpF ilespdo
16: ScoreRpScoreRp+ filePathSimilarity(fn, fp)
17: end for
18: end for
19: ScoreRpScoreRp/(length(F ilesn)×length(F ilesp))
20: # Propagate review similarity scores to code-reviewers who
involved in a previous review Rp
21: for Code-Reviewer r: getCodeReviewers(Rp)do
22: C[r].score C[r].score +S coreRp
23: end for
24: end for
25: return C
and Rpusing filePathSimilarity(fn, fp)function (Lines 13 to
19). After calculating the review similarity score, every code-
reviewer in Rpis added to the list of code-reviewer candidates
(C) with their review similarity score (ScoreRp) (Lines 21 to
23). If a code-reviewer is already in C, the code-reviewer score
will be cumulated with the previous code-reviewer score.
To compute file path similarity value between file fnand
file fp, the filePathSimilarity(fn, fp)function is calculated
as follows:
filePathSimilarity(fn, fp) = StringComparison(fn, fp)
max(Length(fn),Length(fp))
(1)
We split file path into components using a slash character as
a delimiter. The StringComparison(fn, fp)function compares
TABLE III: A description of file path comparison techniques and examples of calculation. The examples are obtained from the review history
of Android for LCP, LCSubstr, and LCSubseq techniques; and Qt for LCS techniques. For each technique, the example files were reviewed
by the same code-reviewer.
Functions Description Example
Longest Common Prefix
(LCP)
Longest consecutive path components
that appears in the beginning of both
file paths.
f1=src/com/android/settings/LocationSettings.java
f2= “src/com/android/settings/Utils.java
LCP(f1, f2)= length([src, com, android, settings]) = 4
Longest Common Suffix
(LCS)
Longest consecutive path components
that appears in the end of both file
paths
f1= “src/imports/undo/undo.pro
f2= “tests/auto/undo/undo.pro
LCS(f1, f2)= length([undo, undo.pro]) = 2
Longest Common
Substring (LCSubstr)
Longest consecutive path components
that appears in both file paths
f1= “res/layout/bluetooth_pin_entry.xml
f2= “tests/res/layout/operator_main.xml
LCSubstr(f1, f2)= length([res, layout]) = 2
Longest Common
Subsequence
(LCSubseq)
Longest path components that appear
in both file paths in relative order but
not necessarily contiguous
f1=apps/CtsVerifier/src/com/android/cts/verifier/
sensors/MagnetometerTestActivity.java
f2=tests/tests/hardware/src/android/hardware/cts/
SensorTest.java
LCSubstr(f1, f2)= length([src, android, cts]) = 3
file path components of fnand fpand returns a number of the
common components that appear in both files. Then, the value
of filePathSimilarity(fn, fp)is normalized by the maximum
length of fnand fpi.e., the number of file path components.
The details of string comparison techniques will be presented
in the next subsection.
C. String Comparison Techniques
To compute file path similarity score (filePathSimilarity),
we use four state-of-the-art string comparison techniques [32]
i.e., Longest Common Prefix (LCP), Longest Common Suffix
(LCS), Longest Common Substring (LCSubstr), and Longest
Common Subsequence (LCSubseq). Table III presents the
definitions and a calculation example for these techniques.
We briefly explain the rationale of these techniques below.
Longest Common Prefix. Files under the same directory
would have similar or related functionality [33]. LCP calcu-
lates the number of common path components that appears in
both file paths from the beginning to the last. This is the most
simple and efficient way to compare two strings.
Longest Common Suffix. Files having the same name would
have the same functionality [32]. LCS calculates the number
of common path components that appears from the end of both
file paths. This is a simply reverse calculation of LCP.
Longest Common Substring. Since the file path can repre-
sent their functionality [34], the related functionality should
be under the same directory structure. However, their root
directories or filename might not be the same. LCSubstr
calculates the number of path components that appears in both
file path consecutively. The advantage of this technique is that
the common paths can be appeared at any position of file path.
Longest Common Subsequence. Files under similar direc-
tory structure would have similar or related functionality [32].
LCSubseq calculates the number of path components that
appears in both file paths which is in the same relative order.
The advantage of this technique is that the common paths of
this technique are not necessary to be contiguous.
D. Combination Technique
A combination of the results of individual techniques
has been successfully shown to improve the performance in
the data mining and software engineering domains [35, 36].
Since we used variants of string comparison techniques [32],
REV FIN DE R combines the different lists of code-reviewers
into a unified list of code-reviewers. Therefore, the truly-
relevant code-reviewers are likely to “bubble up” to the top of
the combined list, providing code-reviewers with fewer false
positive matches to recommend. We use the Borda count [37]
as a combination technique. It is a voting technique that simply
combine the recommendation lists based on the rank. For each
code-reviewer candidate ck, the Borda count method assigns
points based on the rank of ckin each recommendation list.
The candidate with the highest rank will get the highest score.
For example, if a recommendation list of RLCP votes candi-
date c1as the first rank and the number of total candidates are
M, then ckwould get a score of M. The candidate c10 (ranked
10th) would get a score of M10. Given a set of recom-
mendation lists R∈ {RLCP , RLC S , RLCS ubstr, RLC Subseq },
a score for a code-reviewer candidate ckis defined as follow:
Combination(ck) = X
RiR
Mirank(ck|Ri)(2)
, where Miis the total number of code-reviewer candidates
that received a non-zero score in Ri, and rank(ck|Ri)is the
rank of code-reviewer candidate ckin Ri. Finally, the code-
reviewer recommendations of REVFINDER is a list of code-
reviewer candidates that are ranked according to their Borda
score.
V. EMP IR IC AL EVALUATIO N
We perform an empirical study to evaluate the effectiveness
of RE VFI ND ER. First, we describe the goal and the research
questions we addressed. Second, we describe the studied
systems that we used to evaluate. Third, we present evalua-
tion metrics. Last, we briefly describe a baseline comparison
approach.
A. Goal and Research Questions
The goal of our empirical study is to evaluate the effec-
tiveness of REVFINDER in terms of accuracy and ranking of
the correct code-reviewers. The results of REVFINDER are then
compared with RE VI EW BOT [28] as a baseline approach since
it is the only existing code-reviewer recommendation approach
for MCR.
To achieve our goal, we address the following two research
questions:
(RQ2) Does RE VFI ND ER accurately recommend code-
reviewers?
Motivation: We propose REVFINDER to find appropri-
ate code-reviewers based on a similarity of previously
reviewed file path. We aim to evaluate the performance
of our approach in terms of accuracy. Better perform-
ing approaches are of great value to practitioners since
they allow them to take better informed decisions.
(RQ3) Does RE VFI ND ER provide better ranking of recom-
mended code-reviewers?
Motivation: Recommending correct code-reviewers in
the top ranks could ease developer as well as avoid
interfering unrelated code-reviewers. The higher ranks
of the correct code-reviewers that the approach can
recommend, the more effective it is [38]. We set out
this research question to evaluate the overall perfor-
mance of our approach in the view of ranking for a
recommendation.
B. Studied Systems
To evaluate RE VFI ND ER , we use four open-source software
systems, i.e. Android, OpenStack, Qt, and LibreOffice. We
choose these systems mainly for three reasons. First, these
systems use Gerrit system as a tool-based for their code review
process. Second, these systems are large, active, real-world
software, which allow us to perform a realistic evaluation of
REV FIN DE R. Third, each systems carefully maintains code
review system, which allows us to build our oracle datasets
to evaluate REV FINDER. Table I shows a statistical summary
of the studied systems. Note that we use the same datasets as
in RQ1.
Android Open Source Project (AOSP)7is a mobile oper-
ating system developed by Google. Qt8is a cross-platform
application and UI framework developed by Digia Plc. Open-
Stack9is a free and open-source software cloud computing
software platform supported by many well-known companies
e.g., IBM, VMware, and NEC. LibreOffice10 is a free and open
source office suite.
C. Evaluation Metrics
To evaluate our approach, we use the top-kaccuracy
and the Mean Reciprocal Rank (MRR). These metrics are
commonly used in recommendation systems for software en-
gineering [28, 39, 40]. Since most of reviews have only one
code-reviewer (cf. Table I), other evaluation metrics (e.g. Mean
7https://source.android.com/
8http://qt-project.org/
9http://www.OpenStack.org/
10http://www.libreoffice.org/
Average Precision) that consider all of the correct answer
might not be appropriate for this evaluation.
Top-kaccuracy calculates the percentage of reviews that
an approach can correctly recommend code-reviewers and
the total number of reviews. Given a set of reviews R,
the top-kaccuracy can be calculated using Equation 3. The
isCorrect(r, Top-k)function returns value of 1 if at least
one of top-kcode-reviewers actually approve the review r;
and returns value of 0 for otherwise. For example, a top-10
accuracy value of 75% indicates that for 75% of the reviews,
at least one correct code-reviewer was returned in the top 10
results. Inspired by the previous studies [28, 39, 40], we choose
the kvalue to be 1, 3, 5, and 10.
Top-kaccuracy(R) =
P
rR
isCorrect(r, Top-k)
|R|×100% (3)
Mean Reciprocal Rank (MRR) calculates an average of
reciprocal ranks of correct code-reviewers in a recommenda-
tion list. Given a set of reviews R, MRR can be calculated
using Equation 3. The rank(candidates(r)) returns value of
the first rank of actual code-reviewers in the recommendation
list candidates(r). If there is no actual code-reviewers in the
recommendation list, the value of 1
rank(candidates(r)) will be 0.
Ideally, an approach that can provide a perfect ranking should
achieve a MRR value of 1.
MRR =1
|R|X
rR
1
rank(candidates(r)) (4)
D. REV IE WBOT : A Baseline Approach
We re-implement R EV IE WBOT [28] as our baseline. RE-
VI EW BOT is a code-reviewer recommendation approach based
on the assumption that “the most appropriate reviewers for a
code review are those who previously modified or previously
reviewed the sections of code which are included in the current
review” [28, p.932]. Thus, REV IE WBOT finds code-reviewers
using a modification history in line-by-line of source code.
The calculation of REVIEWBOT can summarized as follows:
Given a new review, 1) it computes line change history, a list
of past reviews that relate to the same changed lines in the new
review. 2) The code-reviewers in line change history will be
code-reviewer candidates for the new review. Each candidate
receives a point based on her frequency of reviews in line
change history. 3) The candidates who recent reviewed and
have the highest scores will be recommended as appropriate
code-reviewers. To conserve space, a full description of R E-
VI EW BOT is provided in [28].
VI. RE SU LTS
In this section, we present the results of our empirical
evaluation with respect to our two research questions. For each
research question, we present its approach, and results.
TABLE IV: The results of top-kaccuracy of our approach RevFinder and a baseline ReviewBot for each studied system. The results show
that RevFinder outperforms ReviewBot.
System REV FINDER RE VI EW BOT
Top-1 Top-3 Top-5 Top-10 Top-1 Top-3 Top-5 Top-10
Android 46 % 71 % 79 % 86 % 21 % 29 % 29 % 29 %
OpenStack 38 % 66 % 77 % 87 % 23 % 35 % 39 % 41 %
Qt 20 % 34 % 41 % 69 % 19 % 26 % 27 % 28 %
LibreOfiice 24 % 47 % 59 % 74 % 6 % 9 % 9% 10 %
OpenStack
Qt
1
5
15
50
100
200
RevFinder ReviewBot RevFinder ReviewBot RevFinder ReviewBot RevFinder ReviewBot
Approach
Rank
Fig. 4: A rank distribution of the first correct code-reviewers recommended by RevFinder and ReviewBot. The results show that RevFinder
provide a better ranking of recommended code-reviewers.
(RQ2) Does REV FINDER accurately recommend code-
reviewers?
Approach. Since REV FIN DE R leverages a review history to
recommend code-reviewers, we perform an experiment based
on realistic scenario. To address RQ2, for each studied system,
we execute RE VFI ND ER for every reviews in chronological
order to obtain the lists for recommended code-reviewers.
To evaluate how accurately the REVFINDER can correctly
recommended code-reviewers, we compute the top-kaccuracy
for each studied system. We also compare the results of our
approach with RE VI EWBOT.
Result. On average, for 79% of reviews, REV FIN DE R
correctly recommended code-reviewers with a top-10 rec-
ommendation. Table IV presents the results of top-1, top-3,
top-5 and top-10 accuracy of RE VFI ND ER and R EV IE WBOT
for each studied system. The REVFINDER achieves the top-
10 accuracy of 86%, 87%, 69%, and 74% for Android,
OpenStack, Qt, and LibreOffice, respectively. This indicates
that leveraging a similarity of previously reviewed file path can
accurately recommend code-reviewers.
On average, REVFIND ER is 4 times more accurate than
REV IE WBOT .Table IV shows that, for every studied system,
REV FIN DE R achieves higher top-kaccuracy than REV IE W-
BOT. The top-10 accuracy values of REVFINDER are 2.9, 2.1,
2.5, and 7.4 times higher than that of REVIEWBOT for An-
droid, OpenStack, Qt, and LibreOffice, respectively. We also
find similar results for other top-kaccuracy metrics, indicating
that RE VFI ND ER considerably outperforms RE VI EW BOT.
REV FIN DE R correctly recommended 79% of reviews with
a top-10 recommendation. REVFINDER is 4 times more
accurate than REV IE WBOT. This indicates that
leveraging a similarity of previously reviewed file
path can accurately recommend code-reviewers.
TABLE V: The results of Mean Reciprocal Rank (MRR) of our
approach RevFinder and a baseline ReviewBot. Ideally, a MRR value
of 1 indicates a perfect recommendation approach.
Approach Android OpenStack Qt LibreOffice
REV FINDER 0.60 0.55 0.31 0.40
REV IEWBOT 0.25 0.30 0.22 0.07
(RQ3) Does REV FIN DE R provide better ranking of recom-
mended code-reviewers?
Approach. To address RQ3, we present the distribution of
the ranking of correct code-reviewers. We also use the Mean
Reciprocal Rank (MRR) to represent the overall ranking per-
formance of RE VFI ND ER . The results are then compared with
REV IE WBOT .
Result. RE VFI ND ER recommended the correct code-
reviewers with a median rank of 4. Figure 4 shows that
the correct ranks of code-reviewers of REV FIN DE R is lower
than that of RE VI EWBOT for all studied systems. The median
correct ranks of REVFINDER are 2, 3, 8, and 4 for Android,
OpenStack, Qt, and LibreOffice, respectively. In contrast, the
median correct ranks of RE VI EW BOT are 94, 82, 202, 63
for Android, OpenStack, Qt, and LibreOffice, respectively.
This indicates that RE VFI NDER provides a higher chance of
inviting a correct code-reviewer and a less chance of interfering
unrelated code-reviewers.
The overall ranking of RE VFI ND ER is 3 times better
than that of RE VIEWBOT.Table V shows the MRR values
of RE VFI ND ER and REV IE WBOT for each studied system. For
Android, OpenStack, Qt, and LibreOffice, the MRR values of
REV FIN DE R are 2.4, 1.8, 1.4, and 5.7 times better than that
of RE VI EW BOT, respectively. This indicates that REV FIN DE R
can correctly recommend the first correct code-reviewers at
lower rank than RE VI EW BOT does.
REV FIN DE R recommended the correct code-reviewers
with a median rank of 4. The code-reviewers ranking of
REV FIN DE R is 3 times better than that of REVIEWBOT,
indicating that RE VFI ND ER provides a better ranking of
correct code-reviewers.
VII. DIS CU SS IO N
We discuss the performance and applicability of
REV FIN DE R. We also discuss the threats to validity of
our study.
Performance: Why does REVFI ND ER outperform RE VI EW-
BOT?
The results of our empirical evaluation show that the
proposed approach, RE VFI ND ER outperforms the baseline
approach, RE VI EW BOT. The difference between REV FIN DE R
and RE VI EW BOT is the difference in the granularity of code
review history. REVFINDER uses the code review history at file
path-level, while REVIEWBOT uses the code review history
at the line-level of source code. Intuitively, finding code-
reviewers who have examined the exact same lines seems to be
the best choice for those projects with high frequent changes
of source code. However, it is not often that files are frequently
change at the same lines [41]. Besides, MCR is relatively
new, the performance of REV IE WBOT would be limited due
to a small amount of review history. To better understand why
does RE VFI ND ER outperform REV IE WBOT , we investigate the
frequency of review history at the line level and file level of
granularity. We observed that 70% - 90% of lines of code are
changed only once, indicating that in a code review system
has a lack of the line-level history. Therefore, the performance
of RE VI EW BOT is limited.
Applicability: Can REV FIN DE R effectively help developers
find code-reviewers?
In RQ1, the results of our exploratory study show that
reviews with code-reviewer assignment problem required more
times to integrate a code change. To confirm how effectively
REV FIN DE R help developers, we execute REV FIN DE R for the
reviews with code-reviewer assignment problem of the repre-
sentative samples. We found that, on average, RE VFI ND ER
can correctly recommend code-reviewers for 80% of these
reviews with a top 10 recommendation. This result indicates
that if a developer cannot find an appropriate code-reviewer
for a new change, RE VFI ND ER could accurately recommend
appropriate code-reviewers at hand. Therefore, we believe
that RE VFI ND ER can help developers find appropriate code-
reviewers and speed up the overall code review process.
Threats to Validity: We discuss potential threats to validity of
our work as follows:
Internal Validity: The reviews classification process in
RQ1 involves manual examination. The classification process
was conducted by the authors who are not involved in the code
review process of the studied systems. The results of manual
classification by a domain expert might be different.
External Validity: Our empirical results are limited to
four datasets i.e., Android, OpenStack, Qt, and LibreOffice.
However, we cannot claim that the same results would be
achieved with other systems. Our future work will focus on
an evaluation in other studied systems with larger number of
code-reviewers to better generalize the results of our approach.
Construct Validity: The first threat involves a lack of
code-reviewer retirement information. It is possible that code-
reviewers are retired or no longer involve the code review
system. Therefore, the performance of our approach might
be affected by retired code-reviewers activities. Another threat
involves the workload of code-reviewers. It is possible that
code-reviewers would be burdened with a huge number of
assigned reviews. Therefore, considering workload balancing
would reduce tasks of these potential code-reviewers and the
number of awaiting reviews.
VIII. CO NC LU SI ON A ND FU TU RE WO RK
In this paper, we empirically investigate the impact of
reviews with code-reviewer assignment problem has on review-
ing time as compared to the reviews without code-reviewer as-
signment problem. From our manual examination, we find that
4%-30% of reviews have code-reviewer assignment problem.
These reviews significantly take 12 days longer to approve a
code change. A code-reviewer recommendation tool is neces-
sary in distributed software development to speed up a code
review process.
To help developers find appropriate code-reviewers, we
propose RE VFI ND ER, a file location-based code-reviewer rec-
ommendation approach. In order to evaluate REVFINDER,
we perform a case study on 42,045 reviews of four open-
source software systems i.e., Android Open Source Project
(AOSP), OpenStack, Qt and LibreOffice. The results show that
REV FIN DE R correctly recommended 79% of reviews with a
top 10 recommendation. REVFINDER is 4 times more accurate
than RE VI EW BOT. This indicates that leveraging a similarity
of previously reviewed file path can accurately recommend
code-reviewers. REV FIN DE R recommended the correct code-
reviewers with a median rank of 4. The overall ranking of
REV FIN DE R is 3 times better than that of a baseline approach,
indicating that RE VFI NDER provides a better ranking of cor-
rectly recommended code-reviewers. Therefore, we believe
that RE VFI ND ER can help developers find appropriate code-
reviewers and speed up the overall code review process.
In our future work, we will deploy RE VFI ND ER in a devel-
opment environment and perform experiments with developers
to analyze how effectively and practically REV FIN DE R can
help developers in recommending code-reviewers.
REF ER EN CE S
[1] M. Fagan, “Design and code inspections to reduce errors
in program development,IBM Systems Journal, vol. 15,
no. 3, pp. 182–211, 1976.
[2] A. F. Ackerman, P. J. Fowler, and R. G. Ebenau, “Soft-
ware Inspections and the Industrial Production of Soft-
ware,” in Proceedings of a Symposium on Software Vali-
dation: Inspection-testing-verification-alternatives, 1984,
pp. 13–40.
[3] A. F. Ackerman, L. S. Buchwald, and F. H. Lewski,
“Software inspections: an effective verification process,”
Software, IEEE, vol. 6, no. 3, pp. 31–36, 1989.
[4] A. Aurum, H. Petersson, and C. Wohlin, “State-of-the-
art: software inspections after 25 years,Software Testing,
Verification and Reliability, vol. 12, no. 3, pp. 133–154,
Sep. 2002.
[5] L. G. Votta, “Does Every Inspection Need a Meeting?”
in SIGSOFT’93, 1993, pp. 107–114.
[6] A. Bacchelli and C. Bird, “Expectations, Outcomes, and
Challenges of Modern Code Review,” in ICSE ’13, 2013,
pp. 712–721.
[7] P. C. Rigby and M.-A. Storey, “Understanding broadcast
based peer review on open source software projects,” in
ICSE’11, 2011, pp. 541–550.
[8] V. Mashayekhi, J. Drake, W.-T. Tsai, and J. Riedl,
“Distributed, collaborative software inspection,IEEE
Software, vol. 10, no. 5, pp. 66–75, 1993.
[9] M. Beller, A. Bacchelli, and A. Zaidman, “Modern Code
Reviews in Open-Source Projects: Which Problems Do
They Fix?” in MSR’14, 2014, pp. 202–211.
[10] R. G. Kula, C. C. A. Erika, N. Yoshida, K. Hamasaki,
K. Fujiwara, X. Yang, and H. Iida, “Using Profiling
Metrics to Categorise Peer Review Types in the Android
Project,” in ISSRE’12, 2012, pp. 146–151.
[11] P. C. Rigby and C. Bird, “Convergent Contemporary
Software Peer Review Practices,” in ESEC/FSE 2013,
2013, pp. 202–212.
[12] S. Mcintosh, Y. Kamei, B. Adams, and A. E. Hassan,
“The Impact of Code Review Coverage and Code Review
Participation on Software Quality,” in MSR’14, 2014, pp.
192–201.
[13] P. Thongtanunam, X. Yang, N. Yoshida, R. G. Kula, C. C.
Ana Erika, K. Fujiwara, and H. Iida, “ReDA : A Web-
based Visualization Tool for Analyzing Modern Code
Review Dataset,” in ICSME’14, 2014, pp. 606–609.
[14] P. C. Rigby, D. M. German, and M.-A. Storey, “Open
Source Software Peer Review Practices : A Case Study
of the Apache Server,” in ICSE’08, 2008, pp. 541–550.
[15] P. Weiß gerber, D. Neu, and S. Diehl, “Small Patches Get
In !” in MSR’08, 2008, pp. 67–75.
[16] J. Tsay, L. Dabbish, and J. Herbsleb, “Let’s Talk About It:
Evaluating Contributions through Discussion in GitHub,
in FSE’14, 2014, pp. 144–154.
[17] Y. Jiang, B. Adams, and D. M. German, “Will My Patch
Make It ? And How Fast ? Case Study on the Linux
Kernel,” in MSR’13, 2013, pp. 101–110.
[18] A. Bosu and J. C. Carver, “Impact of Developer Repu-
tation on Code Review Outcomes in OSS Projects : An
Empirical Investigation,” in ESEM’14, 2014, pp. 33–42.
[19] G. Gousios, M. Pinzger, and A. van Deursen, “An Ex-
ploratory Study of the Pull-based Software Development
Model,” in ICSE’14, 2014, pp. 345–355.
[20] A. Mockus and J. D. Herbsleb, “Expertise Browser:
a quantitative approach to identifying expertise,” in
ICSE’02, 2002, pp. 503–512.
[21] J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix
this bug?” in ICSE’06, 2006, pp. 361–370.
[22] R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani,
“Why So Complicated ? Simple Term Filtering and
Weighting for Location-Based Bug Report Assignment
Recommendation,” in MSR’13, 2013, pp. 2–11.
[23] X. Xia, D. Lo, X. Wang, B. Zhou, M. Kersten, F. Hei-
drich, O. Thomann, and G. Gayed, “Accurate Developer
Recommendation for Bug Resolution,” in WCRE’13,
2013, pp. 72–81.
[24] D. Surian, N. Liu, D. Lo, H. Tong, E.-P. Lim, and
C. Faloutsos, “Recommending People in Developers’
Collaboration Network,” in WCRE’11, 2011, pp. 379–
388.
[25] Y. Tian, P. S. Kochhar, E.-p. Lim, F. Zhu, and D. Lo,
“Predicting Best Answerers for New Questions : An
Approach Leveraging Topic Modeling and Collaborative
Voting,” in SocInfo’14, 2014, pp. 55–68.
[26] G. Jeong, S. Kim, T. Zimmermann, and K. Yi, “Improv-
ing code review by predicting reviewers and acceptance
of patches,” in ROSAEC-2009-006, 2009.
[27] Y. Yu, H. Wang, G. Yin, and C. X. Ling, “Reviewer Rec-
ommender of Pull-Requests in GitHub,” in ICSME’14,
2014, pp. 610–613.
[28] V. Balachandran, “Reducing Human Effort and Improv-
ing Quality in Peer Code Reviews using Automatic Static
Analysis and Reviewer Recommendation,” in ICSE ’13,
2013, pp. 931–940.
[29] P. Thongtanunam, R. G. Kula, A. E. C. Cruz, N. Yoshida,
and H. Iida, “Improving Code Review Effectiveness
through Reviewer Recommendations,” in CHASE’14,
2014, pp. 119–122.
[30] K. Hamasaki, R. G. Kula, N. Yoshida, C. C. A. Erika,
K. Fujiwara, and H. Iida, “Who does what during a
Code Review ? An extraction of an OSS Peer Review
Repository,” in MSR’13, 2013, pp. 49–52.
[31] P. Kampstra, “Beanplot: A boxplot alternative for visual
comparison of distributions,Journal of Statistical Soft-
ware, vol. 28, no. 1, pp. 1–9, 2008.
[32] D. Gusfield, Algorithms on Strings, Trees and Sequences:
Computer Science and Computational Biology, 1997.
[33] E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German,
and P. Devanbu, “Cohesive and Isolated Development
with Branches,” in FASE ’12, 2012, pp. 316–331.
[34] I. T. Bowman, R. C. Holt, and N. V. Brewster, “Linux
as a case study: Its extracted software architecture,” in
ICSE ’99, 1999, pp. 555–563.
[35] J. Kittler, I. C. Society, M. Hatef, R. P. W. Duin, and
J. Matas, “On Combining Classifiers,” IEEE TPAMI,
vol. 20, no. 3, pp. 226–239, 1998.
[36] T. K. Ho, J. Hull, S. N. Srihari, and S. Member, “Deci-
sion Combination in Multiple Classifier Systems,” IEEE
TPAMI, vol. 16, no. 1, pp. 66–75, 1994.
[37] R. Ranawana and V. Palade, “Multi-Classifier Systems -
Review and a Roadmap for Developers,” IJHIS, vol. 3,
no. 1, pp. 1–41, 2006.
[38] Z. Guan and E. Cutrell, “An Eye Tracking Study of the
Effect of Target Rank on Web Search,” in CHI’07, 2007,
pp. 417–420.
[39] C. Tantithamthavorn, R. Teekavanich, A. Ihara, and K.-
i. Matsumoto, “Mining A Change History to Quickly
Identify Bug Locations : A Case Study of the Eclipse
Project,” in ISSREW’13, 2013, pp. 108–113.
[40] C. Tantithamthavorn, A. Ihara, and K.-I. Matsumoto,
“Using Co-change Histories to Improve Bug Localization
Performance,” in SNPD’13, Jul. 2013, pp. 543–548.
[41] D. Ma, D. Schuler, T. Zimmermann, and J. Sillito,
“Expert Recommendation with Usage Expertise,” in
ICSM’09, 2009, pp. 535–538.
... In this sense, it is also crucial to find suitable reviewers for a certain PR, especially in the context of the OSS development where the potentially massive participants are usually geographically distributed and not necessarily known to each other. In fact, as Thongtanunam et al. pointed out, inappropriate assignment of code reviewer may take 12 days longer to approve a code change in OSS development, thus a recommendation tool is necessary to speed up a code review process [50]. ...
... This type of recommenders suggests new reviewers with simple heuristic rules. For example, Thongtanunam et al. [49] proposed a recommender based on file path similarity, which subsequently evolved into RevFinder [50]. The RevFinder is based on the similarity between the file paths of a previous PR and a new PR. ...
... and take the same meaning as in Equation 1. PR-PR: The profiles (e.g., language, code lines) and content (e.g., the source code) included in PRs to a certain degree can reflect the expertise of contributors. Moreover, it is also common that closely located source files share similar functions, and hence can be used for reviewer recommendation [50]. Therefore, the weight of PR-PR relationship is achieved by considering the distances between PRs in the file path set (as shown in Equation 3). ...
Preprint
Full-text available
Modern code review is a critical and indispensable practice in a pull-request development paradigm that prevails in Open Source Software (OSS) development. Finding a suitable reviewer in projects with massive participants thus becomes an increasingly challenging task. Many reviewer recommendation approaches (recommenders) have been developed to support this task which apply a similar strategy, i.e. modeling the review history first then followed by predicting/recommending a reviewer based on the model. Apparently, the better the model reflects the reality in review history, the higher recommender's performance we may expect. However, one typical scenario in a pull-request development paradigm, i.e. one Pull-Request (PR) (such as a revision or addition submitted by a contributor) may have multiple reviewers and they may impact each other through publicly posted comments, has not been modeled well in existing recommenders. We adopted the hypergraph technique to model this high-order relationship (i.e. one PR with multiple reviewers herein) and developed a new recommender, namely HGRec, which is evaluated by 12 OSS projects with more than 87K PRs, 680K comments in terms of accuracy and recommendation distribution. The results indicate that HGRec outperforms the state-of-the-art recommenders on recommendation accuracy. Besides, among the top three accurate recommenders, HGRec is more likely to recommend a diversity of reviewers, which can help to relieve the core reviewers' workload congestion issue. Moreover, since HGRec is based on hypergraph, which is a natural and interpretable representation to model review history, it is easy to accommodate more types of entities and realistic relationships in modern code review scenarios. As the first attempt, this study reveals the potentials of hypergraph on advancing the pragmatic solutions for code reviewer recommendation.
... Tools and processes can be developed to recognize security-critical changes and automatically assign security experts as reviewers. Previous research [9,36,53,68] has investigated reviewer recommendation approaches, including recommendations based on cross-project work experience of potential reviewers and estimation of their expertise in specific technologies [53]. These approaches may be adapted to take security aspects into consideration. ...
Preprint
Full-text available
To avoid software vulnerabilities, organizations are shifting security to earlier stages of the software development, such as at code review time. In this paper, we aim to understand the developers' perspective on assessing software security during code review, the challenges they encounter, and the support that companies and projects provide. To this end, we conduct a two-step investigation: we interview 10 professional developers and survey 182 practitioners about software security assessment during code review. The outcome is an overview of how developers perceive software security during code review and a set of identified challenges. Our study revealed that most developers do not immediately report to focus on security issues during code review. Only after being asked about software security, developers state to always consider it during review and acknowledge its importance. Most companies do not provide security training, yet expect developers to still ensure security during reviews. Accordingly, developers report the lack of training and security knowledge as the main challenges they face when checking for security issues. In addition, they have challenges with third-party libraries and to identify interactions between parts of code that could have security implications. Moreover, security may be disregarded during reviews due to developers' assumptions about the security dynamic of the application they develop. Data and materials: https://doi.org/10.5281/zenodo.6875435
... Importantly, prior studies proposed various recommendation approaches for the code review process. For example, code review prioritization approaches based on the characteristics of code changes [11,28,47] and the defect-proneness [23,31], reviewer recommendation approaches [4,17,45,49,56,59], automated code transformation [48,50,52]. ...
Conference Paper
Full-text available
Code review is an effective quality assurance practice, but can be labor-intensive since developers have to manually review the code and provide written feedback. Recently, a Deep Learning (DL)-based approach was introduced to automatically recommend code review comments based on changed methods. While the approach showed promising results, it requires expensive computational resource and time which limits its use in practice. To address this limitation , we propose CommentFinder ś a retrieval-based approach to recommend code review comments. Through an empirical evaluation of 151,019 changed methods, we evaluate the effectiveness and efficiency of CommentFinder against the state-of-the-art approach. We find that when recommending the best-1 review comment candidate, our CommentFinder is 32% better than prior work in recommending the correct code review comment. In addition, CommentFinder is 49 times faster than the prior work. These findings highlight that our CommentFinder could help reviewers to reduce the manual efforts by recommending code review comments, while requiring less computational time.
... In traditional approaches, researchers cannot solve the main challenge in code review: understanding the code [7]. Therefore, researchers can only improve efficiency from other aspects of code review, such as recommending suitable reviewers [8], [9] and using static analysis tools [10], [11]. ...
Preprint
Automatic code review (ACR), aiming to relieve manual inspection costs, is an indispensable and essential task in software engineering. The existing works only use the source code fragments to predict the results, missing the exploitation of developer's comments. Thus, we present a Multi-Modal Apache Automatic Code Review dataset (MACR) for the Multi-Modal ACR task. The release of this dataset would push forward the research in this field. Based on it, we propose a Contrastive Learning based Multi-Modal Network (CLMN) to deal with the Multi-Modal ACR task. Concretely, our model consists of a code encoding module and a text encoding module. For each module, we use the dropout operation as minimal data augmentation. Then, the contrastive learning method is adopted to pre-train the module parameters. Finally, we combine the two encoders to fine-tune the CLMN to decide the results of Multi-Modal ACR. Experimental results on the MACR dataset illustrate that our proposed model outperforms the state-of-the-art methods.
... This measure has been widely adapted in various software engineering studies [28], [31], [60] and less affected by moved code in a PR, compared with other measures. On that same note, we apply the file location-based model that is used in typical code review recommendations [51], [62], [57] to compute file path similarity that is defined as follows: ...
Preprint
Full-text available
Popular adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. In this work, we measure contributions that are aligned with dependency changes, to understand where they come from (i.e., non-maintainer, client maintainer, library maintainer, and library and client maintainer), analyze whether they contribute to library dormancy (i.e., a lack of activity), and investigate the similarities between these contributions and developers' typical contributions. Hence, we leverage socio-technical techniques to measure the dependency-contribution congruence (DC congruence), i.e., the degree to which contributions align with dependencies. We conduct a large-scale empirical study to measure the DC congruence for the NPM ecosystem using 1.7 million issues, 970 thousand pull requests (PR), and over 5.3 million commits belonging to 107,242 NPM packages. At the ecosystem level, we pinpoint in time peaks of congruence with dependency changes (i.e., 16% DC congruence score). Surprisingly, these contributions came from the ecosystem itself (i.e., non-maintainers of either client and library). At the project level, we find that DC congruence shares a statistically significant relationship with the likelihood of a package becoming dormant. Finally, by comparing source code of contributions, we find that congruent contributions are statistically different to typical contributions. Our work has implications to encourage and sustain contributions, especially to support library maintainers that require dependency changes.
... In this respect, researchers have been investigating methodologies and techniques to provide support for developers during the code review process. For instance, researchers have devised automated solutions to identify proper reviewers for a new code change (Ouni et al. 2016;Kovalenko et al. 2020; Thongtanunam et al. 2015b;Zanjani et al. 2015), as well as investigated both the factors making the process more/less effective Baysal et al. 2016;Bosu and Carver 2014;Kononenko et al. 2015;McIntosh et al. 2014;Ram et al. 2018; Thongtanunam et al. 2016;Spadini et al. 2018) and the actual influence of code review on the resulting software quality (Abelein and Paech 2015; Porter et al. 1998;Rigby and Bird 2013;Rigby et al. 2014;Sauer et al. 2000;McIntosh et al. 2016;Morales et al. 2015;Kemerer and Paulk 2009;Spadini et al. 2019;Vassallo et al. 2019a). ...
Article
Full-text available
Code reviewing is a widespread practice used by software engineers to maintain high code quality. To date, the knowledge on the effect of code review on source code is still limited. Some studies have addressed this problem by classifying the types of changes that take place during the review process (a.k.a. review changes), as this strategy can, for example, pinpoint the immediate effect of reviews on code. Nevertheless, this classification (1) is not scalable, as it was conducted manually, and (2) was not assessed in terms of how meaningful the provided information is for practitioners. This paper aims at addressing these limitations: First, we investigate to what extent a machine learning-based technique can automatically classify review changes. Then, we evaluate the relevance of information on review change types and its potential usefulness, by conducting (1) semi-structured interviews with 12 developers and (2) a qualitative study with 17 developers, who are asked to assess reports on the review changes of their project. Key results of the study show that not only it is possible to automatically classify code review changes, but this information is also perceived by practitioners as valuable to improve the code review process. Data and materials: https://doi.org/10.5281/zenodo.5592254
Article
Automatic code review (ACR), which can relieve the costs of manual inspection, is an indispensable and essential task in software engineering. To deal with ACR, existing work is to serialize the abstract syntax tree (AST). However, making sense of the whole AST with sequence encoding approach is a daunting task, mostly due to some redundant nodes in AST hinder the transmission of node information. Not to mention that the serialized representation is inadequate to grasp the information of tree structure in AST. In this paper, we first present a new large-scale Apache Automatic Code Review (AACR) dataset for ACR task since there is still no publicly available dataset in this task. The release of this dataset would push forward the research in this field. Based on it, we propose a novel Simplified AST based Graph Convolutional Network (SimAST-GCN) to deal with ACR task. Concretely, to improve the efficiency of node information dissemination, we first simplify the AST of code by deleting the redundant nodes that do not contain connection attributes, and thus deriving a Simplified AST. Then, we construct a relation graph for each code based on the Simplified AST to properly embody the relations among code fragments of the tree structure into the graph. Subsequently, in the light of the merit of graph structure, we explore a graph convolution networks architecture that follows an attention mechanism to leverage the crucial implications of code fragments to derive code representations. Finally, we exploit a simple but effective subtraction operation in the representations between the original and revised code, enabling the revised difference to be preferably learned for deciding the results of ACR. Experimental results on the AACR dataset illustrate that our proposed model outperforms the state-of-the-art methods.
Conference Paper
Full-text available
Community Question Answering (CQA) sites are becoming increasingly important source of information where users can share knowledge on various topics. Although these platforms bring new opportunities for users to seek help or provide solutions, they also pose many challenges with the ever growing size of the community. The sheer number of questions posted everyday motivates the problem of routing questions to the appropriate users who can answer them. In this paper, we propose an approach to predict the best answerer for a new question on CQA site. Our approach considers both user interest and user expertise relevant to the topics of the given question. A user’s interests on various topics are learned by applying topic modeling to previous questions answered by the user, while the user’s expertise is learned by leveraging collaborative voting mechanism of CQA sites. We have applied our model on a dataset extracted from StackOverflow, one of the biggest CQA sites. The results show that our approach outperforms the TF-IDF based approach.
Article
Full-text available
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
Conference Paper
Full-text available
The advent of distributed version control systems has led to the development of a new paradigm for distributed software development; instead of pushing changes to a central repository, developers pull them from other repositories and merge them locally. Various code hosting sites, notably Github, have tapped on the opportunity to facilitate pull-based development by offering workflow support tools, such as code reviewing systems and integrated issue trackers. In this work, we explore how pull-based software development works, first on the GHTorrent corpus and then on a carefully selected sample of 291 projects. We find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. We show that a relatively small number of factors affect both the decision to merge a pull request and the time to process it. We also examine the reasons for pull request rejection and find that technical ones are only a small minority.
Conference Paper
Full-text available
ReDA(http://reda.naist.jp/) is a web-based visualiza-tion tool for analyzing Modern Code Review (MCR) datasets for large Open Source Software (OSS) projects. MCR is a commonly practiced and lightweight inspection of source code using a support tool such as Gerrit system. Recently, mining code review history of such systems has received attention as a potentially effective method of ensuring software quality. However, due to increasing size and complexity of softwares being developed, these datasets are becoming unmanageable. ReDA aims to assist researchers of mining code review data by enabling better understand of dataset context and identifying abnormalities. Through real-time data interaction, users can quickly gain insight into the data and hone in on interesting areas to investigate. A video highlighting the main features can be found at:
Conference Paper
Full-text available
Software peer review is practiced on a diverse set of software projects that have drastically different settings, cultures, incentive systems, and time pressures. In an effort to characterize and understand these differences we examine two Google-led projects, Android and Chromium OS, three Microsoft projects, Bing, Office, and MS SQL, and projects internal to AMD. We contrast our findings with data taken from traditional software inspection conducted on a Lucent project and from open source software peer review on six projects, including Apache, Linux, and KDE. Our measures of interest include the review interval, the number of developers involved in review, and proxy measures for the number of defects found during review. We find that despite differences among projects, many of the characteristics of the review process have independently converged to similar values which we think indicate general principles of code review practice. We also introduce a measure of the degree to which knowledge is shared during review. This is an aspect of review practice that has traditionally only had experiential support. Our knowledge sharing measure shows that conducting peer review increases the number of distinct files a developer knows about by 66% to 150% depending on the project. This paper is one of the first studies of contemporary review in software firms and the most diverse study of peer review to date.
Conference Paper
Full-text available
Effectively performing code review increases the quality of software and reduces occurrence of defects. However, this requires reviewers with experiences and deep understandings of system code. Manual selection of such reviewers can be a costly and time-consuming task. To reduce this cost, we propose a reviewer recommendation algorithm determining file path similarity called FPS algorithm. Using three OSS projects as case studies, FPS algorithm was accurate up to 77.97%, which significantly outperformed the previous approach.
Conference Paper
Open source software projects often rely on code contributions from a wide variety of developers to extend the capabilities of their software. Project members evaluate these contributions and often engage in extended discussions to decide whether to integrate changes. These discussions have important implications for project management regarding new contributors and evolution of project requirements and direction. We present a study of how developers in open work environments evaluate and discuss pull requests, a primary method of contribution in GitHub, analyzing a sample of extended discussions around pull requests and interviews with GitHub developers. We found that developers raised issues around contributions over both the appropriateness of the problem that the submitter attempted to solve and the correctness of the implemented solution. Both core project members and third-party stakeholders discussed and sometimes implemented alternative solutions to address these issues. Different stakeholders also influenced the outcome of the evaluation by eliciting support from different communities such as dependent projects or even companies. We also found that evaluation outcomes may be more complex than simply acceptance or rejection. In some cases, although a submitter's contribution was rejected, the core team fulfilled the submitter's technical goals by implementing an alternative solution. We found that the level of a submitter's prior interaction on a project changed how politely developers discussed the contribution and the nature of proposed alternative solutions.
Article
Context: Gaining an identity and building a good reputation are important motivations for Open Source Software (OSS) developers. It is unclear whether these motivations have any actual impact on OSS project success. Goal: To identify how an OSS developer's reputation affects the outcome of his/her code review requests. Method: We conducted a social network analysis (SNA) of the code review data from eight popular OSS projects. Working on the assumption that core developers have better reputation than peripheral developers, we developed an approach, Core Identification using K-means (CIK) to divide the OSS developers into core and periphery groups based on six SNA centrality measures. We then compared the outcome of the code review process for members of the two groups. Results: The results suggest that the core developers receive quicker first feedback on their review request, complete the review process in shorter time, and are more likely to have their code changes accepted into the project codebase. Peripheral developers may have to wait 2 - 19 times (or 12 - 96 hours) longer than core developers for the review process of their code to complete. Conclusion: We recommend that projects allocate resources or create tool support to triage the code review requests to motivate prospective developers through quick feedback.
Conference Paper
Code review is the manual assessment of source code by humans, mainly intended to identify defects and quality problems. Modern Code Review (MCR), a lightweight variant of the code inspections investigated since the 1970s, prevails today both in industry and open-source software (OSS) systems. The objective of this paper is to increase our understanding of the practical benefits that the MCR process produces on reviewed source code. To that end, we empirically explore the problems fixed through MCR in OSS systems. We manually classified over 1,400 changes taking place in reviewed code from two OSS projects into a validated categorization scheme. Surprisingly, results show that the types of changes due to the MCR process in OSS are strikingly similar to those in the industry and academic systems from literature, featuring the similar 75:25 ratio of maintainability-related to functional problems. We also reveal that 7–35% of review comments are discarded and that 10–22% of the changes are not triggered by an explicit review comment. Patterns emerged in the review data; we investigated them revealing the technical factors that influence the number of changes due to the MCR process. We found that bug-fixing tasks lead to fewer changes and tasks with more altered files and a higher code churn have more changes. Contrary to intuition, the person of the reviewer had no impact on the number of changes.