Conference PaperPDF Available

Anti-patterns in Modern Code Review: Symptoms and Prevalence

Authors:

Abstract and Figures

Modern code review (MCR) is now broadly adopted as an established and effective software quality assurance practice , with an increasing number of open-source as well as commercial software projects identifying code review as a crucial practice. During the MCR process, developers review, provide constructive feedback, and/or critique each others' patches before a code change is merged into the codebase. Nevertheless, code review is basically a human task that involves technical, personal and social aspects. Existing literature hint the existence of poor reviewing practices i.e., anti-patterns, that may contribute to a tense reviewing culture, degradation of software quality, slow down integration, and may affect the overall sustainability of the project. To better understand these practices, we present in this paper the concept of Modern Code Review Anti-patterns (MCRA) and take a first step to define a catalog that enumerates common poor code review practices. In detail we explore and characterize MCRA symptoms, causes, and impacts. We also conduct a series of preliminary experiments to investigate the prevalence and co-occurrences of such anti-patterns on a random sample of 100 code reviews from various OpenStack projects. Index Terms-Modern code review, review anti-pattern
Content may be subject to copyright.
Anti-patterns in Modern Code Review:
Symptoms and Prevalence
Moataz Chouchen, Ali Ouni, Raula Gaikovina Kula, Dong Wang,
Patanamon Thongtanunam, Mohamed Wiem Mkaouer§, Kenichi Matsumoto
ETS Montreal, University of Quebec, QC, Canada
Nara Institute of Science and Technology, Nara, Japan
University of Melbourne, Melbourne, Australia
§Rochester Institute of Technology, Rochester, NY, USA
Abstract—Modern code review (MCR) is now broadly adopted
as an established and effective software quality assurance prac-
tice, with an increasing number of open-source as well as
commercial software projects identifying code review as a crucial
practice. During the MCR process, developers review, provide
constructive feedback, and/or critique each others’ patches before
a code change is merged into the codebase. Nevertheless, code
review is basically a human task that involves technical, personal
and social aspects. Existing literature hint the existence of poor
reviewing practices i.e.,anti-patterns, that may contribute to a
tense reviewing culture, degradation of software quality, slow
down integration, and may affect the overall sustainability of
the project. To better understand these practices, we present
in this paper the concept of Modern Code Review Anti-patterns
(MCRA) and take a first step to define a catalog that enumerates
common poor code review practices. In detail we explore and
characterize MCRA symptoms, causes, and impacts. We also
conduct a series of preliminary experiments to investigate the
prevalence and co-occurrences of such anti-patterns on a random
sample of 100 code reviews from various OpenStack projects.
Index Terms—Modern code review, review anti-pattern
I. INTRODUCTION
Modern Code Review (MCR) is an established and broadly
adopted software engineering practice in both commercial and
open-source software (OSS) projects [1], [2]. Code review is
defined as the process of reviewing other developers code
to ensure software quality, and find potential problems in
their code changes before they are merged with the codebase.
MCR derives from the formal and disciplined process of
software inspection, which requires synchronous face-to-face
meetings among developers to make a checklist-based code
inspection and interactive discussion [3]. Conversely, MCR
provides practitioners with a convenient environment to read
and discuss code changes and makes this activity lightweight,
less formal and asynchronous through a tool specialized sup-
port for geographically distributed code review [1], [4]. There
is an increasing number of available MCR platforms including
Gerrit, ReviewBoard, and Phabricator.
The MCR process is most effective when developers follow
best practices, such as efficient, collaborative and timely re-
view discussions and code updates to improve the code quality,
enhance knowledge transfer, increase team awareness and
share code ownership. Such practices help reducing conflicts
in the team while ensuring that the code meets common quality
standards before it is merged into the code base [5]. In practice,
it is often challenging to follow these standards due to the
nature of code review being basically a human task involving
technical, personal and social aspects [1], [5]–[11]. Hence,
some poor code review practices can be observed and manifest
in the form of anti-patterns,i.e., common but ineffective
practices to a recurring problem that should be avoided. In
recent years, researchers and practitioners attempted to defined
catalogs of MCR anti-patterns, which become a major problem
that hinders software quality, maintainability and sustainabil-
ity, if not properly addressed [12]–[14].
In practice, such code review anti-patterns can manifest
in different forms such as divergent reviews, low reviewer
participation and responsiveness, toxic conversations, etc. [7],
[11], [15], [16]. For example, since reviewer opinions may
differ, patches can receive both positive and negative scores
leading to conflicts in the peer review process, due to the
disagreement about whether a developer’s contribution should
be accepted. Hence, if reviews with divergent scores are not
carefully resolved, they may contribute to a tense reviewing
culture and may slow down integration and lead to detrimental
effects on contributors’ continuing participation in the commu-
nity affecting the sustainability of the project [11], [16].
In this paper, we further study this phenomenon to overcome
these problems. With the aim of providing practitioners with
a code review quality-oriented dashboard, we compiled a
catalog for MCR anti-patterns. It contains 5 common anti-
patterns related to different aspects of the MCR management
and process, explaining their symptoms, causes and potential
impacts on the development team and project. To gain more
understanding, we further conducted a set of preliminary
experiments to investigate the prevalence and occurences of
such anti-patterns on a random set of 100 code reviews from
various OpenStack projects that use Gerrit as a MCR platform.
Overall, our results indicate that MCR anti-patterns are indeed
prevalent in the studied projects. Practitioners should be aware
of these anti-patterns and consider detecting and preventing
them using dedicated techniques.
II. RE LATE D WOR K
Factors that impact the effectiveness of the MCR pro-
cess. Various studies focused on the MCR process in both
open-source and industry. Bosu et al. [17] studied code review
in Microsoft by investigating various factors that make code
reviews useful to developers based on review comments. Jiang
et al. [18] studied the factors that impact the decision on
the acceptance of Linux patches. They conclude that patches
written by experienced developers are easily accepted and that
the number of invited reviewers have an impact of review
time. Ram et al. [19] suggest that the change description,
size, and coherency with the project coding style impacts the
likelihood of the code change being reviewed. Baysal et al.
[20] showed that various technical and non technical factors
affect the review process including the complexity of the patch,
the patch writer experience and the reviewer previous history
and workload. Recently, Hirao et al. [16] studied patches with
divergent review scores and found that 15%–37% of patches
that receive multiple review scores suffer from divergent
scores. Furthermore, [8] showed that a low level of agreement
is more likely to take a longer reviewing time and discussion
length. Thongtanunam et al. [21] investigated patches that do
not attract reviewers, not discussed, and receive slow initial
feedback. They found that the length of the patch description
plays an important role in the likelihood of receiving poor
reviewer participation or discussion.
Socio-technical aspects in MCR. Huang et al. [11] studied
issues related to potential conflicts in code review. They
indicate that conflicts generally have detrimental effects on
contributors’ continuing participation in the community, while
constructive suggestions increase retaining the contributors.
Bosu et al. [22] showed that core developers receive quicker
first feedback on their review request, complete the review
process in shorter time, and are more likely to have their
code changes accepted into the code base. Later, Bosu et al.
[23] studied the impact of interpersonal relations on the patch
review in MCR. They found that the patch author is one of the
important factors for peer reviewers to decide to review a patch
or not. In addition, Steinmacher et al. [24] gave evidence of
the existence of several social barriers faced by newcomers in
the code review process. Baysal et al. [25] studied the patch
life cycle in code review process in Firefox and found that
patches submitted by casual contributors are disproportionately
more likely to be abandoned compared to core contributors.
Later, Mcintosh et al. [26] suggested that coverage, reviewers
participation and expertise play high impacts on the code
quality. Recently, Uchoa et al. [27] found that long discussions
and review disagreements increase design degradation. Ebert
et al. [6] found that missing rationale and lack of familiarity
with the code are the major reasons for confusions in code
review. Raman et al. [15] studied toxic conversations and
unhealthy interactions in open source projects indicating their
potential impacts to demotivate and burn out developers,
creating challenges for sustaining open source.
III. MCRA: A CATALOG OF MCR ANTI-PATTERNS
Despite bringing several benefits, MCR can be problematic
and challenging especially when it does not follow good
practices [5], [28]. A catalog of MCR anti-patterns is of crucial
importance to increase the practitioners awareness towards
such practices. In particular, we identify a list of common
anti-patterns based on the existent literature, and provide an
illustrative example.
A. Description of MCR Anti-patterns
1) Confused reviewers (CR): Confusion in code review
refers to the inability or the uncertainty of the reviewers to
understand the reason(s) for the code change or any related
aspects to the patch [6]. There are several reasons behind
confusion in code review such as missing rationale, lack of
experience with the source code, and complex patches [10].
2) Divergent reviewers (DR): A code patch under review
can suffer from divergent reviewers when reviewers cannot
agree on the final evaluation by providing conflicting reviews
and scores [16]. DR can lead to several problems in the
review process including developer abandonment [11], poor
team performance [29] and slow integration processes [8].
3) Low review participation (LRP): This anti-pattern is
defined as the low involvement of reviewers when reviewing a
given code patch. Patches with low number of reviewers can
be more defect-prone [7]. Rigby et al. [30] showed that the
level of review participation is the most influential factor in
the code review efficiency.
4) Shallow review (SR): The SR anti-pattern happens when
the review comments are not relevant for the patch author
and/or focus on insignificant details namely, code nitpicking,
variables name, spaces, etc, instead of addressing quality
issues in the patch [5].
5) Toxic review (TR): Developers and reviewers can have
high stress levels due to several socio-technical factors [].
Toxic conversations and unhealthy interactions may demoti-
vate and burn out developers and reviewers, creating chal-
lenges for sustaining open source.
To get more details about the salient aspects of each anti-
pattern type, Table I describes the symptoms and potential
impacts/consequences on the software product, the review
process, and the team.
B. Illustrative examples
To show the salient aspects of MCR anti-patterns, Figure
1 depicts an example of a code review1taken from the
Opendev project, using the Gerrit code review platform.
Several anti-patterns can be found in this example including
confusion in reviewers comments. For example, we can see
that the reviewer “Sean McGinnis” is not clear about the
rationale of the patch through his comment “Do you have
a link to somewhere that says this is deprecated? I tried to
find one, but I don’t see anything stating they are deprecating
this. Nothing in the source either[...]” (cf. discussion box B
in Figure 1). Moreover, this code review suffers from low
review participation since the patch was updated on May
11, 2018 and the first reviewers comment was on May 29,
2018. Furthermore, we observe from the review scores and
1https://review.opendev.org/#/c/567926/
TABLE I: MCR anti-patterns and their associated symptoms and consequences.
MCR anti-pattern Symptoms Potential Consequences
Confused reviews
(CR)
Reviewers ask question(s) about the rationale or the
solution approach of the patch.
Reviewers express incertitude in review comments.
Process: Confusion decreases the efficiency and the effectiveness of
the review [6], [10].
Artifact: The patch may still have poor quality after the review as
reviewers may not fully understand the patch [6].
People: Reviewers may feel frustrated and may express negative
sentiments due to the confusion [6].
Divergent reviews
(DR)
The review decisions do not reach consensus.
The review scores are diverged.
Reviewers post conflict review comments to each
others.
Process: The divergence can slow down the integration process [16]
Artifact: It is correlated with negative development outcomes (i.e.,
patches without changing) [16]
People: The divergence increases contributors’ likelihood of leaving
the communities and lead to the poor team performance [11], [29]
Low review partici-
pation (LRP)
The patch does not have other developers as a re-
viewer.
The patch receive few/short comments.
The patch does not receive prompt/timely feedback
from reviewers.
Process: The lack of review participation has a negative impact on
review efficiency and effectiveness [30].
Artifact: Patches with low number of reviewers can be more defect-
prone [7]
People: The inefficient reviewer feedback could make the patch author
forget the change. [5]
Shallow review
(SR)
The patch receive a superficial or shallow comments
despite the complexity or size of the patch.
The review comments mainly focus on the visual
representation (e.g., code styling) or minor issue(s).
The review comments are unclear or a concern was
raised with a clear explanation.
The absence of inline comments despite the complex-
ity of the patch.
Process: Focusing on small and irrelevant issues would waste time for
no benefit [5].
Artifact: Unknown. There is potential gap in literature on its impact.
People: Small problems (i.e., a lot of style comments) would make
the author annoyed [5].
Toxic review (TR) The patch has a controversial discussion that does not
relate/focus to criticizing the code.
The patch receives a comment with a negative senti-
ment (e.g., harsh, rage expressions).
Process: The reviews with negative sentiments took longer time to get
accepted [31].
Artifact: Harmful sentiments could erode the benefits of suggested
changes [31].
People: Sentiments influence the quality of relationship between two
persons [32].
discussions that there are divergent review scores (cf. box A
in Figure 1), in which the reviewer Jay Bryant provided +2
score (i.e., accept), whereas the reviewer Sean McGinnis
provided -1 score (i.e., reject). Clearly, all these problems
resulted into heated discussions between reviewers and the
patch author and resulted in abandoning the patch as shown
in the last comment of the author Eric Harney saying :
Sure. I mean, it’s not. But ... yeah, let’s go with the ”it doesn’t
matter” plan.”. Hence, from the example we can observe the
importance to identifying such anti-patterns and prevent them
as early as possible during the code review process.
IV. PRELIMINARY EXPERIMENTS
While our long-term agenda is much broader, we conducted
a preliminary study to investigate the phenomenon of anti-
patterns in MCR. We first conducted a manual inspection
to detect the existence of anti-patterns in a sample of code
reviews. Thereafter, we design our experiments to address two
main research questions on the frequency of each anti-pattern
type (RQ1), and the prevalence of such anti-patterns in practice
(RQ2) to gain more insights on this phenomenon.
A. RQ1: How frequent are MCR anti-patterns?
Context selection. To investigate the frequency of each
MCR anti-pattern in practice, we considered the OpenStack
project that adopts a review process based on the Gerrit review
system. OpenStack is a large open source software ecosys-
tem (i.e., OpenStack attracts more than 100,000 contributors
spread more than 600 repositories [33]), where many well-
known organizations and companies collaboratively develop a
platform for cloud computing. From the latest available online
OpenStack datasets [34], [35], we randomly selected a sample
of 100 code reviews to be manually inspected to identify the
possible existence of anti-patterns. The random sample covers
across all repositories and covered reviews from November
2011 to July 2019. This dataset has been used in several
similar studies, especially for those that have motivated our
anti-patterns [6], [10], [16], [34]–[36]. We also provide our
replication package2for future replications and extensions of
our study.
Analysis Method. To detect anti-patterns in the studied
sample, each code review has been manually inspected by
three authors individually. The manual inspection of each code
review consists of reading through (1) the author/reviewers
discussion threads, and (2) the source code change diff in
Gerrit, in order to identify potential symptoms of MCR anti-
patterns. In a preliminary iteration, the authors conducted
an open discussion using a sample of 10 code reviews to
discuss whether the defined anti-patterns match with the
definition and symptoms. Thereafter, considering the workload
required for the manual inspection (100 code reviews ×5 anti-
pattern types, i.e., 500 inspections), we divided the authors
into two three-person groups, so that each group perform
250 inspections. Each inspection took on average from 5
to 25 minutes, depending on (1) the length of the review
2https://github.com/moatazchouchen/MCRA
Search term Changes
All Projects Documentation
Open Merged Abandoned
Change 567926 - Abandoned
Use urllib3 instead of requests.packages.urllib3
The packaged version of urllib3 in requests is
deprecated, just use urllib3 instead.
Change-Id: I853bd53c1a4ecfdc053a2d7086efa11feebb628c
Comments Size
3
4
6
2
10
File Path
Commit Message
cinder/volume/drivers/dell_emc/sc/storagecenter_api.py
cinder/volume/drivers/dell_emc/vmax/rest.py
cinder/volume/drivers/ibm/ibm_storage/ds8k_connection.py
cinder/volume/drivers/solidfire.py
cinder/volume/manager.py
requirements.txt 1
+14, -12
Files Open All Diff against: Base
History
Search Sign In
Project openstack/cinder
Branch master
Topic
Updated 2 years, 4 months ago
Cloudbase Cinder SMB3 CI DellEMC PowerFlex CI DellEMC Unity CI DellEMC VNX CI
Huawei Volume CI IBM PowerKVM CI IBM Storage CI INFINIDAT CI INSPUR CI Kaminario K2 CI
Mellanox CI NEC Cinder CI NetApp CI NetApp E-Series CI Oracle ZFSSA CI Pure Storage CI
Quobyte CI StorPool distributed storage CI Tintri CI Virtuozzo Storage CI ZadaraStorage VPSA CI
Commit
May 11, 2018 10:50 AMAuthor Eric Harney <eharney@redhat.com>
Committer Eric Harney <eharney@redhat.com> May 11, 2018 2:24 PM
(gitweb)
(gitweb)Parent(s)
Change-Id
292bc3dccb3f034acbaf29af0cae289549668772
7c1486218c8de56f1b3f556d5456a7cb8379c777
I853bd53c1a4ecfdc053a2d7086efa11feebb628c
Code-Review +2 Jay Bryant
-1 Sean McGinnis
Verified +1 Zuul
Workflow
Eric Harney
Uploaded patch set 1.
May 11, 2018
Eric Harney
Patch Set 1: Workflow-1
May 11, 2018
Zuul
Patch Set 1: Verified+1
Build succeeded (check pipeline).
May 11, 2018
Eric Harney
Removed Workflow-1 by Eric Harney <eharney@redhat.com>
May 15, 2018
Sean McGinnis
Patch Set 1:
Do you have a link to somewhere that says this is deprecated? I tried to find one, but I don't see anything stating they are
deprecating this.
Nothing in the source either:
https://github.com/requests/requests/blob/00fd4c8eb4ac0fd7b8f8d76bbf15ab06351c052c/requests/packages.py
May 29, 2018
Jay Bryant
Patch Set 1: Code-Review+2
This makes sense.
Jun 5, 2018
Sean McGinnis
Patch Set 1:
Does it? I haven't been able to find anything saying this import is going away.
Jun 5, 2018
Eric Harney
Patch Set 1:
The "deprecated" note was perhaps me extrapolating too much from
https://github.com/requests/requests/blob/master/requests/packages.py
Jul 12, 2018
but that part doesn't really matter. Using bundled libraries is basically never the right thing to do. We should fix this for our own
project's hygiene regardless of what the requests library is intending to do there.
Sean McGinnis
Patch Set 1: Code-Review-1
It's not a bundled library though, and it is not something that is deprecated and will change.
Jul 16, 2018
Eric Harney
Patch Set 1:
So the argument for not fixing this to be done the right way is...?
Jul 16, 2018
Sean McGinnis
Patch Set 1:
That it's already done the right way?
Jul 16, 2018
Eric Harney
Abandoned
Sure. I mean, it's not. But... yeah, let's go with the "it doesn't matter" plan.
Jul 18, 2018
A
B
Owner Eric Harney
Reviewers Jay Bryant Sean McGinnis Zuul
Collapse All
Fig. 1: Example of MCR anti-patterns from the OpenStack project, code change ID #567926 1.
TABLE II: The detection results of MCR anti-patterns.
Anti-pattern # instances Kappa Agreement
Confused Reviews (CR) 21 0.93 Perfect
Divergent Reviews (DR) 20 1.0 Perfect
Low Review Participation (LRP) 32 1.0 Perfect
Shallow Review (SR) 14 0.81 Substantial
Toxic Review (TR) 5 0.65 Substantial
discussions and (2) the size of the code change. To validate the
consistency between the participants inspection, we measured
the inter-rater agreement using Fleiss’s Kappa coefficient κ
[37]. Fleiss’s Kappa coefficient κis interpreted as: Poor
agreement if κ < 0;Slight agreement, if 0.01 κ0.2;
Fair agreement, if 0.21 κ0.40;Moderate agreement, if
0.41 κ0.60,Substantial agreement, if 0.61 κ0.80,
and Almost perfect agreement, if 0.81 κ1.00. Based on
the encouraging Kappa scores (i.e., near perfect for CR, DR,
LRP, and substantial for SR, TR), two three-person groups
coded the remaining samples.
Results. Table II shows the frequencies of anti-patterns
and the obtained Fleiss’s Kappa coefficient. the low review
participation (LRP) anti-pattern is the most frequent affecting
32% of analyzed code reviews. The divergent review (DR)
and confused review (CR) anti-patterns manifest in 20% and
21% of the studied code reviews, respectively. The shallow
review (SR) is detected in 14% of the examples, and finally
the toxic review (TR) turns out to be the lowest frequent one
with 5%. Moreover, we achieved perfect agreement for 3 out
the five anti-patterns, CR, DR and LRP, with 0.93, 1, and 1,
respectively. We also achieved a substantial agreement for the
TABLE III: The prevalence of MCR anti-patterns.
Category Count
Code reviews affected by one anti-pattern 67
Code reviews affected by two anti-patterns 21
Code reviews affected by three anti-patterns 4
two remaining anti-patterns, SR and TR, with a Kappa score
of 0.81 and 0.65, respectively. Having a lower agreement level
in SR and TR could be explained by the need for a degree
of subjectivity to detect them. Therefore, it is challenging to
manually detect code review anti-patterns which motivates the
need of automated support tools and deeper understanding of
the main roots behind these anti-patterns.
B. RQ2: How prevalent are MCR anti-patterns?
To gain more insights from the detected anti-pattern in-
stances, we further analyze their prevalence in the dataset.
Analysis Method. Our analysis consists of counting the
number of anti-patterns that exist in each studied code review
in RQ1 regardless of the specific anti-pattern types.
Results. The obtained results are summarized in Table III.
We observe that 67% of the studied code reviews contain at
least one instance MCR anti-pattern. Moreover, we can see
that 21% of the studied code reviews have at least two anti-
pattern instances, and found that 4% contain three or more
anti-patterns. These results indicate that MCR anti-patterns are
indeed highly prevalent, thus practitioners should be aware
of them and consider detecting and preventing them using
dedicated techniques.
V. CONCLUSION AND FUTURE WORK
In this paper, we identified a list of five common MCR
anti-patterns, their symptoms and potential impacts on the
quality of the software product, the process and the team. To
showcase the prevalence of these anti-patterns, we conducted a
study on a random sample of 100 code reviews extracted from
Openstack project. Preliminary results show that these anti-
patterns are indeed prevalent in MCR affecting 67% of code
reviews. While we do not claim that the provided catalogue is
exhaustive, further investigations should be conducted.
As part of our future work, we plan to study the phe-
nomenon of MCR anti-patterns in more depth by surveying
developers in order to provide an exhaustive list of MCR anti-
patterns. Moreover, we plan to design automated techniques to
detect these anti-patterns. We also plan to study the sensitivity
of MCR anti-patterns detection in different scenarios mainly
within-project detection and cross-project detection. Finally,
we plan to design and integrate dedicated bots in MCR tools
that help developers to avoid such anti-patterns.
Acknowledgments. This work has been supported by the
Natural Sciences and Engineering Research Council of Canada
(NSERC) discovery grant RGPIN-2018-05960, JSPS KAK-
ENHI Grant Numbers 18H04094, 20K19774, and 20H05706.
P. Thongtanunam was partially supported by the Australian
Research Council’s Discovery Early Career Researcher Award
(DECRA) funding scheme (DE210101091).
REFERENCES
[1] A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges of
modern code review,” in 35th International Conference on Software
Engineering (ICSE), 2013, pp. 712–721.
[2] M. Beller, A. Bacchelli, A. Zaidman, and E. Juergens, “Modern code
reviews in open-source projects: Which problems do they fix?” in
Working Conf. on Mining Software Repositories, 2014, pp. 202–211.
[3] M. Fagan, “Design and code inspections to reduce errors in program
development,” in Software pioneers. Springer, 2002, pp. 575–607.
[4] V. Kovalenko, N. Tintarev, E. Pasynkov, C. Bird, and A. Bacchelli,
“Does reviewer recommendation help developers?” IEEE Transactions
on Software Engineering, 2018.
[5] L. MacLeod, M. Greiler, M.-A. Storey, C. Bird, and J. Czerwonka,
“Code reviewing in the trenches: Challenges and best practices,IEEE
Software, vol. 35, no. 4, pp. 34–42, 2017.
[6] F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “Confusion in code
reviews: Reasons, impacts, and coping strategies,” in Int. Conference on
Software Analysis, Evolution and Reengineering, 2019, pp. 49–60.
[7] P. Thongtanunam, S. McIntosh, A. E. Hassan, and H. Iida, “Investigating
code review practices in defective files: An empirical study of the qt
system,” in W. Conf. on Mining Soft. Repositories, 2015, pp. 168–179.
[8] T. Hirao, A. Ihara, Y. Ueda, P. Phannachitta, and K.-i. Matsumoto, “The
impact of a low level of agreement among reviewers in a code review
process,” in Int. Conference on Open Source Systems, 2016, pp. 97–110.
[9] H. S. Qiu, A. Nolte, A. Brown, A. Serebrenik, and B. Vasilescu, “Going
farther together: The impact of social capital on sustained participation
in open source,” in 2019 IEEE/ACM 41st International Conference on
Software Engineering (ICSE). IEEE, 2019, pp. 688–699.
[10] F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “Confusion detec-
tion in code reviews,” in IEEE International Conference on Software
Maintenance and Evolution (ICSME), 2017, pp. 549–553.
[11] W. Huang, T. Lu, H. Zhu, G. Li, and N. Gu, “Effectiveness of conflict
management strategies in peer review process of online collaboration
projects,” in ACM Conf. on Computer-Supported Cooperative Work &
Social Computing, 2016, pp. 717–728.
[12] E. Dietrich, “Manual code review anti-patterns - https://dzone.com/
articles/manual-code- review-anti-patterns,” in DZone, Agile Zone, 2017.
[13] T. Gee, “Five code review antipatterns - https://blogs.oracle.com/
javamagazine/five-code-review-antipatterns,” in ORACLE Java Maga-
zine, 2020.
[14] T. Knierim, “Code review antipatterns. - https://thomasknierim.com/
code-review-antipatterns/,” 2018.
[15] N. Raman, M. Cao, Y. Tsvetkov, C. K¨
astner, and B. Vasilescu, “Stress
and burnout in open source: Toward finding, understanding, and miti-
gating unhealthy interactions,” in International Conference on Software
Engineering, New Ideas and Emerging Results (ICSE-NIER), 2020.
[16] T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto, “Code reviews
with divergent review scores: An empirical study of the openstack and
qt communities,” IEEE Transactions on Software Engineering, 2020.
[17] A. Bosu, M. Greiler, and C. Bird, “Characteristics of useful code
reviews: An empirical study at microsoft,” in IEEE/ACM 12th Working
Conference on Mining Software Repositories, 2015, pp. 146–156.
[18] Y. Jiang, B. Adams, and D. M. German, “Will my patch make it? and
how fast? case study on the linux kernel,” in Working Conference on
Mining Software Repositories (MSR), 2013, pp. 101–110.
[19] A. Ram, A. A. Sawant, M. Castelluccio, and A. Bacchelli, “What makes
a code change easier to review: an empirical investigation on code
change reviewability,” in Joint Meeting on European Soft. Eng. Conf.
and Symp. on the Foundations of Software Eng., 2018, pp. 201–212.
[20] O. Baysal, O. Kononenko, R. Holmes, and M. W. Godfrey, “Investigating
technical and non-technical factors influencing modern code review,”
Empirical Software Engineering, vol. 21, no. 3, pp. 932–959, 2016.
[21] P. Thongtanunam, S. McIntosh, A. E. Hassan, and H. Iida, “Review
participation in modern code review,” Empirical Software Engineering,
vol. 22, no. 2, pp. 768–817, 2017.
[22] A. Bosu and J. C. Carver, “Impact of developer reputation on code
review outcomes in oss projects: An empirical investigation,” in Int.
Symp. on Empirical Software Eng. and Measurement, 2014, pp. 1–10.
[23] A. Bosu, J. C. Carver, C. Bird, J. Orbeck, and C. Chockley, “Process
aspects and social dynamics of contemporary code review: Insights from
open source development and industrial practice at microsoft,IEEE
Transactions on Software Engineering, vol. 43, no. 1, pp. 56–75, 2016.
[24] I. Steinmacher, T. Conte, M. A. Gerosa, and D. Redmiles, “Social bar-
riers faced by newcomers placing their first contribution in open source
software projects,” in 18th ACM conference on Computer supported
cooperative work & social computing, 2015, pp. 1379–1392.
[25] O. Baysal, O. Kononenko, R. Holmes, and M. W. Godfrey, “The secret
life of patches: A firefox case study,” in 19th Working Conference on
Reverse Engineering, 2012, pp. 447–455.
[26] S. McIntosh, Y. Kamei, B. Adams, and A. E. Hassan, “An empirical
study of the impact of modern code review practices on software
quality,Empirical Software Eng., vol. 21, no. 5, pp. 2146–2189, 2016.
[27] A. Uchˆ
oa, C. Barbosa, W. Oizumi, P. Blen´
ılio, R. Lima, A. Garcia,
and C. Bezerra, “How does modern code review impact software design
degradation? an in-depth empirical study,36th ICSME, pp. 1–12, 2020.
[28] M. Greiler, C. Bird, M.-A. Storey, L. MacLeod, and J. Czerwonka,
“Code reviewing in the trenches: Understanding challenges, best prac-
tices and tool needs,” IEEE Software, pp. 34–42, 2016.
[29] A. Filippova and H. Cho, “The effects and antecedents of conflict in free
and open source software development,” in ACM Conf. on Computer-
Supported Cooperative Work & Social Computing, 2016, pp. 705–716.
[30] P. C. Rigby, D. M. German, L. Cowen, and M.-A. Storey, “Peer review
on open-source software projects: Parameters, statistical models, and
theory,ACM TOSEM, vol. 23, no. 4, pp. 1–33, 2014.
[31] I. E. Asri, N. Kerzazi, G. Uddin, F. Khomh, and M. Janati Idrissi,
“An empirical study of sentiments in code reviews,” Information and
Software Technology, vol. 114, pp. 37 – 54, 2019.
[32] T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi, “Senticr: A customized
sentiment analysis tool for code review interactions,” in International
Conference on Automated Software Engineering, 2017, pp. 106–111.
[33] Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, “Companies’ participation in
oss development - an empirical study of openstack,IEEE Transactions
on Software Engineering, 2019.
[34] P. Thongtanunam and A. E. Hassan, “Review dynamics and their impact
on software quality,IEEE Transactions on Software Engineering, 2020.
[35] X. Yang, R. G. Kula, N. Yoshida, and H. Iida, “Mining the modern code
review repositories: A dataset of people, process and product,” in Int.
Conference on Mining Software Repositories, 2016, pp. 460–463.
[36] K. Hamasaki, R. G. Kula, N. Yoshida, A. E. C. Cruz, K. Fujiwara, and
H. Iida, “Who does what during a code review? datasets of oss peer
review repositories,” in MSR, 2013, pp. 49–52.
[37] J. L. Fleiss, “Measuring nominal scale agreement among many raters,”
Psychological bulletin, vol. 76, no. 5, p. 378, 1971.
... In a typical software project, developers can receive several review requests daily. To better manage and prioritize their code review activities, developers need to estimate the review time for their review requests which can be tedious and errorprone, especially when multiple files are involved in the code change [14]- [16]. That is, inappropriate estimation of code review completion time can lead to unforeseen delays in the whole code review process and software product delivery [5], [17]. ...
... Prior research has shown that the code review process can be impacted by several socio-technical factors such as the complexity of the code change, the context of the change, the author's experience and collaboration, and reviewers' participation and the communication environment [16], [18]- [21]. Hence, predicting the completion time of a given code review would not be straightforward given that multiple technical and social factors can interfere with it. ...
Preprint
Full-text available
Context. Modern Code Review (MCR) is being adopted in both open source and commercial projects as a common practice. MCR is a widely acknowledged quality assurance practice that allows early detection of defects as well as poor coding practices. It also brings several other benefits such as knowledge sharing, team awareness, and collaboration. Problem. In practice, code reviews can experience significant delays to be completed due to various socio-technical factors which can affect the project quality and cost. For a successful review process, peer reviewers should perform their review tasks in a timely manner while providing relevant feedback about the code change being reviewed. However, there is a lack of tool support to help developers estimating the time required to complete a code review prior to accepting or declining a review request. Objective. Our objective is to build and validate an effective approach to predict the code review completion time in the context of MCR and help developers better manage and prioritize their code review tasks. Method. We formulate the prediction of the code review completion time as a learning problem. In particular, we propose a framework based on regression models to (i) effectively estimate the code review completion time, and (ii) understand the main factors influencing code review completion time.
... As informal communication such as the usage of emojis becomes prevalent, it is a need to understand its role in keeping an efficient code review process. We believe that emojis may also help remove toxic and other forms of anti-patterns [8] in the code review process. In terms of the intention of the emoji reactions, our study will complement all related works on emoji usage [6,7]. ...
Preprint
Full-text available
Context: Open source software development has become more social and collaborative, especially with the rise of social coding platforms like GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. Interestingly, preliminary results indicate that emojis do not always reduce commenting noise (i.e., eight out of 20 emoji reactions), providing evidence that developers use emojis with ulterior intentions. From a reviewing context, the extent to which emoji reactions facilitate for a more efficient review process is unknown. Objective: In this registered report, we introduce the study protocols to investigate ulterior intentions and usages of emoji reactions, apart from reducing commenting noise during the discussions in GitHub pull requests (PRs). As part of the report, we first perform a preliminary analysis to whether emoji reactions can reduce commenting noise in PRs and then introduce the execution plan for the study. Method: We will use a mixed-methods approach in this study, i.e., quantitative and qualitative, with three hypotheses to test.
Preprint
Full-text available
Code review that detects and locates defects and other quality issues plays an important role in software quality control. One type of issue that may impact the quality of software is code smells. Yet, little is known about the extent to which code smells are identified during modern code review. To investigate the concept behind code smells identified in modern code review and what actions reviewers suggest and developers take in response to the identified smells, we conducted a study of code smells in code reviews by analyzing reviews from four large open source projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator) communities. We manually checked a total of 25,415 code review comments obtained by keywords search and random selection, and identified 1,539 smell-related reviews. Our analysis found that 1) code smells were not commonly identified in code reviews, 2) smells were usually caused by violation of coding conventions, 3) reviewers usually provided constructive feedback, including fixing (refactoring) recommendations to help developers remove smells, 4) developers generally followed those recommendations and actioned the changes, 5) once identified by reviewers, it usually takes developers less than one week to fix the smells, and 6) the main reason why developers chose to ignore the identified smells is not worth fixing the smell. Our results suggest that: 1) developers should closely follow coding conventions in their projects to avoid introducing code smells, 2) review-based detection of code smells is perceived to be a trustworthy approach by developers, mainly because reviews are context-sensitive (as reviewers are more aware of the context of the code given that they are part of the project's development team), and 3) program context needs to be fully considered in order to make a decision of whether to fix the identified code smell immediately.
Article
Full-text available
Code review plays an important role in software quality control. A typical review process involves a careful check of a piece of code in an attempt to detect and locate defects and other quality issues/violations. One type of issue that may impact the quality of software is code smells-i.e., bad coding practices that may lead to defects or maintenance issues. Yet, little is known about the extent to which code smells are identified during modern code review. To investigate the concept behind code smells identified in modern code review and what actions reviewers suggest and developers take in response to the identified smells, we conducted an empirical study of code smells in code reviews by analyzing reviews from four large open source projects from the OpenStack (Nova and Neutron) and Qt (Qt Base and Qt Creator) communities. We manually checked a total of 25,415 code review comments obtained by keywords search and random selection; this resulted in the identification of 1,539 smell-related reviews which then allowed the study of the causes of code smells, actions taken against identified smells, time taken to fix identified smells, and reasons why developers ignored fixing identified smells. Our analysis found that 1) code smells were not commonly identified in code reviews , 2) smells were usually caused by violation of coding conventions, 3) reviewers usually provided constructive feedback, including fixing (refactoring) recommen-2 Xiaofeng Han et al. dations to help developers remove smells, 4) developers generally followed those recommendations and actioned the changes, 5) once identified by reviewers, it usually takes developers less than one week to fix the smells, and 6) the main reason why developers chose to ignore the identified smells is that it is not worth fixing the smell. Our results suggest the following: 1) developers should closely follow coding conventions in their projects to avoid introducing code smells, 2) review-based detection of code smells is perceived to be a trustworthy approach by developers, mainly because reviews are context-sensitive (as reviewers are more aware of the context of the code given that they are part of the project's development team), and 3) program context needs to be fully considered in order to make a decision of whether to fix the identified code smell immediately.
Article
Full-text available
Context Code review is a crucial step of the software development life cycle in order to detect possible problems in source code before merging the changeset to the codebase. Although there is no consensus on a formally defined life cycle of the code review process, many companies and open source software (OSS) communities converge on common rules and best practices. In spite of minor differences in different platforms, the primary purpose of all these rules and practices leads to a faster and more effective code review process. Non-conformance of developers to this process does not only reduce the advantages of the code review but can also introduce waste in later stages of the software development. Objectives The aim of this study is to provide an empirical understanding of the bad practices followed in the code review process, that are code review (CR) smells. Methods We first conduct a multivocal literature review in order to gather code review bad practices discussed in white and gray literature. Then, we conduct a targeted survey with 32 experienced software practitioners and perform follow-up interviews in order to get their expert opinion. Based on this process, a taxonomy of code review smells is introduced. To quantitatively demonstrate the existence of these smells, we analyze 226,292 code reviews collected from eight OSS projects. Results We observe that a considerable number of code review smells exist in all projects with varying degrees of ratios. The empirical results illustrate that 72.2% of the code reviews among eight projects are affected by at least one code review smell. Conclusion The empirical analysis shows that the OSS projects are substantially affected by the code review smells. The provided taxonomy could provide a foundation for best practices and tool support to detect and avoid code review smells in practice.
Article
Full-text available
Peer code review is a widely adopted software engineering practice to ensure code quality and ensure software reliability in both the commercial and open-source software projects. Due to the large effort overhead associated with practicing code reviews, project managers often wonder, if their code reviews are effective and if there are improvement opportunities in that respect. Since project managers at Samsung Research Bangladesh (SRBD) were also intrigued by these questions, this research developed, deployed, and evaluated a production-ready solution using the Balanced SCorecard (BSC) strategy that SRBD managers can use in their day-to-day management to monitor individual developer’s, a particular project’s or the entire organization’s code review effectiveness. Following the four-step framework of the BSC strategy, we– 1) defined the operation goals of this research, 2) defined a set of metrics to measure the effectiveness of code reviews, 3) developed an automated mechanism to measure those metrics, and 4) developed and evaluated a monitoring application to inform the key stakeholders. Our automated model to identify useful code reviews achieves 7.88% and 14.39% improvement in terms of accuracy and minority class F1 score respectively over the models proposed in prior studies. It also outperforms human evaluators from SRBD, that the model replaces, by a margin of 25.32% and 23.84% respectively in terms of accuracy and minority class F1 score. In our post-deployment survey, SRBD developers and managers indicated that they found our solution as useful and it provided them with important insights to help their decision makings.
Conference Paper
Full-text available
Software design is an important concern in modern code review through which multiple developers actively discuss and improve each single code change. However, there is little understanding of the impact of such developers' reviews on continuously reducing design degradation over time. It is even less clear to what extent and how design degradation is reversed during the process of each single code change's review. In summary, existing studies have not assessed how the process of design degradation evolution is impacted along: (i) within each single review, and (ii) across multiple reviews. As a consequence, one cannot understand how certain code review practices consistently contribute to either reduce or further increase design degradation as the project evolves. We aim at addressing these gaps through a multi-project retrospective study. By investigating 14,971 code reviews from seven software projects, we report the first study that characterizes how the process of design degradation evolves within each review and across multiple reviews. Moreover, we analyze a comprehensive suite of metrics to enable us to observe the influence of certain code review practices on combating or even accelerating design degradation. Our results show that the majority of code reviews had little to no design degradation impact in the analyzed projects. Even worse, this observation also applies, to some extent, to reviews with an explicit concern on design. Surprisingly, the practices of long discussions and high proportion of review disagreement in code reviews were found to increase design degradation. Finally, we also discuss how the study findings shed light on how to improve the research and practice of modern code review.
Article
Full-text available
Code review is a crucial activity for ensuring the quality of software products. Unlike the traditional code review process of the past where reviewers independently examine software artifacts, contemporary code review processes allow teams to collaboratively examine and discuss proposed patches. While the visibility of reviewing activities including review discussions in a contemporary code review tends to increase developer collaboration and openness, little is known whether such visible information influences the evaluation decision of a reviewer or not (i.e., knowing others' feedback about the patch before providing ones own feedback). Therefore, in this work, we set out to investigate the review dynamics, i.e., a practice of providing a vote to accept a proposed patch, in a code review process. To do so, we first characterize the review dynamics by examining the relationship between the evaluation decision of a reviewer and the visible information about a patch under review (e.g., comments and votes that are provided by prior co-reviewers). We then investigate the association between the characterized review dynamics and the defect-proneness of a patch. Through a case study of 83,750 patches of the OpenStack and Qt projects, we observe that the amount of feedback (either votes and comments of prior reviewers) and the co-working frequency of a reviewer with the patch author are highly associated with the likelihood that the reviewer will provide a positive vote to accept a proposed patch. Furthermore, we find that the proportion of reviewers who provided a vote consistent with prior reviewers is significantly associated with the defect-proneness of a patch. However, the associations of these review dynamics are not as strong as the confounding factors (i.e., patch characteristics and overall reviewing activities). Our observations shed light on the implicit influence of the visible information about a patch under review on the evaluation decision of a reviewer. Our findings suggest that the code reviewing policies that are mindful of these practices may help teams improve code review effectiveness. Nonetheless, such review dynamics should not be too concerning in terms of software quality.
Article
Full-text available
Commercial participation continues to grow in open source software (OSS) projects and novel arrangements appear to emerge in company-dominated projects and ecosystems. What is the nature of these novel arrangements' Does volunteers' participation remain critical for these ecosystems' Despite extensive research on commercial participation in OSS, the exact nature and extent of company contributions to OSS development, and the impact of this engagement may have on the volunteer community have not been clarified. To bridge the gap, we perform an exploratory study of OpenStack: a large OSS ecosystem with intense commercial participation. We quantify companies' contributions via the developers that they provide and the commits made by those developers. We find that companies made far more contributions than volunteers and the distribution of the contributions made by different companies is also highly unbalanced. We observe eight unique contribution models based on companies' commercial objectives and characterize each model according to three dimensions: contribution intensity, extent, and focus. Companies providing full cloud solutions tend to make both intensive (more than other companies) and extensive (involving a wider variety of projects) contributions. Usage-oriented companies make extensive but less intense contributions. Companies driven by particular business needs focus their contributions on the specific projects addressing these needs. Minor contributors include community players (e.g., the Linux Foundation) and research groups. A model relating the number of volunteers to the diversity of contribution, shows a strong positive association between them.
Article
Full-text available
Context Modern code reviews are supported by tools to enhance developers’ interactions allowing contributors to submit their opinions for each committed change in form of comments. Although the comments are aimed at discussing potential technical issues, the text might enclose harmful sentiments that could erode the benefits of suggested changes. Objective In this paper, we study empirically the impact of sentiment embodied within developers’ comments on the time and outcome of the code review process. Method Based on historical data of four long-lived Open Source Software (OSS) projects from a code review system we investigate whether perceived sentiments have any impact on the interval time of code changes acceptance. Results We found that (1) contributors frequently express positive and negative sentiments during code review activities; (2) the expressed sentiments differ among the contributors depending on their position within the social network of the reviewers (e.g., core vs peripheral contributors); (3) the sentiments expressed by contributors tend to be neutral as they progress from the status of newcomer in an OSS project to the status of core team contributors; (4) the reviews with negative comments on average took more time to complete than the reviews with positive/neutral comments, and (5) the reviews with controversial comments took significantly longer time in one project. Conclusion Through this work, we provide evidences that text-based sentiments have an impact on the duration of the code review process as well as the acceptance or rejection of the suggested changes.
Article
Code review is a broadly adopted software quality practice where developers critique each others' patches. In addition to providing constructive feedback, reviewers may provide a score to indicate whether the patch should be integrated. Since reviewer opinions may differ, patches can receive both positive and negative scores. If reviews with divergent scores are not carefully resolved, they may contribute to a tense reviewing culture and may slow down integration. In this paper, we study patches with divergent review scores in the OPENSTACK and QT communities. Quantitative analysis indicates that patches with divergent review scores: (1) account for 15%-37% of patches that receive multiple review scores; (2) are integrated more often than they are abandoned; and (3) receive negative scores after positive ones in 70% of cases. Furthermore, a qualitative analysis indicates that patches with strongly divergent scores that: (4) are abandoned more often suffer from external issues (e.g., integration planning, content duplication) than patches with weakly divergent scores and patches without divergent scores; and (5) are integrated often address reviewer concerns indirectly (i.e., without changing patches). Our results suggest that review tooling should integrate with release schedules and detect concurrent development of similar patches to optimize review discussions with divergent scores. Moreover, patch authors should note that even the most divisive patches are often integrated through discussion, integration timing, and careful revision.
Conference Paper
Peer code review is a practice widely adopted in software projects to improve the quality of code. In current code review practices, code changes are manually inspected by developers other than the author before these changes are integrated into a project or put into production. We conducted a study to obtain an empirical understanding of what makes a code change easier to review. To this end, we surveyed published academic literature and sources from gray literature (blogs and white papers), we interviewed ten professional developers, and we designed and deployed a reviewability evaluation tool that professional developers used to rate the reviewability of 98 changes. We find that reviewability is defined through several factors, such as the change description, size, and coherent commit history. We provide recommendations for practitioners and researchers. Public preprint [https://doi.org/10.5281/zenodo.1323659]; data and materials [https://doi.org/10.5281/zenodo.1323659].
Article
Selecting reviewers for code changes is a critical step for an efficient code review process. Recent studies propose automated reviewer recommendation algorithms to support developers in this task. However, the evaluation of recommendation algorithms, when done apart from their target systems and users (i.e. code review tools and change authors), leaves out important aspects: perception of recommendations, influence of recommendations on human choices, and their effect on user experience. This study is the first to evaluate a reviewer recommender in vivo. We compare historical reviewers and recommendations for over 21,000 code reviews performed with a deployed recommender in a company environment and set to measure the influence of recommendations on users' choices, along with other performance metrics. Having found no evidence of influence, we turn to the users of the recommender. Through interviews and a survey we find that, though perceived as relevant, reviewer recommendations rarely provide additional value for the respondents. We confirm this finding with a larger study at another company. The confirmation of this finding brings up a case for more user-centric approaches to designing and evaluating the recommenders. Finally, we investigate information needs of developers during reviewer selection and discuss promising directions for the next generation of reviewer recommendation tools.