Conference PaperPDF Available

Understanding Code Smell Detection via Code Review: A Study of the OpenStack Community

Authors:

Abstract and Figures

Code review plays an important role in software quality control. A typical review process would involve a careful check of a piece of code in an attempt to find defects and other quality issues/violations. One type of issues that may impact the quality of the software is code smells - i.e., bad programming practices that may lead to defects or maintenance issues. Yet, little is known about the extent to which code smells are identified during code reviews. To investigate the concept behind code smells identified in code reviews and what actions reviewers suggest and developers take in response to the identified smells, we conducted an empirical study of code smells in code reviews using the two most active OpenStack projects (Nova and Neutron). We manually checked 19,146 review comments obtained by keywords search and random selection, and got 1,190 smell-related reviews to study the causes of code smells and actions taken against the identified smells. Our analysis found that 1) code smells were not commonly identified in code reviews, 2) smells were usually caused by violation of coding conventions, 3) reviewers usually provided constructive feedback, including fixing (refactoring) recommendations to help developers remove smells, and 4) developers generally followed those recommendations and actioned the changes. Our results suggest that 1) developers should closely follow coding conventions in their projects to avoid introducing code smells, and 2) review-based detection of code smells is perceived to be a trustworthy approach by developers, mainly because reviews are context-sensitive (as reviewers are more aware of the context of the code given that they are part of the project's development team).
Content may be subject to copyright.
Understanding Code Smell Detection via Code
Review: A Study of the OpenStack Community
Xiaofeng Han1, Amjed Tahir2, Peng Liang1, Steve Counsell3, Yajing Luo1
1School of Computer Science, Wuhan University, Wuhan, China
2School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
3Department of Computer Science, Brunel University London, London, United Kingdom
hanxiaofeng@whu.edu.cn, a.tahir@massey.ac.nz, liangp@whu.edu.cn, steve.counsell@brunel.ac.uk, luoyajing@whu.edu.cn
Abstract—Code review plays an important role in software
quality control. A typical review process would involve a careful
check of a piece of code in an attempt to find defects and
other quality issues/violations. One type of issues that may
impact the quality of the software is code smells - i.e., bad
programming practices that may lead to defects or maintenance
issues. Yet, little is known about the extent to which code smells
are identified during code reviews. To investigate the concept
behind code smells identified in code reviews and what actions
reviewers suggest and developers take in response to the identified
smells, we conducted an empirical study of code smells in code
reviews using the two most active OpenStack projects (Nova
and Neutron). We manually checked 19,146 review comments
obtained by keywords search and random selection, and got
1,190 smell-related reviews to study the causes of code smells
and actions taken against the identified smells. Our analysis found
that 1) code smells were not commonly identified in code reviews,
2) smells were usually caused by violation of coding conventions, 3)
reviewers usually provided constructive feedback, including fixing
(refactoring) recommendations to help developers remove smells,
and 4) developers generally followed those recommendations and
actioned the changes. Our results suggest that 1) developers
should closely follow coding conventions in their projects to avoid
introducing code smells, and 2) review-based detection of code
smells is perceived to be a trustworthy approach by developers,
mainly because reviews are context-sensitive (as reviewers are
more aware of the context of the code given that they are part
of the project’s development team).
Index Terms—Code Review, Code Smell, Mining Software
Repositories, Empirical Study
I. INTRODUCTION
Code smells are identified as symptoms of possible code or
design problems [1], which may potentially have a negative
impact on software quality, such as maintainability [2], code
readability [3], testability [4], and defect-proneness [5].
A large number of studies have focused on smell detection
and removal techniques (e.g., [6], [7]) with many static analy-
sis tools for smell detection; these include tools such as PMD1,
SonarQube2, and Designite3. However, previous work [8], [9]
indicated that the program context and domain are important in
identifying smells. This makes it difficult for program analysis
tools to correctly identify smells since contextual information
This work was partially funded by the National Key R&D Program of
China with Grant No. 2018YFB1402800.
1https://pmd.github.io
2https://www.sonarqube.org
3https://www.designite-tools.com
is rarely taken into account. Existing smell detection tools are
also known to produce false positives [10], [11]; therefore,
manual detection of smells could be considered more valuable
than automatic approaches.
Code review is a process which aims to verify the quality
of the software by detecting defects and other issues in the
code, and to ensure that the code is readable, understandable,
and maintainable. It has been linked to improved quality [12],
reduced defects [13], reduced anti-patterns [14], and the iden-
tification of vulnerabilities [15]. Compared to smell detection
static analysis tools, code reviews are usually performed by
developers belonging to the same project [16], so it is possible
that reviewers will take full account of contextual information
and thus better identify code smells in the code.
However, little is known about the extent to which code
smells are identified during code reviews, and whether devel-
opers (the code authors) take any action when a piece of code
is deemed “smelly” by reviewers. Therefore, we set out to
study the concept behind code smells identified in code re-
views and track down actions taken after reviews were carried
out. To this end, we mined code review discussions from the
two most active OpenStack4projects: Nova5and Neutron6. We
then conducted a quantitative and qualitative analysis to study
how common it was for reviewers to identified code smells
during code review, why the code smells were introduced,
what actions they recommended for those smells, and how
developers proceeded with those recommendations. In total,
we analyzed 1,190 smell-related reviews got by manually
checking 19,146 review comments to achieve our goal.
Our results suggest that: 1) code smells are not widely iden-
tified in modern code reviews, 2) following coding conventions
can help reducing the introduction of code smells, 3) reviewers
usually provide useful suggestions to help developers better
fix the identified smells; while developers commonly accept
reviewers’ recommendations regarding the identified smells
and they tend to refactor their code based on those recom-
mendations, and 4) review-based detection of code smells is
seen as a trustworthy mechanism by developers.
The paper is structured as follows: related work is presented
4https://www.openstack.org
5https://wiki.openstack.org/wiki/Nova
6https://wiki.openstack.org/wiki/Neutron
in Section II, the study design and data extraction methods are
explained in Section III, the results are presented in Section
IV, followed by a discussion of the results in Section V, and
the threats to the validity of the results are covered in Section
VI, followed by conclusions and future work in Section VII.
II. RE LATE D WOR K
A. Studies on Code Smells
A growing number of studies have investigated the impact
of code smells on software quality, including defects [5],
[17], maintenance [18], and program comprehension [3]. Other
studies have looked at the impact of code smells on software
quality using a group of developers working on a specific
project [18]–[20].
Tufano et al. [21] mined version histories of 200 open
source projects to study when code smells were introduced
and the main reason behind their interaction. It was found
that smells appeared in general as a result of maintenance and
evolution activities. Sjˆ
oberg et al. [18] investigated the rela-
tionship between the presence of code smells and maintenance
effort through a set of control experiments. Their study did
not find significant evidence that the presence of smells led
to increased maintenance effort. Previous studies also include
work investigating the impact of different forms of smells on
software quality, such as architectural smells [22], [23], test
smells [4], [24], and spreadsheet smells [25].
A number of previous studies have investigated developer’s
perception of code smells and their impact in practice. A
survey on developers’ perception of code smells conducted
by Palomba et al. [26] found that developer experience and
system knowledge are critical factors for the identification of
code smells. Yamashita and Moonen [8] reported that devel-
opers are moderately concerned about code smells in their
code. A more recent study by Taibi et al. [27] replicated the
two previous studies [8], [26] and found that the majority of
developers always considered smells to be harmful; however,
it was found that developers perceived smells as critical in
theory, but not as much in practice. Tahir et al. [9] mined posts
from Stack Exchange sites to explore how the topics of code
smells and anti-patterns were discussed amongst developers.
Their study found that developers widely used online forums
to ask for general assessments of code smells or anti-patterns
instead of asking for particular refactoring solutions.
B. Code Reviews in Software Development
Code review is an integral part in modern software devel-
opment. In recent years, empirical studies on code reviews
have investigated the potential code review factors that affect
software quality. For example, McIntosh et al. [28] investi-
gated the impact of code review coverage and participation on
software quality in the Qt, VTK, and ITK projects. The authors
used the incidence rates of post-release defects as an indicator
and found that poorly reviewed code (e.g. with low review
coverage and participation) had a negative impact on software
quality. A study by Kemerer et al. [29] investigated the impact
of review rate on software quality. The authors found that the
Personal Software Process review rate was a significant factor
affecting defect removal effectiveness, even after accounting
for developer ability and other significant process variables.
Several studies [13], [28], [30] have investigated the impact
of modern code review on software quality. Other studies
have also investigated the impact of code reviews on different
aspects of software quality, such as vulnerabilities [31], design
decisions [32], anti-patterns [14], and code smells [33], [34].
Aziz and Apatta [33] examined review comments from code
reviewers and described the need for an empirical analysis
of the relationship between code smells and peer code re-
view. Their preliminary analysis of review comments from
OpenStack and WikiMedia projects indicated that code review
processes identified a number of code smells. However, the
study only provided preliminary results and did not investigate
the causes or resolution strategies of these smells. A more
recent study by Pascarella et al. [34] found that code reviews
helped in reducing the severity of code smells in source code,
but this was mainly a side effect to other changes unrelated
to the smells themselves.
III. METHODOLOGY
A. Research Questions
The goal of this study is to investigate the concept behind
code smells identified during code reviews and what actions
are suggested by those reviewers and performed by developers
in response to the identified smells. To achieve this goal, we
formulated the following three research questions (RQs).
RQ1: Which code smells are the most frequently identified
by code reviewers?
Rationale: This question aims to find out how frequent
smells are identified by code reviewers and what particular
code smells are repeatedly detected by reviewers. Such
information can help in improving developers’ awareness of
these frequently identified code smells.
RQ2: What are the common causes for code smells that
are identified during code reviews?
Rationale: This question investigates the main reasons
behind the identified smells as explained by the reviewers
or developers. When reviewing code, reviewers can express
why they think the code under review may contain a smell.
Developers can also reply to reviewers and explain how they
introduced the smells. Understanding the common causes of
smells identified manually by reviewers will shed some light
on the effectiveness of manual detection of smells and help
developers better understand the nature of identified smells
and reduce such smells in the future.
RQ3: How do reviewers and developers treat the identified
code smells?
Rationale: This question investigates the actions suggested
by reviewers and those taken by developers on the identified
smells. When a smell is identified, reviewers can provide
suggestions to resolve the smell, and developers can then
Fig. 1: An overview of our data mining and analyzing process
decide on whether to fix or ignore it. This question can be
further decomposed into three sub-research questions from
the perspective of reviewers, developers, and the relationship
between their actions:
RQ3.1: What actions do reviewers suggest to deal with
the identified smells?
RQ3.2: What actions do developers take to resolve the
identified smells?
RQ3.3: What is the relationship between the actions
suggested by reviewers and those taken by developers?
B. OpenStack Projects and Gerrit Review Workflow
OpenStack is a set of software tools for building and man-
aging cloud computing platforms. It is one of the largest open
source communities. Based on most recent data, OpenStack
projects contain around 13 million lines of code, contributed
to by around 12k developers7. We deemed the platform to be
appropriate for our analysis, since the community has long
invested in its code review process8.
We then selected two of the most active OpenStack projects
as our subject projects: Nova (a fabric controller) and Neu-
tron (a network connectivity platform) - Table I provides an
overview of the data obtained from the two projects. Both
projects are written in Python, and use Gerrit9, a web-based
code review platform built on top of Git. The Gerrit review
workflow is explained next.
TABLE I: An overview of the subject projects (Nova and
Neutron)
Project Review Period #Code Changes #Comments
Nova Jan 14 - Dec 18 22,762 156,882
Neutron Jan 14 - Dec 18 15,256 152,429
Total 38,018 309,311
Gerrit provides a detailed code review workflow. First, a
developer (author) makes a change to the code and submits
7As of October 2020: https://www.openhub.net/p/openstack
8https://docs.opendev.org/opendev/infra-manual/latest/developers.html
9https://www.gerritcodereview.com
the code (patch) to the Gerrit server so that it can be reviewed.
Then, verification bots check the code using static analysers
and run automated tests. A reviewer (usually other developers
that have not been involved in writing the code under review)
will then conduct a formal review of the code and provide
comments. The original author can reply to the reviewer’s
comments and action the required changes by producing a
new revision of the patch. This process is repeated until the
change is merged to the code base or abandoned by the author.
C. Mining Code Review Repositories
Fig. 1 outlines our data extraction and mining process. We
mined code review data via the RESTful API provided by
Gerrit, which returns the results in a JSON format. We used
a Python script to automatically mine the review data in the
studied period and store the data in a local database. In total,
we mined 38,018 code changes and 309,311 review comments
between Jan 2014 and Dec 2018 from the two projects.
D. Building the Keyword Set
To locate code review comments that include code smell
discussions, we used several variations of terms referring to
code smells or anti-patterns, including “code smell”, “bad
smell”, “bad pattern”, “anti-pattern”, and “technical debt”. In
addition, considering that reviewers may point out the specific
code smell by its name (e.g., dead code) rather than using
generic terms, we included a list of code smell terms obtained
from Tahir et al. [35], that extracted these smell terms from
several relevant studies on this topic, including the first work
on code smells by Fowler [1] and the systematic review by
Zhang et al. [36]. The list of smell terms used in our study
are shown in Table II.
Since the effectiveness of keyword-based mining approach
relies on the set of keywords that are used in the search, we
followed the systematic approach used by Bosu et al. [31] to
identify the keywords included in our search. This includes
the following steps10:
1) Build an initial keyword set.
10 implemented using the NLTK package: http://www.nltk.org
TABLE II: Code smell terms included in our mining
Code Smell Terms
Accidental Complexity Anti Singleton Bad Naming Blob Class
Circular Dependency Coding by Exception Complex Class Complex Conditionals
Data Class Data Clumps Dead Code Divergent Change
Duplicated Code Error Hiding Feature Envy Functional Decomposition
God Class God Method Inappropriate Intimacy Incomplete Library Class
ISP Violation Large Class Lazy Class Long Method
Long Parameter List Message Chain Middle Man Misplaced Class
Parallel Inheritance Hierarchies Primitive Obsession Refused Bequest Shotgun Surgery
Similar Subclasses Softcode Spaghetti Code Speculative Generality
Suboptimal Information Hiding Swiss Army Knife Temporary Field Use Deprecated Components
2) Build a corpus by searching for review comments that
contain at least one keyword of our initial keyword set
(e.g., “dead” or “duplicated”) in the code review data we
collected in Section III-C.
3) Process the identified review comments which contain
at least one keyword of our initial keyword set, and then
apply the identifier splitting rules (i.e., “isDone” becomes
“is Done” or “is done” becomes “is done”).
4) Create a list of tokens for each document in the corpus.
5) Clean the corpus by removing stopwords, punctuation,
and numbers, and then convert all the words to lowercase.
6) Apply the Porter stemming algorithm [37] to obtain the
stem of each token.
7) Create a Document-Term matrix [38] from the corpus.
8) Find the additional words co-occurred frequently with
each of our initial keywords (co-occurrence probability
of 0.05 in the same document).
After performing these eight steps, we found that no addi-
tional keywords co-occurred with each of our initial keywords,
based on the co-occurrence probability of 0.05 in the same
document. Therefore, we believe that our initial keyword set is
sufficient to support the keyword-based mining method. Due
to space constraints, we provide the initial set of keywords
(which are the same as the final set of keywords) associated
with code smells, in our replication package [39].
E. Identifying Smell-related Reviews in Keywords-searched
Review Comments
We followed the following four steps to identify smell-
related reviews:
In step one, we developed a Python script to search for
review comments that contained at least one of the keywords
identified in Section III-D. The search returned a total of
18,082 review comments from the two projects.
In step two, to increase our verification process, two of
the authors independently and manually analyzed the review
comments obtained in step one to exclude comments clearly
unrelated to code smells. If a review comment was deemed
by both coders to be unrelated to code smells, it was then
excluded. As a result of this step, the number of review
comments was reduced to 3,666.
To illustrate this process, consider the following two re-
view comments that contain the keyword “dead”. In the first
example, the reviewer commented that “why not to put the
port on dead vlan first?11. Although this comment contains
the keyword “dead”, both coders thought that it was unrelated
to code smells, and the comment was therefore excluded. In
the second example, the reviewer commented “remove dead
code12, which was regarded as related to “dead code” by the
two coders and thus was included in the analysis.
In step three, two of the authors worked together to further
manually analyze the remaining review comments. The same
two authors carefully analyzed the contextual information of
each review comment, including the code review discussions
and associated source code to determine whether the code
reviewers identified any smells in the review comments. We
considered a comment to be related to code smell only when
both coders agreed. The agreement between the two authors
was calculated using Cohen’s Kappa coefficient [40], which
was 0.85. When the coders were unsure or disagreed, a third
author was then involved in the discussion until an agreement
was reached. This resulted in a reduction in the number of
review comments to 1,235.
To better explain our selection process, consider the two ex-
amples in Fig. 2. In the top example13, the reviewer suggested
adding another argument to the method to eliminate code
duplication. Then the developer replied “Done”, which implies
an acknowledgment of the code duplication. We considered
this as a clear smell-related review, and the review comment
was retained for further analysis. In contrast, in the bottom
example14, we observed that the comment was just used to
explain the meaning of the “DRY” principle, but did not
indicate that the code contained duplication according to the
context. Thus, this comment was excluded from analysis.
Finally, in step four, we recorded the contextual information
of each review comment in an external text file for further
analysis, which contained: 1) a URL to the code change,
2) the type of the identified code smell, 3) the discussion
between reviewers and developers, and 4) a URL to the
source code. We ended up with a total of 1,174 smell-related
reviews (we note that several review comments appearing
in the same discussion were merged). An example of an
11 https://review.opendev.org/c/openstack/neutron/+/179314
12 https://review.opendev.org/c/openstack/neutron/+/196893
13 https://review.opendev.org/c/openstack/nova/+/100097
14 https://review.opendev.org/c/openstack/nova/+/91092
Fig. 2: Review comments related to ‘duplicated code’: the
top review is smell-related, while the bottom one is not.
extracted source file is shown below:
Code Change URL: http://alturl.com/2ne85
Code Smell: Dead Code
Code Smell Discussions:
1) Reviewer: “Looks like copy-paste of above and,
more importantly, dead code.
2) Developer: “yes, sorry for that.”
Source Code URL: http://alturl.com/yai68
F. Identifying Smell-related Reviews in Randomly-selected Re-
view Comments
Knowing that reviewers and developers may not use the
same keywords as we used in Section III-E when detecting and
discussing code smells during code review, we supplemented
our keyword-based mining approach by including a randomly
selected set of review comments from the rest of the review
comments (291,229) that did not contain any of the keywords
used in Section III-D. Based on 95% confidence level and
3% margin of error [41], we ended up with an additional
1,064 review comments. We then followed the same process of
manual analysis (i.e., from step two to step four as described
in Section III-E) to identify smell-related reviews in these
randomly selected review comments. Finally, we identified a
total of 16 smell-related reviews.
In addition to the reviews obtained by keywords search in
Section III-E, we finally obtained a total of 1,190 smell-related
reviews for further analysis. We provided a full replication
package containing all the data, scripts, and results online [39].
G. Manual Analysis and Classification
For RQ1, in Sections III-E and III-F, we identified and
recorded the smell type of each review when analyzing the
review comments. When a reviewer used general terms (e.g.,
“smelly” or “anti-pattern”) to describe the identified smell, we
classified the type in these reviews as “general”. The others
were classified as specific smell (e.g., “duplicated code”).
For RQ2, we adopted Thematic Analysis [42] to find
the causes for the identified code smells in Sections III-E
and III-F. We used MAXQDA15 - a software package for
qualitative research - to code the contextual information of the
identified code smells. Firstly, we coded the collected smell-
related reviews by highlighting sections of the text related to
the causes of the code smell in the review. When no cause
was found, we used “cause not provided/unknown”. Next, we
looked over all the code that we created to identify common
patterns among them and generated themes. We then reviewed
the generated themes by returning to the dataset and comparing
our themes against it. Finally, we named and defined each
theme. This process was performed by the same two coders
in Sections III-E and III-F. A third author was involved in
cases of disagreement by the two coders.
For RQ3, we decided to manually check the code reviews
obtained in Sections III-E and III-F to identify the actions
suggested by reviewers and taken by developers.
For RQ3.1, we categorized the actions recommended by
reviewers into three categories, which are proposed in [35]:
1) Fix: recommendations are made to refactor the code
smell.
2) Capture: detect that there may be a code smell, but no
direct refactoring recommendations are given.
3) Ignore: recommendations are to ignore the identified
smells.
For RQ3.2, we investigated how developers responded to re-
viewers that identified code smells in their code. We conducted
this analysis in three steps: We first checked the developer’s
response to the reviewer in the discussion (Gerrit provides
a discussion platform for both reviewers and developers).
Second, we investigated the associated source code file(s) of
the patch before the review was conducted, and the changes
in the source code made after the review. Finally, if the
developers neither responded to reviewers nor modified source
code, we then checked the status (i.e., merged or abandoned)
of the corresponding code change.
We considered the identified code smells to be solved in
these three cases: 1) the original developer self-admitted a
refactoring (as part of the review discussion), 2) changes were
made in the source code file(s), and 3) the corresponding code
change was then abandoned.
For RQ3.3, based on the results of RQ3.1 and RQ3.2, we
categorized the relationship between the actions recommended
by reviewers and those taken by developers into the following
three categories:
1) A developer agreed with the reviewer’s recommenda-
tions.
2) A developer disagreed with the reviewer’s recommenda-
tions, or
3) A developer did not respond to the reviewer’s comments.
These three categories were then mapped into two actions:
1) fixed the smell (i.e., refactoring was done) or 2) ignored the
15 https://www.maxqda.com/
change (i.e., no changes were performed to the source code
with regard to the smell).
This process was conducted by the first author and the result
of each step was cross-validated by another author. Again, a
third author was involved in case of disagreement. In total,
the manual analysis process took around thirty days of full-
time work of the coders. We also provided a full replication
package containing all the data, scripts, and results from the
manual analysis online [39].
IV. RES ULT S
In this section, we present the results of our three RQs.
We note that, due to space constraints, detailed results of our
analysis are provided externally [39].
RQ1: Which code smells are the most frequently iden-
tified by code reviewers?
Figure 3 shows the distribution of code smells identified
in the code reviews obtained in Sections III-E and III-F. In
general, we identified 1,190 smell-related reviews. Compared
to the number of all the review comments we obtained, we
found that code smells are not commonly identified in code
reviews. In addition, of all the code smells we identified,
duplicated code is by far the most frequently identified smell
by name, with exactly 620 instances. The smells of bad
naming and dead code were also frequently identified, as they
were discussed in 304 and 221 code reviews, respectively.
There were 30 code reviews which identified long method,
while other smells such as circular dependency and swiss army
knife were discussed in only 4 code reviews. The rest of code
reviews (11) used general terms (e.g., code smell) to describe
the identified smells.
Fig. 3: Number of reviews for the identified code smells
RQ1: the most frequently identified smells in code reviews.
Code smells are not widely identified in code reviews. Of the
identified smells, duplicated code,bad naming, and dead code
are the most frequently identified smells in code reviews.
RQ2: What are the common causes for code smells that
are identified during code reviews?
For RQ2, we used Thematic Analysis to identify the com-
mon causes for the identified code smells as noted by code
reviewers or developers. We then identified five causes:
Violation of coding conventions: certain violations of
coding conventions (e.g. naming convention) cause the
smell. (Example: “moreThanOneIp (CamelCase) is not
our naming convention16).
Lack of familiarity with existing code: developers intro-
duced the smell due to unfamiliarity with the functionality
or structure of the existing code. (Example: “this useless
line because None will be returned by default17).
Unintentional mistakes of developers: the developer
forgets to fix the smell or introduces the smell by mistake.
(Example: “You can see I renamed all of the other test
methods and forgot about this one18).
Improper design: the smell is identified to be related to
improper design of the code. (Example: “...If that’s the
case something is smelly (too coupled)...19).
Detection by code analysis tools: the reviewer points
out that the smell was detected by code analysis tools.
(Example: “pass is considered as dead code by python
coverage tool20).
Fig. 4: Reasons for the identified smells
As demonstrated in Fig. 4, we found that the majority of
reviews (70%) did not provide any explanation for the iden-
tified smells - in most cases, the reviewer(s) simply pointed
out the problem, but did not provide any further reasoning
for their decisions. 276 (23%) of the reviews indicate that
violation of coding conventions is the main reason for the
smell. For example, a reviewer suggested that the developer
should adhere to the naming standard of ‘test [method under
test] [detail of what is being tested]’, as shown below:
Link: http://alturl.com/urn8k
Reviewer: “Please adhere to the naming standard
of ‘test [method under test] [detail of what is being
tested]’ to ensure that future maintainers will have
an easier time associating tests and the methods they
target.
In addition, 40 (3%) of the reviews indicate that the smells
were caused by developers’ lack of familiarity with existing
16 https://review.opendev.org/#/c/147739/
17 https://review.opendev.org/#/c/147042/
18 https://review.opendev.org/#/c/125384/
19 https://review.opendev.org/#/c/181674/
20 https://review.opendev.org/#/c/143709/
code. An example of such a case is shown below. In this case,
the reviewer pointed out that the exception handling should
be removed. It could imply that the developer was not aware
that the specific exception is not raised.
Link: http://alturl.com/ccjy3
Reviewer: “on block device.BlockDeviceDict.from api(),
exception.InvalidBDMVolumeNotBootable does not
raise. so it is necessary to remove the exception here.
Nineteen reviews attributed unintentional mistakes of devel-
opers (such as copy and paste) to be the cause of the smell,
similar to the example shown below:
Link: http://alturl.com/zwz2x
Reviewer: “I think you forgot to remove this.
Developer: “Darn, yes bad copy / paste. Will fix it.
Eighteen reviews indicate that improper design was the
cause for the identified smell. In the rest (6) of the reviews,
reviewers would note that the smell was detected by code
analysis tools. For example, a reviewer pointed out that the
code ‘pass’ would be regarded as dead code by coverage tool.
Link: http://alturl.com/azt42
Reviewer: “you can remove ‘pass’, it’s commonly
considered as dead code by coverage tool”
RQ2: common causes for smells as identified during code
reviews.
Taken overall, over half of the reviews did not provide an
explanation of the cause of the smells. In terms of the
formulated causes, violation of coding conventions is the main
cause for the smells as noted by reviewers and developers.
RQ3: How do reviewers and developers treat the iden-
tified code smells?
RQ3.1: What actions do reviewers suggest to deal with
the identified smells?
The results of this research question are shown in Table
III. In the majority of reviews (870, representing 73% of
the reviews), reviewers recommended fix for resolving the
identified code smells. These fixes include either general
directions (such as the name of a refactoring technique to
be used) or specific actions (points to specific changes to
the code base that could remove the smell). 303 (35%) of
these fixes provided example code snippets to help developers
better refactor the smells. Below is an example of a review that
suggested a fix recommendation. In this example, the reviewer
suggested removing duplicated code from a test case, and also
provided a working example of how to apply “extract method”
refactoring to define a new test method, so that it could be
referenced from multiple methods to remove code duplication.
Link: http://alturl.com/c3g69
Reviewer: “I think you can do function that remove
duplicated code, something like that following...
def c om pa r e ( se l f , e x p re a l ) :
for ex p , r e a l i n e x p r e a l :
s e l f . a s s e r t E q u a l ( e xp [ ’ c o un t ’ ] ,
r e a l . c o u n t )
s e l f . a s s e r t E q u a l ( e xp [ a l i as n am e
] , r e a l . a l i a s na m e )
s e l f . a s s e r t E q u a l ( e xp [ ’ s p e c ] ,
r e a l . s p e c )
272 reviews (23%) fell under the capture category. In those
reviews, the reviewers just pointed to the presence of the
smells, but did not provide any refactoring suggestions. In
a small number of reviews (48, 4%), reviewers suggested
ignoring the code smell found in the code review.
TABLE III: Actions recommended by reviewers to resolve
smells in the code
Reviewer’s recommendation Count
Fix (without recommending any specific implementation) 567
Fix (provided specific implementation) 303
Capture (just noted the smell) 272
Ignore (no side effects) 48
RQ3.2: What actions do developers take to resolve the
identified smells?
Table IV provides details of the number of reviews that
identified code smells versus the number of fixes of the
identified code smells. Of the 1,190 code smells identified
in the reviews, the majority (1,029, representing 86%) were
refactored by the developers after the review (i.e., changes
were made to the patch). The remainder did not result in any
changes in the code, indicating that the developers chose to
ignore such recommendations. This could be a case where
developers thought that those smells were not as harmful as
suggested by the reviewers, or that there were other issues
requiring more urgent attention, resulting in those smells being
counted as technical debt in the code.
As per the results of RQ1, duplicated code,bad naming,
and dead code were the most frequently identified smells
by reviewers. Those smells were also widely resolved by
developers. Over 508 (82%) duplicated code, 276 (91%) bad
naming, and 210 (95%) dead code smell instances were
refactored by developers after they were identified in the
reviews. The proportion of other smells being fixed was nearly
78% (35/45). However, the sample size for these smells (35
instances) is still too small to make any generalisations.
Below is an example of a review with a recommendation
by the reviewer to remove dead code in Line 132 of the
original file (i.e., remove the pass statement); the developer
then agreed to the reviewer’s recommendation and deleted the
unused code. Fig. 5 shows the code before review (5a) and
after the action taken by the developer (5b).
Link: http://alturl.com/szswu
Reviewer: “you can remove ‘pass’, it’s commonly
considered as dead code by coverage tool”
Developer: “Done”
TABLE IV: Developers’ actions to code smells identified
during reviews
Code smell #Reviews #Fixed by developers % of fixes
Duplicated Code 620 508 82%
Bad Naming 304 276 91%
Dead Code 221 210 95%
Long Method 30 25 83%
Circular Dependency 3 2 67%
Swiss Army Knife 1 1 100%
General Smell 11 7 64%
Total 1190 1029 86%
RQ3.3: What is the relationship between the actions
suggested by reviewers and those taken by developers?
For answering this RQ, a map of reviewer recommendations
and resulting developer actions is shown in Fig. 6. In 775
(65%) of the obtained reviews, developers agreed with the
reviewers’ suggestions and took exactly the same actions
(either fix or ignore) as suggested by reviewers. Of those cases,
there are 20 cases where developers agreed with reviewers on
ignoring the smell (i.e., a smell has been identified, but the
reviewer may think that the impact of the smell is minor). The
example below shows a case where a reviewer pointed out that
they could accept duplicated code if there was a reasonable
justification and the developer gave their explanation and
ignored the smell.
Link: http://alturl.com/s59so
Reviewer: “...I just don’t like duplicated code but if
there is a reasonable justification for this I can be sold
cheaply and easily.
Developer: “we need create_vm here to support a
lot of the other testing in this method. I agree it’s
duplicate code, but it’s needed here too and this one is
more complex that (sic) the test_config one....
In 274 (23%) reviews, even when developers did not
respond to reviewers directly in the review system, they
still made the required changes to the source code files.
We noted another 66 (5%) reviews where developers had
different opinions from reviewers and decided to ignore the
recommendations to refactor the code and remove the smell. In
those cases, the developers themselves decided that the smells
were either not as critical as perceived by the reviewers, or
there are time or project constraints preventing them from
implementing the changes, which is typically self-admitted
technical debt [43]. An example review is shown below:
Link: http://alturl.com/pzmzz
Reviewer: “This method has a lot duplicated code
of ‘ apply instance name template’. The differ in the
use of ‘index’ and the CONF parameters. With a bit
refactoring only one method would be necessary I
guess.”
Developer: “I thought to make / leave this
separate in case one wants to configure the
multi instance name template different to that of sin-
gle instance.”
Similarly, there were also 75 (6%) reviews in which
developers neither replied to reviewers nor modified the
source code. For those cases, we assume that developers did
not find the recommendations regarding how to deal with
the specific smells in the code helpful, and therefore decided
not to perform any changes. In all of those cases, no further
explanation/reasons were provided by the developers on why
they ignored these recommended changes.
RQ3: reviewers’ recommendations and developers’ ac-
tions.
In most reviews, reviewers provided fixing (refactoring) rec-
ommendations (e.g., in the form of code snippets) to help
developers remove the identified smells. Developers generally
followed those recommendations and performed the suggested
refactoring operations, which then appeared in the patches
committed after the review.
V. DISCUSSION
A. RQ1: The most frequently identified smells
In general, code smells are not commonly identified during
code reviews. The results of RQ1 imply that duplicated code,
bad naming, and dead code were, by far, the most frequently
identified code smells in code reviews. The results regarding
duplicated code are in line with previous findings which
indicate that the smell is also frequently discussed among
developers in online forums [9], and is also the smell that
developers are most concerned about [8]. However, dead code
and bad naming were not found to be ranked highly in
previous studies [8].
The different results are due to the different context and
domain, which are critical in identifying smells, as shown by
previous studies [8], [9]. The results reported in these two
previous studies [8], [9] are based on more generic investi-
gation of code smells among online Q&A forums’ users and
developers. The context of some of these code smells was not
fully taken into account, even if the developers may provide
some specific scenario to explain their views. In contrast,
our study is project-centric, and the context of the identified
code smells during code reviews is known to reviewers and
developers involved in identification and removal of the smells.
B. RQ2: The causes for identified smells
We identified five types of common causes for code smells
in code reviews (RQ2). Among these, violation of coding
(a) method before review. (b) after change made by the developer.
Fig. 5: An example of a remove dead code operation after review (the change is highlighted in Line 132 (a))
Fig. 6: A treemap of the relationship between developers’
actions in response to reviewers’ recommendations regarding
code smells identified in the code
conventions is the major cause of code smells identified in
reviews. Coding conventions are important in reducing the
cost of software maintenance while the existence of smells
can increase this cost. We conjecture that this is because that
developers may not be familiar with the coding conventions
of their community and the system they implemented. For
example, duplicated code and dead code may occur because
developers are not aware of existing functionality, while bad
naming may occur because developers are not familiar with
the naming conventions. This reason can imply that developers
can inadvertently violate coding conventions in their company
or community, leading to smells or other problems. This may
have a negative impact on software quality.
Another main observations is that more than half of review-
ers (in review comments where they indicated that there was a
code smell) simply pointed out the smell in the code, but did
not provide any further explanation of why they considered
that as a smell. One explanation for this is that the identified
smells were simple or self-explanatory (e.g., duplicated code,
dead code). Therefore, it is not expected that reviewers need
to provide further explanation for these smells. Although the
point of code review is to identify shortcomings (e.g., code
smells) in the contributed code, understanding the causes of
code smells can help practitioners understand how the code
smell is introduced, and then take corresponding measures.
C. RQ3: The relationship between what reviewers suggest and
the actions taken by developers
The results of RQ3 show that reviewers usually provide use-
ful recommendations (sometimes in the form of code snippets)
when they identify smells in the code and developers usually
follow these suggestions. Given the constructive nature of most
reviews, developers tend to agree with the review-based smell
detection mechanism (i.e., where a reviewer detects and reports
a smell) and in most cases they perform the recommended
actions (i.e., refactoring their code) to remove the smell. We
believe that this is because reviewers can take the contextual
information into full account as the program context and
domain are important in identifying smells [8], [9], [44].
Although not as frequent, there are cases where changes
recommended by reviewers were ignored (see Figure 6). This
situation is partially due to the different understanding of
reviewers and developers about the severity of identified code
smells, i.e., when a reviewer identifies a code smell to be
resolved, a developer may not agree that this code smell must
be fixed, such as technical debt [45].
D. Implications
First, although we built the initial set of keywords with 5
general code smell terms and 40 specific code smell terms,
most of the smells were not identified in code reviews, such
as long parameter list,temporary field, and lazy class. One
potential reason is that code smells which are considered
as problematic in academic research may not be considered
as a pressing problem in industry. More research should be
conducted with practitioners to explore existing code smells
and to understand the driving force behind industry efforts on
code smell detection and elimination. This will further help
guide the design of next-generation code smell detection tools.
Second, violation of coding conventions is the main cause
of code smells identified in code reviews. It implies that
developers’ lack of familiarity with the coding conventions in
their company or organization can have a significantly negative
impact on software quality. To reduce code smells, project
leaders not only need to adopt code analysis tools, but also
need to help and educate their developers to become familiar
with the coding conventions adopted in the system.
Third, in smell-related reviews, reviewers usually give use-
ful suggestions to help developers better fix the identified
code smells and developers generally tend to accept those
suggestions. It implies that review-based detection of smells
is seen as a trustworthy mechanism by developers. In general,
code reviews are useful for finding defects and locating code
smells. Although code analysis tools (both static analyzers
and dynamic (coverage-based) tools) are able to find some of
those smells, their large outputs restrict their usefulness. Most
tools are context and domain-insensitive, making their results
less useful due to the potential false positives [10]. Context
seems to matter in deciding whether a smell is bad or not [9],
[11]. There have been some recent attempts to develop smell-
detection tools that take developers-context into account [44],
[46]. Still, other contextual factors such as project structure
and developer experience are much harder to capture with
tools. Code reviewers are much better positioned to understand
and account for those contextual factors (as they are involved
in the project) and therefore their assessment of smells might
be trusted more by developers than automated detection tools.
To increase reliability, it may be that we need a two-step
detection mechanism; static analysis tools to identify smells
(as they are faster than human assessment and also scalable)
and then for reviewers to go through those smell instances.
They should decide, based on the additional contextual factors,
which of those smells should be removed and at what cost.
The problem with such an approach is that most tools would
probably produce large sets of outputs, making it impractical
for reviewers working on a large code base.
VI. TH RE ATS TO VALIDITY
External Validity: our study considered two major projects
from the OpenStack community (Nova and Neutron), since
those projects have invested a significant effort in their code
review process (see Section III-B). Due to our sample size, our
study may not be generalizable to other systems. However, we
believe that our findings could help researchers and developers
understand the importance of the manual detection of code
smells better. Including code review discussions from other
communities will supplement our findings, and this may lead
to more general conclusions.
Internal Validity: the main threat to internal validity is
related to the quality of the selected projects. It is possible that
the projects we included do not provide a good representation
of the types of code smells we included in our study. While
we only selected two projects from the OpenStack community
with Gerrit as their code review tool, OpenStack investment
in code review processes and commitment to perform code
review to their entire code base and following coding best
practices make it a good candidate for our analysis.
Construct Validity: a large part of the study depends on
manual analysis of the data, which could affect the construct
validity due to personal oversight and bias. In order to reduce
its impact, each step in the manual analysis (i.e., identifying
smell-related reviews and classification) was conducted by at
least two authors, and results were always cross-validated.
The selection of the keywords used to identify the reviews
which contain smell discussions is another threat to construct
validity since reviewers and developers may use terms other
than those that we used in our mining query. To minimize
the impact of this threat, we first combined a list of code
smell terms that developers and researchers frequently used,
as reported in several previous studies. Then, we identified the
keywords by following the systematic approach used by Bosu
et al. [31] to minimize the impact of missing keywords due
to misspelling or other textual issues. Moreover, we randomly
selected a collection of review comments that did not contain
any of our keywords to supplement our approach, reducing
the threat to the construct validity.
Reliability: before starting our full scale study, we con-
ducted a pilot run to check the suitability of the data source.
Besides, the execution of all the steps in our study, including
the mining process, data filtering, and manual analysis, was
discussed and confirmed by at least two of the authors.
VII. CONCLUSIONS
Code review is a common software quality assurance prac-
tice. One of the issues that may impact software quality is
the presence of code smells. Yet, little is known about the
extent to which code smells are identified and resolved during
code reviews. To this end, we performed an empirical study
of code smell discussions in code reviews by collecting and
analyzing code review comments from the two most active
OpenStack projects (Nova and Neutron). Our results show that:
1) code smells are not commonly identified in code reviews,
and when identified, duplicated code,bad naming, and dead
code are, by far, the most frequently identified smells; 2)
violation of coding conventions is the most common cause for
smells as identified during code reviews; 3) when smells are
identified, most reviewers provide recommendations to help
developers fix the code and remove the smells (via specific
refactoring operations or through an example code snippet);
and 4) developers mostly agree with reviewers and remove the
identified smells through the suggested refactoring operations.
Our results suggest that: 1) developers should follow the
coding conventions in their projects to reduce code smell
incidents; and 2) code smell detection via code reviews is
seen as a trustworthy approach by developers (given their con-
structive nature) and smell-removal recommendations made by
reviewers appear more actionable by developers. We found that
the majority of smell-related recommendations were accepted
by developers. We believe this is mainly due to the context-
sensitivity of the reviewer-centric smell detection approach.
We plan to extend this work by studying code reviews in
a larger set of projects from different communities. We also
plan to explore, in more detail, the refactoring actions taken
by developers when removing certain smells, and the reasons
(e.g., trade-off in managing technical debt [45]) why develop-
ers disagreed with reviewers’ recommendations or ignored the
recommended changes by e.g., code smell discussions [47].
REFERENCES
[1] M. Fowler, Refactoring: Improving the Design of Existing Code, 2nd ed.
Addison-Wesley Professional, 2018.
[2] F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R. Oliveto, and
A. De Lucia, “On the diffuseness and the impact on maintainability
of code smells: A large scale empirical investigation,” in Proceedings
of the 40th International Conference on Software Engineering (ICSE).
ACM, 2018, pp. 1188–1221.
[3] M. Abbes, F. Khomh, Y.-G. Gueheneuc, and G. Antoniol, “An empirical
study of the impact of two antipatterns blob and spaghetti code on pro-
gram comprehension,” in Proceedings of the 15th European Conference
on Software Maintenance and Reengineering (CSMR). IEEE, 2011,
pp. 181–190.
[4] A. Tahir, S. Counsell, and S. G. MacDonell, “An empirical study into the
relationship between class features and test smells,” in Proceedings of the
23rd Asia-Pacific Software Engineering Conference (APSEC). IEEE,
2016, pp. 137–144.
[5] F. Khomh, M. Di Penta, and Y.-G. Gueheneuc, “An exploratory study of
the impact of code smells on software change-proneness,” in Proceed-
ings of the 16th Working Conference on Reverse Engineering (WCRE).
IEEE, 2009, pp. 75–84.
[6] N. Tsantalis and A. Chatzigeorgiou, “Identification of move method
refactoring opportunities,” IEEE Transactions on Software Engineering,
vol. 35, no. 3, pp. 347–367, 2009.
[7] N. Moha, Y.-G. Gueheneuc, L. Duchien, and A.-F. Le Meur, “Decor: A
method for the specification and detection of code and design smells,”
IEEE Transactions on Software Engineering, vol. 36, no. 1, pp. 20–36,
2009.
[8] A. Yamashita and L. Moonen, “Do developers care about code smells?
an exploratory survey,” in Proceedings of the 20th Working Conference
on Reverse Engineering (WCRE). IEEE, 2013, pp. 242–251.
[9] A. Tahir, J. Dietrich, S. Counsell, S. Licorish, and A. Yamashita, “A large
scale study on how developers discuss code smells and anti-pattern in
stack exchange sites,” Information and Software Technology, vol. 125,
2020.
[10] F. A. Fontana, J. Dietrich, B. Walter, A. Yamashita, and M. Zanoni,
“Anti-pattern and code smell false positives: Preliminary conceptualisa-
tion and classification,” in Proceedings of the 23rd International Con-
ference on Software Analysis, Evolution, and Reengineering (SANER).
IEEE, 2016, pp. 609–613.
[11] T. Sharma and D. Spinellis, “A survey on software smells,Journal of
Systems and Software, vol. 138, pp. 158–173, 2018.
[12] R. A. Baker Jr, “Code reviews enhance software quality,” in Proceedings
of the 19th International Conference on Software Engineering (ICSE).
ACM, 1997, pp. 570–571.
[13] S. McIntosh, Y. Kamei, B. Adams, and A. E. Hassan, “An empirical
study of the impact of modern code review practices on software
quality,Empirical Software Engineering, vol. 21, no. 5, pp. 2146–2189,
2016.
[14] R. Morales, S. McIntosh, and F. Khomh, “Do code review practices
impact design quality? a case study of the Qt, VTK, and ITK projects,”
in Proceedings of the 22nd IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER). IEEE, 2015, pp.
171–180.
[15] A. Meneely, A. C. R. Tejeda, B. Spates, S. Trudeau, D. Neuberger,
K. Whitlock, C. Ketant, and K. Davis, “An empirical investigation
of socio-technical code review metrics and security vulnerabilities,
in Proceedings of the 6th International Workshop on Social Software
Engineering (SSE). ACM, 2014, pp. 37–44.
[16] S. McConnell, Code Complete. Pearson Education, 2004.
[17] T. Hall, M. Zhang, D. Bowes, and Y. Sun, “Some code smells have a
significant but small effect on faults,ACM Transactions on Software
Engineering and Methodology, vol. 23, no. 4, pp. 1–39, 2014.
[18] D. I. Sjøberg, A. Yamashita, B. C. Anda, A. Mockus, and T. Dyb˚
a,
“Quantifying the effect of code smells on maintenance effort,IEEE
Transactions on Software Engineering, vol. 39, no. 8, pp. 1144–1156,
2013.
[19] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, D. Poshyvanyk, and
A. De Lucia, “Mining version histories for detecting code smells,” IEEE
Transactions on Software Engineering, vol. 41, no. 5, pp. 462–489, 2015.
[20] Z. Soh, A. Yamashita, F. Khomh, and Y.-G. Gu´
eh´
eneuc, “Do code smells
impact the effort of different maintenance programming activities?” in
Proceedings of the 23rd International Conference on Software Analysis,
Evolution, and Reengineering (SANER). IEEE, 2016, pp. 393–402.
[21] M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia,
and D. Poshyvanyk, “When and why your code starts to smell bad,” in
Proceedings of the IEEE/ACM 37th IEEE International Conference on
Software Engineering (ICSE), vol. 1. IEEE, 2015, pp. 403–414.
[22] J. Garcia, D. Popescu, G. Edwards, and N. Medvidovic, “Identifying ar-
chitectural bad smells,” in Proceedings of the 13th European Conference
on Software Maintenance and Reengineering (CSMR). IEEE, 2009, pp.
255–258.
[23] A. Martini, F. A. Fontana, A. Biaggi, and R. Roveda, “Identifying and
prioritizing architectural debt through architectural smells: A case study
in a large software company,” in Proceedings of the 12th European
Conference on Software Architecture (ECSA). Springer, 2018, pp. 320–
335.
[24] G. Bavota, A. Qusef, R. Oliveto, A. De Lucia, and D. Binkley, “Are
test smells really harmful? an empirical study,Empirical Software
Engineering, vol. 20, no. 4, pp. 1052–1094, 2015.
[25] W. Dou, S.-C. Cheung, and J. Wei, “Is spreadsheet ambiguity harmful?
detecting and repairing spreadsheet smells due to ambiguous computa-
tion,” in Proceedings of the 36th International Conference on Software
Engineering (ICSE). ACM, 2014, pp. 848–858.
[26] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, and A. De Lucia, “Do
they really smell bad? a study on developers’ perception of bad code
smells,” in Proceedings of the 30th International Conference on Software
Maintenance and Evolution (ICSME). IEEE, 2014, pp. 101–110.
[27] D. Taibi, A. Janes, and V. Lenarduzzi, “How developers perceive
smells in source code: A replicated study,Information and Software
Technology, vol. 92, pp. 223–235, 2017.
[28] S. McIntosh, Y. Kamei, B. Adams, and A. E. Hassan, “The impact of
code review coverage and code review participation on software quality:
A case study of the Qt, VTK, and ITK projects,” in Proceedings of
the 11th Working Conference on Mining Software Repositories (MSR).
ACM, 2014, p. 192–201.
[29] C. F. Kemerer and M. C. Paulk, “The impact of design and code
reviews on software quality: An empirical study based on psp data,
IEEE Transactions on Software Engineering, vol. 35, no. 4, pp. 534–
550, 2009.
[30] O. Kononenko, O. Baysal, L. Guerrouj, Y. Cao, and M. W. Godfrey,
“Investigating code review quality: Do people and participation matter?”
in 2015 IEEE International Conference on Software Maintenance and
Evolution (ICSME), 2015, pp. 111–120.
[31] A. Bosu, J. C. Carver, M. Hafiz, P. Hilley, and D. Janni, “Identifying
the characteristics of vulnerable code changes: An empirical study,” in
Proceedings of the 22nd ACM SIGSOFT International Symposium on
Foundations of Software Engineering (FSE). ACM, 2014, p. 257–268.
[32] F. E. Zanaty, T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto, “An
empirical study of design discussions in code review,” in Proceedings
of the 12th ACM/IEEE International Symposium on Empirical Software
Engineering and Measurement (ESEM). ACM, 2018, pp. 1–10.
[33] A. Nanthaamornphong and A. Chaisutanon, “Empirical evaluation of
code smells in open source projects: preliminary results,” in Proceedings
of the 1st International Workshop on Software Refactoring (IWoR).
ACM, 2016, pp. 5–8.
[34] L. Pascarella, D. Spadini, F. Palomba, and A. Bacchelli, “On the
effect of code review on code smells,” in Proceedings of the 27th
IEEE International Conference on Software Analysis, Evolution and
Reengineering (SANER). IEEE, 2020.
[35] A. Tahir, A. Yamashita, S. Licorish, J. Dietrich, and S. Counsell, “Can
you tell me if it smells? a study on how developers discuss code
smells and anti-patterns in stack overflow,” in Proceedings of the 22nd
International Conference on Evaluation and Assessment in Software
Engineering (EASE). ACM, 2018, pp. 68–78.
[36] M. Zhang, T. Hall, and N. Baddoo, “Code bad smells: a review of current
knowledge,Journal of Software Maintenance and Evolution: research
and practice, vol. 23, no. 3, pp. 179–202, 2011.
[37] M. F. Porter, “Snowball: A language for stemming algorithms,” Open
Source Initiative Osi, 2001.
[38] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining.
Pearson Education India, 2016.
[39] X. Han, A. Tahir, P. Liang, S. Counsell, and Y. Luo, “Replication
package for the paper understanding code smell detection via code
review: A study of the openstack community,” Jan. 2021. [Online].
Available: https://doi.org/10.5281/zenodo.4468035
[40] J. Cohen, “A coefficient of agreement for nominal scales,” Educational
and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960.
[41] G. D. Israel, “Determining sample size,” Florida Cooperative Extension
Service, Institute of Food and Agricultural Sciences, University of
Florida, Florida, U.S.A, Fact Sheet PEOD-6, November 1992.
[42] V. Braun and V. Clarke, “Using thematic analysis in psychology,”
Qualitative Research in Psychology, vol. 3, no. 2, pp. 77–101, 2006.
[43] A. Potdar and E. Shihab, “An exploratory study on self-admitted
technical debt,” in 2014 IEEE International Conference on Software
Maintenance and Evolution. IEEE, 2014, pp. 91–100.
[44] N. Sae-Lim, S. Hayashi, and M. Saeki, “Context-based approach to
prioritize code smells for prefactoring,” Journal of Software: Evolution
and Process, vol. 30, no. 6, pp. 1–24, 2018.
[45] Z. Li, P. Avgeriou, and P. Liang, “A systematic mapping study on
technical debt and its management,” Journal of Systems and Software,
vol. 101, pp. 193–220, 2015.
[46] F. Pecorelli, F. Palomba, F. Khomh, and A. De Lucia, “Developer-
driven code smell prioritization,” in Proceedings of the 17th Working
Conference on Mining Software Repositories (MSR). ACM, 2020, pp.
220–231.
[47] S. Shcherban, P. Liang, A. Tahir, and X. Li, “Automatic identification of
code smell discussions on stack overflow: A preliminary investigation,”
in Proceedings of the 14th ACM/IEEE International Symposium on
Empirical Software Engineering and Measurement (ESEM). ACM,
2020, pp. 1–6.
... A code smell is an indicator that characterizes bad practices, design problems, or potential bugs in code [1], which may lead to code violations or vulnerability, and thus increase the possibility of project failure. Studying code smells is important, but little research compares their distribution and evolution between different programming languages [2]. ...
... Han et al. [48] have studied the relationship between the identification of code smells and the actions taken by software developers. The authors have conducted an experiment on 2 OpenStack projects and collected over 1000 reviews of code smells out of over 19000 reviews, which have been done by, perhaps, software quality engineers or software professionals who are interested in software quality. ...
Article
Full-text available
Software code smells are regarded as signs of potential problems in code design or implementation, they might not affect functionality but impact future maintenance and readability. However, refactoring is a process of code transformation that improves code structure without changing code behavior. This process occurs in the stage of software maintenance where software developers and software quality managers/engineers must work closely in order to identify code smells and plan for refactoring. This research aims to investigate the role of quality managers and software developers in creating code smells.
... For instance, the systems of the Qualita Corpus dataset were obtained in 2013, while the GitHub ones have been currently maintained and evolved. This difference in the time span ranges different open source development techniques and quality criteria, such as refactorings, code styles, tool support, code review, gate keeping and testing (Han et al., 2021;Tsay et al., 2014). For example, GitHub allows developers and bots to recommend changes and refactorings to pull requests, increasing the code quality and code style guidelines (Alizadeh et al., 2019;Brown & Parnin, 2020;Digkas et al., 2022;Falessi & Kazman, 2021;Silva et al., 2016;Soares et al., 2020). ...
Article
Full-text available
Code smell is a symptom of decisions about the system design or code that may degrade its modularity. For example, they may indicate inheritance misuse, excessive coupling and size. When two or more code smells occur in the same snippet of code, they form a code smell agglomeration. Few studies evaluate how agglomerations may impact code modularity. In this work, we evaluate which aspects of modularity are being hindered by agglomerations. This way, we can support practitioners in improving their code, by refactoring the code involved with code smell agglomeration that was found as harmful to the system modularity. We analyze agglomerations composed of four types of code smells: Large Class, Long Method, Feature Envy, and Refused Bequest. We then conduct a comparison study between 20 systems mined from the Qualita Corpus dataset with 10 systems mined from GitHub. In total, we analyzed 1789 agglomerations in 30 software projects, from both repositories: Qualita Corpus and GitHub. We rely on frequent itemset mining and non-parametric hypothesis testing for our analysis. Agglomerations formed by two or more Feature Envy smells have a significant frequency in the source code for both repositories. Agglomerations formed by different smell types impact the modularity more than classes with only one smell type and classes without smells. For some metrics, when Large Class appears alone, it has a significant and large impact when compared to classes that have two or more method-level smells of the same type. We have identified which agglomerations are more frequent in the source code, and how they may impact the code modularity. Consequently, we provide supporting evidence of which agglomerations developers should refactor to improve the code modularity.
... In real software projects, developers typically pose verification questions to reduce ambiguity during the code review and improve the quality of submitted code [15,18]. Additionally, raising questions about unexpected or incorrect aspects of students' solutions helps them to think around the incorrect part of their solutions and fill the gaps [32]. ...
Preprint
LLM-based assistants, such as GitHub Copilot and ChatGPT, have the potential to generate code that fulfills a programming task described in a natural language description, referred to as a prompt. The widespread accessibility of these assistants enables users with diverse backgrounds to generate code and integrate it into software projects. However, studies show that code generated by LLMs is prone to bugs and may miss various corner cases in task specifications. Presenting such buggy code to users can impact their reliability and trust in LLM-based assistants. Moreover, significant efforts are required by the user to detect and repair any bug present in the code, especially if no test cases are available. In this study, we propose a self-refinement method aimed at improving the reliability of code generated by LLMs by minimizing the number of bugs before execution, without human intervention, and in the absence of test cases. Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code. These VQs target various nodes within the Abstract Syntax Tree (AST) of the initial code, which have the potential to trigger specific types of bug patterns commonly found in LLM-generated code. Finally, our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code. Our evaluation, based on programming tasks in the CoderEval dataset, demonstrates that our proposed method outperforms state-of-the-art methods by decreasing the number of targeted errors in the code between 21% to 62% and improving the number of executable code instances to 13%.
... A study by Kononenko et al. (2016) reported that security concerns are one of the aspects that developers consider before making changes. In particular, Han et al. (2021) investigated code-smell-related comments (e.g., violation of coding conventions) in code reviews and reported that 6% of comments were ignored by developers. Similarly, Beller et al. (2014) found that 7%-35% of the code review comments were neglected by the developers in general. ...
Article
Full-text available
Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.
... Their results show that 28˜48% of SATD comments are introduced during MCRs. Han et al. (2021) conducted an empirical study to investigate the concept behind code smells identified in code reviews and what actions reviewers suggest and developers take in response to the identified smells, and they found that the majority of smell-related suggestions were accepted by developers. ...
Article
Full-text available
Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which reviewers and developers would communicate with each other in review comments to exchange necessary information. As a result, understanding the information in review comments is a prerequisite for reviewers and developers to conduct an effective code review. Code snippet, as a special form of code, can be used to convey necessary information in code reviews. For example, reviewers can use code snippets to make suggestions or elaborate their ideas to meet developers’ information needs in code reviews. However, little research has focused on the practices of providing code snippets in code reviews. To bridge this gap, we conduct a mixed-methods study to mine information and knowledge related to code snippets in code reviews, which can help practitioners and researchers get a better understanding about using code snippets in code review. Specifically, our study includes two phases: mining code review data and conducting practitioners’ survey. In Phase 1, we conducted an exploratory study to mine code review data from two popular developer communities (i.e., OpenStack and Qt). We manually labelled 69,604 review comments and finally identified 3,213 review comments that contain code snippets. Based on the code review data collected, we analyzed the extent of using code snippets, the reviewers’ purposes of providing code snippets, the developers’ acceptance of code snippet suggestions, and the reasons that developers do not accept code snippet suggestions in code reviews. In Phase 2, we used an online questionnaire to survey practitioners from industry. By analyzing the 63 valid responses we received, we explored the scenarios reviewers provide code snippets, the developers’ attitudes towards code snippets, and the characteristics of code snippets developers expect reviewers to provide in code reviews. Our results show that: (1) code snippets are not frequently used in code reviews, and most of the code snippets are provided by reviewers rather than developers; (2) the purposes of reviewers providing code snippets in code reviews are Suggestion and Citation, in which Suggestion is the main purpose; (3) most developers would accept reviewers’ code snippet suggestions; (4) the most common reasons that developers do not accept reviewers’ code snippet suggestions in code reviews are difference in the opinions between developers and reviewers and reviewer’s suggestion is flawed; (5) reviewers often provide code snippets in code reviews when code is more illustrate than words; (6) most developers hold positive attitudes towards code snippet comments; and (7) most developers expect that code snippets in review comments are understandable and fitting into existing code. The study results highlight that reviewers can provide code snippets in appropriate scenarios to meet developers’ specific information needs in code reviews, which will facilitate and accelerate the code review process.
Conference Paper
Full-text available
Background: Code smells indicate potential design or implementation problems that may have a negative impact on programs. Similar to other software artefacts, developers use Stack Overflow (SO) to ask questions about code smells. However, given the high number of questions asked on the platform, and the limitations of the default tagging system, it takes significant effort to extract knowledge about code smells by means of manual approaches. Aim: We utilized supervised machine learning techniques to automatically identify code-smell discussions from SO posts. Method: We conducted an experiment using a manually labeled dataset that contains 3000 code-smell and 3000 non-code-smell posts to evaluate the performance of different classifiers when automatically identifying code smell discussions. Results: Our results show that Logistic Regression (LR) with parameter C=20 (inverse of regularization strength) and Bag of Words (BoW) feature extraction technique achieved the best performance amongst the algorithms we evaluated with a precision of 0.978, a recall of 0.965, and an F1-score of 0.971. Conclusion: Our results show that machine learning approach can effectively locate code-smell posts even if posts' title and/or tags cannot be of help. The technique can be used to extract code smell discussions from other textual artefacts (e.g., code reviews), and promisingly to extract SO discussions of other topics.
Conference Paper
Full-text available
This paper investigates how developers discuss code smells and anti-patterns over Stack Overflow to understand better their perceptions and understanding of these two concepts. Understanding developers' perceptions of these issues are important in order to inform and align future research efforts and direct tools vendors in the area of code smells and anti-patterns. In addition, such insights could lead the creation of solutions to code smells and anti-patterns that are better fit to the realities developers face in practice. We applied both quantitative and qualitative techniques to analyse discussions containing terms associated with code smells and anti-patterns. Our findings show that developers widely use Stack Overflow to ask for general assessments of code smells or anti-patterns, instead of asking for particular refactoring solutions. An interesting finding is that developers very often ask their peers 'to smell their code' (i.e., ask whether their own code 'smells' or not), and thus, utilize Stack Overflow as an informal, crowd-based code smell/anti-pattern detector. We conjecture that the crowd-based detection approach considers contextual factors, and thus, tends to be more trusted by developers over automated detection tools. We also found that developers often discuss the downsides of implementing specific design patterns, and 'flag' them as potential anti-patterns to be avoided. Conversely, we found discussions on why some anti-patterns previously considered harmful should not be flagged as anti-patterns. Our results suggest that there is a need for: 1) more context-based evaluations of code smells and anti-patterns, and 2) better guidelines for making trade-offs when applying design patterns or eliminating smells/anti-patterns in industry.
Article
Full-text available
Context Smells in software systems impair software quality and make them hard to maintain and evolve. The software engineering community has explored various dimensions concerning smells and produced extensive research related to smells. The plethora of information poses challenges to the community to comprehend the state-of-the-art tools and techniques. Objective We aim to present the current knowledge related to software smells and identify challenges as well as opportunities in the current practices. Method We explore the definitions of smells, their causes as well as effects, and their detection mechanisms presented in the current literature. We studied 445 primary studies in detail, synthesized the information, and documented our observations. Results The study reveals five possible defining characteristics of smells — indicator, poor solution, violates best-practices, impacts quality, and recurrence. We curate ten common factors that cause smells to occur including lack of skill or awareness and priority to features over quality. We classify existing smell detection methods into five groups — metrics, rules/heuristics, history, machine learning, and optimization-based detection. Challenges in the smells detection include the tools’ proneness to false-positives and poor coverage of smells detectable by existing tools.
Article
Full-text available
Existing techniques for detecting code smells (indicators of source code problems) do not consider the current context, which renders them unsuitable for developers who have a specific context, such as modules within their focus. Consequently, the developers must spend time identifying relevant smells. We propose a technique to prioritize code smells using the developers' context. Explicit data of the context are obtained using a list of issues extracted from an issue tracking system. We applied impact analysis to the list of issues and used the results to specify the context-relevant smells. Results show that our approach can provide developers with a list of prioritized code smells related to their current context. We conducted several empirical studies to investigate the characteristics of our technique and factors that might affect the ranking quality. Additionally, we conducted a controlled experiment with professional developers to evaluate our technique. The results demonstrate the effectiveness of our technique.
Article
Full-text available
Context. In recent years, smells, also referred to as bad smells, have gained popularity among developers. However, it is still not clear how harmful they are perceived from the developers’ point of view. Many developers talk about them, but only few know what they really are, and even fewer really take care of them in their source code. Objective. The goal of this work is to understand the perceived criticality of code smells both in theory, when reading their description, and in practice. Method. We executed an empirical study as a differentiated external replica- tion of two previous studies. The studies were conducted as surveys involving only highly experienced developers (63 in the first study and 41 in the second one). First the perceived criticality was analyzed by proposing the description of the smells, then different pieces of code infected by the smells were proposed, and finally their ability to identify the smells in the analyzed code was tested. Results. According to our knowledge, this is the largest study so far investi- gating the perception of code smells with professional software developers. The results show that developers are very concerned about code smells in theory, nearly always considering them as harmful or very harmful (17 out of 23 smells). However, when they were asked to analyze an infected piece of code, only few infected classes were considered harmful and even fewer were considered harmful because of the smell. Conclusions. The results confirm our initial hypotheses that code smells are perceived as more critical in theory but not as critical in practice. Keywords: Software Maintenance, Code Smells, Bad Smells, antipatterns, Refactoring
Article
Full-text available
Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the effort devoted by the research community in studying code smells, the extent to which code smells in software systems affect software maintainability remains still unclear. In this paper we present a large scale empirical investigation on the diffuseness of code smells and their impact on code change- and fault-proneness. The study was conducted across a total of 395 releases of 30 open source projects and considering 17,350 manually validated instances of 13 different code smell kinds. The results show that smells characterized by long and/or complex code (e.g., Complex Class) are highly diffused, and that smelly classes have a higher change- and fault-proneness than smell-free classes.
Article
Context:In this paper, we investigate how developers discuss code smells and anti-patterns across three technical Stack Exchange sites. Understanding developers perceptions of these issues is important to inform and align future research efforts and direct tools vendors to design tailored tools that best suit developers. Method: we mined three Stack Exchange sites and used quantitative and qualitative methods to analyse more than 4,000 posts that discuss code smells and anti-patterns. Results:results showed that developers often asked their peers to smell their code, thus utilising those sites as an informal, crowd-based code smell/anti-pattern detector. The majority of questions (556) asked were focused on smells like Duplicated Code, Spaghetti Code, God and Data Classes. In terms of languages, most of discussions centred around popular languages such as C# (772 posts), JavaScript (720) and Java (699), however greater support is available for Java compared to other languages (especially modern languages such as Swift and Kotlin). We also found that developers often discuss the downsides of implementing specific design patterns and ‘flag’ them as potential anti-patterns to be avoided. Some well-defined smells and anti-patterns are discussed as potentially being acceptable practice in certain scenarios. In general, developers actively seek to consider trade-offs to decide whether to use a design pattern, an anti-pattern or not. Conclusion:our results suggest that there is a need for: 1) more context and domain sensitive evaluations of code smells and anti-patterns, 2) better guidelines for making trade-offs when applying design patterns or eliminating smells/anti-patterns in industry, and 3) a unified, constantly updated, catalog of smells and anti-patterns. We conjecture that the crowd-based detection approach considers contextual factors and thus tend to be more trusted by developers than automated detection tools.
Conference Paper
Code smells were defined as symptoms of poor design choices applied by programmers during the development of a software project [2]. They might hinder the comprehensibility and maintainability of software systems [5]. Similarly to some previous work [3, 4, 6, 7] in this paper we investigate the relationship between the presence of code smells and the software change- and fault-proneness. Specifically, while previous work shows a significant correlation between smells and code change/fault-proneness, the empirical evidence provided so far is still limited because of:
Conference Paper
Background: Code review is a well-established software quality practice where developers critique each others' changes. A shift towards automated detection of low-level issues (e.g., integration with linters) has, in theory, freed reviewers up to focus on higher level issues, such as software design. Yet in practice, little is known about the extent to which design is discussed during code review. Aim: To bridge this gap, in this paper, we set out to study the frequency and nature of design discussions in code reviews. Method: We perform an empirical study on the code reviews of the OpenStack Nova (provisioning management) and Neutron (networking abstraction) projects. We manually classify 2,817 review comments from a randomly selected sample of 220 code reviews. We then train and evaluate classifiers to automatically label review comments as design related or not. Finally, we apply the classifiers to a larger sample of 2,506,308 review comments to study the characteristics of reviews that include design discussions. Results: Our manual analysis indicates that (1) design discussions are still quite rare, with only 9% and 14% of Nova and Neutron review comments being related to software design, respectively; and (2) design feedback is often constructive, with 73% of the design-related comments also providing suggestions to address the concerns. Furthermore, our classifiers achieve a precision of 59%-66% and a recall of 70%-78%, outperforming baselines like zeroR by 43 percentage points in terms of F1-score. Finally, code changes that have design-related feedback have a statistically significantly increased rate of abandonment (Pearson χ² test, DF=1, p < 0.001). Conclusion: Design-related discussion during code review is still rare. Since design discussion is a primary motivation for conducting code review, more may need to be done to encourage such discussions among contributors.