Conference PaperPDF Available

Characterizing and Identifying Composite Refactorings: Concepts, Heuristics and Patterns


Abstract and Figures

Refactoring consists of a transformation applied to improve the program internal structure, for instance, by contributing to remove code smells. Developers often apply multiple interrelated refactor-ings called composite refactoring. Even though composite refactor-ing is a common practice, an investigation from different points of view on how composite refactoring manifests in practice is missing. Previous empirical studies also neglect how different kinds of composite refactorings affect the removal, prevalence or introduction of smells. To address these matters, we provide a conceptual framework and two heuristics to respectively characterize and identify composite refactorings within and across commits. Then, we mined the commit history of 48 GitHub software projects. We identified and analyzed 24,911 composite refactorings involving 104,505 single refactorings. Amongst several findings, we observed that most composite refactorings occur in the same commit and have the same refactoring type. We found that several refactorings are semantically related to each other, which occur in different parts of the system but are still related to the same task. Our study is the first to reveal that many smells are introduced in a program due to "incomplete" composite refactorings. Our study is also the first to reveal 111 patterns of composite refactorings that frequently introduce or remove certain smell types. These patterns can be used as guidelines for developers to improve their refactoring practices as well as for designers of recommender systems.
Content may be subject to copyright.
Characterizing and Identifying Composite Refactorings:
Concepts, Heuristics and Paerns
Leonardo Sousa
Electrical & Computer Engineering
Carnegie Mellon University, USA
Diego Cedrim
Alessandro Garcia, Willian
PUC-Rio, Brazil
Ana C. Bibiano, Daniel Oliveira
PUC-Rio, Brazil
Miryung Kim
Anderson Oliveira
PUC-Rio, Brazil
Refactoring consists of a transformation applied to improve the
program internal structure, for instance, by contributing to remove
code smells. Developers often apply multiple interrelated refactor-
ings called composite refactoring. Even though composite refactor-
ing is a common practice, an investigation from dierent points of
view on how composite refactoring manifests in practice is miss-
ing. Previous empirical studies also neglect how dierent kinds of
composite refactorings aect the removal, prevalence or introduc-
tion of smells. To address these matters, we provide a conceptual
framework and two heuristics to respectively characterize and iden-
tify composite refactorings within and across commits. Then, we
mined the commit history of 48 GitHub software projects. We iden-
tied and analyzed 24,911 composite refactorings involving 104,505
single refactorings. Amongst several ndings, we observed that
most composite refactorings occur in the same commit and have
the same refactoring type. We found that several refactorings are
semantically related to each other, which occur in dierent parts
of the system but are still related to the same task. Our study is
the rst to reveal that many smells are introduced in a program
due to “incomplete” composite refactorings. Our study is also the
rst to reveal 111 patterns of composite refactorings that frequently
introduce or remove certain smell types. These patterns can be used
as guidelines for developers to improve their refactoring practices
as well as for designers of recommender systems.
Software and its engineering Software design engineer-
ACM Reference Format:
Leonardo Sousa, Diego Cedrim, Alessandro Garcia, Willian Oizumi, Ana C.
Bibiano, Daniel Oliveira, Miryung Kim, and Anderson Oliveira. 2020. Char-
acterizing and Identifying Composite Refactorings: Concepts, Heuristics
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
©2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7517-7/20/05.. .$15.00
and Patterns. In 17th International Conference on Mining Software Reposito-
ries (MSR ’20), October 5–6, 2020, Seoul, Republic of Korea. ACM, New York,
NY, USA, 12 pages.
Software refactoring is a widely used technique in practice [
]. Refactoring consists of a program transforma-
tion used to improve software structure, such as removing code
smells [14]. Well-known refactoring types include Extract Method,
Rename Method, and Move Method. Since the term refactoring rst
appeared in the literature [
], studies have been actively in-
vestigating it [
]. Most of these
studies analyze the characteristics and the impact of each single
refactoring on the software structure.
However, from 40% to 60% of the times, developers apply more
than one refactoring in conjunction [
], even for removing
simple code smells, such as Long Methods [
]. In other words,
developers often apply which we call here as composite refactoring.
A composite refactoring – from now on also called composites –
comprises two or more interrelated refactorings that aect one
or more elements [
]. There are two broad categories of
composites: (i) temporally-related composite, i.e., those refactorings
applied in the same commit and are likely to be related to the same
developer’s task, and (ii) spatial composite, i.e., a set of refactorings
applied in structurally related code elements, regardless whether
they are performed at the same change (commit) or not.
Recent studies (e.g., [
]) have analyzed a single category
of composite at a time. For example, Palomba et al. [
] and Tufano
et al. [
] analyze temporally-related composites, while Bibiano et
al. [
] and Brito et al. [
] explore spatial composites. As no study
analyzes these dierent categories altogether, we might have missed
a more comprehensive understanding of composites. For example,
while certain complex smells are likely to be fully removed over
time (e.g., a God Class) through a spatial composite refactoring,
other smells (e.g., Shotgun Surgery) may be removed in a single
commit, but require changes in non-structurally related parts of the
program. As composite categories were studied only under a single
perspective, we have the opportunity to investigate from dierent
perspectives the impact of refactoring on the program structure.
To investigate composite refactorings, we mined the commit
history of 48 GitHub software projects (i) to identify the charac-
teristics of dierent categories of composite refactorings, and (ii)
their eect on either removing or introducing smells. To support
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
our study, we provide a conceptual framework and two heuristics
for detecting composites. The heuristics are named commit-based
and range-based heuristics, and they serve to automatically identify
composites in software projects. The rst supports the analysis of
refactorings which have a temporal relation. The second intends to
capture refactorings that have a spatial relation. These heuristics
enabled us to investigate composites and their impact on smells
from dierent perspectives. We expect that our contributions and
study ndings can help tool builders by uncovering the blind spots
on the relation between composite refactoring and smells.
Our contributions and study ndings can be summarized as
follows. First, we provide a formal and unambiguous denition for
composites, which serves to guide researchers who aim to further
investigate composites. Second, our heuristics enabled us to reveal
characteristics of composites that were not investigated by related
studies [
]. Some of these characteristics are reported below.
We observe that nearly 41% of composites are complex, i.e., are
comprised by 3 to 20 interrelated refactorings, which contradicts
a recent nding [
]. The majority of the composites are conned
to the same commit and homogeneously formed by refactorings of
the same type, e.g., various syntactically related method extractions.
There is also a non-negligible frequency of: (i) heterogeneous and
cross-commit composites, and (ii) semantically related composites
within the same commit, i.e., sequences of refactorings located in
dierent parts of the code, but still related to the same task (e.g.,
removing non-trivial, scattered smells).
Contradicting previous ndings [
], we observe that
refactoring do have a considerable eect on smells. We found that
nearly 50% of composites either remove or introduce smells. Previ-
ous studies often suggest otherwise. For instance, Bavota et al [
stated that refactorings are not related to smell removal. Cedrim
et al. [
] and Bibiano et al [
] reported that refactorings are most
often neutral, i.e., neither introduce nor remove smells. These stud-
ies either analyze refactorings under the viewpoint of each single
refactoring or multiple refactorings aecting only a single element.
We used our heuristics to identify patterns of composites that
recurrently introduce or remove specic smell types, which have
not been reported in the literature. A manual analysis conrmed a
total of 111 composite-smell patterns: 84 smell-removing patterns
and 27 smell-introducing patterns. As refactoring tools tend to be
underused [
], these patterns can be used to improve recommen-
dation systems [
] by leveraging the use of removal
patterns that developers do in practice. This strategy would increas-
ing the chance of such developers adopting automated refactoring
tools. We also provide a replication package [
], which includes the
scripts that we used to implement the proposed heuristics and the
catalog of composite-smell patterns for 11 smell types. Our dataset
is available for other researchers who are interested in studying
composites and their eects on smells.
Diverse views on composite refactoring.
Many researchers have
investigated composites [
]. However, they use
dierent terms (e.g.,batch refactoring [
]) or denitions to refer
to composite refactoring. Some studies consider a composite as a
set of two or more interrelated refactorings applied by the same
+ userDao
+ mediaDao
+ saveUser (u:User)
+ saveMedia (m:Media)
+ userDao
+ saveUser (u:User)
+ mediaDao
+ saveMedia (m:Media)
Figure 1: Refactorings applied to the Mobile Media
developer [
]. Other studies dene a composite as a
set of refactorings applied by multiple developers [
]. Bib-
iano et al. [
] consider the scope of a composite refactoring as an
individual code element. Other studies consider that a composite
refactoring may be applied in the scope of multiple elements [
]. There is a study that assumes time constraints
to dene a composite [
]. There are also studies that proposed
approaches to recommend composite refactorings [24, 31, 34, 51].
Bibiano et al. [
] and Vassalo et al. [
] are representative exam-
ples of recent studies that explicitly investigated composites. How-
ever, they investigated composites through a single perspective. For
example, Bibiano et al. [
] provided a partial view on composite
refactoring since they analyze only composites in the scope of indi-
vidual code elements. Hence, composite refactorings that crosscut
two or more elements were not completely investigated. Thus, their
ndings may not yield a comprehensive understanding of more
complex forms of composites. Next, we illustrate how relying on a
denition can compromise a researcher’s study.
Eect of composites on smells.
In the example of Figure 1,
the researcher wants to investigate the eect of composites on
smells. The gure shows three commits of Mobile Media (MM),
a software product line to derive mobile applications [
]. A de-
veloper performed seven refactorings:
r1,r2, . .,r7
along these com-
mits. According to Bibiano et al.’s denition [
], a composite com-
prises two or more refactorings within the scope of a single ele-
ment. If the researcher follows this denition, s/he would consider
as composites. This
denition forces the researcher to restricting composites to those
occurring in the context of an element, which may be inappropri-
ate to investigate the eects of composites on smells. For example,
in Figure 1, the refactorings
removed the God Class. As
these refactorings belong to the composite
, the researcher would
conclude that composites have a positive eect on the program
structure since
reduced the incidence of smells. However, this
conclusion is misleading due to the use of a composite denition
that does not properly cover cases such as the one discussed above.
Let us consider the
refactoring (Extract Superclass), which
crosscuts multiple elements. This refactoring creates a superclass
) shared by
, which led to the
introduction of the Speculative Generality [
]. Since the smell is
introduced in the scope of another element, Bibiano et al.’s deni-
tion would not consider it when assessing the eect of a composite -
their denition does not consider the scope of all elements aected
by the refactorings. In this scenario, the composite removed a smell
(God Class) but introduced another (Speculative Generality). There-
fore, the researcher should have concluded that composites have
Characterizing and Identifying Composite Refactorings MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
no eect on the introduction or removal of smells. To have a better
understanding on composite refactorings and their eect on smells,
the researcher would need other heuristics (Section 3.3) to identify
composites that aect the scope of multiple elements. In addition,
although there are several works that study the complex nature
of code smells [
], they do not address the
relationship of composite refactorings and smells.
We dene here basic concepts needed to study composites (Sec-
tion 3.1). We rely on these concepts to characterize an existing
heuristic (Section 3.2) and to propose two new ones (Section 3.3).
3.1 A Conceptual Framework
This section presents a conceptual framework for composite refac-
toring. We used this framework to provide a foundation for our
heuristics (Section 3.3) and our empirical study. Other researchers
can also use it to conduct studies based on unambiguous concepts.
3.1.1 Composite Refactoring. Composite refactoring occurs when
two or more interrelated refactorings are applied to a set of code
elements. Thus,
cr =[r1,r2,· · · ,rn]
is a composite of size
2. The notion of interrelation depends on the composite scope
(Section 3.1.4). Most studies restrict the composite to refactorings
applied by the same developer [
]. However, developers
can work together to apply a composite [
]. This scenario can hap-
pen, for example, when they have to team up to plan and perform
a major restructuring in the system, or when they create branches
to apply refactoring exclusively [20].
3.1.2 Composite Uniformity. All the refactorings in the compos-
ite can have the same type or not, which we dene as composite
uniformity. In this context,
is a function that returns the
type of the refactoring
. In our example of Figure 1,
Move Method
. Therefore, the composite
cr =[r1,r2,· · · ,rn]
is het-
erogeneous if and only if
|type(r1) ∪ type(r2) · · · type(rn)| >
|type(r1) ∪ type(r2) · · · type(rn)| =
1, then the composite is
homogeneous. Most studies do not consider that a composite only
exists if all refactorings have the same type [33, 42, 45, 48].
3.1.3 Composite Timespan. A developer can start a composite in
a commit and nish it in the same commit or in the subsequent
commits. In this sense, composite timespan indicates if the composite
is either single-commit or cross-commit. To identify the timespan,
let us dene the function
to nd the commit where the
was performed. Thus, a composite
cr =[r1,r2,· · · ,rn]
is cross-commit if and only if
|commit(r1) ∪ · · · ∪ commit(rn)| >
Similarly, if
|commit(r1) ∪ · · · ∪commit(rn)| =
1, then
is single-
commit. Several studies of refactoring only consider major version
[5] or a single commit [10], or the entire project history [6].
3.1.4 Refactoring and Composite Scope. Elements directly aected
by the refactoring constitute the refactoring scope. Given a refactor-
scope (r)
is a function that returns the set of elements belong-
ing to the scope of
. For instance, the refactoring
in Figure 1
(Move Method) moved the method
from class
MediaCtrl. Hence, the refactoring scope is {mediaDao,U ser Ct rl ,
MediaCtrl }
. Similar to a single refactoring, composites also have a
scope. The composite scope is the set of code elements aected by the
refactorings within a composite. The composite scope also indicates
how the refactorings within the composite are interrelated.
One might naturally say the union of all refactoring scopes from
a composite determines the composite scope, but this is not nec-
essarily true in all scenarios. Related studies have dierent ways
to dene the composite scope. In general, these studies can be di-
vided into two groups: composite refactoring aects only the scope
of a single element [
] or the scope of multiple elements
]. In the rst group, all refactorings within the composite
are related to each other because they aect the same element. In
the second group, if a refactoring crosscuts two elements, then all
refactorings in one element will be related to the refactorings in the
other element. For example, a developer applied refactoring
to class
. These two refactorings are not related; thus
they do not compose a composite. However, the developer applied
a refactoring
, which moves a method from
. Thus, the three
refactorings became related to each other, creating a composite. In
this case, the composite scope includes both classes.
3.1.5 Composite Synthesis. The process of grouping interrelated
refactorings to nd composites is dened as composite synthesis.
To synthesize a composite, we need rst to detect the refactorings
that occurred in the system. Related studies have dierent strate-
gies to identify refactorings applied by developers. A strategy is
to analyze the commit message to identify the refactorings [
Another strategy is to use a tool that compares two subsequent
commits to identify refactorings [
]. For the sake of explanation,
let us assume that a refactoring detection tool implements a func-
tion R. This function expresses all refactorings in the history Hof
a system
, which is composed of all refactorings detected between
subsequent pairs of commits:
H(s)=Ð|Comm it s (s) |1
To illustrate the output of function
, let us visit the MM sys-
tem presented in Figure 1. This system has four commits, where
three of them are represented in the gure. The fourth one is
produced as the result of applying the refactorings
H(s1)=R(c1,c2) ∪ R(c2,c3) ∪ R(c3,c4)
. In other words,
contains all refactorings presented in Figure 1, which are
3.2 Element-Based Heuristic
This section presents a formal denition of the element-based heuris-
tic [6], which we will use in our study.
Formal Denition.
A heuristic that synthesizes composites
using as scope an individual code element, i.e., either a method
or a class. The goal of this heuristic is to investigate how com-
posites aect an specic element. Formally, a given composite
cr =[r1,r2,· · · ,rn]
is synthesized by the element-based heuristic if
and only if there is an element
such as
escope (ri) ∀ricr
. For
instance, let
be the function that implements the element-
based heuristic over a particular refactoring history
(Figure 1).
CRe(H(s1)) ={cra[r1,r2,r3,r4,r5],crb[r3,r6,r7]}
. Thus, this
heuristic synthesizes two composites. The rst one,
, is a com-
posite because
aected the same element: UserCtrl.
The second composite, crb, aects the MediaCtrl class.
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
In this heuristic, the composite scope is determined by the
element used to synthesize the composites. In this way,
scope (ca)=
{U serC tr l }, and scope (cb)={MediaCt rl}.
The element-based heuristic focuses on the element to nd com-
posites. Focusing on the element is a strength as it allows us to
investigate what occurs with the element during its evolution. At
the same time, focusing on the element is also a weakness. The
scope of some refactoring types goes beyond a single element. Sup-
pose a developer applies an Extracted Method in class
, and then a
Move Method from class
. The heuristic will only synthesize a
composite in class
. Since class
is out of scope, the eects of the
composite in
will not be considered. As the eect in each element
will be treated independently, this heuristic may not be entirely
appropriate to investigate the eect of composites on smells.
3.3 Composite Synthesis Heuristics
We propose here two heuristics to synthesize composites.
3.3.1 Commit-Based Heuristic. The composite scope also indicates
how the refactorings are interrelated (Section 3.1.4). Sometimes the
refactorings are not structurally related to each other but they occur
in the same context. For example, a developer may apply several
refactorings to address a task associated with a commit. Hence,
it makes sense to group these refactorings. For this purpose, this
heuristic considers a single commit as the timespan (Section 3.1.3).
In fact, there is a commit policy, widely accepted in the commu-
nity, that recommends developers not to perform code changes for
multiple tasks in the same commit [
]. Thus, each commit should
have refactorings somehow related to the same task.
Formal Denition.
The commit-based composite heuristic syn-
thesizes as a composite all refactorings performed within a commit.
The goal of this heuristic is to capture a temporal relation among
the refactorings made at the time frame of a single commit. For-
mally, a composite
cr =[r1,r2,· · · ,rn]
is synthesized if and only
|commit(r1) ∪ commit(r2) · · · commit(rn)| =
1. For instance,
H(s1)=[r1,· · · ,r7]
(Figure 1). Now, let
be the
function that implements the commit-based heuristic over a refac-
toring history
. Thus, the commit-based heuristic produces two
composites: CRc(H(s1)) ={crc[r1,r2],crd[r4,r5,r6,r7]}.
The composite scope includes the elements aected by
the refactorings within the commit. Thus,
scope (crc)={U ser Ct rl ,
MediaCtrl }
, and
scope (crd)={U ser Ct rl ,MediaCtr l,Abstr actCtrl }
The commit-based heuristic is useful to observe the eect of all
refactorings that occur in a commit. Assuming that all the changes
within a commit are related to the same task [
], researchers can
use this heuristic to understand how refactorings aect elements
related to a task. This heuristic solves (partially) the limitation of the
element-based heuristic. Instead of considering only the scope of a
single element, it considers all elements aected by the refactorings
performed along the commit’s task. Thus, this heuristic does not
discard refactorings that crosscut elements. However, there are
cases that the commit-based heuristic discards refactorings to which
it should not. A developer can start a composite in a commit and
nish it in the subsequent commits. For example, a developer can
start a composite, then, s/he can commit the changes and continue
on refactoring the same elements. In this case, the commit-based
heuristic would synthesize two composites rather than one.
3.3.2 Range-Based Heuristic. Some refactorings are structurally
related to each other because they aect elements that are located in
the same part of the source code. Thus, if we want to understand the
eect of composites on the program structure, we need to analyze
how these structurally related refactorings aect the elements. For
example, if a refactoring crosscuts two elements, both elements
should be analyzed to understand the eect of the refactoring. We
propose the range-based heuristic to identify composites in which
their refactorings aect the same location in the code.
Formal Denition.
The range-based composite heuristic con-
siders the notion of refactoring scope to synthesize composites.
In this heuristic, the scope of all refactorings form the compos-
ite scope. A composite starts with an arbitrary refactoring
. A
second refactoring
is part of the same composite if and only
such as
escope (ra)
. A possible third
will be added to the composite if
escope (ra)
escope (rb)
. This process continues until all
refactorings in a particular history are explored.
In this heuristic, the composite scope is determined by
the union of the scopes of all refactorings:
i=1scope (ri)
. The
refactorings in Figure 1 moved elements from UserCtrl to MediaC-
trl classes. Hence,
scope (r1)=scope(r2)={Us erC tr l,MediaCt rl }
The composite synthesis in this example starts with
. As
applied in one element of
scope (r1)
, then the composite grows big-
ger and turns into
. The
refactoring aects elements of
scope (r1)
, then the composite is now
. The same reason-
ing can be used for the remaining refactorings, so the composite
synthesis produce the composite ce=[r1,r2,r3,r4,r5,r6,r7].
4.1 Research Questions
In the previous section, we proposed heuristics to identify com-
posites. These heuristics allow one to analyze composites from
dierent, albeit complementary, perspectives. To propose them, we
formally dened concepts that characterize a composite. Our goal
is to use these concepts to understand (i) how composites manifest
in software systems and (ii) their eect on smells. To achieve this
goal, we aim to answer the following research question:
What are the characteristics of composites in software
We address
by applying the heuristics to identify three cate-
gories of composites: element-based,commit-based, and range-based
composites. The concepts dened in our conceptual framework al-
low us to compare these categories of composites. Thus, we can
also have a better understanding of the eect of composites on the
program structure. For this purpose, we use the following research
question to investigate if composites aect the incidence of smells:
RQ2.How does composite aect the incidence of smells?
Notice that answering
is not trivial. First, we need to iden-
tify the elements aected by each category of composite, but taking
into consideration their composite scope. Then, we analyze what
happened with the smells before and after developers apply the
composites. To support this analysis, we classify each composite
according to their eect on the incidence of smells. We classify a
Characterizing and Identifying Composite Refactorings MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
composite as a
one if it reduces the number of code smells.
Conversely, we classify it as
composite if it increases the
number of smells. Otherwise, we classify it as
Other empirical studies applied this type of analysis [6, 9–11].
As a complement to RQ
, understanding and distinguishing the
eect of specic types of composites on smells is an essential inves-
tigation. First, our investigation may help tool builders by uncover-
ing the blind spots on the relation between refactoring and smells.
Second, this investigation aims (i) to identify topics that require
further investigation and (ii) to contrast the results with ndings
established in the literature. For example, Fowler [
] presented a
catalog of composite types that can be used to remove code smells,
which we named as a composite-smell pattern. A composite-smell
pattern establishes a frequently observed relationship between a
composite type and the introduction or removal of a smell type.
For instance, suppose that there is a method aected by the Fea-
ture Envy code smell. In this case, Fowler recommends to apply a
composite pattern composed of Extract Method followed by a Move
Method. Unfortunately, we do not know if developers apply this
composite pattern in practice. More specically, we do not know
which patterns govern the relation between refactorings and smells.
These patterns are the focus of our next research question:
What are the patterns governing composites and smells?
We address
by investigating creational and removal pat-
terns. A
creational pattern
represents a recurring case where the
composite tends to introduce a code smell. A
removal pattern
represents a recurring case where the composite tends to remove
a smell. There is no empirical study in the literature that reports
composites that typically remove or introduce smells. By answering
, we are able to reveal composites used by developers not only
to remove, but also to inadvertently introduce smells. The knowl-
edge about creational patterns make developers informed about
the risks of introducing certain smells along composite refactoring.
The removal patterns can be useful to implement recommendation
systems to support developers when removing smells.
4.2 Study Phases
This section presents the ve phases of the study design.
Phase 1: Dataset Acquisition.
In this phase, we choose a set
of software projects to analyze. We established GitHub as the
source of projects. To select them, we followed criteria based on
closely related studies [
]. We selected projects with (1) dierent
levels of popularity – based on the number of Github stars, (2) an
active issue tracking system, and (3) at least 90% of code written in
Java. These criteria allowed us to select 48 projects with a diversity
of structure, domain, size and popularity. The replication package
contains information about them [43].
Phase 2: Smell and Refactoring Detection.
In this phase, we
detected (i) the refactorings in all subsequent pairs of commits
, and (ii) all smells in each commit
. We
chose Refactoring Miner [
] to detect refactorings for two reasons.
First, the tool has precision of 98% and recall of 87% as reported
by Tsantalis et al. [
], which leads to a low rate of false positives
and false negatives. Second, the tool identies the most common
refactoring types applied by developers [
]. We considered all 14
refactoring types identied by the tool. Refactoring Miner gives us
as output a list of refactorings
R(ci,ci+1)={r1,· · · ,rk}
as dened
before, where kis the number of identied refactorings.
Code smells are often detected with metric-based strategies [
Each strategy is dened based on a set of metrics and thresholds. Af-
ter collecting metrics for all projects, we applied the rules to detect
smells [
]. These rules were used because: (i) they repre-
sent renements of well-known rules proposed by Lanza et al. [
which are used in related studies [
]; and (ii) they have,
on average, precision of 72% and recall of 81% [
]. We collected
19 smells: Brain Class,Brain Method,Class Data Should Be Private,
Complex Class,Data Class,Dispersed Coupling,Divergent Change,
Feature Envy,God Class,Intensive Coupling,Large Class,Lazy Class,
Long Method,Long Parameter List,Message Chain,Refused Bequest,
Shotgun Surgery,Spaghetti Code,Speculative Generality.
Phase 3: Manual Validation.
We randomly sampled refactor-
ings from each type to validate them manually. To ensure an ac-
ceptable condence level in the results, we calculated the sample
size of each refactoring type based on a condence level of 95% and
a condence interval of 5 points. We recruited ten undergraduate
students from another research group to also analyze the samples.
The samples were divided into ten disjointed sets, and each student
validated one. For each pair of elements, they had to mark it as a
valid refactoring or not. Thus, we estimated the number of false
positives generated by the Refactoring Miner [
]. We highlight
that our goal was to ensure the trustability of the tool for our set
of systems. For that matter, we relied on students, familiar with
refactoring, to validate the tool. After the manual validation, we
observed that the tool achieved high precision for all refactoring
types, in which the median was 88.36%. The precision for all refac-
toring types is within one standard deviation (7.73). Applying the
Grubb outlier test (alpha=0.05), we did not nd any outlier. This
result indicates that no refactoring type is strongly inuencing the
median precision. Thus, the precision for all the refactorings in the
validated sample provides trustability to our results.
Some smells can be introduced by functional changes, such as
the implementation of a new feature. Thus, we also validated if
the smells were introduce or removed by the refactorings. First,
we ran the eGit plugin and the Linux di tool to nd changes
between commits. Then, we manually analyzed each change. We
also analyzed the commit message to verify if there was any sign
that the developer applied a pure refactoring. When we identied
a functional change, we classied it as non-pure refactoring [
otherwise, we classied it as pure refactoring. We validated 1,168
pure refactorings and 3,817 non-pure refactorings. We used the
pure refactorings to conrm some results in Sections 5 and 6.
Phase 4: Synthesis and Classication of Composites.
heuristics to synthesize composites require a refactoring history
as input (Section 3.3). We collected this history for each project in
Phase 2. Each refactoring history was submitted to the algorithms
that implement the heuristics, allowing us to collect: (i) element-
based, (ii) range-based, and (iii) commit-based composites. After
collecting them, they were classied according to their eect on
smells. Thus, composites were classied as positive, negative, and
neutral. Finally, we identied composite patterns related to the
introduction and removal of specic types of smell. More details
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
about the composite patterns are provided in Section 6. The al-
gorithms (scripts) that implement the heuristics and classify the
composites are available in the replication package [43].
Phase 5: Systematic Validation of Composite Patterns.
increase the reliability of our results, we conducted a systematic
manual validation of a random sample of composites. First, we se-
lected 130 composites associated with the introduction and removal
of Feature Envy and God Class. We focused on these smells since
they are the ones with the most complex composites (Section 6).
Then, we randomly divided the composites among 4 researchers.
For each composite, the researcher conducted the following steps.
(1) Select the GitHub project where the composite happened;
(2) Identify the commits where the composite occur;
(3) Validate the refactorings and the smells in the elements;
Conrm if the composite is a creational or removal pattern;
If yes: conrm if the composite explicitly introduced/re-
moved the smell or if it is at least associated with the smell
If no: verify if the composite is an incomplete one, i.e., if
one or more refactorings in the removal pattern would
have removed the smell.
Analyze the commit messages to nd the developer’s inten-
tion when performing the composite.
We also veried if the refactorings within a commit-based
composite were semantically related. For this purpose, we
analyzed the commit message and also if the refactorings
addressed a task associated with a commit.
We validated 40 creational patterns, 43 removal patterns and
47 incomplete composites. We will use the validated composites
to exemplify our discussions. In these cases, we will identify the
composite by the “#” symbol followed by its id, e.g., composite
#21517). Our replication package contains all the validated instances
and the detailed steps and information to validate them.
We identied 27,911 composites in our dataset. We present their
characteristics (Section 5.1) and smell eects (Section 5.2).
5.1 Synthesized Composites
5.1.1 antity and Size. This section addresses our
. Table
1 shows, for each heuristic (1
st column
), the quantity (2
nd column
and size of composites.
Table 1: Quantity and size of composites by heuristic
Ref. in
Size Std.
Elem.Min Med. Max Avg
Element 12,636 28,394
(54%) 2 2 333 3.9 6.6 49.89538 4,579
Commit 11,545 47,218
(91%) 2 3 2,562 8.0 44.4 57.76980 51,472
Range 3,761 28,883
(55%) 2 2 2,556 7.7 62.2 41.09278 18,132
Providing a broader view on the composites.
In Section 3.2,
we discussed that the element-based heuristic proposed by Bibiano
et al. [
] may not be appropriate to researchers who want to inves-
tigate the eect of composites on smells. The reason is that there
are several elements aected by the refactorings that this heuristic
would ignore by denition. Indeed, the number of refactored ele-
ments in the element-based composites is lower when compared to
the other categories of composites (last column in Table 1). When
we compare the average size of element-based composites with the
commit- and range-based composites (7
th column
), we notice a dif-
ference in the number of refactorings in each category of composite.
Comparing the number of elements with the average size, we notice
that the commit- and range-based composites are fragmented in
the element-based composites. This result shows how the element-
based heuristic only provides a partial view of the composites. The
analysis of refactored elements leads to our rst nding:
Finding 1
: Commit- and range-based heuristics allow a broader
assessment on the interrelation among refactored elements.
Capturing complex composites.
Our heuristics are helpful to
nd complex composites. A composite is complex when it is com-
posed of a high number of refactorings, usually aecting multiple el-
ements. When we consider the average of refactorings in a compos-
ite (7
th column
), the size of commit-based (8.0) and range-based (7.7)
composites is near twice the size of element-based composites (3.9).
This comparison shows that the number of interrelated refactorings
(in commit-based or range-based composites) is much larger than
any occurrence in the context of a single element. We also found
that 1,545 (41%) out of 3,761 composites of range-based heuristic,
and 5,793 (50%) out of 11,545 composites of commit-based heuristic
have 3 to 20 interrelated refactorings in conjunction. Therefore,
studies that investigated only single refactorings or only refactor-
ings aecting an element [
] are not able
to identify complex composites. Thus, they are oversimplifying the
study on refactoring. This result leads us to our next nding:
Finding 2
: There is a non-ignorable frequency of complex
composites that most empirical studies missed.
Most refactorings are interrelated.
After applying the heuris-
tics, a given refactoring will be either classied as a single refac-
toring or interrelated with others in a composite. In this vein, the
r d column
of Table 1 presents the quantity of interrelated refac-
torings. As expected, the commit-based heuristic was the one that
grouped the highest number of interrelated refactorings. The heuris-
tic synthesized 11,545 composites, totaling 47,218 interrelated refac-
torings, which represents 91% of the total of refactorings in our
dataset. Previous empirical studies [
] reported that Extract
Method and Rename Method are the most common refactoring
types applied by developers. These studies may give the simplistic
impression that developers tend to most commonly apply single
refactorings with a strict scope, i.e., refactorings that aect one or
two methods of a single class. However, this is not the case.
Even though Extract and Rename Method are the most com-
mon refactoring types, they are most often interrelated with other
refactorings and they tend to be complex. For example, when we
manually validated the 130 composite instances, we found that
when these two refactoring types are applied, they are frequently
part of a much more complex transformation that goes beyond the
scope of a single method or class. For instance, when developers
had the intention to improve the source code, all the refactorings
Characterizing and Identifying Composite Refactorings MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
were associated to the same task: code improvement (e.g., compos-
ites #22691 and #22703 – These composites are available in our
replication package [
]). This is even clearer for the commit-based
composites. Since most of the refactorings occur within a commit
(91%), the refactorings are associated with the task’s commit.
Finding 3
: Refactoring composites are much more complex
than what existing empirical studies suggest.
Semantic relation among refactorings.
When we analyze the
commit-based composites, only 9% of the refactorings do not belong
to a composite. This result indicates that 91% of the refactorings
are interrelated. Thus, either these refactorings are part of range-
based composites (55%) or they occur in elements that are not
structurally related to each other. This result indicates that when
developers are working on a task, there are several refactorings
that are not syntactically related to each other. As the refactorings
in the commit-based composites are not syntactically related, we
investigated if they had any relation. We found that several of
these refactorings are semantically associated with the task that the
developer is addressing in the commit. For example, several of the
refactorings were applied to remove smells in dierent elements.
These refactorings were not structurally related to each other, but
they were semantically related to each since they aimed to remove
smells (Section 5.2). Notice that if one analyzes only the range-based
composite, s/he would not be able to identify the semantic relation
between the refactorings. This result leads us to our next nding:
Finding 4
: Several commit-based composites contain refac-
torings that are semantically related to each other.
This nding may jeopardize most refactoring recommendation
systems [
]. These systems tend to consider only
the structurally related refactorings to learn how to recommend
refactorings. However, they do not explore the semantic relation
among refactorings. Only considering structurally related refactor-
ings may not suce to provide recommendations for developers.
Our dataset also contains extremely large composites (Table 1).
However, we consider them as outliers, since they are rare. For
the commit-based heuristic, for example, 87% of them are com-
posed by 10 or less refactorings. Only 0.004% of the commit-based
composites have more than 100 refactorings. To conrm that the
largest composites are outliers, we applied the Grubbs test for one
outlier. Table 1 shows the Grubbs score in the penultimate column.
The test is calculated as the highest composite size minus mean,
divided by standard deviation. We can accept the hypothesis that
the highest sizes of all heuristics are outliers since for all of them
the Grubbs scores were higher than the critical values. Besides that,
we observed p-values smaller than 0.00001 for all heuristics, which
means that the results are statistically signicant. In our replication
package [43], we have a manual analysis about these outliers.
5.1.2 Heterogeneity and Timespan of Composites. Table 2 presents
the results about the timespan and uniformity of composites.
Most composites are single-commit.
Table 2 shows that most
composites are single-commit. This occurs even in the case of the
range-based composites, which may have a larger composite scope.
We were expecting that developers could start a composite in a
commit and nish it in the following commits. However, out results
Table 2: Timespan and uniformity characteristics
Timespan Uniformity
Heur. Single-Commit Cross-Commit Homoge. Heteroge.
Element 9,094 (72.0%) 3,542 (28.0%) 11,107 (87.9%) 1,529 (12.1%)
Commit 11,545 (100.0%) 0 (0.0%) 6,484 (56.0%) 5,061 (44.0%)
Range 3,486 (93.5%) 244 (6.5%) 2,875 (77.0%) 855 (23.0%)
show that developers tend to limit the composites to a single commit.
This suggests that they intend to perform all refactorings at once,
without splitting the task into multiple commits.
Most composites are homogeneous.
Table 2 shows that most
composites are homogeneous, i.e., they have the same refactoring
type. We were not expecting this result. Fowler [
] in his book
presents a catalog of multiple refactorings that can be applied to re-
move some smells. Hence, we assumed that developers would apply
heterogeneous composites in practice. However, our assumption
does not hold in practice since most composites are homogeneous.
The highest incidence of heterogeneous composites are from the
commit-based composites, which can be explained due to the se-
mantic relation among refactorings. As discussed, any refactoring
performed in a given commit can be semantically related to the same
task, even if these refactorings are applied in structurally unrelated
elements. The result about uniformity indicates that developers fre-
quently apply the same refactoring type when restructuring related
elements. These discussed results lead us to our next nding:
Finding 5
: Even though homogeneous and single-commit
composites are more frequent than their counterparts, het-
erogeneous and cross-commits composites occur with a non-
ignorable frequency, which should not be overlooked.
5.2 Eect of Composites on Code Smells
To answer
, we classied the composites as positive, negative
or neutral (Section 4.1). Table 3 shows this classication.
Table 3: Composite classication by heuristic
Heuristic Positive Neutral Negative
Element-based 751 (6.0%) 11,264 (89.1%) 621 (4.9%)
Commit-based 1,653 (14.3%) 6,019 (52.1%) 3,873 (33.6%)
Range-based 542 (14.5%) 2,020 (54.2%) 1,168 (31.3%)
Several positive and negative composites.
Table 3 shows the
frequency of positive, negative and neutral composites from the
element-based heuristic diers from the commit- and range-based
heuristics. First, Bibiano et al. found similar values for the element-
based heuristic. However, if we analyze only from the perspective
of element-based heuristic, we will conclude that the frequency
of positive and negative composites is almost negligible. However,
this conclusion is not correct. The other heuristics show that the
positive and negative composites are almost as frequent as neutral
composites. In fact, the frequency of positive, negative and neutral
composites is higher than the results reported in the literature [
]. As discussed, the scope of some refactoring types goes beyond a
single element. However, the element-based heuristic only consider
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
the scope of a single element. Thus, this heuristic is not entirely
appropriate to investigate refactorings that crosscut elements. This
limitation compromises the study of Bibiano et al. [
]. In their
study, the eect of several refactorings out of the composite scope
is ignored. This result leads to our next nding:
Finding 6
: Eects of composites often can only be observed
through the reasoning of refactoring’s relations in the scope
of a range or a commit.
Negative composites are most likely than positive ones.
We had an increase in the number of positive composites when we
compare the element-based composites with the other categories.
As discussed in Finding 4 (Section 5.1.1), several refactorings are
not syntactically related to each other but are semantically related.
This scenario occurred, for instance, when developers had the task
of removing Duplicate Code smell scattered over dierent parts of
the system. When we manually analyzed the commit message for
some of the refactorings, we noticed that the developers tagged
the commits as “structural improvements.” In these commits, we
found three distinct cases where each developer was removing
Duplicate Code. All the commits were tagged with the structural
improvement label, and the developer applied, throughout multiple
commits, refactorings to remove the duplication.
We found several instances of the following commit-based com-
cr1={Extract Superclass,Rename Method }
to remove Dupli-
cate Code. The developer applied the Extract Superclass to create a
superclass for the classes with the smell. Then, s/he renamed the
method in the superclass to be consistent with the functionality pro-
vided. We found a case that a system had three dierent unrelated
instances of Duplicate Code in the same commit. For each instance,
the developer applied the composite
. Despite the increase in
positive composites, developers are most likely to introduce smells,
as shown in Table 3. This result leads to the next nding:
Finding 7
: Even though most composites are neutral, a non-
ignorable frequency of composites introduce smells.
Eect of the composite on the smell type.
We relied on the
classication of each composite to investigate its inuence on the
incidence of smells (Section 4.1). We found a case in which the
developer applied a composite to a class that had two smells: Feature
Envy and Message Chain. After the composite has been applied,
we noticed that the developer removed the Message Chain, but
s/he introduced a God Class. In this case, our classication scheme
would classify the composite as neutral. However, a God Class
would be often considered worse than a Message Chain. Hence, it
would not be fair to label the composite as neutral. Considering
the “criticality” of the smell, this composite is more likely to be
considered negative because the structure is worse than before. To
mitigate the risk of misclassifying neutral composites, we veried
in our datset the smells presented before and after each neutral
composite. We observed only 30 cases, in a set that contains 27,911
composites, in which a smell was replaced by other from a dierent
type. This investigation leads to our next nding:
Finding 8
: The refactorings in neutral composites very often
do not replace a smell type for another type.
To address
, we analyzed removal and creational patterns
emerging from the relationship between range-based composites
and smells (Section 4.1). We focus on discussing here the patterns
of range-based composites that aect Feature Envy and God Class.
We discuss these smells because they are usually associated with
the system structural degradation [
]. Patterns for the other
smells and categories of composites are available in our replica-
tion package [
]. We manually inspected several instances of the
patterns to understand what happened. In particular, we also con-
rmed whether the composites were directly related to the removal
or introduction of the smell. We ended up identifying 111 composite-
smell patterns: 84 removal patterns and 27 creational patterns.
6.1 Feature Envy
Feature Envy is a code smell that represents a method much more
interested in the data of a class other than the one it is actually
declared [
]. This smell is the most frequent one in our dataset.
Figure 2 presents all 13 composite types related to Feature Envy.
Green boxes represent the removal patterns; they appear in the
right side of Figure 2. The red ones, in the left side, represent the
creational patterns. The content of each box represents the type of
composite involved in the pattern. There is a caveat regarding the
repetition structure: the
symbol indicates the refactoring type
was observed more than once in the composite structure.
The arrow weight indicates the frequency of a pattern with: (i) a
removal behavior if the arrow is pointing to a green box, and (ii)
creational behavior if the arrow is departing from a red box. For
instance, the top-right green box indicates that in 77% of the times
a composite with more than one Inline Method followed by more
than one Extract Method removes one instance of Feature Envy.
The same rationale is used to interpret the creational patterns.
Figure 2: Feature Envy patterns
We discussed in Section 5.1.1 that Extract Method is one of the
most common refactorings and it is most often interrelated with
other refactorings. Indeed, Figure 2 shows that all patterns have
by, at least, one Extract Method (EM). Neither the discussion about
Extract Method in Section 5.1.1, nor the identication of compos-
ite patterns would be possible if (i) we had only analyzed single
refactorings or (ii) used the element-based heuristic.
Characterizing and Identifying Composite Refactorings MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
Incomplete composites. We noticed cases of composites con-
sistently introducing Feature Envies in 31 projects. Composites
with Move Attribute, Extract Method introduced Feature Envies in
more than 60% of the cases as shown in Figure 2. These creational
patterns indicate that the composites are “incomplete”, which con-
tributed to the introduction (rather than the removal) of the Feature
Envy. An incomplete composite occurs when a set of refactorings
aect the smelly structure, but are not sucient to fully remove a
smell. It may even worsen the smelly structure. For instance, the
developers moved attributes in the three rst creational patterns in
Figure 2; however, they did not move the corresponding extracted
methods to fully remove the envy structures. Consequently, the
“unmoved methods” became more interested in the classes to which
the attributes were moved. These composites led to the introduc-
tion of the Feature Envy because they were incomplete; i.e., a Move
Method should also be part of such composites. Examples falling
into this scenario include composites #22092, #22156 and #22419.
This type of scenario reinforces our discussion about the high
number of negative composites (Finding 7). As we discussed in
Section 5.2, our heuristics show that several composites are nega-
tive. This increase in the number of negative impacts is related to
the incomplete composites. We found that developers are trying
to improve the program structure during the refactoring process
but, for dierent reasons, they are not necessarily completing the
restructuring process to fully remove the smelly structure. As a
consequence, incomplete composites lead to the introduction of
smells, such as the Feature Envy. These incomplete composites
were also observed on patterns for the other smell types.
Finding 9
: Developers tend to introduce smells, such as Fea-
ture Envies, due to incomplete composites.
Avoiding misleading results.
As discussed, Bibiano et al. [
do not provide a broader understanding of the eect of composites
on smells, which can lead to misleading results. The same occurs
with studies that only focus on single refactorings [
]. For exam-
ple, Bavota et al. [
] did not nd any relation between specic smells
(e.g., Feature Envy) and specic refactorings (e.g., EM). To illustrate
how these studies are not able to either provide a broader view or
nd relation between refactorings and smells, let us consider the
EM refactoring since it occurs in all the patterns associated with
the Feature Envy (Figure 2). We applied the Fisher’s Exact Test to
investigate the relation between EM and Feature Envy (Table 4). For
each heuristic (1
st column
), we present the number of composites
containing EM that removed and introduced Feature Envies, 2
and 3
r d
columns respectively. The 4
and 5
columns show the
same information for composites without EM. The last two columns
show the p-value and odds ratio (OR) for the Fisher’s Exact Test.
Table 4: Fisher’s test results for Feature Envy patterns
Heuristic Positive
With EM
With EM
Without EM
Without EM p-value OR
Element 496 86 0 0 1 0
Commit 15,632 2,013 31,398 39,000 <0.000001 9.64
Range 360 110 25 0 0.002338 0
We ran the test with 95% of condence, which means that we
can reject the null hypothesis (H0) when the p-value is smaller
than 0.05. In our case, the H0 is that the introduction or removal of
Feature Envies by composites is independent of the presence of EM.
Given the p-values, only in the case of the element-based heuristic
that we cannot reject H0. Therefore, the element-based composites
mislead us to believe that composites without EM will never re-
move or introduce Feature Envies. However, the results of the other
heuristisc show the opposite, especially in the case of commit-based
composites. Thus, our heuristics were able to reveal that EM often
“partially” contributes to the removal (and introduction) of Feature
Envy, when performed with other refactorings (composites). In
summary, only analyzing element-based composites [
] or single
composites [
] does not provide a broader understanding of
composite, or, in the worst-case scenario, it can lead to an erroneous
result. This discussion reinforces Finding 1 (Section 5.1.1).
6.2 God Class
Our second set of composite-smell patterns concerns the God Class.
This smell exists when a class accumulates several responsibilities
]. We found out that this smell is more frequent than one might
expect. We found 425 distinct instances of God Class distributed
into 26 projects. Figure 3 presents all the 12 patterns.
Figure 3: God Class patterns
Palomba et al. showed that when developers implement new
features, they often apply complex refactorings to improve the code
cohesion [
]. Our results provide a new perspective regarding
this scenario. We found that developers tend to decrease the code
cohesion when interleaving refactorings with additional changes.
For example, when developers apply composites of Rename Methods
and Extract Methods, they tend to introduce God Class, as shown
in Figure 3. At rst sight, this pattern is not intuitive. Developers
are not expected to increase the size of classes while performing
Rename and Extract Methods. We analyzed these composites to
understand why they led to the God Class.
Inappropriate additional changes
. We found that this cre-
ational pattern exists when developers interleave refactoring with
additional changes and if they are not performed in conjunction
with other refactorings (e.g., composites #21517 and #20932). The
additional changes comprise the creation of new methods (Extract
Methods), which are, unfortunately, implementing unrelated func-
tionalities. As a consequence of these additions in the extracted
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
methods, developers have to change the methods’ name to express
the new functionalities (Rename Methods). As new functionalities
are introduced, the class cohesion decreases, which leads to the
appearance of a God Class. The composites with Rename Methods
and Extract Methods were not the main reason for the introduction
of the God Class. Still, a recommender system can use this pattern
to improve their refactoring recommendation. For example, if a de-
veloper is introducing non-structural changes along with Rename
Methods and Extract Methods, the system can alert the developer
that s/he may introduce a structural problem.
Moving data to remove the God Class.
We identied 11 re-
moval patterns associated with the God Class. This result shows that
developers often apply a wide range of non-trivial composites to
remove the smell across software project. For example, as discussed
in the previous paragraphs, the God Class was introduced when
the composites of Rename Methods and Extract Methods occurred
with additional changes. We found that these changes introduced
pieces of code that should not be in the classes, contributing to the
God Class. Later on, developers had to apply several refactorings
to move these pieces of code to the classes that suit them better,
removing the God Class. This behavior of applying refactorings
that move data is reected in the removal patterns. All the removal
patterns had refactorings that moved data between classes, except
for Inline Method and Extract Method. This scenario is another ex-
ample of why an element-based heuristic fails to show the eect
of composites on smells. To remove God Class, developers apply
refactorings that aect multiple elements, such as the classes to
which the data is moved. However, if we analyze only the scope of
a single element, we would not be able to notice that composites
moving data play a central role in the addition and removal of God
Classes. This behavior leads us to our next nding:
Finding 10
: The range-based heuristic detects how data is
moved among classes to either introduce or remove God Class.
Providing knowledge based on practice.
Although some pat-
terns emerge in the element-based heuristic, they only provide a
partial view of composite eects. Several of the composite patterns
reported here and in the replication package can only be identied
with range-based and commit-based heuristics. Even Fowler’s cat-
alog [
], which lists common composites to remove smells, does
not report our patterns. For example, Fowler’s catalog indicates
that developers should apply Extract Class or Extract Subclass to
remove a God Class. However, we noticed that developers much
more often follow other strategies regarding the refactoring types:
Inline Method,Extract Method,Pull Up Method and Attribute, and
Move Method. Thus, our results suggest that existing refactoring
catalogs [
] may not reect the practice. We also observed that
existing recommenders for code smell removal do not recommend
these patterns [
]. They should rene their recommenda-
tions with our smell-removal composite patterns.
We relied on the Refactoring Miner [
], which leads to a threat
associated with the false positives generated by the tool. To mini-
mize this threat, we manually validated each refactoring type (Sec-
tion 4.2). We observed a high precision for each refactoring type.
Some ndings are centered around the dierence among positive,
negative and neutral composites. However, if our classication
procedure is somewhat inaccurate, then we have a major threat to
the validity in our data. To mitigate that, we studied all the cases
where the classication procedure could be inaccurate (Section 5.2).
We found a risk of the classication scheme being wrong on 0.01%
of the cases. Thus, this risk was mitigated by the data disposition.
Our proposed heuristics may have limitations regarding how
they group refactorings (composite synthesis). For example, a rea-
son for dening the range-based heuristic is to capture composites
that would be incomplete from the commit-based perspective. Even
so, the range-base heuristic still can miss refactorings; thus, an
incomplete composite can be a complete one if we use another syn-
thesis strategy. One can consider these limitations as opportunities
for other researchers to dene their synthesis strategy. One could
also investigate an unied heuristic that infers for each refactoring,
exploring additional contextual information from where it occurs,
which is the most appropriate scope in that particular case.
We presented several patterns that remove or introduce smells.
We computed them by verifying how often they happen in the
projects, so they might suer from lack of generality. To avoid this
threat, we only reported patterns that happened in more than 50%
of the instances in our dataset. Additionally, to make sure that all
the three heuristics could nd these patterns, we veried the inter-
section among them. We found that 16 (out of 27) creational pattern
and 80 (out of 84) removal patterns were found by all heuristics.
Composite refactoring is common in practice, but a wide empirical
knowledge about it is scarce. To tackle this issue, rst, we pro-
vided a conceptual characterization of composites and dened two
heuristics to identify composites in dierent categories. Second, we
investigated how composites manifest in practice, and how they
aect the program structure. Our results show that to study compos-
ite we need to rely on dierent heuristics: they are complementary
to each other, but most empirical studies tend to use only a single
heuristic. For example, the identication of the semantically-related
refactorings was only possible using the commit-based and range-
based heuristics together. Similarly, the identication of several
composite-smell patterns were only possible with the range-based
Our results can be useful both for researchers and practitioners.
In particular, our study helped to explain conicting results in the
literature. For instance, dierent studies (e.g., [
] and [
]) have come
to dierent conclusions regarding the relation of refactoring types
with specic code smells. Thus, we provided new evidence that
there are composite patterns strongly related to the introduction
or removal of specic code smells (which explain the divergence in
their results). On the practical side, we contributed with insights
and a set of composite-smell patterns that are useful for improving
existing refactoring detection tools or recommender systems.
We want to thank the reviewers for their valuable suggestions. This
work is funded by CNPq (grants 434969/2018-4, 312149/2016-6),
CAPES (grant 175956), and FAPERJ (grant 22520-7/2016).
Characterizing and Identifying Composite Refactorings MSR ’20, October 5–6, 2020, Seoul, Republic of Korea
M Abbes, F Khomh, Y Gueheneuc, and G Antoniol. 2011. An Empirical Study
of the Impact of Two Antipatterns, Blob and Spaghetti Code, on Program Com-
prehension. In Proceedings of the 15th European Software Engineering Conference;
Oldenburg, Germany. 181–190.
Vahid Alizadeh and Marouane Kessentini. 2018. Reducing Interactive Refactoring
Eort via Clustering-based Multi-objective Search. In Proceedings of the 33rd
ACM/IEEE International Conference on Automated SoftwareEngine ering (ASE 2018).
ACM, New York, NY, USA, 464–474.
Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni, and Marouane
Kessentini. 2019. Do Design Metrics Capture Developers Perception of Quality?
An Empirical Study on Self-Armed Refactoring Activities. In 13th ACM/IEEE
International Symposium on Empirical Software Engineering and Measurement
(ESEM 2019).
Roberta Arcoverde, Isela Macia, Alessandro Garcia, and Arndt von Staa. 2012.
Automatically Detecting Architecturally-Relevant Code Anomalies. Proceedings
of the International Workshopon Recommendation Systems for Software Engineering
(2012), 90–91.
Gabriele Bavota, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, and
Fabio Palomba. 2015. An Experimental Investigation On The Innate Relationship
Between Quality And Refactoring. Journal of Systems and Software 107 (2015),
Ana Carla Bibiano, Eduardo Fernandes, Daniel Oliveira, Alessandro Garcia, Mar-
cos Kalinowski, Baldoino Fonseca, Roberto Oliveira, Anderson Oliveira, and
Diego Cedrim. 2019. A Quantitative Study on Characteristics and Eect of
Batch Refactoring on Code Smells. In 13th International Symposium on Empirical
Software Engineering and Measurement (ESEM). 1–11.
Arnaud Blouin, Valéria Lelli, Benoit Baudry, and Fabien Coulon. 2018. User
interface design smell: Automatic detection and refactoring of Blob listeners.
Information and Software Technology 102 (2018), 49 – 64.
Aline Brito, Andre Hora, and Marco Tulio Valente. 2020. Refactoring Graphs:
Assessing Refactoring over Time. In 2020 IEEE 27th International Conference on
Software Analysis, Evolution and Reengineering (SANER). IEEE.
Diego Cedrim, Leonardo da Silva Sousa, Alessandro F. Garcia, and Rohit Gheyi.
2016. Does Refactoring Improve Software Structural Quality? A Longitudinal
Study of 25 Projects. In Proceedings of the 30th Brazilian Symposium on Software
Engineering. ACM, New York, NY, USA, 73–82.
Diego Cedrim, Alessandro Garcia, Melina Mongiovi, Rohit Gheyi, Leonardo
Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez.
2017. Understanding the Impact of Refactoring on Smells: A Longitudinal Study
of 23 Software Projects. In Proceedings of the 11th Joint Meeting on Foundations
of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 465–475.
Alexander Chávez, Isabella Ferreira, Eduardo Fernandes, Diego Cedrim, and
Alessandro Garcia. 2017. How Does Refactoring Aect Internal Quality At-
tributes? A Multi-Project Study. In Proceedings of the 31st Brazilian Sympo-
sium on Software Engineering (SBES’17). ACM, New York, NY, USA, 74–83.
Rafael Maiani de Mello, Anderson G. Uchôa, Roberto Felicio Oliveira,
Willian Nalepa Oizumi, Jairo Souza, Kleyson Mendes, Daniel Oliveira, Baldoino
Fonseca, and Alessandro Garcia. 2019. Do Research and Practice of Code
Smell Identication Walk Together? A Social Representations Analysis. In 2019
ACM/IEEE International Symposium on Empirical Software Engineering and Mea-
surement, ESEM 2019, Porto de Galinhas, Recife, Brazil, September 19-20, 2019. IEEE,
Danny Dig, Kashif Manzoor, Ralph Johnson, and Tien N. Nguyen. 2007.
Refactoring-Aware Conguration Management for Object-Oriented Programs.
In Proceedings of the 29th International Conference on Software Engineering
(ICSE ’07). IEEE Computer Society, Washington, DC, USA, 427–436. https:
[14] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. 1999.
Refactoring: Improving The Design Of Existing Code (1st ed.). Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA. 464 pages.
Kenji Fujiwara, Kyohei Fushida, Norihiro Yoshida, and Hajimu Iida. 2013. As-
sessing Refactoring Instances and the Maintainability Benets of Them from
Version Archives. Springer Berlin Heidelberg, Berlin, Heidelberg, 313–323. 642-39259- 7_25
Birgit Geppert, Audris Mockus, and Frank Rossler. 2005. Refactoring for Change-
ability: A Way to Go?. In Proceedings of the 11th IEEE International Software
Metrics Symposium (METRICS ’05). IEEE Computer Society, Washington, DC,
USA, 13–.
Everton T.Guimarães, Alessandro F. Garcia, and Yuanfang Cai. 2015. Architecture-
sensitive heuristics for prioritizing critical code anomalies. In Proceedings of the
14th International Conference on Modularity, MODULARITY 2015, Fort Collins, CO,
USA, March 16 - 19, 2015, Robert B. France, Sudipto Ghosh, and Gary T. Leavens
(Eds.). ACM, 68–80.
Mark Harman and Laurence Tratt. 2007. Pareto optimal search based refactoring
at the design level. In 9th Genetic and Evolutionary Computation Conference
(GECCO). 1106–1113.
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2012. A Field
Study of Refactoring Challenges and Benets. In Proceedings of the ACM SIGSOFT
20th International Symposium on the Foundations of Software Engineering (FSE
’12). ACM, New York, NY, USA, Article 50, 11 pages.
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2014. An
Empirical Study of Refactoring Challenges and Benets at Microsoft. IEEE
Transactions on Software Engineering 40, 7 (2014), 633–649.
H. Kirinuki, Y. Higo, K. Hotta, and S. Kusumoto. 2016. Splitting Commits via
Past Code Changes. In 2016 23rd Asia-Pacic Software Engineering Conference
(APSEC). 129–136.
Martin Kuhlemann, Liang Liang, and Gunter Saake. 2010. Algebraic and cost-
based optimization of refactoring sequences. In 2nd International Workshop on
Model-driven Product Line Engineering (MDPLE). 37–48.
Michele Lanza and Radu Marinescu. 2010. Object-Oriented Metrics in Practice:
Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-
Oriented Systems (1st ed.). Springer Publishing Company, Incorporated.
Yun Lin, Xin Peng, Yuanfang Cai, Danny Dig, Diwen Zheng, and Wenyun Zhao.
2016. Interactive and guided architectural refactoring with search-based recom-
mendation. In 24th International Symposium on Foundations of Software Engineer-
ing (FSE). 535–546.
Kui Liu, Dongsun Kim, Tegawendé F. Bissyandé, Taeyoung Kim, Kisub Kim,
Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to Spot and
Refactor Inconsistent Method Names. In Proceedings of the 41st International
Conference on Software Engineering (ICSE ’19). IEEE Press, Piscataway, NJ, USA,
Isela Macia. 2013. On The Detection Of Architecturally Relevant Code Anomalies
In Software Systems. Ph.D. Dissertation. Pontical Catholic University of Rio de
Isela Macia, Roberta Arcoverde, Alessandro Garcia, Christina Chavez, and Arndt
von Staa. 2012. On the Relevance of Code Anomalies for Identifying Architecture
Degradation Symptoms. Proceedings of the 16th European Conference on Software
Maintenance and Reengineering (2012), 277–286.
Mehran Mahmoudi, Sarah Nadi, and Nikolaos Tsantalis. 2019. Are Refactorings
to Blame? An Empirical Study of Refactorings in Merge Conicts. In 2019 IEEE
26th International Conference on Software Analysis, Evolution and Reengineering
(SANER). IEEE, 151–162.
Leandra Mara, Gustavo Honorato, Francisco Dantas Medeiros, Alessandro Garcia,
and Carlos Lucena. 2011. Hist-Inspect: A Tool for History-Sensitive Detection of
Code Smells. In Proceedings of the 10th International Conference on Aspect-oriented
Software Development Companion (AOSD ’11). ACM, New York, NY, USA, 65–66.
Panita Meananeatra. 2012. Identifying Refactoring Sequences For Improving
Software Maintainability. In Proceedings of the 27th IEEE/ACM International
Conference on Automated Software Engineering. ACM Press, New York, New
York, USA, 406–409.
Mohamed Wiem Mkaouer, Marouane Kessentini, Slim Bechikh, Kalyanmoy Deb,
and Mel Ó Cinnéide. 2014. Recommendation system for software refactoring
using innovization and interactive dynamic optimization. In 29th International
Conference on Automated Software Engineering (ASE). 331–336.
E. Murphy-Hill and A. P. Black. 2008. Refactoring Tools: Fitness for Purpose.
IEEE Software 25, 5 (Sep. 2008), 38–44.
E. Murphy-Hill, C. Parnin, and A. P. Black. 2012. How We Refactor, and How We
Know It. IEEE Transactions on Software Engineering 38, 1 (2012), 5–18. https:
Mel Ó Cinnéide and Paddy Nixon. 2000. Composite refactorings for Java programs.
In Proceedings of the Workshop on Formal Techniques for Java Programs, co-located
with the 14th European Conference on Object-Oriented Programming (ECOOP).
Willian Nalepa Oizumi, Leonardo da Silva Sousa, Anderson Oliveira, Alessandro
Garcia, O. I. Anne Benedicte Agbachi, Roberto Felicio Oliveira, and Carlos Lucena.
2018. On the identication of design problems in stinky code: experiences and
tool support. J. Braz. Comp. Soc. 24, 1 (2018), 13:1–13:30.
Willian Nalepa Oizumi, Alessandro F. Garcia, Leonardo da Silva Sousa, Bruno
Barbieri Pontes Cafeo, and Yixue Zhao. 2016. Code anomalies ock together:
exploring code anomaly agglomerations for locating design problems. In Pro-
ceedings of the 38th International Conference on Software Engineering, ICSE 2016,
Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and Laurie
Williams (Eds.). ACM, 440–451.
Mark O’Keee and Mel Ó Cinnéide. 2008. Search-based Refactoring: An Empirical
Study. J. Softw. Maint. Evol. 20, 5 (Sept. 2008), 345–364.
MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Sousa et al.
Roberto Felicio Oliveira, Leonardo da Silva Sousa, Rafael Maiani de Mello, Natasha
M. Costa Valentim, Adriana Lopes, Tayana Conte, Alessandro F. Garcia, Edson
Cesar Cunha de Oliveira, and Carlos José Pereira de Lucena. 2017. Collaborative
Identication of Code Smells: A Multi-Case Study. In 39th IEEE/ACM International
Conference on Software Engineering: Software Engineering in Practice Track, ICSE-
SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. IEEE Computer Society,
Roberto Felicio Oliveira, Rafael Maiani de Mello, Eduardo Fernandes, Alessandro
Garcia, and Carlos Lucena. 2020. Collaborative or individual identication of
code smells? On the eectiveness of novice and professional developers. Inf.
Softw. Technol. 120 (2020).
William F. Opdyke. 1992. Refactoring Object-oriented Frameworks. Ph.D. Disserta-
tion. Champaign, IL, USA. UMI Order No. GAX93-05645.
Ali Ouni, Marouane Kessentini, Mel Ó Cinnéide, Houari Sahraoui, Kalyanmoy
Deb, and Katsuro Inoue. 2017. MORE: A multi-objective refactoring recommen-
dation approach to introducing design patterns and xing code smells. Journal
of Software: Evolution and Process 29, 5 (2017), e1843.
Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refac-
toring using recorded code changes. In 17th European Conference on Software
Maintenance and Reengineering (CSMR). 221–230.
[43] 2020 Replication Package. 2020. https://
Fabio Palomba, Andy Zaidman, Rocco Oliveto, and Andrea De Lucia. 2017. An
exploratory study on the relationship between changes and refactoring. In 2017
IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE,
E. Piveta, J. Araujo, M. Pimenta, A. Moreira, P. Guerreiro, and R. T. Price. 2008.
Searching for Opportunities of Refactoring Sequences: Reducing the Search
Space. In 2008 32nd Annual IEEE International Computer Software and Applications
Conference. 319–326.
K. Prete, N. Rachatasumrit, N. Sudan, and M. Kim. 2010. Template-Based Recon-
struction of Complex Refactorings. In Proceedings of IEEE International Conference
on Software Maintenance. 1–10.
Jacek Ratzinger, Thomas Sigmund, and Harald C Gall. 2008. On The Relation of
Refactorings and Software Defect Prediction. In Proceedings of the International
Workshop on Mining Software Repositories. ACM Press, New York, New York, USA,
Veselin Raychev, Max Schäfer, Manu Sridharan, and Martin Vechev. 2013. Refac-
toring with synthesis. ACM SIGPLAN Notices 48, 10 (2013), 339–354.
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why We Refac-
tor? Confessions of GitHub Contributors. In Proceedings of the 24th ACM SIGSOFT
International Symposium on Foundations of Software Engineering (FSE 2016). ACM,
New York, NY, USA, 858–870.
Gábor Szőke, Gábor Antal, Csaba Nagy, Rudolf Ferenc, and Tibor Gyimóthy. 2017.
Empirical study on refactoring large-scale industrial systems and its eects on
maintainability. Journal of Systems and Software 129 (2017), 107–126.
Nikolaos Tsantalis, Theodoros Chaikalis, and Alexander Chatzigeorgiou. 2018.
Ten years of JDeodorant: Lessons learned from the hunt for smells. In 2018 IEEE
25th International Conference on Software Analysis, Evolution and Reengineering
(SANER). IEEE, 4–14.
Nikolaos Tsantalis, Matin Mansouri, Laleh M. Eshkevari, Davood Mazinanian, and
Danny Dig. 2018. Accurate and Ecient Refactoring Detection in Commit History.
In Proceedings of the 40th International Conference on Software Engineering (ICSE
’18). ACM, New York, NY, USA, 483–494.
Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano
Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and Why Your
Code Starts to Smell Bad. In Proceedings of the 37th International Conference on
Software Engineering (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 403–414.
Carmine Vassallo, Giovanni Grano, Fabio Palomba, Harald C. Gall, and Alberto
Bacchelli. 2019. A large-scale empirical exploration on refactoring activities in
open source software projects. Science of Computer Programming 180 (2019), 1 –
Santiago A. Vidal, Willian Nalepa Oizumi, Alessandro Garcia, J. Andres Diaz-Pace,
and Claudia Marcos. 2019. Ranking architecturally critical agglomerations of
code smells. Sci. Comput. Program. 182 (2019), 64–85.
Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-Smell
Relations on Software Maintainability: An Empirical Study. Proceedings of the
International Conference on Software Engineering (2013), 682–691. https://doi.
Aiko Yamashita and Leon Moonen. 2013. To What Extent can Maintenance
Problems be Predicted by Code Smell Detection? An Empirical Study. Information
and Software Technology 55, 12 (2013), 2223–2242.
Young Seok Yoon and Brad A. Myers. 2015. Supporting Selective Undo in a Code
Editor. In Proceedings of the 37th International Conference on Software Engineering
- Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 223–233. http://dl.acm.
Trevor J. Young. 2005. Using AspectJ to build a software product line for mobile
devices. Ph.D. Dissertation.
... Other studies explored the developers' intentions to engage in refactoring actions, including MMR, where developers were found to recognize that improvement of quality metrics is the main driver for refactoring [54]. MMR was found to be most used by developers in refactoring activities targeting code reusability [55], code readability [56], and feature envy code smell [57], addressing self-admitted technical debt [58], and improving energy consumption [59,60]. Developers should be aware of some of the unwanted consequences of using MMR, such as breaking class API in instances, especially when MMR is used in changes that introduce new features or bug fixes [61]. ...
... MMR was predicted to occur with changes that target bug fixing, introduce a new feature, or perform general maintenance [67,68]. It was also found to occur in association with increases in CBO, LOC, and LCOM [58]. ...
Full-text available
Refactoring is a maintenance task that aims at enhancing the quality of a software’s source code by restructuring it without affecting the external behavior. Move method refactoring (MMR) involves reallocating a method by moving it from one class to the class in which the method is used most. Several studies have been performed to explore the impact of MMR on several quality attributes. However, these studies have several limitations related to the applied approaches, considered quality attributes, and size of the selected datasets. This paper reports an empirical study that applies statistical and machine learning (ML) approaches to explore the impact of MMR on code quality. The study overcame the limitations of the existing studies, and this improvement is expected to make the results of this study more reliable and trustworthy. We considered eight quality attributes and thirty quality measures, and a total of approximately 4 K classes from seven Java open-source systems were involved in the study. The results provide evidence that most of the quality attributes were significantly improved by MMR in most cases. In addition, the results show that a limited number of measures, when considered individually, have a significant ability to predict MMR, whereas most of the considered measures, when considered together, significantly contribute to the MMR prediction model. The constructed ML-based prediction model has an area under curve (AUC) value of 96.6%.
... The significance of refactoring within software development processes has grown significantly, as it can be prompted by various factors such as new requirements, adaptation to diverse contexts, and subpar quality [3], [9]. Refactoring is widely adopted as a prominent technique for enhancing the quality of existing software systems in practical settings [10], [11], [12], establishing an important connection between quality attributes [13]. ...
Full-text available
The expenses associated with software maintenance and evolution constitute a significant portion, surpassing more than 80% of the overall costs involved in software development. Refactoring, a widely embraced technique, plays a crucial role in streamlining and minimizing maintenance activities and expenses. However, the effect of refactoring techniques on quality attributes presents inconsistent and conflicting findings, making it challenging for software developers to enhance software quality effectively. Additionally, the absence of a comprehensive framework further complicates the decision-making process for developers in selecting appropriate refactoring techniques aligned with specific design objectives. In light of these considerations, this research aims to introduce a novel framework for classifying refactoring techniques based on their measurable influence on internal quality attributes. Initially, an exploratory study was conducted to identify commonly employed refactoring techniques, followed by an experimental analysis involving five case studies to evaluate the effects of these techniques on internal quality attributes. Subsequently, the framework was constructed based on the outcomes of the exploratory and experimental studies, further reinforced by a multi-case analysis. Comprising three key components, namely the methodology for applying refactoring techniques, the Quality Model for Object-Oriented Design (QMOOD), and the classification scheme for refactoring techniques, this proposed framework serves as a valuable guideline for developers. By comprehending the effect of each refactoring technique on internal quality attributes, developers can make informed decisions and select suitable techniques to enhance specific aspects of their software. Consequently, this framework optimizes developers’ time and effort by minimizing the need to weigh the pros and cons of different refactoring techniques, potentially leading to a reduction in maintenance activities and associated costs.
... We look at the entire change set of PRs without differentiating between initial and refactoring-inducing commits, which can be investigated in a separate study. Sousa et al. [35] introduce the idea of composite refactorings that consist of multiple atomic refactorings of the same or different types and that are considered as one change. Our testability refactoring patterns can be considered as composite refactorings. ...
To create unit tests, it may be necessary to refactor the production code, e.g. by widening access to specific methods or by decomposing classes into smaller units that are easier to test independently. We report on an extensive study to understand such composite refactoring procedures for the purpose of improving testability. We collected and studied 346,841 java pull requests from 621 GitHub projects. First, we compared the atomic refactorings in two populations: pull requests with changed test-pairs (i.e. with co-changes in production and test code and thus potentially including testability refactoring) and pull requests without test-pairs. We found significantly more atomic refactorings in test-pairs pull requests, such as Change Variable Type Operation or Change Parameter Type. Second, we manually analyzed the code changes of 200 pull requests, where developers explicitly mention the terms "testability" or "refactor + test". We identified ten composite refactoring procedures for the purpose of testability, which we call testability refactoring patterns. Third, we manually analyzed additional 524 test-pairs pull requests: both randomly selected and where we assumed to find testability refactorings, e.g. in pull requests about dependency or concurrency issues. About 25% of all analyzed pull requests actually included testability refactoring patterns. The most frequent were extract a method for override or for invocation, widen access to a method for invocation, and extract a class for invocation. We also report on frequent atomic refactorings which co-occur with the patterns and discuss the implications of our findings for research, practice, and education
Refactoring is a program transformation to improve the internal structure of a program while preserving its external behavior. Developers frequently apply multiple refactorings that depend on each other to achieve goals such as improving code reusability. Although manually applying a sequence of dependent refactorings is a common practice, existing refactoring recommendation tools treat refactorings in isolation without revealing the dependencies among them to developers. One reason is that these relationships among refactorings are poorly understood. Current approaches treat refactoring recommendations as a strictly ordered sequence limiting developers’ ability to understand, validate, and apply recommended refactorings. To address this gap, this paper describes a theory for reasoning about collections of refactorings through defining an ordering dependency relation among refactorings and organizing collection of refactorings as a set of refactoring graphs. We propose an algorithm for identifying refactoring dependencies and illustrate these concepts with a tool for visualizing such refactoring dependencies and refactoring graphs. Our validation results demonstrate that 43% of the 1,457,873 recommended refactorings from 9,595 projects that we studied are part of dependent refactoring graphs. Furthermore, refactorings are not only commonly involved in dependent relations, but also when applied, dependent refactoring graphs improve all of the quality attribute metrics in our experiments more than individual refactorings.
Catalogs of refactoring have key importance in software maintenance and evolution, since developers rely on such documents to understand and perform refactoring operations. Furthermore, these catalogs constitute a reference guide for communication between practitioners since they standardize a common refactoring vocabulary. Fowler's book describes the most popular catalog of refactorings, which documents single and well‐known refactoring operations. However, sometimes, refactorings are composite transformations, that is, a sequence of refactorings is performed over a given program element. For example, a sequence of Extract Method operations (a single refactoring) can be performed over the same method, in one or in multiple commits, to simplify its implementation, therefore, leading to a Method Decomposition operation (a composite refactoring). In this paper, we propose and document a catalog with eight composite refactorings. We also implement a set of scripts to mine composite refactorings by preprocessing the results of refactoring detection tools. Using such scripts, we search for composites in a representative refactoring oracle with hundreds of confirmed single refactoring operations. Next, to complement this first study, we also search for composites in the full history of 10 well‐known open‐source projects. We characterize the detected composite refactorings, under dimensions such as size and location. We conclude by addressing the applications and implications of the proposed catalog. Catalogs of refactoring have key importance in software development, since developers rely on such documents to perform refactoring operations, and they also act as a reference guide for communication among practitioners. In this paper, we propose and document a catalog with eight composite refactorings, that is, sequences of refactorings performed over a given program element. We searched for occurrences of each composite instance in real scenarios. We characterize the detected composites, and we also address applications and implications.
Full-text available
Context: code refactoring is a code transformation that aims to improve software quality. A composite refactoring (or, simply, composite) is defined by two or more interrelated refactorings, which is often applied by developers. Each composite needs to be somehow represented and has its own characteristics (e.g., code scope) as well as its effects on software quality. However, these basic elements of composites are rarely studied systematically. The lack of systematic knowledge also misguides the design of automated support tools for supporting composite refactoring. Thus, researchers might have controversial views about basic elements of composite refactorings. An example of these literature conflicts concerns the effect of composites: while some studies suggest composites more often remove code smells, other studies indicate composites often introduce code smells. Objective: in this sense, our study aims at analyzing the technical literature of composite refactoring and building a conceptual framework of the representation models, characteristics, and the effect of composite refactoring. Method: we conducted a systematic mapping with 140 primary empirical studies about refactoring. Our systematic mapping summarizes the current knowledge on composites and also presents a conceptual framework intended to characterize composite refactoring. Results: our conceptual framework presents seven representation models, nine characteristics, and thirteen effects of composites. We found out that studies used multidimensional representations, like graphs, to determine what refactoring(s) may be suggested and combined. On composite characteristics, studies mentioned developers often finish a composite in up to a month. However, these studies do not detail why and when composites span for several weeks. Then, we discussed other existing gaps on the current literature of composites. For instance, while most of the studies report the effect of composites on internal software quality, e.g., code smells, their effect on external software quality is little explored. Conclusion: Our results can motivate future studies to more deeply investigate composite refactoring applications, and the improvement of tooling support for composite refactorings.
Conference Paper
Full-text available
Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence , this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences of modifications performed over time. To study and reason about refactorings over time, in this paper, we propose a novel concept called refactoring graphs and provide an algorithm to build such graphs. Then, we investigate the history of 10 popular open-source Java-based projects. After eliminating trivial graphs, we characterize a large sample of 1,150 refactoring graphs, providing quantitative data on their size, commits, age, refactoring composition, and developers. We conclude by discussing applications and implications of refactoring graphs, for example, to improve code comprehension, detect refactoring patterns, and support software evolution studies.
Conference Paper
Full-text available
Context: It is frequently claimed the need for bridging the gap between software engineering research and practice. In this sense, the theory of social representations may be useful to characterize the actual concerns of software developers. It comprises the system of values, behaviors, and practices of communities regarding a particular social object, such as the task of smell identification. Aim: To characterize the social representations of smell identification by software developers. Method: Based on the answers given to a questionnaire, we analyzed the associations made by the developers about smell identification, i.e., what immediately comes to their minds when they think about this task. Results: We found that developers strongly associate smell identification with the practice of smell removal and with the incidence of bugs. They also frequently associate the task with the practice of inspection and with the need of having individual skills. Besides, we verified that the current state of the art on smell identification partially address the social representations of the software developers. Conclusion: There is a considerable gap between the research of smell identification and its practice. We propose directions to mitigating this gap.
Conference Paper
Full-text available
Background: Code refactoring aims to improve code structures via code transformations. A single transformation rarely suffices to fully remove code smells that reveal poor code structures. Most transformations are applied in batches, i.e. sets of interrelated transformations, rather than in isolation. Nevertheless, empirical knowledge on batch application, or batch refactoring, is scarce. Such scarceness helps little to improve current refactoring practices. Aims: We analyzed 57 open and closed software projects. We aimed to understand batch application from two perspectives: characteristics that typically constitute a batch (e.g., the variety of transformation types employed), and the batch effect on smells. Method: We analyzed 19 smell types and 13 transformation types. We identified 4,607 batches, each applied by the same developer on the same code element (method or class); we expected to have batches whose transformations are closely interrelated. We computed (1) the frequency in which five batch characteristic manifest, (2) the probability of each batch characteristics to remove smells, and (3) the frequency in which batches introduce and remove smells. Results: Most batches are quite simple: although most batches are applied on more than one method (90%), they are usually composed of the same transformation type (72%) and only two transformations (57%). Batches applied on a single method are 2.6 times more prone to fully remove smells than batches affecting more than one method. Surprisingly, batches mostly ended up introducing (51%) or not fully removing (38%) smells. Conclusions: The batch simplicity suggests that developers have sub-explored the combinations of transformations within a batch. We summarized some batches that may fully remove smells, so that developers can incorporate them into current refactoring practices.
Full-text available
Abstract Background Developers often have to locate design problems in the source code. Several types of design problems may manifest as code smells in the program. A code smell is a source code structure that may reveal a partial hint about the manifestation of a design problem. Recent studies suggest that developers should ignore smells occurring in isolation in a program location. Instead, they should focus on analyzing stinkier code, i.e., program locations—e.g., a class or a hierarchy—affected by multiple smells. There is evidence that the stinkier a program location is, the more likely it contains a design problem. However, there is no empirical evidence on whether developers can effectively identify a design problem in stinkier code. Developers may struggle to make an analysis of inter-related smells affecting the same program location. Besides that, the analysis of stinkier code may require proper tool support due to its analysis complexity. However, there is little knowledge on what are the requirements for a tool that helps developers in revealing stinkier program locations. As a result, developers may not be able to identify design problems due to tool issues. Method To address this matter, we aimed at achieving three goals. In the first case, we proposed Organic—a tool supporting the analysis of stinky code. In the second case, we applied a mixed-method approach to analyze if and how developers can effectively find design problems when reflecting upon stinky code—i.e., a program location affected by multiple smells. We conducted a study with 11 software professionals. Finally, in the third case, we aimed at understanding if Organic could be used by developers to identify design problems. To achieve this goal, we used a method from the Semiotic Engineering theory. This method enabled us to evaluate what are the tool issues that may hinder the identification of design problems in stinky code. Result Our study revealed that only 36.36% of the developers found more design problems when explicitly reasoning about multiple smells as compared to single smells. Moreover, 63.63% of the developers reported much lesser false positives when using the first approach as compared to the latter. The second study, in its turn, showed that most developers may be unable to identify design problems in stinky code without proper tool support. Conclusion Our experiences, in particular the second study, helped us to refine the features of Organic for better supporting developers in reflecting upon stinkier code. For example, analyses of stinky code scattered in class hierarchies or packages is often difficult, time-consuming, and requires proper visualization support. Moreover, without effective support, it remains time-consuming to discard stinky program locations that do not represent design problems.
Context The code smell identification aims to reveal code structures that harm the software maintainability. Such identification usually requires a deep understanding of multiple parts of a system. Unfortunately, developers in charge of identifying code smells individually can struggle to identify, confirm, and refute code smell suspects. Developers may reduce their struggle by identifying code smells in pairs through the collaborative smell identification. Objective The current knowledge on the effectiveness of collaborative smell identification remains limited. Some scenarios were not explored by previous work on effectiveness of collaborative versus individual smell identification. In this paper, we address a particular scenario that reflects various organizations worldwide. We also compare our study results with recent studies. Method We have carefully designed and conducted a controlled experiment with 34 developers. We exploited a particular scenario that reflects various organizations: novices and professionals inspecting systems they are unfamiliar with. We expect to minimize some critical threats to validity of previous work. Additionally, we interviewed 5 project leaders aimed to understand the potential adoption of the collaborative smell identification in practice. Results Statistical testing suggests 27% more precision and 36% more recall through the collaborative smell identification for both novices and professionals. These results partially confirm previous work in a not previously exploited scenario. Additionally, the interviews showed that leaders would strongly adopt the collaborative smell identification. However, some organization and tool constraints may limit such adoption. We derived recommendations to organizations concerned about adopting the collaborative smell identification in practice. Conclusion We recommend that organizations allocate novice developers for identifying code smells in collaboration. Thus, these organizations can promote the knowledge sharing and the correct smell identification. We also recommend the allocation of developers that are unfamiliar with the system for identifying smells. Thus, organizations can allocate more experience developers in more critical tasks.
Code smells are symptoms in the source code that could help to identify architectural problems. However, developers may feel discouraged to analyze multiple smells if they are not able to focus their attention on a small set of source code locations. Unfortunately, current techniques fall short in assisting developers to prioritize smelly locations that are likely to indicate architectural problems. Furthermore, developers often have trouble analyzing interconnected smells that contribute together to realize an architectural problem. To deal with these issues, this work presents and evaluates a suite of five criteria for ranking groups of code smells as indicators of architectural problems in evolving systems. These criteria were implemented in a tool called JSpIRIT. In a first experiment, we have assessed the criteria in the context of 23 versions of 4 systems and analyzed their effectiveness for revealing architectural problem locations. In addition, we conducted a second experiment for analyzing similarities between the prioritization provided by developers and the prioritization provided by our best performing criterion. The results provide evidence that one of the proposed criteria helped to correctly prioritize more than 80 code locations of architectural problems, alleviating tedious manual inspection of the source code vis-a-vis with the architecture.
Refactoring is a well-established practice that aims at improving the internal structure of a software system without changing its external behavior. Existing literature provides evidence of how and why developers perform refactoring in practice. In this paper, we continue on this line of research by performing a large-scale empirical analysis of refactoring practices in 200 open source systems. Specifically, we analyze the change history of these systems at commit level to investigate: (i)whether developers perform refactoring operations and, if so, which are more diffused and (ii)when refactoring operations are applied, and (iii)which are the main developer-oriented factors leading to refactoring. Based on our results, future research can focus on enabling automatic support for less frequent refactorings and on recommending refactorings based on the developer's workload, project's maturity and developer's commitment to the project.