Factors Characterizing Reopened Issues: A Case Study
Bora Caglayan1, Ayse Tosun Misirli2, Andriy Miranskyy3, Burak Turhan4, Ayse Bener5
Bogazici University1,2, IBM Canada Ltd.3, University of Oulu4, Ryerson University5
Department of Computer Engineering, Istanbul, Turkey1,2
IBM Toronto Software Laboratory, Toronto, ON Canada3
Department of Information Processing Science, Oulu, Finland4
Ted Rogers School of Information Technology Management, Toronto, ON Canada5
email@example.com 3, burak.turhan@oulu.ﬁ 4, firstname.lastname@example.org
Background: Reopened issues may cause problems in man-
aging software maintenance eﬀort. In order to take actions
that will reduce the likelihood of issue reopening the possi-
ble causes of bug reopens should be analysed.
Aims: In this paper, we investigate potential factors that
may cause issue reopening.
Method: We have extracted issue activity data from a large
release of an enterprise software product. We consider four
dimensions, namely developer activity,issue proximity net-
work,static code metrics of the source code changed to ﬁx
an issue, issue reports and ﬁxes as possible factors that may
cause issue reopening. We have done exploratory analysis
on data. We build logistic regression models on data in or-
der to identify key factors leading issue reopening. We have
also conducted a survey regarding these factors with the QA
Team of the product and interpreted the results.
Results: Our results indicate that centrality in the issue
proximity network and developer activity are important fac-
tors in issue reopening. We have also interpreted our results
with the QA Team to point out potential implications for
Conclusions: Quantitative ﬁndings of our study suggest
that issue complexity and developers workload play an im-
portant role in triggering issue reopening.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics—process metrics,
complexity measures, performance measures
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
PROMISE ’12, September 21–22, 2012, Lund, Sweden
Copyright 2012 ACM 978-1-4503-1241-7/12/09 ...$15.00.
software maintenance, issue management, issue repository,
Issue management is a key activity within software projects
especially during the maintenance phase. A steady inﬂux of
new issue reports are ﬁltered, prioritized, assigned and han-
dled by the maintainers daily for popular software. Man-
aging the issue handling process eﬀectively is an important
element for the long term stability of the software product.
In an idealized issue management scenario, the issue re-
port and the records of issue activity are moved to the
archives indeﬁnitely after an issue is closed. Reopened is-
sues are a group of issues that are exceptions to this ideal-
ized scenario. These issues go through the issue handling
procedures at least one more time after they are archived.
Understanding reopened issues is of signiﬁcant interest to
practitioners, since these issues may represent a miscommu-
nication between issue assigner and assignee. Furthermore,
reopened issues may cause waste of time and eﬀort if they
are frequent in issue repositories.
We asked the the quality assurance (QA) team of a large
scale company about the possible reasons of reopening and
the beneﬁts of investigating these reasons. The opinions of
the QA team are as follows:
•Reopened issues may have diﬀerent explanations, but
the major one is a back and forth discussion between
issue originator and owner on “yes, it’s bug /no, it
works as designed”. Most of the time, people spent
arguing on the issue and it is assigned to “opened”
multiple times due to this discussion.
•Reopened issues consist of less than 10% of all issues,
and hence it is not a major pain point. But, they
also cause waste of resources. Identifying the main
reasons for these issues gives us a chance for process
improvement. Reducing this ratio down to 5% would
also be very beneﬁcial.
We consider an issue as reopened if it has changed from a
termination state such as closed, cancelled or postponed to
an active state such as assigned or work-in progress. Identi-
fying the factors that lead to issue reopens is crucial in such
a situation in order to take the necessary actions.
In this paper our research question is identifying the pos-
sible factors that may lead to issues getting reopened:
•RQ: Which factors lead to issues getting reopened?
In order to answer our research question, we analysed a
large scale enterprise software and its issue and code repos-
itory. We modelled four dimensions, namely 1)issue-code
relation, 2)issue proximity network, 3)issue reports and 4)
developer activity, in order to check their individual eﬀects
on issues getting reopened.
In the analysis, we used logistic regression to ﬁt a predic-
tive model on the issue data. As a ﬁrst step, we conducted
univariate regression by using each of the factors from the
four dimensions namely developer activity, issue-code rela-
tion, issue proximity network and the issue report. After
that, we did an exhaustive search for the best factor combi-
nation by optimizing the AUC (area under the receiver op-
erating characteristic curve) for all the possible factor com-
binations. Finally, we showed the model with the highest
success rate among all possible model combinations.
Previously two independent research groups investigated
the factors that may cause issue reopens for Microsoft Win-
dows  and Eclipse projects . However, to the best of
our knowledge, factors used in our study related to issue-
code relation and issue proximity network have not been
considered previously. The main contribution of this paper
is analysis of factors that lead to issues getting reopened in
a large-scale software developed in multiple locations. Since
some of the measures used in the two previous studies are
not extractable for our dataset, our aim is complementing
the ﬁndings of previous researchers rather than testing their
ﬁndings on our new dataset.
The rest of the paper is structured as follows: In Related
Work section, we discuss the relevant work on understanding
the reasons for issue reopening. In Methodology, we present
the dataset, data extraction process and metrics and logistic
regression models. In Results section, we show and inter-
pret the outcomes of the logistic regression model. Finally
a discussion of threats to the validity of the results and con-
clusions with possible future research topics are presented.
In this paper, we perform quantitative analysis on issue
activity database, whose attributes are described in Section
2.1, extracted factors that may have signiﬁcant inﬂuences on
reopened issues (Section 2.2), and built a statistical model
to interpret them (Section 2.3).
We have used issue activity database of a large-scale en-
terprise software product which has a long development his-
tory with a 20 years old code base. The company uses IBM
Rational ClearQuest with customised defect forms as the is-
sue management system. In this database, each issue record
includes, but not limited to, the following features:
•Originator: The person who opens an issue. Often,
testers (or support personnel) are the originators.
•Owner: The person who is assigned to an issue. The
owner is often the developer, who ﬁxes an issue espe-
cially when issue is classiﬁed as a defect.
•State: There are 11 distinct states in the database of
the company: Opened, Assigned, Working, Delivered,
Returned, Integrated, Validated, Rejected, Closed, Post-
•Phase Found: It indicates the phase in the develop-
ment life cycle an issue is reported. A list of phases are
summarized in Appendix with their occurrence rates
in the issue database.
•Symptom: This is a sign of problem (“crash/outage”)
experienced by a customer. Some of the common symp-
toms are Build/Test Failed, Core Dump, Program De-
fect, Incorrect I/O. We have found that symptoms are
signiﬁcantly correlated with phase found.
A typical life cycle of an issue in our case study can be
seen in Figure 1. Bold arrows show a typical life cycle, while
the dashed arrows numbered with (1), (3), (4) and (5) in-
dicate a reopening. It is often the case when an issue is
assigned right from the start and the owner starts working
on it immediately (arrow number (2)). The ﬁnal status of
an issue is stored in State ﬁeld, but changes of the state are
stored in two other ﬁelds (old state and new state) with the
person id (generally owner or originator) who makes this
Figure 1: Life cycle of an issue in our case study
In our case study, issue activity database contains 3645
unique issue reports with the earliest record opened on Jan-
uary, 2005. We ﬁltered only closed issues from this database
and obtained 2287 issues in total. Of these issues, 219 (ap-
proximately 9%) are classiﬁed as reopened. This dataset is
further ﬁltered as we incorporate factors from the code base.
2.2 Factors Affecting Issue Reopening
We have previously extracted code, network and churn
metrics from the code base of the same product 16 months
prior to release date at the method level. Our data extrac-
tion methodology and prediction models built with these
metric sets can be read from  and . In this paper, we
identify four dimensions about 1) developers who ﬁx these
issues, 2) size and complexity of methods edited during issue
ﬁxes, 3) the relationship between issues and 4) other factors
about issue reports and ﬁx activities. For each of these fac-
tors, we deﬁne hypotheses. We also deﬁne abbreviations for
all factors to ease their usage on tables and ﬁgures.
Of these 2287 closed issues in the database, 1318 issue
records are matched with developer activities in which there
are 88 (7%) reopened issues. Furthermore, when code and
network metrics are extracted, these issues are mapped with
source code at method level. After issue-code mapping, ﬁnal
dataset includes 1046 issues, of which 62 (6%) are reopened.
2.2.1 Developer Activity
Previously, Guo et al.  and Zimmermann et al.  de-
ﬁned bug opener’s reputation as the ratio of “total number of
previous bugs opened and gotten ﬁxed”over “total number of
bugs he/she opened”. In a case study on Firefox performance
and security bugs, authors analysed ”who ﬁxes these bugs”
by measuring the expertise of developers in terms of num-
ber of previously ﬁxed bugs by the developer and experience
in days, i.e. number of days from the ﬁrst ﬁx by the given
developer to the latest bug’s ﬁx date . Similar to these
approaches, we extracted 6 diﬀerent metrics representing the
development activities of issue owners, i.e., developers who
owned and ﬁxed issues.
We analyse developers’ defect ﬁxing activity to verify the
H1: Issues owned by developers, who ﬁx few issues, are
more likely to be reopened.
H2: Issues owned by developers, who haven’t ﬁxed an
issue for a long time, are more likely to be reopened.
H3: Issues owned by developers, who edit large number
of methods to ﬁx an issue, are more likely to be reopened.
# Fixed Issues (dev ﬁx count): To test H1, we com-
puted the number of previously ﬁxed issues of a developer
who owns the current issue, up to the current issue’s open-
ing date. For example, suppose john.black@XXX.com owns
issue ID #120 which was opened on May 2009. Then, we cal-
culated the number of previously ﬁxed issues by john.black
since May 2009. Therefore, even though two issues are ﬁxed
by the same developer, this metric’s value may change for
these issues if their opening dates are diﬀerent from each
Rank in terms of # Fixed Issues (dev rank ﬁx count):
This metric is calculated using Fixed Issue Count and ranks
of developers who owned and ﬁxed issues. For each issue’s
opening date, # ﬁxed issues and ranks by developers are
re-calculated. For example, suppose john.black@ XXX.com
ﬁxed issue ID #120 which was opened on May, 5th 2009 and
closed on June, 6th 2009, and he also ﬁxed issue ID #133
which was opened on July, 5th 2009 and closed on October,
6th 2009. Considering this case, we computed previously
ﬁxed issues for john.black twice, both for May 2009 (k) and
July 2009 (k+1 by adding issue #120). Ranks of this devel-
oper is also calculated based on other developers’ issue ﬁx
performance on May 2009 and July 2009. So, john.black can
be at the ﬁrst rank on May 2009 with kﬁxed issues, but at
the third rank on July 2009 with (k+1) ﬁxed issues.
Duration between the First and Last Fix (dev ﬁrst
last ﬁx): This metric computes the number of days from
the ﬁrst ﬁx of the developer to the current issue’s ﬁx date.
To test H2, we computed this metric for all developers asso-
ciated with issues in our database. The reason for choosing
this metric is as follows: Reopened issues are critical in the
sense that they require thorough knowledge of the source
code and developers who would ﬁx reopened issues also re-
quire active development history to avoid forgetting possible
bottlenecks in the source code. If the duration between the
ﬁrst and last ﬁx of a developer is long, then developer may
spend harder time during understanding the main reasons
for the issue in an updated software system and this may
cause issues to be re-opened.
Total # of Edits (dev total edits): To test H3, we
extracted total # of edits a developer has done on a method
(i.e. function) from development commit logs and associ-
ated them with issues. The more edits are done on software
methods by a developer, the more likely the developer is
well informed about the software system and the more likely
he/she is an active developer.
# Unique Methods Edited (dev methods edited):
To test H3, we have also considered unique number of meth-
ods edited for ﬁxing an issue. Total # of edits is not enough
to evaluate whether developers of reopened issues have a
strong code ownership. If a developer edits majority of
methods in software, it may indicate his/her ownership on
the code. Having strong ownership may also avoid issue re-
openings since it also suggests that the issue owner has an
extensive knowledge on the source code as well as potential
Rank in terms of # Methods Edited (dev rank
methods count): This metric is calculated based on unique
methods edited and ranks of developers who owned and ﬁxed
an issue are computed. For each issue, its owner’s (devel-
oper) rank in terms of number of methods he/she edited so
far is computed and added as a new metric. Computation of
these ranks is similar to rank in terms of # ﬁxed issues. This
metric also completes the general deﬁnition of code owner-
ship by adding both the number of methods edited by a
developer as well as what percentage of edits are done by
this developer among all developers (i.e., developer’s rank).
2.2.2 Issue-Code Relation
Shihab et al.  considered the fact that re-opened bugs
may be harder to ﬁx than others due to the fact that they
require many ﬁles or more complex ﬁles to be changed. We
have also considered the complexity of reopened issues in
terms of software methods changed during their ﬁxes and
deﬁne two hypotheses:
H4: Issues related with many methods are more likely to
H5: Issues related with larger (in terms of lines of code)
and more complex methods are more likely to be reopened.
# Methods Changed (methods changed): This met-
ric is calculated by counting the number of methods changed
for ﬁxing an issue by mining commit messages from version
control systems and matching each commit with an issue. It
is then used to test H3.
LOC: Number of methods changed during a ﬁx may not
be enough to represent the complexity of an issue. For ex-
ample, an issue may require changes on 3 methods, but each
method may be greater than 100 lines of code (LOC) and
hence its ﬁx may be harder than other ﬁxes. Therefore, we
have also extracted a size indicator (in terms of lines of code)
for methods changed for ﬁxing an issue. If there are more
than one method changed during a ﬁx, we aggregated their
lines of codes by taking maximum and sum values over all
Cyclomatic complexity (CC): As an extension to LOC
measure, we have used McCabe’s cyclomatic complexity of
a method changed for ﬁxing an issue. If there are more
than one method changed, we aggregated their cyclomatic
complexity values by taking maximum and sum values over
all methods. This metric as well as LOC changed are used
to test H4.
2.2.3 Issue Proximity Network
Issue proximity network models the relation between is-
sues. It measures the distance between issues in terms of
the number of common methods changed during their ﬁxes.
If an issue is connected with many other issues in terms of
the number of common methods changed during a ﬁx, it may
increase the probability of this issue being reopened. The
reason for this can be explained as follows: Reopened is-
sues may have close connections with many other issues and
therefore they reside at the center of this proximity network.
However, being in the core part also indicates that reopened
issues may aﬀect many methods in the source code, which
increases the risk of failures afterwards. We deﬁne our hy-
pothesis for measuring this dimension as follows:
H6: Issues linked with many other issues are more likely
to be reopened.
Metrics are extracted from issue proximity network, all of
which were used in previous studies  to measure caller-
callee relations between software modules and their eﬀects
on defect proneness. In this paper, we used four network
metrics to quantify complexity of issues and how complexity
(in terms of methods changed) is related to issue reopening.
Degree: This metric is computed by counting the number
of direct relations (edges) an issue has. Having higher degree
means that an issue is connected to many other issues, such
as a hub in traﬃc networks.
Degree Centrality: In our previous studies, we have
extracted both in-degree and out-degree centrality metrics
[8, 18]. However, in this paper, the proximity network is
undirected with weights of edges are set as the number of
methods shared by two issues. Therefore, this metric is cal-
culated by “degree” of an issue over all issues (degree/N
where Nis number of issues).
Betweenness Centrality: This metric is calculated by
counting the number of shortest paths that contain the is-
sue X over all shortest paths between all issue pairs, i,j. It
evaluates the location of an issue, since being in a popular
location may be very critical due to the fact that an issue
has association with many issues as well as it aﬀects many
methods in the source code.
Pagerank: This metric measures the relative importance
of an issue. It also evaluates the centrality in issues by con-
sidering the fact that the eﬀect of being related with a cen-
tral issue should be more important than being related with
a decentralized issue.
2.2.4 From Issue Reports
From issue reports, we have extracted 2 categorical met-
rics, namely Symptom and Phase found. Our objective is to
observe whether reopened issues have unique symptoms or
they are more likely to be reported during a speciﬁc phase.
Same Location (same loc): We have also extracted
geographical locations of the owner and originator of is-
sues based on their email addresses’ domain and deﬁned a
boolean metric, Same location to observe the eﬀect of com-
munication across diﬀerent locations on issue re-openings.
Our hypothesis to test this relation is as follows:
H7: Issues whose owner and originator are from diﬀerent
locations are more likely to be reopened.
In a study done by Herbsleb and Mockus , it was found
that issues reported (and ﬁxed) in distributed teams have
a higher resolution time than issues reported and ﬁxed in
the same location. Zimmermann et al. also investigated
the eﬀects of location diﬀerences between assigners and as-
signees on reopened bugs and found that bugs initially as-
signed across teams/ buildings or countries are more likely
Figure 2: Correlations of the measures. The shape
of the ellipse represents the correlation among two
variable. In the correlation visualisation, bolder col-
ors indicate higher correlations. If the shape of an
ellipse bends towards right, it indicates positive cor-
relation, whereas negative correlation if its shape
bends towards left.
to be reopened . Thus, we have deﬁned Same location
and assigned 1 if both originator and owner of an issue were
located in the same country, and 0 in the opposite case.
Based on the data, issues were reported from 12 distinct ge-
ographical locations. Only 20% of issues had their owners
and originators being in diﬀerent locations.
Fix Days: Reopened issues may have a long life cycle
from their opened to closed dates since they were assigned
to same states (opened/ assigned) more than once. We have
deﬁned a new metric, namely Fix days, to measure the num-
ber of days between an issue’s opened and closed date. Hy-
pothesis to test the eﬀect of this metric is deﬁned as follows:
H8: Issues which take long time (in days) to ﬁx are more
likely to be reopened.
2.3 Analysis of The Factors
Basic descriptive statistics of the factors can be found in
Table 1. Median values of factors for reopened and not-
reopened issues are signiﬁcantly diﬀerent (Mann-Whitney
U Test P < 0.05) in 10 out of 19 cases. This shows that dis-
tributions of reopened issues are shifted towards less/ more
activity in each of 10 factors separately. For instance, re-
opened issues cause signiﬁcantly more edits on the source
code (8th factor in Table 1) compared to other issues.
Descriptive statistics of factors related to issue-code rela-
tion dimension are particularly interesting. In the extreme
cases, an issue ﬁx can change up to 460 methods or ﬁles with
up to 92 kLOC. We believe that such a pattern highlights rel-
ative complexities of addressed issues or architectural com-
plexities of the software. A ﬁx in a complex software will
likely involve changes at a lot of interdependent modules.
Project issue ﬁx count distribution is similar to the Pareto-
Law trends observed in the developer activity distribution
in open source projects .
Spearman rank correlation coeﬃcients among the factors
we considered are visualised in Figure 2. In the correla-
tion visualisation, bolder colors indicate higher correlations.
If the shape of an ellipse bends towards right, it indicates
positive correlation, whereas negative correlation if its shape
bends towards left. From the ﬁgure, it can be observed there
are relatively higher correlations among the factors within
the same dimension, while there are relatively lower corre-
lations between the factors from diﬀerent dimensions. The
high correlations are especially apparent among the factors
from the issue-code relation (LOC, CC) and issue proximity
(degree, betwenness, pagerank) dimensions.
2.4 Logistic Regression Models
Univariate Logistic Regression
Logistic regression is the standard way to model binary out-
comes yi= 0,1 [1, 10] and therefore it is suitable for our
problem. Logistic regression has been frequently used in
classiﬁcation problems such as defect prediction in the soft-
ware engineering literature previously [22, 24]. The basic
probability model formula of logistic regression can be stated
P r(yi= 1) = logit−1(Xiβ) (1)
P r(yi= 1) is the probability of outcome yi= 1. Xiis a
vector of independent parameters for the instance and beta
is the vector of regression coeﬃcients.
One advantage of logistic regression in binary classiﬁca-
tion when compared to methods like Naive Bayes is that its
regression coeﬃcients and other parameters (odds ratio) are
easily interpretable and highly explicative. Assuming the lo-
gistic regression model is true, one can check the signiﬁcance
of various regression coeﬃcients of diﬀerent input variables
to understand their explanatory power. In addition rela-
tion of odds, (yi= 1/yi= 0) and individual factors can be
analysed by the following formula:
P r(yi= 1)/P r(yi= 0) = eXiβ(2)
From this formula, the eﬀect of various factors on the
probability of a certain outcome yican be understood by
analysing the eﬀects of their changes.
Exhaustive Search on All Factor Combinations
In order to test the performance of logistic regression with
all possible factors (N), 2Nmodels should be considered.
For large data, various approaches are used to reduce the
number of considered combinations. However, we were able
to test all possible combinations of models (524,288 in total)
in a couple of hours, since our dataset was relatively small.
In order to ﬁnd the “best” model, a performance criterion
should be optimized. In the literature, a likelihood based
information criteria such as AIC(Akaike Information crite-
rion) or BIC (Bayesian Information criterion) is often used
to maximize the likelihood function while penalizing over-
ﬁtting . Instead of a maximum likelihood based perfor-
mance measure, we use Area Under the receiver operational
characteristic Curve (AUC) as the performance measure.
We believe that AUC represents the predictive performance
of the model more clearly than a likelihood based measure.
AUC is commonly used to compare the performance of var-
ious classiﬁcation models .
Multivariate Logistic Regression
As a second model we built a logistic regression model using
a set of factors with the best predictive power. We observed
the predictive power of this model by drawing its ROC curve
and reported the factors included to the model.
3. RESULTS OF LOGISTIC REGRESSION
3.1 Univariate Statistical Model
In order to check the importance of factors individually,
we have checked the signiﬁcance of univariate logistic regres-
sion models . In Table 2, factors with signiﬁcant corre-
lations are presented. Five out of 19 factors were found to
have signiﬁcant correlation coeﬃcients. One of these factors
are related to issue report (same location), three are related
to the developer activity (dev*ﬁrst last ﬁx, dev total edits,
dev methods edited) and one is related to issue proximity
network (betweenness centrality). We have interpreted our
hypotheses by building a univariate regression model to pre-
dict reopened issues with 19 factors respectively. Regression
coeﬃcients and their signiﬁcance regions reported in Table
2 are used to validate if a factor is signiﬁcant for predict-
ing reopened issues. In summary, hypotheses related with
developer activity in terms of frequent issue ﬁxes and meth-
ods edited (H2,H3), issue proximity network (H6) and ge-
ographical locations of issue owner and originator (H7) are
validated and these ﬁndings are summarized below in bold
and italic. Other hypotheses could not be validated with
univariate analysis, but we have also checked their signiﬁ-
cance using multivariate analysis of factors in later sections.
Developer activity: We have found that reopened is-
sues have a signiﬁcantly negative relation with developers
who have not ﬁxed an issue for a long time (coeﬃcient: -
2.6947136, p0.05). Furthermore, reopened issues have
a positive relation with developers who edit relatively more
methods (coeﬃcients: 0.0012903, 0.0019286, p < 0.05). These
results validate our second and third hypotheses (H2,H3).
But, we could not validate H1, which deﬁnes a relationship
between developer activity in terms of previously ﬁxed issues
and issue reopening.
Reopened issues are ﬁxed by developers who actively ﬁx
issues and edit many methods.
Issue proximity network: Regarding issue relations in
terms of shared methods, we have validated H6with the
highest coeﬃcient (35.8880) on betweenness centrality.
Reopened issues are linked with other issues in terms of
methods edited during issue ﬁxes.
Issue report: Issues whose owner and originator located
in the same location have a signiﬁcantly negative impact on
issue reopening (coeﬃcient: -1.1320, p0.05). This also
validates H7, since being in diﬀerent geographical locations
may lengthen the communication process and cause issue
reopening. However, we could not validate the relationship
between ﬁx days and reopened issues (H8).
Reopened issues are often reported and ﬁxed by people
from diﬀerent geographical locations.
Table 1: Descriptive Statistics of The Considered Factors. The ﬁrst values in the cells are for all the issues. Values in parantheses are: αfor
not reopened issues, βfor reopened issues.♣: Factor has signiﬁcantly diﬀerent medians for reopened and not reopened issues
Factor Max 75 % Median 25% Minimum Mean
Issue Report Symptom - - - - - -
Phase Found - - - - - -
Same Location - - - - - -
Fix Days 3745(3745α
39(40α30β) 0(0α, 4β) 169(168α,
# Fixed Issues ♣79(79α58β) 22(21.25α
11 (11α13β) 4(4α4β) 1(1α, 1β) 15.5 (15.38α
Rank in # Fixed Is-
5(6α4β) 1(1α, 1β) 21.49(21.31α,
First and Last Fix in
0(0α, 0β) 388(391α,
Total # Edits ♣1347(1347α
0(0α, 0β) 149(144α,
# Unique Methods
0(0α, 0β) 99(95α,
Rank in # Methods
4(4.75α1β) 0(0α, 0β) 21(22α, 20β)
Max. CC ♣2103(2087α
1(1α, 4β) 303.3(307.2α,
Sum CC ♣2103(2087α
1(1α, 4β) 303.3(307.2α,
Max. LOC ♣22644(22644α
Sum LOC ♣92931(92931α
# Methods Changed 460(460α86β) 9(9α10β) 3(3α5β) 1(1α1.25β) 1(1α, 1β) 11.26(11.25α,
Issue Proximity Network
31(31α26β) 10(10α11β) 3(3α4β) 0(0α, 0β) 20.32(20.02α,
0(0α, 0β) 0.002(0.002α,
0(0α, 0β) 0.0001(0.0001α,
Degree Centrality ♣0.18(0.12α
0(0α, 0β) 0.02(0.02α,
Table 2: Coeﬃcients for Univariate Regression
Dimension Factor Coeﬃcient Standard Deviation Z Value P r(>|z|) Signiﬁcance
Symptom - - - 2
Phase Found - - - 2
Same Location -1.13200 0.27000 -4.160 3.14e-05 FFF
Fix Days 0.00018 0.00061 0.296 0.7700
Dev ﬁx count 0.01100 0.00770 1.460 0.1500
Dev rank ﬁx count 0.00640 0.00560 1.130 0.2600
Dev ﬁrst last ﬁx -2.69000 0.17000 -16.222 <2e-16 FFF
Dev total edits 0.00130 0.00043 3.030 0.0024 FF
Dev methods edited 0.00190 0.00060 3.200 0.0014 FF
Dev rank methods count -0.00250 0.00600 -0.420 0.6700
Max. CC -0.00031 0.00031 -0.995 0.3200
Sum CC -7.08e-05 1.69e-04 -0.420 0.6800
Max. LOC -3.30e-05 2.89e-05 -1.140 0.2500
Sum LOC -4.96e-06 1.42e-05 -0.350 0.7300
Methods changed 0.00022 0.00460 0.047 0.9600
Issue Proximity Network
Degree centrality 7.45000 4.63000 1.610 0.1100
Degree 0.00760 0.00469 1.609 0.1080
Betweenness centrality 35.89000 12.01700 2.990 0.0028 FF
Pagerank 21.16000 11.80000 2.130 0.2200
Issue-code relations: Univariate analyses do not show
a signiﬁcant relation between code metrics and reopened is-
sues. Hence, we could not validate H4,H5, but observed
predictive power of code metrics in multivariate regression
3.2 The Best Factor Combinations
We ranked all possible combinations of the factors (219
possible models) to predict reopened issues based on our
performance measure, AUC. We counted the occurrence of
each factor in the best 100 models. Best 100 models had a
AUC diﬀerence of 0.05 between the best performing and the
worst performing model. All top performing models consist
of 8 to 12 factors.
In Table 3, number of occurrences of each factor with
#Occur > 20 is presented. Out of 19 factors, 4 occurred in
all top performing models, namely, betweenness centrality,
maximum cyclomatic complexity, sum of cyclomatic com-
plexity and maximum LOC. Furthermore, 3 factors occurred
in more than 90% of the top performing models namely ﬁx
count, ﬁx count rank and sum of LOC.
When compared with the signiﬁcance of factors in uni-
variate regression analysis, code based measures performed
surprisingly well in the top models. Even though we could
not validate hypotheses H4and H5during univariate anal-
ysis, cyclomatic complexity and LOC measures of methods
related with reopened issues have signiﬁcant beneﬁts to pre-
dictive model. On the other hand, betweenness is signiﬁcant
both in the best models and in the univariate regression
3.3 Multivariate Statistical Model
In Figure 3, AUC for the model that performed the best
(in predicting reopened issues) during our exhaustive search
is presented. In addition to the 0.81 AUC, the top model
had 0.88 recall and 0.82 precision. If this was a prediction
scenario, we could conclude that the model had a signiﬁ-
cant potential. However, some of the factors such as factors
related to code issue relation have limited applicability in
a prediction scenario because they are only available post-
Table 3: Number of Occurrences of Factors In Top
Performing Model Combinations
Factor Name Occurrence In
Same Location 100
Betweenness Centrality 100
Maximum Cyclomatic Complexity 100
Sum of Cyclomatic Complexity 100
Maximum LOC 100
# Fixed Issues 94
Sum of LOC 94
Rank in Fixed Issue Count 91
Unique Methods Changed 59
Total # Edits 44
Degree Centrality 24
Duration Between First and Last Fix 24
The factors in Table 3 with the highest number of occur-
rences in top 100 models were present in the top model. The
model with the highest AUC contains the following factors:
Same Location - Betweenness Centrality - Maximum Cyclomatic
Complexity - Sum of Cyclomatic Complexity - Maximum LOC - #
Fixed Issues - # Sum of LOC - Rank in Fixed Issue Count - Unique
Methods Changed - Degree - Duration Between First and Last Fix -
Total # Edits
4.1 Interpretation of Results with QA Team
In order to interpret our ﬁndings, we held a meeting with
the QA Team in the company and asked free format ques-
tions about our analysis. Responses are summarized below.
Betweenness centrality: Why do you think that
reopened issues are often located at the centre of
issue proximity network?
Reopened issues are generally the ones that developers post-
pone ﬁxing or cancel, since a) they may decide that it is not
very critical for the customer and b) ﬁxing it may be risky
or complex so that they may want to delay ﬁxing this issue.
Figure 3: ROC curve of the multivariate regression
that uses 12 factors which performed the best in
model testing - AUC : 0.81 (Recall: 0.88 and Preci-
The fact that reopened issues are at the center of this net-
work supports our anecdotal evidences of issues’ complexity.
Rank of developers in terms of method changes:
Do you think that developers who edit many meth-
ods should have a positive or negative impact on
Usually, the amount of code changed by a developer is pos-
itively correlated with the amount of issues ﬁxed by this de-
veloper. These top “code-altering” developers may decide to
postpone or cancel some of these issues assigned to them, un-
til they reduce their workload, since ﬁxing a new issue may
also introduce others due to large amount of changes.
4.2 Threats to Validity
In this section, we discuss possible threats to validity of
our study. We have used a large scale enterprise product to
conduct a case study. Even though drawing general conclu-
sions from an empirical study is very diﬃcult, results should
be transferable to other researchers with well-designed and
controlled experiments. In this study, we propose a set
of metrics representing four main dimensions to investigate
their eﬀects on reopened issues. Our methodology consists
of investigating their individual eﬀects on reopened issues,
as well as pairwise metric relations to ﬁnd the best set of
metrics predicting reopened issues. Using the same method-
ology, results can be replicated and refuted on new datasets.
In this case study, we were able to extract 2287 resolved
issues of which 1046 were matched with the code base and
developers. At a ﬁrst glance, this seems to be a small set,
however we traced all issues submitted since 2004 and ﬁl-
tered them based on our requirements. While reducing the
dataset, we also considered the fact that ratio of reopened
issues over other issues should be similar to the ratio of the
initial dataset. When we linked issues with developer activ-
ities, this ratio was 7%, whereas in the ﬁnal set, this ratio
was reduced to 6%, with a minor change. We have also
worked closely with QA team in the company to validate
We use logistic regression during univariate and multi-
variate analyses, since coeﬃcients in logistic regression are
easily interpretable with their signiﬁcance regions. We did
not consider using other algorithms and compre their per-
formance with regression, since our aim was to understand
the explanatory power of metrics on reopened issues, rather
than building the best predictive model for reopened issues.
But, we did considered all factor combinations to select the
best metric set during our experiments.
5. RELATED WORK
Issue management systems of large open source software
projects have become available on the internet since early
1990s. Issue management systems of some applications with
commercial licences are publicly accessible since the owners
of these applications would like to make their issue handling
process transparent to their users. Research on issue reposi-
tory data has started in parallel with the public availability
of past software issue management data.
There are two recent papers closely related to our research.
Shihab et al.  analysed work habits, bug report, bug ﬁx
dimensions for Eclipse Project to ﬁnd the factors that con-
tributed to bug reopening and built a reopened bug predic-
tion model using decision trees. Shihab et al. found that the
comment text, description text, time to resolve the bug and
the component the bug was found were the leading factors
that caused bug reopening for Eclipse . On the other
hand, Zimmermann et al.  analysed Windows Vista and
7 issue repository and conducted a survey on 394 developers
in order the important factors that causes bug reopens .
Zimmermann et al. built a logistic regression model in or-
der to identify the factors that may cause issue reopening.
In their research Zimmermann et al. used organizational
and process related factors in addition to factors directly
extractable from the issue report. In their logistic regres-
sion model nearly all the factors they observed were found
to be signiﬁcant which included factors related to location,
work habits and bug report characteristics.
Our work is diﬀerent than the research by Shihab et al.
and Zimmermann et al. in two aspects: 1) The factors and
the dataset we analysed are diﬀerent, 2) We analysed the
eﬀect of the combinations of various factors in addition to
individual factors on issue reopening.
Other notable recent areas of research include automated
issue triage, factors that change the quality of issue reports,
detection of duplicate issues and estimation of issue ﬁx du-
rations [3, 6, 21].
One important related research topic about issues is au-
tomated bug triage, the procedure of processing an issue
report and assigning the right issue to the right developer.
This problem is especially important for large software with
millions of users. Anvik et al. found that 300 daily reported
issues make it impossible for developers to triage issues eﬀec-
tively for Mozilla based on an interview with an anonymous
developer [3, 12, 21].
Text mining methods have been used in several studies to
ﬁnd the most relevant developer to handle a bug in auto-
mated bug triage models [2–4, 9, 20]. Bakir et al. proposed
a model that forwarded auto-generated software fault data
directly to the relevant developers by mining the patterns in
the faults . The beneﬁt of automated bug triage is often
measured by % of actual owners estimated by the model and
the decrease in issue reassignments or bug tosses. While bug
triage studies claim that bug tossing is time consuming, ,
Guo et al. observed that issue reassignment is beneﬁcial for
Eﬀective issue reporting is also important for reporters as
for developers. In an exploratory study on Windows Vista,
Guo et al.  identiﬁed the characteristics of bugs that
are getting ﬁxed. Bettenburg et al.  also analysed the
components of a bug report (severity, stack traces, builds,
screenshots) that make a bug more likely to be resolved.
Estimation of issue resolve times is another research area
to plan developers’ eﬀorts eﬃciently. Some studies on open
source projects can be found in ( , ).
In this paper, we analysed the eﬀects of 19 factors from
four diﬀerent dimensions on the probability of issue reopen-
ing for a large-scale software developed in geographically
distributed locations. In our study, we have found that a
subset of these factors are important for issue reopening.
The predictive power of best factor combinations is high
with AUC ≈0.81 in the best performing models.
RQ: Which factors lead to issues getting reopened?
In order to ﬁnd the factors that were most important for
issue reopening, we built a univariate and best-subset lo-
gistic regression models. In the univariate logistic regres-
sion model and the best-subset logistic regression model we
checked the importance of the factors we considered.
Dimensions of developer activity (in terms of the time
between ﬁrst and last issue ﬁxes and the number of meth-
ods edited during issue ﬁxes), issue proximity network (in
terms of common methods changed during issue ﬁxes) and
geographical locations of issue owners and originators) are
found to be important for issue reopening. In the top rank-
ing logistic regression models based on their predictive power,
factors from all dimensions were prominent.
In previous research on this topic, nearly all of the con-
sidered factors were found to be signiﬁcant , . On
the contrary, in our analysis we found that a subset of our
considered factors are signiﬁcantly more important in issue
reopening. The best logistic regression model in terms of
predictive power contains 12 factors out of 19 (Section 3.3
for the full list).
Implications of the Results To The Industry
Issue reopening can lead to unanticipated resource alloca-
tion, leading to projects running over budget and late. There-
fore, it is important to proactively identify issue that can be
reopened and take corrective actions. Quantitative ﬁndings
of our study suggest that issue complexity and developers
workload play an important role in triggering issue reopen-
ing. This information can aid managers in deriving concrete
corrective actions (e.g., ensuring existence of deep code re-
view and reducing developer’s workload.
As indicated in our results, issue reopening may have
many reasons and may not be modelled by a small set of
Table 4: A list of phases in which issues were found
Phase Issues reported during this
Functional testing 19.9
Regression testing 8.5
System testing 6.8
Nightly build 5.0
Performance testing 3.9
Unit testing 2.7
Beta testing 1.3
factors. In addition, some of the causes of the issue reopen-
ing such as design problems may be outside the scope of the
issue management process. Identifying, the important fac-
tors that may lead to issue reopening may be the ﬁrst step to
lead companies to understand these underlying causes and
take necessary actions.
Every model is a simpliﬁcation of reality and has its limi-
tations. We attempted to model the 3 aspects of software
development (people, process, product) when choosing the
factors. New factors can be proposed in the future studies
related to these aspects. As usual, in this case study, one
possible future work would be testing our conclusions in new
datasets. Another area to consider would be analysing the
causality relations between the considered factors and the
probability of issue reopening.
Table 4 presents majority of phases in which issues were
found and reported in the company and percentage of their
occurrence. We have listed 10 phases which account for 85%
of all issue reports, and deﬁned other phases as “Others” due
to privacy issues.
This research is supported in part by Turkish State Planning
Organization (DPT) under the project number 2007K120610
and partially supported by TEKES under Cloud-SW project
in Finland. We would like to thank IBM Canada Lab –
Toronto site for making their development data available
for research and strategic help during all phases of this re-
search. The opinions expressed in this paper are those of
the authors and not necessarily of IBM Corporation.
 E. Alpaydin. Introduction to Machine Learning (Adaptive
Computation and Machine Learning). The MIT Press,
 J. Anvik, L. Hiew, and G. C. Murphy. Who should ﬁx this
bug? In Proceedings of the International Conference on
Software Engineering, pages 361–370, Shanghai, China,
 J. Anvik and G. Murphy. Determining implementation
expertise from bug reports. In Mining Software
Repositories, 2007. ICSE Workshops MSR’07. Fourth
International Workshop on, pages 1–8. IEEE, 2007.
 J. Anvik and G. C. Murphy. Reducing the eﬀort of bug
report triage. ACM Transactions on Software Engineering
and Methodology, 20(3):1–35, Aug. 2011.
 A. Bakir, E. Kocaguneli, A. Tosun, A. Bener, and
B. Turhan. Xiruxe: An Intelligent Fault Tracking Tool.
AIPR09, Orlando, 2009.
 N. Bettenburg and A. Hassan. Studying the Impact of
Social Structures on Software Quality. In 2010 IEEE 18th
International Conference on Program Comprehension,
pages 124–133. IEEE, 2010.
 L. Briand, W. Melo, and J. Wust. Assessing the
applicability of fault-proneness models across
object-oriented software projects. Software Engineering,
IEEE Transactions on, 28(7):706–720, 2002.
 B. Caglayan, A. Tosun, A. Miranskyy, A. Bener, and
N. Ruﬀolo. Usage of multiple prediction models based on
defect categories. In Proceedings of the 6th International
Conference on Predictive Models in Software Engineering,
pages 1–9. ACM, 2010.
 D. Cubranic and G. Murphy. Automatic bug triage using
text categorization. In Proceedings of the Sixteenth
International Conference on Software Engineering
Knowledge Engineering, pages 1–6. Citeseer, 2004.
 A. Gelman and J. Hill. Data Analysis Using Regression
And Multilevel/Hierarchical Models. Analytical Methods
for Social Research. Cambridge University Press, 2007.
 E. Giger, M. Pinzger, and H. Gall. Predicting the Fix Time
of Bugs. In RSSE ’10 Proceedings of the 2nd International
Workshop on Recommendation Systems for Software
Engineering, pages 52–56, 2010.
 P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy.
Characterizing and predicting which bugs get ﬁxed: An
empirical study of Microsoft Windows. In Software
Engineering, 2010 ACM/IEEE 32nd International
Conference on, volume 1, pages 495–504. IEEE, 2010.
 P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy.
Not my bug! and other reasons for software bug report
reassignments. In Proceedings of the ACM 2011 conference
on Computer supported cooperative work, pages 395–404.
 J. Herbsleb and A. Mockus. An empirical study of speed
and communication in globally distributed software
development. IEEE Transactions on Software Engineering,
29(6):481–494, June 2003.
 S. Koch. Eﬀort modeling and programmer participation in
open source software projects. Information Economics and
Policy, 20(4):345–355, Dec. 2008.
 S. Lessmann, B. Baesens, C. Mues, and S. Pietsch.
Benchmarking classiﬁcation models for software defect
prediction: A proposed framework and novel ﬁndings.
Software Engineering, IEEE Transactions on,
 C. d. Mazancourt and V. Calcagno. glmulti: An r package
for easy automated model selection with (generalized)
linear models. Journal of Statistical Software, 34(i12), 2010.
 A. T. Misirli, B. Caglayan, A. V. Miranskyy, A. Bener, and
N. Ruﬀolo. Diﬀerent strokes for diﬀerent folks: a case study
on software metrics for diﬀerent defect categories. In
Proceedings of the 2nd International Workshop on
Emerging Trends in Software Metrics, WETSoM ’11, pages
45–51, New York, NY, USA, 2011. ACM.
 E. Shihab, A. Ihara, Y. Kamei, W. M. Ibrahim, M. Ohira,
B. Adams, A. E. Hassan, and K.-i. Matsumoto. Predicting
Re-opened Bugs: A Case Study on the Eclipse Project.
2010 17th Working Conference on Reverse Engineering,
pages 249–258, Oct. 2010.
 A. Tamrawi, T. Nguyen, and J. Al-Kofahi. Fuzzy set-based
automatic bug triaging: NIER track. Proceedings of the
33rd International Conference on Software Engineering,
pages 884–887, 2011.
 C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How
long will it take to ﬁx this Bug? In Fourth International
Workshop on Mining Software Repositories, 2007. ICSE
Workshops MSR’07, number 2, 2007.
 E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Using
developer information as a factor for fault prediction. In
Proceedings of the Third International Workshop on
Predictor Models in Software Engineering. IEEE Computer
Society, May 2007.
 S. Zaman, B. Adams, and A. E. Hassan. Security Versus
Performance Bugs : A Case Study on Firefox. Design,
pages 93–102, 2011.
 T. Zimmermann and N. Nagappan. Predicting defects with
program dependencies. 2009 3rd International Symposium
on Empirical Software Engineering and Measurement,
pages 435–438, Oct. 2009.
 T. Zimmermann, N. Nagappan, P. Guo, and B. Murphy.
Characterizing and predicting which bugs get reopened. In
Proceedings of the 34th International Conference on
Software Engineering [ACCEPTED], 2012.