Conference PaperPDF Available

Factors characterizing reopened issues: A case study

Authors:

Abstract and Figures

Background: Reopened issues may cause problems in managing software maintenance effort. In order to take actions that will reduce the likelihood of issue reopening the possible causes of bug reopens should be analysed. Aims: In this paper, we investigate potential factors that may cause issue reopening. Method: We have extracted issue activity data from a large release of an enterprise software product. We consider four dimensions, namely developer activity, issue proximity network, static code metrics of the source code changed to fix an issue, issue reports and fixes as possible factors that may cause issue reopening. We have done exploratory analysis on data. We build logistic regression models on data in order to identify key factors leading issue reopening. We have also conducted a survey regarding these factors with the QA Team of the product and interpreted the results. Results: Our results indicate that centrality in the issue proximity network and developer activity are important factors in issue reopening. We have also interpreted our results with the QA Team to point out potential implications for practitioners. Conclusions: Quantitative findings of our study suggest that issue complexity and developers workload play an important role in triggering issue reopening.
Content may be subject to copyright.
Factors Characterizing Reopened Issues: A Case Study
Bora Caglayan1, Ayse Tosun Misirli2, Andriy Miranskyy3, Burak Turhan4, Ayse Bener5
Bogazici University1,2, IBM Canada Ltd.3, University of Oulu4, Ryerson University5
Department of Computer Engineering, Istanbul, Turkey1,2
IBM Toronto Software Laboratory, Toronto, ON Canada3
Department of Information Processing Science, Oulu, Finland4
Ted Rogers School of Information Technology Management, Toronto, ON Canada5
{bora.caglayan1, ayse.tosun2}@boun.edu.tr
andriy@ca.ibm.com 3, burak.turhan@oulu.fi 4, ayse.bener@ryerson.ca5
ABSTRACT
Background: Reopened issues may cause problems in man-
aging software maintenance effort. In order to take actions
that will reduce the likelihood of issue reopening the possi-
ble causes of bug reopens should be analysed.
Aims: In this paper, we investigate potential factors that
may cause issue reopening.
Method: We have extracted issue activity data from a large
release of an enterprise software product. We consider four
dimensions, namely developer activity,issue proximity net-
work,static code metrics of the source code changed to fix
an issue, issue reports and fixes as possible factors that may
cause issue reopening. We have done exploratory analysis
on data. We build logistic regression models on data in or-
der to identify key factors leading issue reopening. We have
also conducted a survey regarding these factors with the QA
Team of the product and interpreted the results.
Results: Our results indicate that centrality in the issue
proximity network and developer activity are important fac-
tors in issue reopening. We have also interpreted our results
with the QA Team to point out potential implications for
practitioners.
Conclusions: Quantitative findings of our study suggest
that issue complexity and developers workload play an im-
portant role in triggering issue reopening.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics—process metrics,
complexity measures, performance measures
General Terms
Measurement, Experimentation
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PROMISE ’12, September 21–22, 2012, Lund, Sweden
Copyright 2012 ACM 978-1-4503-1241-7/12/09 ...$15.00.
Keywords
software maintenance, issue management, issue repository,
issue reopening.
1. INTRODUCTION
Issue management is a key activity within software projects
especially during the maintenance phase. A steady influx of
new issue reports are filtered, prioritized, assigned and han-
dled by the maintainers daily for popular software. Man-
aging the issue handling process effectively is an important
element for the long term stability of the software product.
In an idealized issue management scenario, the issue re-
port and the records of issue activity are moved to the
archives indefinitely after an issue is closed. Reopened is-
sues are a group of issues that are exceptions to this ideal-
ized scenario. These issues go through the issue handling
procedures at least one more time after they are archived.
Understanding reopened issues is of significant interest to
practitioners, since these issues may represent a miscommu-
nication between issue assigner and assignee. Furthermore,
reopened issues may cause waste of time and effort if they
are frequent in issue repositories.
We asked the the quality assurance (QA) team of a large
scale company about the possible reasons of reopening and
the benefits of investigating these reasons. The opinions of
the QA team are as follows:
Reopened issues may have different explanations, but
the major one is a back and forth discussion between
issue originator and owner on “yes, it’s bug /no, it
works as designed”. Most of the time, people spent
arguing on the issue and it is assigned to “opened”
multiple times due to this discussion.
Reopened issues consist of less than 10% of all issues,
and hence it is not a major pain point. But, they
also cause waste of resources. Identifying the main
reasons for these issues gives us a chance for process
improvement. Reducing this ratio down to 5% would
also be very beneficial.
We consider an issue as reopened if it has changed from a
termination state such as closed, cancelled or postponed to
an active state such as assigned or work-in progress. Identi-
fying the factors that lead to issue reopens is crucial in such
a situation in order to take the necessary actions.
1
In this paper our research question is identifying the pos-
sible factors that may lead to issues getting reopened:
RQ: Which factors lead to issues getting reopened?
In order to answer our research question, we analysed a
large scale enterprise software and its issue and code repos-
itory. We modelled four dimensions, namely 1)issue-code
relation, 2)issue proximity network, 3)issue reports and 4)
developer activity, in order to check their individual effects
on issues getting reopened.
In the analysis, we used logistic regression to fit a predic-
tive model on the issue data. As a first step, we conducted
univariate regression by using each of the factors from the
four dimensions namely developer activity, issue-code rela-
tion, issue proximity network and the issue report. After
that, we did an exhaustive search for the best factor combi-
nation by optimizing the AUC (area under the receiver op-
erating characteristic curve) for all the possible factor com-
binations. Finally, we showed the model with the highest
success rate among all possible model combinations.
Previously two independent research groups investigated
the factors that may cause issue reopens for Microsoft Win-
dows [25] and Eclipse projects [19]. However, to the best of
our knowledge, factors used in our study related to issue-
code relation and issue proximity network have not been
considered previously. The main contribution of this paper
is analysis of factors that lead to issues getting reopened in
a large-scale software developed in multiple locations. Since
some of the measures used in the two previous studies are
not extractable for our dataset, our aim is complementing
the findings of previous researchers rather than testing their
findings on our new dataset.
The rest of the paper is structured as follows: In Related
Work section, we discuss the relevant work on understanding
the reasons for issue reopening. In Methodology, we present
the dataset, data extraction process and metrics and logistic
regression models. In Results section, we show and inter-
pret the outcomes of the logistic regression model. Finally
a discussion of threats to the validity of the results and con-
clusions with possible future research topics are presented.
2. METHODOLOGY
In this paper, we perform quantitative analysis on issue
activity database, whose attributes are described in Section
2.1, extracted factors that may have significant influences on
reopened issues (Section 2.2), and built a statistical model
to interpret them (Section 2.3).
2.1 Dataset
We have used issue activity database of a large-scale en-
terprise software product which has a long development his-
tory with a 20 years old code base. The company uses IBM
Rational ClearQuest with customised defect forms as the is-
sue management system. In this database, each issue record
includes, but not limited to, the following features:
Originator: The person who opens an issue. Often,
testers (or support personnel) are the originators.
Owner: The person who is assigned to an issue. The
owner is often the developer, who fixes an issue espe-
cially when issue is classified as a defect.
State: There are 11 distinct states in the database of
the company: Opened, Assigned, Working, Delivered,
Returned, Integrated, Validated, Rejected, Closed, Post-
poned, Cancelled.
Phase Found: It indicates the phase in the develop-
ment life cycle an issue is reported. A list of phases are
summarized in Appendix with their occurrence rates
in the issue database.
Symptom: This is a sign of problem (“crash/outage”)
experienced by a customer. Some of the common symp-
toms are Build/Test Failed, Core Dump, Program De-
fect, Incorrect I/O. We have found that symptoms are
significantly correlated with phase found.
A typical life cycle of an issue in our case study can be
seen in Figure 1. Bold arrows show a typical life cycle, while
the dashed arrows numbered with (1), (3), (4) and (5) in-
dicate a reopening. It is often the case when an issue is
assigned right from the start and the owner starts working
on it immediately (arrow number (2)). The final status of
an issue is stored in State field, but changes of the state are
stored in two other fields (old state and new state) with the
person id (generally owner or originator) who makes this
modification.
Figure 1: Life cycle of an issue in our case study
In our case study, issue activity database contains 3645
unique issue reports with the earliest record opened on Jan-
uary, 2005. We filtered only closed issues from this database
and obtained 2287 issues in total. Of these issues, 219 (ap-
proximately 9%) are classified as reopened. This dataset is
further filtered as we incorporate factors from the code base.
2.2 Factors Affecting Issue Reopening
We have previously extracted code, network and churn
metrics from the code base of the same product 16 months
prior to release date at the method level. Our data extrac-
tion methodology and prediction models built with these
metric sets can be read from [8] and [18]. In this paper, we
identify four dimensions about 1) developers who fix these
issues, 2) size and complexity of methods edited during issue
fixes, 3) the relationship between issues and 4) other factors
about issue reports and fix activities. For each of these fac-
tors, we define hypotheses. We also define abbreviations for
all factors to ease their usage on tables and figures.
Of these 2287 closed issues in the database, 1318 issue
records are matched with developer activities in which there
are 88 (7%) reopened issues. Furthermore, when code and
network metrics are extracted, these issues are mapped with
source code at method level. After issue-code mapping, final
dataset includes 1046 issues, of which 62 (6%) are reopened.
2
2.2.1 Developer Activity
Previously, Guo et al. [12] and Zimmermann et al. [25] de-
fined bug opener’s reputation as the ratio of “total number of
previous bugs opened and gotten fixed”over “total number of
bugs he/she opened”. In a case study on Firefox performance
and security bugs, authors analysed ”who fixes these bugs”
by measuring the expertise of developers in terms of num-
ber of previously fixed bugs by the developer and experience
in days, i.e. number of days from the first fix by the given
developer to the latest bug’s fix date [23]. Similar to these
approaches, we extracted 6 different metrics representing the
development activities of issue owners, i.e., developers who
owned and fixed issues.
We analyse developers’ defect fixing activity to verify the
following hypotheses:
H1: Issues owned by developers, who fix few issues, are
more likely to be reopened.
H2: Issues owned by developers, who haven’t fixed an
issue for a long time, are more likely to be reopened.
H3: Issues owned by developers, who edit large number
of methods to fix an issue, are more likely to be reopened.
# Fixed Issues (dev fix count): To test H1, we com-
puted the number of previously fixed issues of a developer
who owns the current issue, up to the current issue’s open-
ing date. For example, suppose john.black@XXX.com owns
issue ID #120 which was opened on May 2009. Then, we cal-
culated the number of previously fixed issues by john.black
since May 2009. Therefore, even though two issues are fixed
by the same developer, this metric’s value may change for
these issues if their opening dates are different from each
other.
Rank in terms of # Fixed Issues (dev rank fix count):
This metric is calculated using Fixed Issue Count and ranks
of developers who owned and fixed issues. For each issue’s
opening date, # fixed issues and ranks by developers are
re-calculated. For example, suppose john.black@ XXX.com
fixed issue ID #120 which was opened on May, 5th 2009 and
closed on June, 6th 2009, and he also fixed issue ID #133
which was opened on July, 5th 2009 and closed on October,
6th 2009. Considering this case, we computed previously
fixed issues for john.black twice, both for May 2009 (k) and
July 2009 (k+1 by adding issue #120). Ranks of this devel-
oper is also calculated based on other developers’ issue fix
performance on May 2009 and July 2009. So, john.black can
be at the first rank on May 2009 with kfixed issues, but at
the third rank on July 2009 with (k+1) fixed issues.
Duration between the First and Last Fix (dev first
last fix): This metric computes the number of days from
the first fix of the developer to the current issue’s fix date.
To test H2, we computed this metric for all developers asso-
ciated with issues in our database. The reason for choosing
this metric is as follows: Reopened issues are critical in the
sense that they require thorough knowledge of the source
code and developers who would fix reopened issues also re-
quire active development history to avoid forgetting possible
bottlenecks in the source code. If the duration between the
first and last fix of a developer is long, then developer may
spend harder time during understanding the main reasons
for the issue in an updated software system and this may
cause issues to be re-opened.
Total # of Edits (dev total edits): To test H3, we
extracted total # of edits a developer has done on a method
(i.e. function) from development commit logs and associ-
ated them with issues. The more edits are done on software
methods by a developer, the more likely the developer is
well informed about the software system and the more likely
he/she is an active developer.
# Unique Methods Edited (dev methods edited):
To test H3, we have also considered unique number of meth-
ods edited for fixing an issue. Total # of edits is not enough
to evaluate whether developers of reopened issues have a
strong code ownership. If a developer edits majority of
methods in software, it may indicate his/her ownership on
the code. Having strong ownership may also avoid issue re-
openings since it also suggests that the issue owner has an
extensive knowledge on the source code as well as potential
problems.
Rank in terms of # Methods Edited (dev rank
methods count): This metric is calculated based on unique
methods edited and ranks of developers who owned and fixed
an issue are computed. For each issue, its owner’s (devel-
oper) rank in terms of number of methods he/she edited so
far is computed and added as a new metric. Computation of
these ranks is similar to rank in terms of # fixed issues. This
metric also completes the general definition of code owner-
ship by adding both the number of methods edited by a
developer as well as what percentage of edits are done by
this developer among all developers (i.e., developer’s rank).
2.2.2 Issue-Code Relation
Shihab et al. [19] considered the fact that re-opened bugs
may be harder to fix than others due to the fact that they
require many files or more complex files to be changed. We
have also considered the complexity of reopened issues in
terms of software methods changed during their fixes and
define two hypotheses:
H4: Issues related with many methods are more likely to
be reopened.
H5: Issues related with larger (in terms of lines of code)
and more complex methods are more likely to be reopened.
# Methods Changed (methods changed): This met-
ric is calculated by counting the number of methods changed
for fixing an issue by mining commit messages from version
control systems and matching each commit with an issue. It
is then used to test H3.
LOC: Number of methods changed during a fix may not
be enough to represent the complexity of an issue. For ex-
ample, an issue may require changes on 3 methods, but each
method may be greater than 100 lines of code (LOC) and
hence its fix may be harder than other fixes. Therefore, we
have also extracted a size indicator (in terms of lines of code)
for methods changed for fixing an issue. If there are more
than one method changed during a fix, we aggregated their
lines of codes by taking maximum and sum values over all
methods.
Cyclomatic complexity (CC): As an extension to LOC
measure, we have used McCabe’s cyclomatic complexity of
a method changed for fixing an issue. If there are more
than one method changed, we aggregated their cyclomatic
complexity values by taking maximum and sum values over
all methods. This metric as well as LOC changed are used
to test H4.
2.2.3 Issue Proximity Network
Issue proximity network models the relation between is-
sues. It measures the distance between issues in terms of
3
the number of common methods changed during their fixes.
If an issue is connected with many other issues in terms of
the number of common methods changed during a fix, it may
increase the probability of this issue being reopened. The
reason for this can be explained as follows: Reopened is-
sues may have close connections with many other issues and
therefore they reside at the center of this proximity network.
However, being in the core part also indicates that reopened
issues may affect many methods in the source code, which
increases the risk of failures afterwards. We define our hy-
pothesis for measuring this dimension as follows:
H6: Issues linked with many other issues are more likely
to be reopened.
Metrics are extracted from issue proximity network, all of
which were used in previous studies [18] to measure caller-
callee relations between software modules and their effects
on defect proneness. In this paper, we used four network
metrics to quantify complexity of issues and how complexity
(in terms of methods changed) is related to issue reopening.
Degree: This metric is computed by counting the number
of direct relations (edges) an issue has. Having higher degree
means that an issue is connected to many other issues, such
as a hub in traffic networks.
Degree Centrality: In our previous studies, we have
extracted both in-degree and out-degree centrality metrics
[8, 18]. However, in this paper, the proximity network is
undirected with weights of edges are set as the number of
methods shared by two issues. Therefore, this metric is cal-
culated by “degree” of an issue over all issues (degree/N
where Nis number of issues).
Betweenness Centrality: This metric is calculated by
counting the number of shortest paths that contain the is-
sue X over all shortest paths between all issue pairs, i,j. It
evaluates the location of an issue, since being in a popular
location may be very critical due to the fact that an issue
has association with many issues as well as it affects many
methods in the source code.
Pagerank: This metric measures the relative importance
of an issue. It also evaluates the centrality in issues by con-
sidering the fact that the effect of being related with a cen-
tral issue should be more important than being related with
a decentralized issue.
2.2.4 From Issue Reports
From issue reports, we have extracted 2 categorical met-
rics, namely Symptom and Phase found. Our objective is to
observe whether reopened issues have unique symptoms or
they are more likely to be reported during a specific phase.
Same Location (same loc): We have also extracted
geographical locations of the owner and originator of is-
sues based on their email addresses’ domain and defined a
boolean metric, Same location to observe the effect of com-
munication across different locations on issue re-openings.
Our hypothesis to test this relation is as follows:
H7: Issues whose owner and originator are from different
locations are more likely to be reopened.
In a study done by Herbsleb and Mockus [14], it was found
that issues reported (and fixed) in distributed teams have
a higher resolution time than issues reported and fixed in
the same location. Zimmermann et al. also investigated
the effects of location differences between assigners and as-
signees on reopened bugs and found that bugs initially as-
signed across teams/ buildings or countries are more likely
Figure 2: Correlations of the measures. The shape
of the ellipse represents the correlation among two
variable. In the correlation visualisation, bolder col-
ors indicate higher correlations. If the shape of an
ellipse bends towards right, it indicates positive cor-
relation, whereas negative correlation if its shape
bends towards left.
to be reopened [25]. Thus, we have defined Same location
and assigned 1 if both originator and owner of an issue were
located in the same country, and 0 in the opposite case.
Based on the data, issues were reported from 12 distinct ge-
ographical locations. Only 20% of issues had their owners
and originators being in different locations.
Fix Days: Reopened issues may have a long life cycle
from their opened to closed dates since they were assigned
to same states (opened/ assigned) more than once. We have
defined a new metric, namely Fix days, to measure the num-
ber of days between an issue’s opened and closed date. Hy-
pothesis to test the effect of this metric is defined as follows:
H8: Issues which take long time (in days) to fix are more
likely to be reopened.
2.3 Analysis of The Factors
Basic descriptive statistics of the factors can be found in
Table 1. Median values of factors for reopened and not-
reopened issues are significantly different (Mann-Whitney
U Test P < 0.05) in 10 out of 19 cases. This shows that dis-
tributions of reopened issues are shifted towards less/ more
activity in each of 10 factors separately. For instance, re-
opened issues cause significantly more edits on the source
code (8th factor in Table 1) compared to other issues.
Descriptive statistics of factors related to issue-code rela-
tion dimension are particularly interesting. In the extreme
cases, an issue fix can change up to 460 methods or files with
up to 92 kLOC. We believe that such a pattern highlights rel-
ative complexities of addressed issues or architectural com-
plexities of the software. A fix in a complex software will
4
likely involve changes at a lot of interdependent modules.
Project issue fix count distribution is similar to the Pareto-
Law trends observed in the developer activity distribution
in open source projects [15].
Spearman rank correlation coefficients among the factors
we considered are visualised in Figure 2. In the correla-
tion visualisation, bolder colors indicate higher correlations.
If the shape of an ellipse bends towards right, it indicates
positive correlation, whereas negative correlation if its shape
bends towards left. From the figure, it can be observed there
are relatively higher correlations among the factors within
the same dimension, while there are relatively lower corre-
lations between the factors from different dimensions. The
high correlations are especially apparent among the factors
from the issue-code relation (LOC, CC) and issue proximity
(degree, betwenness, pagerank) dimensions.
2.4 Logistic Regression Models
Univariate Logistic Regression
Logistic regression is the standard way to model binary out-
comes yi= 0,1 [1, 10] and therefore it is suitable for our
problem. Logistic regression has been frequently used in
classification problems such as defect prediction in the soft-
ware engineering literature previously [22, 24]. The basic
probability model formula of logistic regression can be stated
as follows:
P r(yi= 1) = logit1(Xiβ) (1)
P r(yi= 1) is the probability of outcome yi= 1. Xiis a
vector of independent parameters for the instance and beta
is the vector of regression coefficients.
One advantage of logistic regression in binary classifica-
tion when compared to methods like Naive Bayes is that its
regression coefficients and other parameters (odds ratio) are
easily interpretable and highly explicative. Assuming the lo-
gistic regression model is true, one can check the significance
of various regression coefficients of different input variables
to understand their explanatory power. In addition rela-
tion of odds, (yi= 1/yi= 0) and individual factors can be
analysed by the following formula:
P r(yi= 1)/P r(yi= 0) = eXiβ(2)
From this formula, the effect of various factors on the
probability of a certain outcome yican be understood by
analysing the effects of their changes.
Exhaustive Search on All Factor Combinations
In order to test the performance of logistic regression with
all possible factors (N), 2Nmodels should be considered.
For large data, various approaches are used to reduce the
number of considered combinations. However, we were able
to test all possible combinations of models (524,288 in total)
in a couple of hours, since our dataset was relatively small.
In order to find the “best” model, a performance criterion
should be optimized. In the literature, a likelihood based
information criteria such as AIC(Akaike Information crite-
rion) or BIC (Bayesian Information criterion) is often used
to maximize the likelihood function while penalizing over-
fitting [17]. Instead of a maximum likelihood based perfor-
mance measure, we use Area Under the receiver operational
characteristic Curve (AUC) as the performance measure.
We believe that AUC represents the predictive performance
of the model more clearly than a likelihood based measure.
AUC is commonly used to compare the performance of var-
ious classification models [16].
Multivariate Logistic Regression
As a second model we built a logistic regression model using
a set of factors with the best predictive power. We observed
the predictive power of this model by drawing its ROC curve
and reported the factors included to the model.
3. RESULTS OF LOGISTIC REGRESSION
MODELS
3.1 Univariate Statistical Model
In order to check the importance of factors individually,
we have checked the significance of univariate logistic regres-
sion models [7]. In Table 2, factors with significant corre-
lations are presented. Five out of 19 factors were found to
have significant correlation coefficients. One of these factors
are related to issue report (same location), three are related
to the developer activity (dev*first last fix, dev total edits,
dev methods edited) and one is related to issue proximity
network (betweenness centrality). We have interpreted our
hypotheses by building a univariate regression model to pre-
dict reopened issues with 19 factors respectively. Regression
coefficients and their significance regions reported in Table
2 are used to validate if a factor is significant for predict-
ing reopened issues. In summary, hypotheses related with
developer activity in terms of frequent issue fixes and meth-
ods edited (H2,H3), issue proximity network (H6) and ge-
ographical locations of issue owner and originator (H7) are
validated and these findings are summarized below in bold
and italic. Other hypotheses could not be validated with
univariate analysis, but we have also checked their signifi-
cance using multivariate analysis of factors in later sections.
Developer activity: We have found that reopened is-
sues have a significantly negative relation with developers
who have not fixed an issue for a long time (coefficient: -
2.6947136, p0.05). Furthermore, reopened issues have
a positive relation with developers who edit relatively more
methods (coefficients: 0.0012903, 0.0019286, p < 0.05). These
results validate our second and third hypotheses (H2,H3).
But, we could not validate H1, which defines a relationship
between developer activity in terms of previously fixed issues
and issue reopening.
Reopened issues are fixed by developers who actively fix
issues and edit many methods.
Issue proximity network: Regarding issue relations in
terms of shared methods, we have validated H6with the
highest coefficient (35.8880) on betweenness centrality.
Reopened issues are linked with other issues in terms of
methods edited during issue fixes.
Issue report: Issues whose owner and originator located
in the same location have a significantly negative impact on
issue reopening (coefficient: -1.1320, p0.05). This also
validates H7, since being in different geographical locations
may lengthen the communication process and cause issue
reopening. However, we could not validate the relationship
between fix days and reopened issues (H8).
Reopened issues are often reported and fixed by people
from different geographical locations.
5
Table 1: Descriptive Statistics of The Considered Factors. The first values in the cells are for all the issues. Values in parantheses are: αfor
not reopened issues, βfor reopened issues.: Factor has significantly different medians for reopened and not reopened issues
Factor Max 75 % Median 25% Minimum Mean
Issue Report Symptom - - - - - -
Phase Found - - - - - -
Same Location - - - - - -
Fix Days 3745(3745α
981β)
240(241α
217β)
119(121α
81β)
39(40α30β) 0(0α, 4β) 169(168α,
176β)
Developer Activity
# Fixed Issues 79(79α58β) 22(21.25α
27.5β)
11 (11α13β) 4(4α4β) 1(1α, 1β) 15.5 (15.38α
18.29β)
Rank in # Fixed Is-
sues
125(119α
125β)
31(31α
33.25β)
14(14α
14.5β)
5(6α4β) 1(1α, 1β) 21.49(21.31α,
24.45β)
Duration Between
First and Last Fix in
Days
3786(3786α
833β)
478(475.25α
559β)
282.5(279α
373β)
124(124α
130.75β)
0(0α, 0β) 388(391α,
345β)
Total # Edits 1347(1347α
1167β)
180(177.25α
241β)
80(80α115β) 24(24α
19.5β)
0(0α, 0β) 149(144α,
234β)
# Unique Methods
Edited 864(864α
794β)
113(110α
168.75β)
56(55α68β) 20(20α
17.5β)
0(0α, 0β) 99(95α,
161β)
Rank in # Methods
Edited 129(129α
115β)
34(34α
28.5β)
14(15α
10.5β)
4(4.75α1β) 0(0α, 0β) 21(22α, 20β)
Issue-Code Relation
Max. CC 2103(2087α
2103β)
275(309.3α
131.75β)
91(92α
75.5β)
37(37α
35.25β)
1(1α, 4β) 303.3(307.2α,
240.9β)
Sum CC 2103(2087α
2103β)
275(309.3α
131.75β)
91(92α
75.5β)
37(37α
35.25β)
1(1α, 4β) 303.3(307.2α,
240.9β)
Max. LOC 22644(22644α
22567β)
3077(3643α
1609β)
1152(1170α
863.5β)
468(466α
473β)
28(28α,
146β)
3458(3508α,
2680β)
Sum LOC 92931(92931α
67509β)
7543(7730α
4588β)
2014(2023α
1586β)
646(645α
721β)
28(28α,
146β)
6076(6103α,
5648β)
# Methods Changed 460(460α86β) 9(9α10β) 3(3α5β) 1(1α1.25β) 1(1α, 1β) 11.26(11.25α,
11.42β)
Issue Proximity Network
Degree 173(115α
173β)
31(31α26β) 10(10α11β) 3(3α4β) 0(0α, 0β) 20.32(20.02α,
25.13β)
Betweenness Cen-
trality
0.22(0.08α
0.22β)
0.0009(0.0009α
0.002β)
0.0002(0.0002α
0.0006β)
1.28e07(0α
1.23e05 β)
0(0α, 0β) 0.002(0.002α,
0.008β)
Pagerank 0.007(0.006α
0.007β)
0.001(0.001α
0.002β)
0.0008(0.0008α
0.0008β)
0.0004(0.0004α
0.0005β)
0(0α, 0β) 0.0001(0.0001α,
0.001β)
Degree Centrality 0.18(0.12α
0.18β)
0.03(0.03α
0.03β)
0.01(0.01α
0.01β)
0.003(0.003α
0.004β)
0(0α, 0β) 0.02(0.02α,
0.02β)
6
Table 2: Coefficients for Univariate Regression
Dimension Factor Coefficient Standard Deviation Z Value P r(>|z|) Significance
Issue Report
Symptom - - - 2
Phase Found - - - 2
Same Location -1.13200 0.27000 -4.160 3.14e-05 FFF
Fix Days 0.00018 0.00061 0.296 0.7700
Developer Activity
Dev fix count 0.01100 0.00770 1.460 0.1500
Dev rank fix count 0.00640 0.00560 1.130 0.2600
Dev first last fix -2.69000 0.17000 -16.222 <2e-16 FFF
Dev total edits 0.00130 0.00043 3.030 0.0024 FF
Dev methods edited 0.00190 0.00060 3.200 0.0014 FF
Dev rank methods count -0.00250 0.00600 -0.420 0.6700
Issue-Code Relation
Max. CC -0.00031 0.00031 -0.995 0.3200
Sum CC -7.08e-05 1.69e-04 -0.420 0.6800
Max. LOC -3.30e-05 2.89e-05 -1.140 0.2500
Sum LOC -4.96e-06 1.42e-05 -0.350 0.7300
Methods changed 0.00022 0.00460 0.047 0.9600
Issue Proximity Network
Degree centrality 7.45000 4.63000 1.610 0.1100
Degree 0.00760 0.00469 1.609 0.1080
Betweenness centrality 35.89000 12.01700 2.990 0.0028 FF
Pagerank 21.16000 11.80000 2.130 0.2200
Issue-code relations: Univariate analyses do not show
a significant relation between code metrics and reopened is-
sues. Hence, we could not validate H4,H5, but observed
predictive power of code metrics in multivariate regression
models.
3.2 The Best Factor Combinations
We ranked all possible combinations of the factors (219
possible models) to predict reopened issues based on our
performance measure, AUC. We counted the occurrence of
each factor in the best 100 models. Best 100 models had a
AUC difference of 0.05 between the best performing and the
worst performing model. All top performing models consist
of 8 to 12 factors.
In Table 3, number of occurrences of each factor with
#Occur > 20 is presented. Out of 19 factors, 4 occurred in
all top performing models, namely, betweenness centrality,
maximum cyclomatic complexity, sum of cyclomatic com-
plexity and maximum LOC. Furthermore, 3 factors occurred
in more than 90% of the top performing models namely fix
count, fix count rank and sum of LOC.
When compared with the significance of factors in uni-
variate regression analysis, code based measures performed
surprisingly well in the top models. Even though we could
not validate hypotheses H4and H5during univariate anal-
ysis, cyclomatic complexity and LOC measures of methods
related with reopened issues have significant benefits to pre-
dictive model. On the other hand, betweenness is significant
both in the best models and in the univariate regression
model.
3.3 Multivariate Statistical Model
In Figure 3, AUC for the model that performed the best
(in predicting reopened issues) during our exhaustive search
is presented. In addition to the 0.81 AUC, the top model
had 0.88 recall and 0.82 precision. If this was a prediction
scenario, we could conclude that the model had a signifi-
cant potential. However, some of the factors such as factors
related to code issue relation have limited applicability in
a prediction scenario because they are only available post-
mortem.
Table 3: Number of Occurrences of Factors In Top
Performing Model Combinations
Factor Name Occurrence In
100 Best
Model (%)
Same Location 100
Betweenness Centrality 100
Maximum Cyclomatic Complexity 100
Sum of Cyclomatic Complexity 100
Maximum LOC 100
# Fixed Issues 94
Sum of LOC 94
Rank in Fixed Issue Count 91
Unique Methods Changed 59
Total # Edits 44
Degree 36
Degree Centrality 24
Duration Between First and Last Fix 24
The factors in Table 3 with the highest number of occur-
rences in top 100 models were present in the top model. The
model with the highest AUC contains the following factors:
Same Location - Betweenness Centrality - Maximum Cyclomatic
Complexity - Sum of Cyclomatic Complexity - Maximum LOC - #
Fixed Issues - # Sum of LOC - Rank in Fixed Issue Count - Unique
Methods Changed - Degree - Duration Between First and Last Fix -
Total # Edits
4. DISCUSSION
4.1 Interpretation of Results with QA Team
In order to interpret our findings, we held a meeting with
the QA Team in the company and asked free format ques-
tions about our analysis. Responses are summarized below.
Betweenness centrality: Why do you think that
reopened issues are often located at the centre of
issue proximity network?
Reopened issues are generally the ones that developers post-
pone fixing or cancel, since a) they may decide that it is not
very critical for the customer and b) fixing it may be risky
or complex so that they may want to delay fixing this issue.
7
Figure 3: ROC curve of the multivariate regression
that uses 12 factors which performed the best in
model testing - AUC : 0.81 (Recall: 0.88 and Preci-
sion: 0.82)
The fact that reopened issues are at the center of this net-
work supports our anecdotal evidences of issues’ complexity.
Rank of developers in terms of method changes:
Do you think that developers who edit many meth-
ods should have a positive or negative impact on
reopened issues?
Usually, the amount of code changed by a developer is pos-
itively correlated with the amount of issues fixed by this de-
veloper. These top “code-altering” developers may decide to
postpone or cancel some of these issues assigned to them, un-
til they reduce their workload, since fixing a new issue may
also introduce others due to large amount of changes.
4.2 Threats to Validity
In this section, we discuss possible threats to validity of
our study. We have used a large scale enterprise product to
conduct a case study. Even though drawing general conclu-
sions from an empirical study is very difficult, results should
be transferable to other researchers with well-designed and
controlled experiments. In this study, we propose a set
of metrics representing four main dimensions to investigate
their effects on reopened issues. Our methodology consists
of investigating their individual effects on reopened issues,
as well as pairwise metric relations to find the best set of
metrics predicting reopened issues. Using the same method-
ology, results can be replicated and refuted on new datasets.
In this case study, we were able to extract 2287 resolved
issues of which 1046 were matched with the code base and
developers. At a first glance, this seems to be a small set,
however we traced all issues submitted since 2004 and fil-
tered them based on our requirements. While reducing the
dataset, we also considered the fact that ratio of reopened
issues over other issues should be similar to the ratio of the
initial dataset. When we linked issues with developer activ-
ities, this ratio was 7%, whereas in the final set, this ratio
was reduced to 6%, with a minor change. We have also
worked closely with QA team in the company to validate
data quality.
We use logistic regression during univariate and multi-
variate analyses, since coefficients in logistic regression are
easily interpretable with their significance regions. We did
not consider using other algorithms and compre their per-
formance with regression, since our aim was to understand
the explanatory power of metrics on reopened issues, rather
than building the best predictive model for reopened issues.
But, we did considered all factor combinations to select the
best metric set during our experiments.
5. RELATED WORK
Issue management systems of large open source software
projects have become available on the internet since early
1990s. Issue management systems of some applications with
commercial licences are publicly accessible since the owners
of these applications would like to make their issue handling
process transparent to their users. Research on issue reposi-
tory data has started in parallel with the public availability
of past software issue management data.
There are two recent papers closely related to our research.
Shihab et al. [19] analysed work habits, bug report, bug fix
dimensions for Eclipse Project to find the factors that con-
tributed to bug reopening and built a reopened bug predic-
tion model using decision trees. Shihab et al. found that the
comment text, description text, time to resolve the bug and
the component the bug was found were the leading factors
that caused bug reopening for Eclipse [19]. On the other
hand, Zimmermann et al. [25] analysed Windows Vista and
7 issue repository and conducted a survey on 394 developers
in order the important factors that causes bug reopens [25].
Zimmermann et al. built a logistic regression model in or-
der to identify the factors that may cause issue reopening.
In their research Zimmermann et al. used organizational
and process related factors in addition to factors directly
extractable from the issue report. In their logistic regres-
sion model nearly all the factors they observed were found
to be significant which included factors related to location,
work habits and bug report characteristics.
Our work is different than the research by Shihab et al.
and Zimmermann et al. in two aspects: 1) The factors and
the dataset we analysed are different, 2) We analysed the
effect of the combinations of various factors in addition to
individual factors on issue reopening.
Other notable recent areas of research include automated
issue triage, factors that change the quality of issue reports,
detection of duplicate issues and estimation of issue fix du-
rations [3, 6, 21].
One important related research topic about issues is au-
tomated bug triage, the procedure of processing an issue
report and assigning the right issue to the right developer.
This problem is especially important for large software with
millions of users. Anvik et al. found that 300 daily reported
issues make it impossible for developers to triage issues effec-
tively for Mozilla based on an interview with an anonymous
developer [3, 12, 21].
Text mining methods have been used in several studies to
find the most relevant developer to handle a bug in auto-
8
mated bug triage models [2–4, 9, 20]. Bakir et al. proposed
a model that forwarded auto-generated software fault data
directly to the relevant developers by mining the patterns in
the faults [5]. The benefit of automated bug triage is often
measured by % of actual owners estimated by the model and
the decrease in issue reassignments or bug tosses. While bug
triage studies claim that bug tossing is time consuming, [20],
Guo et al. observed that issue reassignment is beneficial for
communication [13].
Effective issue reporting is also important for reporters as
for developers. In an exploratory study on Windows Vista,
Guo et al. [12] identified the characteristics of bugs that
are getting fixed. Bettenburg et al. [6] also analysed the
components of a bug report (severity, stack traces, builds,
screenshots) that make a bug more likely to be resolved.
Estimation of issue resolve times is another research area
to plan developers’ efforts efficiently. Some studies on open
source projects can be found in ( [11], [21]).
6. CONCLUSIONS
In this paper, we analysed the effects of 19 factors from
four different dimensions on the probability of issue reopen-
ing for a large-scale software developed in geographically
distributed locations. In our study, we have found that a
subset of these factors are important for issue reopening.
The predictive power of best factor combinations is high
with AUC 0.81 in the best performing models.
RQ: Which factors lead to issues getting reopened?
In order to find the factors that were most important for
issue reopening, we built a univariate and best-subset lo-
gistic regression models. In the univariate logistic regres-
sion model and the best-subset logistic regression model we
checked the importance of the factors we considered.
Dimensions of developer activity (in terms of the time
between first and last issue fixes and the number of meth-
ods edited during issue fixes), issue proximity network (in
terms of common methods changed during issue fixes) and
geographical locations of issue owners and originators) are
found to be important for issue reopening. In the top rank-
ing logistic regression models based on their predictive power,
factors from all dimensions were prominent.
In previous research on this topic, nearly all of the con-
sidered factors were found to be significant [19], [25]. On
the contrary, in our analysis we found that a subset of our
considered factors are significantly more important in issue
reopening. The best logistic regression model in terms of
predictive power contains 12 factors out of 19 (Section 3.3
for the full list).
Implications of the Results To The Industry
Issue reopening can lead to unanticipated resource alloca-
tion, leading to projects running over budget and late. There-
fore, it is important to proactively identify issue that can be
reopened and take corrective actions. Quantitative findings
of our study suggest that issue complexity and developers
workload play an important role in triggering issue reopen-
ing. This information can aid managers in deriving concrete
corrective actions (e.g., ensuring existence of deep code re-
view and reducing developer’s workload.
As indicated in our results, issue reopening may have
many reasons and may not be modelled by a small set of
Table 4: A list of phases in which issues were found
and reported.
Phase Issues reported during this
phase (%)
Customer 25.5
Functional testing 19.9
Regression testing 8.5
Coding 7.4
System testing 6.8
Nightly build 5.0
Performance testing 3.9
Design 3.6
Unit testing 2.7
Beta testing 1.3
Others 15.5
factors. In addition, some of the causes of the issue reopen-
ing such as design problems may be outside the scope of the
issue management process. Identifying, the important fac-
tors that may lead to issue reopening may be the first step to
lead companies to understand these underlying causes and
take necessary actions.
Future Work
Every model is a simplification of reality and has its limi-
tations. We attempted to model the 3 aspects of software
development (people, process, product) when choosing the
factors. New factors can be proposed in the future studies
related to these aspects. As usual, in this case study, one
possible future work would be testing our conclusions in new
datasets. Another area to consider would be analysing the
causality relations between the considered factors and the
probability of issue reopening.
APPENDIX
Table 4 presents majority of phases in which issues were
found and reported in the company and percentage of their
occurrence. We have listed 10 phases which account for 85%
of all issue reports, and defined other phases as “Others” due
to privacy issues.
Acknowledgment
This research is supported in part by Turkish State Planning
Organization (DPT) under the project number 2007K120610
and partially supported by TEKES under Cloud-SW project
in Finland. We would like to thank IBM Canada Lab –
Toronto site for making their development data available
for research and strategic help during all phases of this re-
search. The opinions expressed in this paper are those of
the authors and not necessarily of IBM Corporation.
7. REFERENCES
[1] E. Alpaydin. Introduction to Machine Learning (Adaptive
Computation and Machine Learning). The MIT Press,
2004.
[2] J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this
bug? In Proceedings of the International Conference on
Software Engineering, pages 361–370, Shanghai, China,
2006.
[3] J. Anvik and G. Murphy. Determining implementation
expertise from bug reports. In Mining Software
Repositories, 2007. ICSE Workshops MSR’07. Fourth
International Workshop on, pages 1–8. IEEE, 2007.
9
[4] J. Anvik and G. C. Murphy. Reducing the effort of bug
report triage. ACM Transactions on Software Engineering
and Methodology, 20(3):1–35, Aug. 2011.
[5] A. Bakir, E. Kocaguneli, A. Tosun, A. Bener, and
B. Turhan. Xiruxe: An Intelligent Fault Tracking Tool.
AIPR09, Orlando, 2009.
[6] N. Bettenburg and A. Hassan. Studying the Impact of
Social Structures on Software Quality. In 2010 IEEE 18th
International Conference on Program Comprehension,
pages 124–133. IEEE, 2010.
[7] L. Briand, W. Melo, and J. Wust. Assessing the
applicability of fault-proneness models across
object-oriented software projects. Software Engineering,
IEEE Transactions on, 28(7):706–720, 2002.
[8] B. Caglayan, A. Tosun, A. Miranskyy, A. Bener, and
N. Ruffolo. Usage of multiple prediction models based on
defect categories. In Proceedings of the 6th International
Conference on Predictive Models in Software Engineering,
pages 1–9. ACM, 2010.
[9] D. Cubranic and G. Murphy. Automatic bug triage using
text categorization. In Proceedings of the Sixteenth
International Conference on Software Engineering
Knowledge Engineering, pages 1–6. Citeseer, 2004.
[10] A. Gelman and J. Hill. Data Analysis Using Regression
And Multilevel/Hierarchical Models. Analytical Methods
for Social Research. Cambridge University Press, 2007.
[11] E. Giger, M. Pinzger, and H. Gall. Predicting the Fix Time
of Bugs. In RSSE ’10 Proceedings of the 2nd International
Workshop on Recommendation Systems for Software
Engineering, pages 52–56, 2010.
[12] P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy.
Characterizing and predicting which bugs get fixed: An
empirical study of Microsoft Windows. In Software
Engineering, 2010 ACM/IEEE 32nd International
Conference on, volume 1, pages 495–504. IEEE, 2010.
[13] P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy.
Not my bug! and other reasons for software bug report
reassignments. In Proceedings of the ACM 2011 conference
on Computer supported cooperative work, pages 395–404.
ACM, 2011.
[14] J. Herbsleb and A. Mockus. An empirical study of speed
and communication in globally distributed software
development. IEEE Transactions on Software Engineering,
29(6):481–494, June 2003.
[15] S. Koch. Effort modeling and programmer participation in
open source software projects. Information Economics and
Policy, 20(4):345–355, Dec. 2008.
[16] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch.
Benchmarking classification models for software defect
prediction: A proposed framework and novel findings.
Software Engineering, IEEE Transactions on,
34(4):485–496, 2008.
[17] C. d. Mazancourt and V. Calcagno. glmulti: An r package
for easy automated model selection with (generalized)
linear models. Journal of Statistical Software, 34(i12), 2010.
[18] A. T. Misirli, B. Caglayan, A. V. Miranskyy, A. Bener, and
N. Ruffolo. Different strokes for different folks: a case study
on software metrics for different defect categories. In
Proceedings of the 2nd International Workshop on
Emerging Trends in Software Metrics, WETSoM ’11, pages
45–51, New York, NY, USA, 2011. ACM.
[19] E. Shihab, A. Ihara, Y. Kamei, W. M. Ibrahim, M. Ohira,
B. Adams, A. E. Hassan, and K.-i. Matsumoto. Predicting
Re-opened Bugs: A Case Study on the Eclipse Project.
2010 17th Working Conference on Reverse Engineering,
pages 249–258, Oct. 2010.
[20] A. Tamrawi, T. Nguyen, and J. Al-Kofahi. Fuzzy set-based
automatic bug triaging: NIER track. Proceedings of the
33rd International Conference on Software Engineering,
pages 884–887, 2011.
[21] C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How
long will it take to fix this Bug? In Fourth International
Workshop on Mining Software Repositories, 2007. ICSE
Workshops MSR’07, number 2, 2007.
[22] E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Using
developer information as a factor for fault prediction. In
Proceedings of the Third International Workshop on
Predictor Models in Software Engineering. IEEE Computer
Society, May 2007.
[23] S. Zaman, B. Adams, and A. E. Hassan. Security Versus
Performance Bugs : A Case Study on Firefox. Design,
pages 93–102, 2011.
[24] T. Zimmermann and N. Nagappan. Predicting defects with
program dependencies. 2009 3rd International Symposium
on Empirical Software Engineering and Measurement,
pages 435–438, Oct. 2009.
[25] T. Zimmermann, N. Nagappan, P. Guo, and B. Murphy.
Characterizing and predicting which bugs get reopened. In
Proceedings of the 34th International Conference on
Software Engineering [ACCEPTED], 2012.
10
... It is the main repository of the quality assurance operations in a software project. We have used the issue management software to map the defects with the underlying source code modules and to analyze the issue handling process of or- ganizations [60], [61]. Defect data may be used to label the defect prone modules for defect prediction or to assess software quality in terms of defect density or defect count. ...
... For example, a graphical representation of commits extracted from a version control system would reveal the distribution of workload among developers, i.e., what percentage of developers actively develop on a daily basis [53], or it would highlight which components of a software system are frequently changed. On the other hand, a statistical test between metrics characterizing issues that are previously fixed and stored in an issue repository may identify the reasons for re-opened bugs [60] or reveal the issue workload among software developers [61] . Depending on the questions that are investigated , we can collect various types of metrics from software repositories; but it is inappropriate to use any statistical technique or visualization approach without considering the data characteristics (e.g. ...
... We concluded that testers may report more bugs than the amount that developers fix before each release, and hence, as more bugs are reported, the number of production defects increase. In another case study with a large scale software development organization, we used Spearman correlation to analyse the relationship between metrics that characterize reopened issues (issues closed and opened again during an issue life cycle) [60]. We found strong statistical relationship between the lines of code changed to fix a reopened issue and the dependencies of a reopened issue to other issues, i.e., the higher the proximity of an issue to the others, the more lines of code is affected during its fix. ...
Chapter
In this chapter, we share our experience and views on software data analytics in practice with a retrospect to our previous work. Over ten years of joint research projects with the industry, we have encountered similar data analytics patterns in diverse organizations and in different problem cases. We discuss these patterns following a 'software analytics' framework: problem identification, data collection, descriptive statistics and decision making. We motivate the discussion by building our arguments and concepts around our experiences of the research process in six different industry research projects in four different organizations.
... Zimmermann et al. [32] investigate the reasons for bug reopening and find that bugs identified by code analysis tools or code review processes are less likely to be re-opened. Caglayan et al. [6] report that developers' activities are important factors that cause bugs to be re-opened. ...
Conference Paper
Full-text available
Background: Bug fixing is one major activity in software maintenance to solve unexpected errors or crashes of software systems. However, a bug fix can also be incomplete and even introduce new bugs. In such cases, extra effort is needed to rework the bug fix. The reworking requires to inspect the problem again, and perform the code change and verification when necessary. Discussions throughout the bug fixing process are important to clarify the reported problem and reach a solution. Aims: In this paper, we explore how discussions during the initial bug fix period (i.e., before the bug reworking occurs) associate with future bug reworking. We focus on two types of "reworked bug fixes": 1) the initial bug fix made in a re-opened bug report; and 2) the initially submitted patch if multiple patches are submitted for a single bug report. Method: We perform a case study using five open source projects (i.e., Linux, Firefox, PDE, Ant and HTTP). The discussions are studied from six perspectives (i.e., duration, number of comments, dispersion, frequency, number of developers and experience of developers). Furthermore, we extract topics of discussions using Latent Dirichlet Allocation (LDA). Results: We find that the occurrence of bug reworking is associated with various perspectives of discussions. Moreover, discussions on some topics (e.g., code inspection and code testing) can decrease the frequency of bug reworking. Conclusions: The discussions during the initial bug fix period may serve as an early indicator of what bug fixes are more likely to be reworked.
... Zimmermann et al. [32] investigate the reasons for bug reopening and find that bugs identified by code analysis tools or code review processes are less likely to be re-opened. Caglayan et al. [6] report that developers' activities are important factors that cause bugs to be re-opened. ...
Conference Paper
Background: Bug fixing is one major activity in software maintenance to solve unexpected errors or crashes of software systems. However, a bug fix can also be incomplete and even introduce new bugs. In such cases, extra effort is needed to rework the bug fix. The reworking requires to inspect the problem again, and perform the code change and verification when necessary. Discussions throughout the bug fixing process are important to clarify the reported problem and reach a solution. Aims: In this paper, we explore how discussions during the initial bug fix period (i.e., before the bug reworking occurs) associate with future bug reworking. We focus on two types of reworked bug fixes: 1) the initial bug fix made in a re-opened bug report; and 2) the initially submitted patch if multiple patches are submitted for a single bug report. Method: We perform a case study using five open source projects (i.e., Linux, Firefox, PDE, Ant and HTTP). The discussions are studied from six perspectives (i.e., duration, number of comments, dispersion, frequency, number of developers and experience of developers). Furthermore, we extract topics of discussions using Latent Dirichlet Allocation (LDA). Results: We find that the occurrence of bug reworking is associated with various perspectives of discussions. Moreover, discussions on some topics (e.g., code inspection and code testing) can decrease the frequency of bug reworking. Conclusions: The discussions during the initial bug fix period may serve as an early indicator of what bug fixes are more likely to be reworked.
... Affects extracted from comments in addition to other important metrics, specifically can be used to investigating code review quality. @BULLET analyze social and technical debt in software develop- ment [18, 20] or bug life cycle [3]. @BULLET study the impact of affects regarding scheduling of developers . ...
Conference Paper
Full-text available
Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and (recently) investigating developer affectiveness. For the latter, issue tracking systems can be mined to explore developers emotions, sentiments and politeness---affects for short. However, research on affect detection in software artefacts is still in its early stage due to the lack of manually validated data and tools. In this paper, we contribute to the research of affects on software artefacts by providing a labeling of emotions present on issue comments. We manually labeled 2,000 issue comments and 4,000 sentences written by developers with emotions such as love, joy, surprise, anger, sadness and fear. Labeled comments and sentences are linked to software artefacts reported in our previously published dataset (containing more than 1K projects, more than 700K issue reports and more than 2 million issue comments). The enriched dataset presented in this paper allows the investigation of the role of affects in software development.
Article
Full-text available
Reopened bugs can degrade the overall quality of a software system since they require unnecessary rework by developers. Moreover, reopened bugs also lead to a loss of trust in the end-users regarding the quality of the software. Thus, predicting bugs that might be reopened could be extremely helpful for software developers to avoid rework. Prior studies on reopened bug prediction focus only on three open source projects (i.e., Apache, Eclipse, and OpenOffice) to generate insights. We observe that one out of the three projects (i.e., Apache) has a data leak issue – the bug status of reopened was included as training data to predict reopened bugs. In addition, prior studies used an outdated prediction model pipeline (i.e., with old techniques for constructing a prediction model) to predict reopened bugs. Therefore, we revisit the reopened bugs study on a large scale dataset consisting of 47 projects tracked by JIRA using the modern techniques such as SMOTE, permutation importance together with 7 different machine learning models. We study the reopened bugs using a mixed methods approach (i.e., both quantitative and qualitative study). We find that: 1) After using an updated reopened bug prediction model pipeline, only 34% projects give an acceptable performance with AUC ≥\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\geqslant $\end{document} 0.7. 2) There are four major reasons for a bug getting reopened, that is, technical (i.e., patch/integration issues), documentation, human (i.e., due to incorrect bug assessment), and reasons not shown in the bug reports. 3) In projects with an acceptable AUC, 94% of the reopened bugs are due to patch issues (i.e., the usage of an incorrect patch) identified before bug reopening. Our study revisits reopened bugs and provides new insights into developer’s bug reopening activities.
Conference Paper
Full-text available
Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers "affectiveness". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.
Article
Change Requests (CRs) are key elements to software maintenance and evolution. Finding the appropriate developer to a CR is crucial for obtaining the lowest, economically feasible, fixing time. Nevertheless, assigning CRs is a labor-intensive and time consuming task. In this paper, we report on a questionnaire-based survey with practitioners to understand the characteristics of CR assignment, and on a semi-automated approach for CR assignment which combines rule-based and machine learning techniques. In accordance with the results of the survey, the proposed approach emphasizes the use of contextual information, essential to effective assignments, and puts the development team in control of the assignment rules, toward making its adoption easier. The assignment rules can be either extracted from the assignment history or created from scratch. An empirical validation was performed through an offline experiment with CRs from a large software project. The results pointed out that the approach is up to 46,5% more accurate than other approaches which relying solely on machine learning techniques. This indicates that a rule-based approach is a viable and simple method to leverage CR assignments.
Article
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science. The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions. Presents best practices, hints, and tips to analyze data and apply tools in data science projects Presents research methods and case studies that have emerged over the past few years to further understanding of software data Shares stories from the trenches of successful data science initiatives in industry.
Article
Full-text available
In this chapter, we share our experience and views on software data analytics in practice with a review of our previous work. In more than 10 years of joint research projects with industry, we have encountered similar data analytics patterns in diverse organizations and in different problem cases. We discuss these patterns following a "software analytics" framework: problem identification, data collection, descriptive statistics, and decision making. In the discussion, our arguments and concepts are built around our experiences of the research process in six different industry research projects in four different organizations.Methods: Spearman rank correlation, Pearson correlation, Kolmogorov-Smirnov test, chi-square goodness-of-fit test, t test, Mann-Whitney U test, Kruskal-Wallis analysis of variance, k-nearest neighbor, linear regression, logistic regression, naïve Bayes, neural networks, decision trees, ensembles, nearest-neighbor sampling, feature selection, normalization.
Article
Full-text available
Two important questions concerning the coordination of de-velopment effort are which bugs to fix first and how long it takes to fix them. In this paper we investigate empirically the relationships between bug report attributes and the time to fix. The objective is to compute prediction models that can be used to recommend whether a new bug should and will be fixed fast or will take more time for resolution. We examine in detail if attributes of a bug report can be used to build such a recommender system. We use decision tree analysis to compute and 10-fold cross validation to test pre-diction models. We explore prediction models in a series of empirical studies with bug report data of six systems of the three open source projects Eclipse, Mozilla, and Gnome. Re-sults show that our models perform significantly better than random classification. For example, fast fixed Eclipse Plat-form bugs were classified correctly with a precision of 0.654 and a recall of 0.692. We also show that the inclusion of post-submission bug report data of up to one month can further improve prediction models.
Conference Paper
Full-text available
Assigning a bug to the right developer is a key in reducing the cost, time, and efforts for developers in a bug fixing process. This assignment process is often referred to as bug triaging. In this paper, we propose Bugzie, a novel approach for automatic bug triaging based on fuzzy set-based modeling of bug-fixing expertise of developers. Bugzie considers a system to have multiple technical aspects, each is associated with technical terms. Then, it uses a fuzzy set to represent the developers who are capable/competent of fixing the bugs relevant to each term. The membership function of a developer in a fuzzy set is calculated via the terms extracted from the bug reports that (s)he has fixed, and the function is updated as new fixed reports are available. For a new bug report, its terms are extracted and corresponding fuzzy sets are union'ed. Potential fixers will be recommended based on their membership scores in the union'ed fuzzy set. Our preliminary results show that Bugzie achieves higher accuracy and efficiency than other state-of-the-art approaches.
Conference Paper
Full-text available
Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on the Eclipse project. We structure our study along 4 dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed on), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). Our case study on the Eclipse Platform 3.0 project shows that the comment and description text, the time it took to fix the bug, and the component the bug was found in are the most important factors in determining whether a bug will be re-opened. Based on these dimensions we create decision trees that predict whether a bug will be re-opened after its closure. Using a combination of our dimensions, we can build explainable prediction models that can achieve 62.9% precision and 84.5% recall when predicting whether a bug will be re-opened.
Code
R package for Data Analysis using multilevel/hierarchical model
Article
A key collaborative hub for many software development projects is the bug report repository. Although its use can improve the software development process in a number of ways, reports added to the repository need to be triaged. A triager determines if a report is meaningful. Meaningful reports are then organized for integration into the project's development process. To assist triagers with their work, this article presents a machine learning approach to create recommenders that assist with a variety of decisions aimed at streamlining the development process. The recommenders created with this approach are accurate; for instance, recommenders for which developer to assign a report that we have created using this approach have a precision between 70&percnt; and 98&percnt; over five open source projects. As the configuration of a recommender for a particular project can require substantial effort and be time consuming, we also present an approach to assist the configuration of such recommenders that significantly lowers the cost of putting a recommender in place for a project. We show that recommenders for which developer should fix a bug can be quickly configured with this approach and that the configured recommenders are within 15&percnt; precision of hand-tuned developer recommenders.
Conference Paper
Fault localization in telecommunication sector is a major challenge. Most companies manually try to trace faults back to their origin. Such a process is expensive, time consuming and ineffective. Therefore in this study we automated manual fault localization process by designing and implementing an intelligent software tool (Xiruxe) for a local telecommunications company. Xiruxe has a learning-based engine which uses powerful AI algorithms, such as Naïve Bayes, Decision Tree and Multi Layer Perceptrons, to match keywords and patterns in the fault messages. The initial deployment results show that this intelligent engine can achieve a misclassification rate as low as 1.28%.
Article
Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understanding the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to describe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.
Article
Defect prediction has been evolved with variety of metric sets, and defect types. Researchers found code, churn, and network metrics as significant indicators of defects. However, all metric sets may not be informative for all defect categories such that only one metric type may represent majority of a defect category. Our previous study showed that defect category sensitive prediction models are more successful than general models, since each category has different characteristics in terms of metrics. We extend our previous work, and propose specialized prediction models using churn, code, and network metrics with respect to three defect categories. Results show that churn metrics are the best for predicting all defects. The strength of correlation for code and network metrics varies with defect category: Network metrics have higher correlations than code metrics for defects reported during functional testing and in the field, and vice versa for defects reported during system testing.