Conference PaperPDF Available

# Factors characterizing reopened issues: A case study

Authors:

## Abstract and Figures

Background: Reopened issues may cause problems in managing software maintenance effort. In order to take actions that will reduce the likelihood of issue reopening the possible causes of bug reopens should be analysed. Aims: In this paper, we investigate potential factors that may cause issue reopening. Method: We have extracted issue activity data from a large release of an enterprise software product. We consider four dimensions, namely developer activity, issue proximity network, static code metrics of the source code changed to fix an issue, issue reports and fixes as possible factors that may cause issue reopening. We have done exploratory analysis on data. We build logistic regression models on data in order to identify key factors leading issue reopening. We have also conducted a survey regarding these factors with the QA Team of the product and interpreted the results. Results: Our results indicate that centrality in the issue proximity network and developer activity are important factors in issue reopening. We have also interpreted our results with the QA Team to point out potential implications for practitioners. Conclusions: Quantitative findings of our study suggest that issue complexity and developers workload play an important role in triggering issue reopening.
Content may be subject to copyright.
Factors Characterizing Reopened Issues: A Case Study
Bora Caglayan1, Ayse Tosun Misirli2, Andriy Miranskyy3, Burak Turhan4, Ayse Bener5
Bogazici University1,2, IBM Canada Ltd.3, University of Oulu4, Ryerson University5
Department of Computer Engineering, Istanbul, Turkey1,2
IBM Toronto Software Laboratory, Toronto, ON Canada3
Department of Information Processing Science, Oulu, Finland4
Ted Rogers School of Information Technology Management, Toronto, ON Canada5
{bora.caglayan1, ayse.tosun2}@boun.edu.tr
andriy@ca.ibm.com 3, burak.turhan@oulu.ﬁ 4, ayse.bener@ryerson.ca5
ABSTRACT
Background: Reopened issues may cause problems in man-
aging software maintenance eﬀort. In order to take actions
that will reduce the likelihood of issue reopening the possi-
ble causes of bug reopens should be analysed.
Aims: In this paper, we investigate potential factors that
may cause issue reopening.
Method: We have extracted issue activity data from a large
release of an enterprise software product. We consider four
dimensions, namely developer activity,issue proximity net-
work,static code metrics of the source code changed to ﬁx
an issue, issue reports and ﬁxes as possible factors that may
cause issue reopening. We have done exploratory analysis
on data. We build logistic regression models on data in or-
der to identify key factors leading issue reopening. We have
also conducted a survey regarding these factors with the QA
Team of the product and interpreted the results.
Results: Our results indicate that centrality in the issue
proximity network and developer activity are important fac-
tors in issue reopening. We have also interpreted our results
with the QA Team to point out potential implications for
practitioners.
Conclusions: Quantitative ﬁndings of our study suggest
that issue complexity and developers workload play an im-
portant role in triggering issue reopening.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics—process metrics,
complexity measures, performance measures
General Terms
Measurement, Experimentation
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
PROMISE ’12, September 21–22, 2012, Lund, Sweden
Copyright 2012 ACM 978-1-4503-1241-7/12/09 ...$15.00. Keywords software maintenance, issue management, issue repository, issue reopening. 1. INTRODUCTION Issue management is a key activity within software projects especially during the maintenance phase. A steady inﬂux of new issue reports are ﬁltered, prioritized, assigned and han- dled by the maintainers daily for popular software. Man- aging the issue handling process eﬀectively is an important element for the long term stability of the software product. In an idealized issue management scenario, the issue re- port and the records of issue activity are moved to the archives indeﬁnitely after an issue is closed. Reopened is- sues are a group of issues that are exceptions to this ideal- ized scenario. These issues go through the issue handling procedures at least one more time after they are archived. Understanding reopened issues is of signiﬁcant interest to practitioners, since these issues may represent a miscommu- nication between issue assigner and assignee. Furthermore, reopened issues may cause waste of time and eﬀort if they are frequent in issue repositories. We asked the the quality assurance (QA) team of a large scale company about the possible reasons of reopening and the beneﬁts of investigating these reasons. The opinions of the QA team are as follows: Reopened issues may have diﬀerent explanations, but the major one is a back and forth discussion between issue originator and owner on “yes, it’s bug /no, it works as designed”. Most of the time, people spent arguing on the issue and it is assigned to “opened” multiple times due to this discussion. Reopened issues consist of less than 10% of all issues, and hence it is not a major pain point. But, they also cause waste of resources. Identifying the main reasons for these issues gives us a chance for process improvement. Reducing this ratio down to 5% would also be very beneﬁcial. We consider an issue as reopened if it has changed from a termination state such as closed, cancelled or postponed to an active state such as assigned or work-in progress. Identi- fying the factors that lead to issue reopens is crucial in such a situation in order to take the necessary actions. 1 In this paper our research question is identifying the pos- sible factors that may lead to issues getting reopened: RQ: Which factors lead to issues getting reopened? In order to answer our research question, we analysed a large scale enterprise software and its issue and code repos- itory. We modelled four dimensions, namely 1)issue-code relation, 2)issue proximity network, 3)issue reports and 4) developer activity, in order to check their individual eﬀects on issues getting reopened. In the analysis, we used logistic regression to ﬁt a predic- tive model on the issue data. As a ﬁrst step, we conducted univariate regression by using each of the factors from the four dimensions namely developer activity, issue-code rela- tion, issue proximity network and the issue report. After that, we did an exhaustive search for the best factor combi- nation by optimizing the AUC (area under the receiver op- erating characteristic curve) for all the possible factor com- binations. Finally, we showed the model with the highest success rate among all possible model combinations. Previously two independent research groups investigated the factors that may cause issue reopens for Microsoft Win- dows [25] and Eclipse projects [19]. However, to the best of our knowledge, factors used in our study related to issue- code relation and issue proximity network have not been considered previously. The main contribution of this paper is analysis of factors that lead to issues getting reopened in a large-scale software developed in multiple locations. Since some of the measures used in the two previous studies are not extractable for our dataset, our aim is complementing the ﬁndings of previous researchers rather than testing their ﬁndings on our new dataset. The rest of the paper is structured as follows: In Related Work section, we discuss the relevant work on understanding the reasons for issue reopening. In Methodology, we present the dataset, data extraction process and metrics and logistic regression models. In Results section, we show and inter- pret the outcomes of the logistic regression model. Finally a discussion of threats to the validity of the results and con- clusions with possible future research topics are presented. 2. METHODOLOGY In this paper, we perform quantitative analysis on issue activity database, whose attributes are described in Section 2.1, extracted factors that may have signiﬁcant inﬂuences on reopened issues (Section 2.2), and built a statistical model to interpret them (Section 2.3). 2.1 Dataset We have used issue activity database of a large-scale en- terprise software product which has a long development his- tory with a 20 years old code base. The company uses IBM Rational ClearQuest with customised defect forms as the is- sue management system. In this database, each issue record includes, but not limited to, the following features: Originator: The person who opens an issue. Often, testers (or support personnel) are the originators. Owner: The person who is assigned to an issue. The owner is often the developer, who ﬁxes an issue espe- cially when issue is classiﬁed as a defect. State: There are 11 distinct states in the database of the company: Opened, Assigned, Working, Delivered, Returned, Integrated, Validated, Rejected, Closed, Post- poned, Cancelled. Phase Found: It indicates the phase in the develop- ment life cycle an issue is reported. A list of phases are summarized in Appendix with their occurrence rates in the issue database. Symptom: This is a sign of problem (“crash/outage”) experienced by a customer. Some of the common symp- toms are Build/Test Failed, Core Dump, Program De- fect, Incorrect I/O. We have found that symptoms are signiﬁcantly correlated with phase found. A typical life cycle of an issue in our case study can be seen in Figure 1. Bold arrows show a typical life cycle, while the dashed arrows numbered with (1), (3), (4) and (5) in- dicate a reopening. It is often the case when an issue is assigned right from the start and the owner starts working on it immediately (arrow number (2)). The ﬁnal status of an issue is stored in State ﬁeld, but changes of the state are stored in two other ﬁelds (old state and new state) with the person id (generally owner or originator) who makes this modiﬁcation. Figure 1: Life cycle of an issue in our case study In our case study, issue activity database contains 3645 unique issue reports with the earliest record opened on Jan- uary, 2005. We ﬁltered only closed issues from this database and obtained 2287 issues in total. Of these issues, 219 (ap- proximately 9%) are classiﬁed as reopened. This dataset is further ﬁltered as we incorporate factors from the code base. 2.2 Factors Affecting Issue Reopening We have previously extracted code, network and churn metrics from the code base of the same product 16 months prior to release date at the method level. Our data extrac- tion methodology and prediction models built with these metric sets can be read from [8] and [18]. In this paper, we identify four dimensions about 1) developers who ﬁx these issues, 2) size and complexity of methods edited during issue ﬁxes, 3) the relationship between issues and 4) other factors about issue reports and ﬁx activities. For each of these fac- tors, we deﬁne hypotheses. We also deﬁne abbreviations for all factors to ease their usage on tables and ﬁgures. Of these 2287 closed issues in the database, 1318 issue records are matched with developer activities in which there are 88 (7%) reopened issues. Furthermore, when code and network metrics are extracted, these issues are mapped with source code at method level. After issue-code mapping, ﬁnal dataset includes 1046 issues, of which 62 (6%) are reopened. 2 2.2.1 Developer Activity Previously, Guo et al. [12] and Zimmermann et al. [25] de- ﬁned bug opener’s reputation as the ratio of “total number of previous bugs opened and gotten ﬁxed”over “total number of bugs he/she opened”. In a case study on Firefox performance and security bugs, authors analysed ”who ﬁxes these bugs” by measuring the expertise of developers in terms of num- ber of previously ﬁxed bugs by the developer and experience in days, i.e. number of days from the ﬁrst ﬁx by the given developer to the latest bug’s ﬁx date [23]. Similar to these approaches, we extracted 6 diﬀerent metrics representing the development activities of issue owners, i.e., developers who owned and ﬁxed issues. We analyse developers’ defect ﬁxing activity to verify the following hypotheses: H1: Issues owned by developers, who ﬁx few issues, are more likely to be reopened. H2: Issues owned by developers, who haven’t ﬁxed an issue for a long time, are more likely to be reopened. H3: Issues owned by developers, who edit large number of methods to ﬁx an issue, are more likely to be reopened. # Fixed Issues (dev ﬁx count): To test H1, we com- puted the number of previously ﬁxed issues of a developer who owns the current issue, up to the current issue’s open- ing date. For example, suppose john.black@XXX.com owns issue ID #120 which was opened on May 2009. Then, we cal- culated the number of previously ﬁxed issues by john.black since May 2009. Therefore, even though two issues are ﬁxed by the same developer, this metric’s value may change for these issues if their opening dates are diﬀerent from each other. Rank in terms of # Fixed Issues (dev rank ﬁx count): This metric is calculated using Fixed Issue Count and ranks of developers who owned and ﬁxed issues. For each issue’s opening date, # ﬁxed issues and ranks by developers are re-calculated. For example, suppose john.black@ XXX.com ﬁxed issue ID #120 which was opened on May, 5th 2009 and closed on June, 6th 2009, and he also ﬁxed issue ID #133 which was opened on July, 5th 2009 and closed on October, 6th 2009. Considering this case, we computed previously ﬁxed issues for john.black twice, both for May 2009 (k) and July 2009 (k+1 by adding issue #120). Ranks of this devel- oper is also calculated based on other developers’ issue ﬁx performance on May 2009 and July 2009. So, john.black can be at the ﬁrst rank on May 2009 with kﬁxed issues, but at the third rank on July 2009 with (k+1) ﬁxed issues. Duration between the First and Last Fix (dev ﬁrst last ﬁx): This metric computes the number of days from the ﬁrst ﬁx of the developer to the current issue’s ﬁx date. To test H2, we computed this metric for all developers asso- ciated with issues in our database. The reason for choosing this metric is as follows: Reopened issues are critical in the sense that they require thorough knowledge of the source code and developers who would ﬁx reopened issues also re- quire active development history to avoid forgetting possible bottlenecks in the source code. If the duration between the ﬁrst and last ﬁx of a developer is long, then developer may spend harder time during understanding the main reasons for the issue in an updated software system and this may cause issues to be re-opened. Total # of Edits (dev total edits): To test H3, we extracted total # of edits a developer has done on a method (i.e. function) from development commit logs and associ- ated them with issues. The more edits are done on software methods by a developer, the more likely the developer is well informed about the software system and the more likely he/she is an active developer. # Unique Methods Edited (dev methods edited): To test H3, we have also considered unique number of meth- ods edited for ﬁxing an issue. Total # of edits is not enough to evaluate whether developers of reopened issues have a strong code ownership. If a developer edits majority of methods in software, it may indicate his/her ownership on the code. Having strong ownership may also avoid issue re- openings since it also suggests that the issue owner has an extensive knowledge on the source code as well as potential problems. Rank in terms of # Methods Edited (dev rank methods count): This metric is calculated based on unique methods edited and ranks of developers who owned and ﬁxed an issue are computed. For each issue, its owner’s (devel- oper) rank in terms of number of methods he/she edited so far is computed and added as a new metric. Computation of these ranks is similar to rank in terms of # ﬁxed issues. This metric also completes the general deﬁnition of code owner- ship by adding both the number of methods edited by a developer as well as what percentage of edits are done by this developer among all developers (i.e., developer’s rank). 2.2.2 Issue-Code Relation Shihab et al. [19] considered the fact that re-opened bugs may be harder to ﬁx than others due to the fact that they require many ﬁles or more complex ﬁles to be changed. We have also considered the complexity of reopened issues in terms of software methods changed during their ﬁxes and deﬁne two hypotheses: H4: Issues related with many methods are more likely to be reopened. H5: Issues related with larger (in terms of lines of code) and more complex methods are more likely to be reopened. # Methods Changed (methods changed): This met- ric is calculated by counting the number of methods changed for ﬁxing an issue by mining commit messages from version control systems and matching each commit with an issue. It is then used to test H3. LOC: Number of methods changed during a ﬁx may not be enough to represent the complexity of an issue. For ex- ample, an issue may require changes on 3 methods, but each method may be greater than 100 lines of code (LOC) and hence its ﬁx may be harder than other ﬁxes. Therefore, we have also extracted a size indicator (in terms of lines of code) for methods changed for ﬁxing an issue. If there are more than one method changed during a ﬁx, we aggregated their lines of codes by taking maximum and sum values over all methods. Cyclomatic complexity (CC): As an extension to LOC measure, we have used McCabe’s cyclomatic complexity of a method changed for ﬁxing an issue. If there are more than one method changed, we aggregated their cyclomatic complexity values by taking maximum and sum values over all methods. This metric as well as LOC changed are used to test H4. 2.2.3 Issue Proximity Network Issue proximity network models the relation between is- sues. It measures the distance between issues in terms of 3 the number of common methods changed during their ﬁxes. If an issue is connected with many other issues in terms of the number of common methods changed during a ﬁx, it may increase the probability of this issue being reopened. The reason for this can be explained as follows: Reopened is- sues may have close connections with many other issues and therefore they reside at the center of this proximity network. However, being in the core part also indicates that reopened issues may aﬀect many methods in the source code, which increases the risk of failures afterwards. We deﬁne our hy- pothesis for measuring this dimension as follows: H6: Issues linked with many other issues are more likely to be reopened. Metrics are extracted from issue proximity network, all of which were used in previous studies [18] to measure caller- callee relations between software modules and their eﬀects on defect proneness. In this paper, we used four network metrics to quantify complexity of issues and how complexity (in terms of methods changed) is related to issue reopening. Degree: This metric is computed by counting the number of direct relations (edges) an issue has. Having higher degree means that an issue is connected to many other issues, such as a hub in traﬃc networks. Degree Centrality: In our previous studies, we have extracted both in-degree and out-degree centrality metrics [8, 18]. However, in this paper, the proximity network is undirected with weights of edges are set as the number of methods shared by two issues. Therefore, this metric is cal- culated by “degree” of an issue over all issues (degree/N where Nis number of issues). Betweenness Centrality: This metric is calculated by counting the number of shortest paths that contain the is- sue X over all shortest paths between all issue pairs, i,j. It evaluates the location of an issue, since being in a popular location may be very critical due to the fact that an issue has association with many issues as well as it aﬀects many methods in the source code. Pagerank: This metric measures the relative importance of an issue. It also evaluates the centrality in issues by con- sidering the fact that the eﬀect of being related with a cen- tral issue should be more important than being related with a decentralized issue. 2.2.4 From Issue Reports From issue reports, we have extracted 2 categorical met- rics, namely Symptom and Phase found. Our objective is to observe whether reopened issues have unique symptoms or they are more likely to be reported during a speciﬁc phase. Same Location (same loc): We have also extracted geographical locations of the owner and originator of is- sues based on their email addresses’ domain and deﬁned a boolean metric, Same location to observe the eﬀect of com- munication across diﬀerent locations on issue re-openings. Our hypothesis to test this relation is as follows: H7: Issues whose owner and originator are from diﬀerent locations are more likely to be reopened. In a study done by Herbsleb and Mockus [14], it was found that issues reported (and ﬁxed) in distributed teams have a higher resolution time than issues reported and ﬁxed in the same location. Zimmermann et al. also investigated the eﬀects of location diﬀerences between assigners and as- signees on reopened bugs and found that bugs initially as- signed across teams/ buildings or countries are more likely Figure 2: Correlations of the measures. The shape of the ellipse represents the correlation among two variable. In the correlation visualisation, bolder col- ors indicate higher correlations. If the shape of an ellipse bends towards right, it indicates positive cor- relation, whereas negative correlation if its shape bends towards left. to be reopened [25]. Thus, we have deﬁned Same location and assigned 1 if both originator and owner of an issue were located in the same country, and 0 in the opposite case. Based on the data, issues were reported from 12 distinct ge- ographical locations. Only 20% of issues had their owners and originators being in diﬀerent locations. Fix Days: Reopened issues may have a long life cycle from their opened to closed dates since they were assigned to same states (opened/ assigned) more than once. We have deﬁned a new metric, namely Fix days, to measure the num- ber of days between an issue’s opened and closed date. Hy- pothesis to test the eﬀect of this metric is deﬁned as follows: H8: Issues which take long time (in days) to ﬁx are more likely to be reopened. 2.3 Analysis of The Factors Basic descriptive statistics of the factors can be found in Table 1. Median values of factors for reopened and not- reopened issues are signiﬁcantly diﬀerent (Mann-Whitney U Test P < 0.05) in 10 out of 19 cases. This shows that dis- tributions of reopened issues are shifted towards less/ more activity in each of 10 factors separately. For instance, re- opened issues cause signiﬁcantly more edits on the source code (8th factor in Table 1) compared to other issues. Descriptive statistics of factors related to issue-code rela- tion dimension are particularly interesting. In the extreme cases, an issue ﬁx can change up to 460 methods or ﬁles with up to 92 kLOC. We believe that such a pattern highlights rel- ative complexities of addressed issues or architectural com- plexities of the software. A ﬁx in a complex software will 4 likely involve changes at a lot of interdependent modules. Project issue ﬁx count distribution is similar to the Pareto- Law trends observed in the developer activity distribution in open source projects [15]. Spearman rank correlation coeﬃcients among the factors we considered are visualised in Figure 2. In the correla- tion visualisation, bolder colors indicate higher correlations. If the shape of an ellipse bends towards right, it indicates positive correlation, whereas negative correlation if its shape bends towards left. From the ﬁgure, it can be observed there are relatively higher correlations among the factors within the same dimension, while there are relatively lower corre- lations between the factors from diﬀerent dimensions. The high correlations are especially apparent among the factors from the issue-code relation (LOC, CC) and issue proximity (degree, betwenness, pagerank) dimensions. 2.4 Logistic Regression Models Univariate Logistic Regression Logistic regression is the standard way to model binary out- comes yi= 0,1 [1, 10] and therefore it is suitable for our problem. Logistic regression has been frequently used in classiﬁcation problems such as defect prediction in the soft- ware engineering literature previously [22, 24]. The basic probability model formula of logistic regression can be stated as follows: P r(yi= 1) = logit1(Xiβ) (1) P r(yi= 1) is the probability of outcome yi= 1. Xiis a vector of independent parameters for the instance and beta is the vector of regression coeﬃcients. One advantage of logistic regression in binary classiﬁca- tion when compared to methods like Naive Bayes is that its regression coeﬃcients and other parameters (odds ratio) are easily interpretable and highly explicative. Assuming the lo- gistic regression model is true, one can check the signiﬁcance of various regression coeﬃcients of diﬀerent input variables to understand their explanatory power. In addition rela- tion of odds, (yi= 1/yi= 0) and individual factors can be analysed by the following formula: P r(yi= 1)/P r(yi= 0) = eXiβ(2) From this formula, the eﬀect of various factors on the probability of a certain outcome yican be understood by analysing the eﬀects of their changes. Exhaustive Search on All Factor Combinations In order to test the performance of logistic regression with all possible factors (N), 2Nmodels should be considered. For large data, various approaches are used to reduce the number of considered combinations. However, we were able to test all possible combinations of models (524,288 in total) in a couple of hours, since our dataset was relatively small. In order to ﬁnd the “best” model, a performance criterion should be optimized. In the literature, a likelihood based information criteria such as AIC(Akaike Information crite- rion) or BIC (Bayesian Information criterion) is often used to maximize the likelihood function while penalizing over- ﬁtting [17]. Instead of a maximum likelihood based perfor- mance measure, we use Area Under the receiver operational characteristic Curve (AUC) as the performance measure. We believe that AUC represents the predictive performance of the model more clearly than a likelihood based measure. AUC is commonly used to compare the performance of var- ious classiﬁcation models [16]. Multivariate Logistic Regression As a second model we built a logistic regression model using a set of factors with the best predictive power. We observed the predictive power of this model by drawing its ROC curve and reported the factors included to the model. 3. RESULTS OF LOGISTIC REGRESSION MODELS 3.1 Univariate Statistical Model In order to check the importance of factors individually, we have checked the signiﬁcance of univariate logistic regres- sion models [7]. In Table 2, factors with signiﬁcant corre- lations are presented. Five out of 19 factors were found to have signiﬁcant correlation coeﬃcients. One of these factors are related to issue report (same location), three are related to the developer activity (dev*ﬁrst last ﬁx, dev total edits, dev methods edited) and one is related to issue proximity network (betweenness centrality). We have interpreted our hypotheses by building a univariate regression model to pre- dict reopened issues with 19 factors respectively. Regression coeﬃcients and their signiﬁcance regions reported in Table 2 are used to validate if a factor is signiﬁcant for predict- ing reopened issues. In summary, hypotheses related with developer activity in terms of frequent issue ﬁxes and meth- ods edited (H2,H3), issue proximity network (H6) and ge- ographical locations of issue owner and originator (H7) are validated and these ﬁndings are summarized below in bold and italic. Other hypotheses could not be validated with univariate analysis, but we have also checked their signiﬁ- cance using multivariate analysis of factors in later sections. Developer activity: We have found that reopened is- sues have a signiﬁcantly negative relation with developers who have not ﬁxed an issue for a long time (coeﬃcient: - 2.6947136, p0.05). Furthermore, reopened issues have a positive relation with developers who edit relatively more methods (coeﬃcients: 0.0012903, 0.0019286, p < 0.05). These results validate our second and third hypotheses (H2,H3). But, we could not validate H1, which deﬁnes a relationship between developer activity in terms of previously ﬁxed issues and issue reopening. Reopened issues are ﬁxed by developers who actively ﬁx issues and edit many methods. Issue proximity network: Regarding issue relations in terms of shared methods, we have validated H6with the highest coeﬃcient (35.8880) on betweenness centrality. Reopened issues are linked with other issues in terms of methods edited during issue ﬁxes. Issue report: Issues whose owner and originator located in the same location have a signiﬁcantly negative impact on issue reopening (coeﬃcient: -1.1320, p0.05). This also validates H7, since being in diﬀerent geographical locations may lengthen the communication process and cause issue reopening. However, we could not validate the relationship between ﬁx days and reopened issues (H8). Reopened issues are often reported and ﬁxed by people from diﬀerent geographical locations. 5 Table 1: Descriptive Statistics of The Considered Factors. The ﬁrst values in the cells are for all the issues. Values in parantheses are: αfor not reopened issues, βfor reopened issues.: Factor has signiﬁcantly diﬀerent medians for reopened and not reopened issues Factor Max 75 % Median 25% Minimum Mean Issue Report Symptom - - - - - - Phase Found - - - - - - Same Location - - - - - - Fix Days 3745(3745α 981β) 240(241α 217β) 119(121α 81β) 39(40α30β) 0(0α, 4β) 169(168α, 176β) Developer Activity # Fixed Issues 79(79α58β) 22(21.25α 27.5β) 11 (11α13β) 4(4α4β) 1(1α, 1β) 15.5 (15.38α 18.29β) Rank in # Fixed Is- sues 125(119α 125β) 31(31α 33.25β) 14(14α 14.5β) 5(6α4β) 1(1α, 1β) 21.49(21.31α, 24.45β) Duration Between First and Last Fix in Days 3786(3786α 833β) 478(475.25α 559β) 282.5(279α 373β) 124(124α 130.75β) 0(0α, 0β) 388(391α, 345β) Total # Edits 1347(1347α 1167β) 180(177.25α 241β) 80(80α115β) 24(24α 19.5β) 0(0α, 0β) 149(144α, 234β) # Unique Methods Edited 864(864α 794β) 113(110α 168.75β) 56(55α68β) 20(20α 17.5β) 0(0α, 0β) 99(95α, 161β) Rank in # Methods Edited 129(129α 115β) 34(34α 28.5β) 14(15α 10.5β) 4(4.75α1β) 0(0α, 0β) 21(22α, 20β) Issue-Code Relation Max. CC 2103(2087α 2103β) 275(309.3α 131.75β) 91(92α 75.5β) 37(37α 35.25β) 1(1α, 4β) 303.3(307.2α, 240.9β) Sum CC 2103(2087α 2103β) 275(309.3α 131.75β) 91(92α 75.5β) 37(37α 35.25β) 1(1α, 4β) 303.3(307.2α, 240.9β) Max. LOC 22644(22644α 22567β) 3077(3643α 1609β) 1152(1170α 863.5β) 468(466α 473β) 28(28α, 146β) 3458(3508α, 2680β) Sum LOC 92931(92931α 67509β) 7543(7730α 4588β) 2014(2023α 1586β) 646(645α 721β) 28(28α, 146β) 6076(6103α, 5648β) # Methods Changed 460(460α86β) 9(9α10β) 3(3α5β) 1(1α1.25β) 1(1α, 1β) 11.26(11.25α, 11.42β) Issue Proximity Network Degree 173(115α 173β) 31(31α26β) 10(10α11β) 3(3α4β) 0(0α, 0β) 20.32(20.02α, 25.13β) Betweenness Cen- trality 0.22(0.08α 0.22β) 0.0009(0.0009α 0.002β) 0.0002(0.0002α 0.0006β) 1.28e07(0α 1.23e05 β) 0(0α, 0β) 0.002(0.002α, 0.008β) Pagerank 0.007(0.006α 0.007β) 0.001(0.001α 0.002β) 0.0008(0.0008α 0.0008β) 0.0004(0.0004α 0.0005β) 0(0α, 0β) 0.0001(0.0001α, 0.001β) Degree Centrality 0.18(0.12α 0.18β) 0.03(0.03α 0.03β) 0.01(0.01α 0.01β) 0.003(0.003α 0.004β) 0(0α, 0β) 0.02(0.02α, 0.02β) 6 Table 2: Coeﬃcients for Univariate Regression Dimension Factor Coeﬃcient Standard Deviation Z Value P r(>|z|) Signiﬁcance Issue Report Symptom - - - 2 Phase Found - - - 2 Same Location -1.13200 0.27000 -4.160 3.14e-05 FFF Fix Days 0.00018 0.00061 0.296 0.7700 Developer Activity Dev ﬁx count 0.01100 0.00770 1.460 0.1500 Dev rank ﬁx count 0.00640 0.00560 1.130 0.2600 Dev ﬁrst last ﬁx -2.69000 0.17000 -16.222 <2e-16 FFF Dev total edits 0.00130 0.00043 3.030 0.0024 FF Dev methods edited 0.00190 0.00060 3.200 0.0014 FF Dev rank methods count -0.00250 0.00600 -0.420 0.6700 Issue-Code Relation Max. CC -0.00031 0.00031 -0.995 0.3200 Sum CC -7.08e-05 1.69e-04 -0.420 0.6800 Max. LOC -3.30e-05 2.89e-05 -1.140 0.2500 Sum LOC -4.96e-06 1.42e-05 -0.350 0.7300 Methods changed 0.00022 0.00460 0.047 0.9600 Issue Proximity Network Degree centrality 7.45000 4.63000 1.610 0.1100 Degree 0.00760 0.00469 1.609 0.1080 Betweenness centrality 35.89000 12.01700 2.990 0.0028 FF Pagerank 21.16000 11.80000 2.130 0.2200 Issue-code relations: Univariate analyses do not show a signiﬁcant relation between code metrics and reopened is- sues. Hence, we could not validate H4,H5, but observed predictive power of code metrics in multivariate regression models. 3.2 The Best Factor Combinations We ranked all possible combinations of the factors (219 possible models) to predict reopened issues based on our performance measure, AUC. We counted the occurrence of each factor in the best 100 models. Best 100 models had a AUC diﬀerence of 0.05 between the best performing and the worst performing model. All top performing models consist of 8 to 12 factors. In Table 3, number of occurrences of each factor with #Occur > 20 is presented. Out of 19 factors, 4 occurred in all top performing models, namely, betweenness centrality, maximum cyclomatic complexity, sum of cyclomatic com- plexity and maximum LOC. Furthermore, 3 factors occurred in more than 90% of the top performing models namely ﬁx count, ﬁx count rank and sum of LOC. When compared with the signiﬁcance of factors in uni- variate regression analysis, code based measures performed surprisingly well in the top models. Even though we could not validate hypotheses H4and H5during univariate anal- ysis, cyclomatic complexity and LOC measures of methods related with reopened issues have signiﬁcant beneﬁts to pre- dictive model. On the other hand, betweenness is signiﬁcant both in the best models and in the univariate regression model. 3.3 Multivariate Statistical Model In Figure 3, AUC for the model that performed the best (in predicting reopened issues) during our exhaustive search is presented. In addition to the 0.81 AUC, the top model had 0.88 recall and 0.82 precision. If this was a prediction scenario, we could conclude that the model had a signiﬁ- cant potential. However, some of the factors such as factors related to code issue relation have limited applicability in a prediction scenario because they are only available post- mortem. Table 3: Number of Occurrences of Factors In Top Performing Model Combinations Factor Name Occurrence In 100 Best Model (%) Same Location 100 Betweenness Centrality 100 Maximum Cyclomatic Complexity 100 Sum of Cyclomatic Complexity 100 Maximum LOC 100 # Fixed Issues 94 Sum of LOC 94 Rank in Fixed Issue Count 91 Unique Methods Changed 59 Total # Edits 44 Degree 36 Degree Centrality 24 Duration Between First and Last Fix 24 The factors in Table 3 with the highest number of occur- rences in top 100 models were present in the top model. The model with the highest AUC contains the following factors: Same Location - Betweenness Centrality - Maximum Cyclomatic Complexity - Sum of Cyclomatic Complexity - Maximum LOC - # Fixed Issues - # Sum of LOC - Rank in Fixed Issue Count - Unique Methods Changed - Degree - Duration Between First and Last Fix - Total # Edits 4. DISCUSSION 4.1 Interpretation of Results with QA Team In order to interpret our ﬁndings, we held a meeting with the QA Team in the company and asked free format ques- tions about our analysis. Responses are summarized below. Betweenness centrality: Why do you think that reopened issues are often located at the centre of issue proximity network? Reopened issues are generally the ones that developers post- pone ﬁxing or cancel, since a) they may decide that it is not very critical for the customer and b) ﬁxing it may be risky or complex so that they may want to delay ﬁxing this issue. 7 Figure 3: ROC curve of the multivariate regression that uses 12 factors which performed the best in model testing - AUC : 0.81 (Recall: 0.88 and Preci- sion: 0.82) The fact that reopened issues are at the center of this net- work supports our anecdotal evidences of issues’ complexity. Rank of developers in terms of method changes: Do you think that developers who edit many meth- ods should have a positive or negative impact on reopened issues? Usually, the amount of code changed by a developer is pos- itively correlated with the amount of issues ﬁxed by this de- veloper. These top “code-altering” developers may decide to postpone or cancel some of these issues assigned to them, un- til they reduce their workload, since ﬁxing a new issue may also introduce others due to large amount of changes. 4.2 Threats to Validity In this section, we discuss possible threats to validity of our study. We have used a large scale enterprise product to conduct a case study. Even though drawing general conclu- sions from an empirical study is very diﬃcult, results should be transferable to other researchers with well-designed and controlled experiments. In this study, we propose a set of metrics representing four main dimensions to investigate their eﬀects on reopened issues. Our methodology consists of investigating their individual eﬀects on reopened issues, as well as pairwise metric relations to ﬁnd the best set of metrics predicting reopened issues. Using the same method- ology, results can be replicated and refuted on new datasets. In this case study, we were able to extract 2287 resolved issues of which 1046 were matched with the code base and developers. At a ﬁrst glance, this seems to be a small set, however we traced all issues submitted since 2004 and ﬁl- tered them based on our requirements. While reducing the dataset, we also considered the fact that ratio of reopened issues over other issues should be similar to the ratio of the initial dataset. When we linked issues with developer activ- ities, this ratio was 7%, whereas in the ﬁnal set, this ratio was reduced to 6%, with a minor change. We have also worked closely with QA team in the company to validate data quality. We use logistic regression during univariate and multi- variate analyses, since coeﬃcients in logistic regression are easily interpretable with their signiﬁcance regions. We did not consider using other algorithms and compre their per- formance with regression, since our aim was to understand the explanatory power of metrics on reopened issues, rather than building the best predictive model for reopened issues. But, we did considered all factor combinations to select the best metric set during our experiments. 5. RELATED WORK Issue management systems of large open source software projects have become available on the internet since early 1990s. Issue management systems of some applications with commercial licences are publicly accessible since the owners of these applications would like to make their issue handling process transparent to their users. Research on issue reposi- tory data has started in parallel with the public availability of past software issue management data. There are two recent papers closely related to our research. Shihab et al. [19] analysed work habits, bug report, bug ﬁx dimensions for Eclipse Project to ﬁnd the factors that con- tributed to bug reopening and built a reopened bug predic- tion model using decision trees. Shihab et al. found that the comment text, description text, time to resolve the bug and the component the bug was found were the leading factors that caused bug reopening for Eclipse [19]. On the other hand, Zimmermann et al. [25] analysed Windows Vista and 7 issue repository and conducted a survey on 394 developers in order the important factors that causes bug reopens [25]. Zimmermann et al. built a logistic regression model in or- der to identify the factors that may cause issue reopening. In their research Zimmermann et al. used organizational and process related factors in addition to factors directly extractable from the issue report. In their logistic regres- sion model nearly all the factors they observed were found to be signiﬁcant which included factors related to location, work habits and bug report characteristics. Our work is diﬀerent than the research by Shihab et al. and Zimmermann et al. in two aspects: 1) The factors and the dataset we analysed are diﬀerent, 2) We analysed the eﬀect of the combinations of various factors in addition to individual factors on issue reopening. Other notable recent areas of research include automated issue triage, factors that change the quality of issue reports, detection of duplicate issues and estimation of issue ﬁx du- rations [3, 6, 21]. One important related research topic about issues is au- tomated bug triage, the procedure of processing an issue report and assigning the right issue to the right developer. This problem is especially important for large software with millions of users. Anvik et al. found that 300 daily reported issues make it impossible for developers to triage issues eﬀec- tively for Mozilla based on an interview with an anonymous developer [3, 12, 21]. Text mining methods have been used in several studies to ﬁnd the most relevant developer to handle a bug in auto- 8 mated bug triage models [2–4, 9, 20]. Bakir et al. proposed a model that forwarded auto-generated software fault data directly to the relevant developers by mining the patterns in the faults [5]. The beneﬁt of automated bug triage is often measured by % of actual owners estimated by the model and the decrease in issue reassignments or bug tosses. While bug triage studies claim that bug tossing is time consuming, [20], Guo et al. observed that issue reassignment is beneﬁcial for communication [13]. Eﬀective issue reporting is also important for reporters as for developers. In an exploratory study on Windows Vista, Guo et al. [12] identiﬁed the characteristics of bugs that are getting ﬁxed. Bettenburg et al. [6] also analysed the components of a bug report (severity, stack traces, builds, screenshots) that make a bug more likely to be resolved. Estimation of issue resolve times is another research area to plan developers’ eﬀorts eﬃciently. Some studies on open source projects can be found in ( [11], [21]). 6. CONCLUSIONS In this paper, we analysed the eﬀects of 19 factors from four diﬀerent dimensions on the probability of issue reopen- ing for a large-scale software developed in geographically distributed locations. In our study, we have found that a subset of these factors are important for issue reopening. The predictive power of best factor combinations is high with AUC 0.81 in the best performing models. RQ: Which factors lead to issues getting reopened? In order to ﬁnd the factors that were most important for issue reopening, we built a univariate and best-subset lo- gistic regression models. In the univariate logistic regres- sion model and the best-subset logistic regression model we checked the importance of the factors we considered. Dimensions of developer activity (in terms of the time between ﬁrst and last issue ﬁxes and the number of meth- ods edited during issue ﬁxes), issue proximity network (in terms of common methods changed during issue ﬁxes) and geographical locations of issue owners and originators) are found to be important for issue reopening. In the top rank- ing logistic regression models based on their predictive power, factors from all dimensions were prominent. In previous research on this topic, nearly all of the con- sidered factors were found to be signiﬁcant [19], [25]. On the contrary, in our analysis we found that a subset of our considered factors are signiﬁcantly more important in issue reopening. The best logistic regression model in terms of predictive power contains 12 factors out of 19 (Section 3.3 for the full list). Implications of the Results To The Industry Issue reopening can lead to unanticipated resource alloca- tion, leading to projects running over budget and late. There- fore, it is important to proactively identify issue that can be reopened and take corrective actions. Quantitative ﬁndings of our study suggest that issue complexity and developers workload play an important role in triggering issue reopen- ing. This information can aid managers in deriving concrete corrective actions (e.g., ensuring existence of deep code re- view and reducing developer’s workload. As indicated in our results, issue reopening may have many reasons and may not be modelled by a small set of Table 4: A list of phases in which issues were found and reported. Phase Issues reported during this phase (%) Customer 25.5 Functional testing 19.9 Regression testing 8.5 Coding 7.4 System testing 6.8 Nightly build 5.0 Performance testing 3.9 Design 3.6 Unit testing 2.7 Beta testing 1.3 Others 15.5 factors. In addition, some of the causes of the issue reopen- ing such as design problems may be outside the scope of the issue management process. Identifying, the important fac- tors that may lead to issue reopening may be the ﬁrst step to lead companies to understand these underlying causes and take necessary actions. Future Work Every model is a simpliﬁcation of reality and has its limi- tations. We attempted to model the 3 aspects of software development (people, process, product) when choosing the factors. New factors can be proposed in the future studies related to these aspects. As usual, in this case study, one possible future work would be testing our conclusions in new datasets. Another area to consider would be analysing the causality relations between the considered factors and the probability of issue reopening. APPENDIX Table 4 presents majority of phases in which issues were found and reported in the company and percentage of their occurrence. We have listed 10 phases which account for 85% of all issue reports, and deﬁned other phases as “Others” due to privacy issues. Acknowledgment This research is supported in part by Turkish State Planning Organization (DPT) under the project number 2007K120610 and partially supported by TEKES under Cloud-SW project in Finland. We would like to thank IBM Canada Lab – Toronto site for making their development data available for research and strategic help during all phases of this re- search. The opinions expressed in this paper are those of the authors and not necessarily of IBM Corporation. 7. REFERENCES [1] E. Alpaydin. Introduction to Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2004. [2] J. Anvik, L. Hiew, and G. C. Murphy. Who should ﬁx this bug? In Proceedings of the International Conference on Software Engineering, pages 361–370, Shanghai, China, 2006. [3] J. Anvik and G. Murphy. Determining implementation expertise from bug reports. In Mining Software Repositories, 2007. ICSE Workshops MSR’07. Fourth International Workshop on, pages 1–8. IEEE, 2007. 9 [4] J. Anvik and G. C. Murphy. Reducing the eﬀort of bug report triage. ACM Transactions on Software Engineering and Methodology, 20(3):1–35, Aug. 2011. [5] A. Bakir, E. Kocaguneli, A. Tosun, A. Bener, and B. Turhan. Xiruxe: An Intelligent Fault Tracking Tool. AIPR09, Orlando, 2009. [6] N. Bettenburg and A. Hassan. Studying the Impact of Social Structures on Software Quality. In 2010 IEEE 18th International Conference on Program Comprehension, pages 124–133. IEEE, 2010. [7] L. Briand, W. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. Software Engineering, IEEE Transactions on, 28(7):706–720, 2002. [8] B. Caglayan, A. Tosun, A. Miranskyy, A. Bener, and N. Ruﬀolo. Usage of multiple prediction models based on defect categories. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pages 1–9. ACM, 2010. [9] D. Cubranic and G. Murphy. Automatic bug triage using text categorization. In Proceedings of the Sixteenth International Conference on Software Engineering Knowledge Engineering, pages 1–6. Citeseer, 2004. [10] A. Gelman and J. Hill. Data Analysis Using Regression And Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press, 2007. [11] E. Giger, M. Pinzger, and H. Gall. Predicting the Fix Time of Bugs. In RSSE ’10 Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pages 52–56, 2010. [12] P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Characterizing and predicting which bugs get ﬁxed: An empirical study of Microsoft Windows. In Software Engineering, 2010 ACM/IEEE 32nd International Conference on, volume 1, pages 495–504. IEEE, 2010. [13] P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Not my bug! and other reasons for software bug report reassignments. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, pages 395–404. ACM, 2011. [14] J. Herbsleb and A. Mockus. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on Software Engineering, 29(6):481–494, June 2003. [15] S. Koch. Eﬀort modeling and programmer participation in open source software projects. Information Economics and Policy, 20(4):345–355, Dec. 2008. [16] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classiﬁcation models for software defect prediction: A proposed framework and novel ﬁndings. Software Engineering, IEEE Transactions on, 34(4):485–496, 2008. [17] C. d. Mazancourt and V. Calcagno. glmulti: An r package for easy automated model selection with (generalized) linear models. Journal of Statistical Software, 34(i12), 2010. [18] A. T. Misirli, B. Caglayan, A. V. Miranskyy, A. Bener, and N. Ruﬀolo. Diﬀerent strokes for diﬀerent folks: a case study on software metrics for diﬀerent defect categories. In Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics, WETSoM ’11, pages 45–51, New York, NY, USA, 2011. ACM. [19] E. Shihab, A. Ihara, Y. Kamei, W. M. Ibrahim, M. Ohira, B. Adams, A. E. Hassan, and K.-i. Matsumoto. Predicting Re-opened Bugs: A Case Study on the Eclipse Project. 2010 17th Working Conference on Reverse Engineering, pages 249–258, Oct. 2010. [20] A. Tamrawi, T. Nguyen, and J. Al-Kofahi. Fuzzy set-based automatic bug triaging: NIER track. Proceedings of the 33rd International Conference on Software Engineering, pages 884–887, 2011. [21] C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to ﬁx this Bug? In Fourth International Workshop on Mining Software Repositories, 2007. ICSE Workshops MSR’07, number 2, 2007. [22] E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Using developer information as a factor for fault prediction. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering. IEEE Computer Society, May 2007. [23] S. Zaman, B. Adams, and A. E. Hassan. Security Versus Performance Bugs : A Case Study on Firefox. Design, pages 93–102, 2011. [24] T. Zimmermann and N. Nagappan. Predicting defects with program dependencies. 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 435–438, Oct. 2009. [25] T. Zimmermann, N. Nagappan, P. Guo, and B. Murphy. Characterizing and predicting which bugs get reopened. In Proceedings of the 34th International Conference on Software Engineering [ACCEPTED], 2012. 10 ... It is the main repository of the quality assurance operations in a software project. We have used the issue management software to map the defects with the underlying source code modules and to analyze the issue handling process of or- ganizations [60], [61]. Defect data may be used to label the defect prone modules for defect prediction or to assess software quality in terms of defect density or defect count. ... ... For example, a graphical representation of commits extracted from a version control system would reveal the distribution of workload among developers, i.e., what percentage of developers actively develop on a daily basis [53], or it would highlight which components of a software system are frequently changed. On the other hand, a statistical test between metrics characterizing issues that are previously fixed and stored in an issue repository may identify the reasons for re-opened bugs [60] or reveal the issue workload among software developers [61] . Depending on the questions that are investigated , we can collect various types of metrics from software repositories; but it is inappropriate to use any statistical technique or visualization approach without considering the data characteristics (e.g. ... ... We concluded that testers may report more bugs than the amount that developers fix before each release, and hence, as more bugs are reported, the number of production defects increase. In another case study with a large scale software development organization, we used Spearman correlation to analyse the relationship between metrics that characterize reopened issues (issues closed and opened again during an issue life cycle) [60]. We found strong statistical relationship between the lines of code changed to fix a reopened issue and the dependencies of a reopened issue to other issues, i.e., the higher the proximity of an issue to the others, the more lines of code is affected during its fix. ... Chapter In this chapter, we share our experience and views on software data analytics in practice with a retrospect to our previous work. Over ten years of joint research projects with the industry, we have encountered similar data analytics patterns in diverse organizations and in different problem cases. We discuss these patterns following a 'software analytics' framework: problem identification, data collection, descriptive statistics and decision making. We motivate the discussion by building our arguments and concepts around our experiences of the research process in six different industry research projects in four different organizations. ... Zimmermann et al. [32] investigate the reasons for bug reopening and find that bugs identified by code analysis tools or code review processes are less likely to be re-opened. Caglayan et al. [6] report that developers' activities are important factors that cause bugs to be re-opened. ... Conference Paper Full-text available Background: Bug fixing is one major activity in software maintenance to solve unexpected errors or crashes of software systems. However, a bug fix can also be incomplete and even introduce new bugs. In such cases, extra effort is needed to rework the bug fix. The reworking requires to inspect the problem again, and perform the code change and verification when necessary. Discussions throughout the bug fixing process are important to clarify the reported problem and reach a solution. Aims: In this paper, we explore how discussions during the initial bug fix period (i.e., before the bug reworking occurs) associate with future bug reworking. We focus on two types of "reworked bug fixes": 1) the initial bug fix made in a re-opened bug report; and 2) the initially submitted patch if multiple patches are submitted for a single bug report. Method: We perform a case study using five open source projects (i.e., Linux, Firefox, PDE, Ant and HTTP). The discussions are studied from six perspectives (i.e., duration, number of comments, dispersion, frequency, number of developers and experience of developers). Furthermore, we extract topics of discussions using Latent Dirichlet Allocation (LDA). Results: We find that the occurrence of bug reworking is associated with various perspectives of discussions. Moreover, discussions on some topics (e.g., code inspection and code testing) can decrease the frequency of bug reworking. Conclusions: The discussions during the initial bug fix period may serve as an early indicator of what bug fixes are more likely to be reworked. ... Zimmermann et al. [32] investigate the reasons for bug reopening and find that bugs identified by code analysis tools or code review processes are less likely to be re-opened. Caglayan et al. [6] report that developers' activities are important factors that cause bugs to be re-opened. ... Conference Paper Background: Bug fixing is one major activity in software maintenance to solve unexpected errors or crashes of software systems. However, a bug fix can also be incomplete and even introduce new bugs. In such cases, extra effort is needed to rework the bug fix. The reworking requires to inspect the problem again, and perform the code change and verification when necessary. Discussions throughout the bug fixing process are important to clarify the reported problem and reach a solution. Aims: In this paper, we explore how discussions during the initial bug fix period (i.e., before the bug reworking occurs) associate with future bug reworking. We focus on two types of reworked bug fixes: 1) the initial bug fix made in a re-opened bug report; and 2) the initially submitted patch if multiple patches are submitted for a single bug report. Method: We perform a case study using five open source projects (i.e., Linux, Firefox, PDE, Ant and HTTP). The discussions are studied from six perspectives (i.e., duration, number of comments, dispersion, frequency, number of developers and experience of developers). Furthermore, we extract topics of discussions using Latent Dirichlet Allocation (LDA). Results: We find that the occurrence of bug reworking is associated with various perspectives of discussions. Moreover, discussions on some topics (e.g., code inspection and code testing) can decrease the frequency of bug reworking. Conclusions: The discussions during the initial bug fix period may serve as an early indicator of what bug fixes are more likely to be reworked. ... Affects extracted from comments in addition to other important metrics, specifically can be used to investigating code review quality. @BULLET analyze social and technical debt in software develop- ment [18, 20] or bug life cycle [3]. @BULLET study the impact of affects regarding scheduling of developers . ... Conference Paper Full-text available Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and (recently) investigating developer affectiveness. For the latter, issue tracking systems can be mined to explore developers emotions, sentiments and politeness---affects for short. However, research on affect detection in software artefacts is still in its early stage due to the lack of manually validated data and tools. In this paper, we contribute to the research of affects on software artefacts by providing a labeling of emotions present on issue comments. We manually labeled 2,000 issue comments and 4,000 sentences written by developers with emotions such as love, joy, surprise, anger, sadness and fear. Labeled comments and sentences are linked to software artefacts reported in our previously published dataset (containing more than 1K projects, more than 700K issue reports and more than 2 million issue comments). The enriched dataset presented in this paper allows the investigation of the role of affects in software development. Article Full-text available Reopened bugs can degrade the overall quality of a software system since they require unnecessary rework by developers. Moreover, reopened bugs also lead to a loss of trust in the end-users regarding the quality of the software. Thus, predicting bugs that might be reopened could be extremely helpful for software developers to avoid rework. Prior studies on reopened bug prediction focus only on three open source projects (i.e., Apache, Eclipse, and OpenOffice) to generate insights. We observe that one out of the three projects (i.e., Apache) has a data leak issue – the bug status of reopened was included as training data to predict reopened bugs. In addition, prior studies used an outdated prediction model pipeline (i.e., with old techniques for constructing a prediction model) to predict reopened bugs. Therefore, we revisit the reopened bugs study on a large scale dataset consisting of 47 projects tracked by JIRA using the modern techniques such as SMOTE, permutation importance together with 7 different machine learning models. We study the reopened bugs using a mixed methods approach (i.e., both quantitative and qualitative study). We find that: 1) After using an updated reopened bug prediction model pipeline, only 34% projects give an acceptable performance with AUC ≥\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\geqslant \$\end{document} 0.7. 2) There are four major reasons for a bug getting reopened, that is, technical (i.e., patch/integration issues), documentation, human (i.e., due to incorrect bug assessment), and reasons not shown in the bug reports. 3) In projects with an acceptable AUC, 94% of the reopened bugs are due to patch issues (i.e., the usage of an incorrect patch) identified before bug reopening. Our study revisits reopened bugs and provides new insights into developer’s bug reopening activities.
Conference Paper
Full-text available
Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers "affectiveness". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.
Article
Change Requests (CRs) are key elements to software maintenance and evolution. Finding the appropriate developer to a CR is crucial for obtaining the lowest, economically feasible, fixing time. Nevertheless, assigning CRs is a labor-intensive and time consuming task. In this paper, we report on a questionnaire-based survey with practitioners to understand the characteristics of CR assignment, and on a semi-automated approach for CR assignment which combines rule-based and machine learning techniques. In accordance with the results of the survey, the proposed approach emphasizes the use of contextual information, essential to effective assignments, and puts the development team in control of the assignment rules, toward making its adoption easier. The assignment rules can be either extracted from the assignment history or created from scratch. An empirical validation was performed through an offline experiment with CRs from a large software project. The results pointed out that the approach is up to 46,5% more accurate than other approaches which relying solely on machine learning techniques. This indicates that a rule-based approach is a viable and simple method to leverage CR assignments.
Article
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science. The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions. Presents best practices, hints, and tips to analyze data and apply tools in data science projects Presents research methods and case studies that have emerged over the past few years to further understanding of software data Shares stories from the trenches of successful data science initiatives in industry.
Article
Full-text available
In this chapter, we share our experience and views on software data analytics in practice with a review of our previous work. In more than 10 years of joint research projects with industry, we have encountered similar data analytics patterns in diverse organizations and in different problem cases. We discuss these patterns following a "software analytics" framework: problem identification, data collection, descriptive statistics, and decision making. In the discussion, our arguments and concepts are built around our experiences of the research process in six different industry research projects in four different organizations.Methods: Spearman rank correlation, Pearson correlation, Kolmogorov-Smirnov test, chi-square goodness-of-fit test, t test, Mann-Whitney U test, Kruskal-Wallis analysis of variance, k-nearest neighbor, linear regression, logistic regression, naïve Bayes, neural networks, decision trees, ensembles, nearest-neighbor sampling, feature selection, normalization.
Article
Full-text available
Two important questions concerning the coordination of de-velopment effort are which bugs to fix first and how long it takes to fix them. In this paper we investigate empirically the relationships between bug report attributes and the time to fix. The objective is to compute prediction models that can be used to recommend whether a new bug should and will be fixed fast or will take more time for resolution. We examine in detail if attributes of a bug report can be used to build such a recommender system. We use decision tree analysis to compute and 10-fold cross validation to test pre-diction models. We explore prediction models in a series of empirical studies with bug report data of six systems of the three open source projects Eclipse, Mozilla, and Gnome. Re-sults show that our models perform significantly better than random classification. For example, fast fixed Eclipse Plat-form bugs were classified correctly with a precision of 0.654 and a recall of 0.692. We also show that the inclusion of post-submission bug report data of up to one month can further improve prediction models.
Conference Paper
Full-text available
Assigning a bug to the right developer is a key in reducing the cost, time, and efforts for developers in a bug fixing process. This assignment process is often referred to as bug triaging. In this paper, we propose Bugzie, a novel approach for automatic bug triaging based on fuzzy set-based modeling of bug-fixing expertise of developers. Bugzie considers a system to have multiple technical aspects, each is associated with technical terms. Then, it uses a fuzzy set to represent the developers who are capable/competent of fixing the bugs relevant to each term. The membership function of a developer in a fuzzy set is calculated via the terms extracted from the bug reports that (s)he has fixed, and the function is updated as new fixed reports are available. For a new bug report, its terms are extracted and corresponding fuzzy sets are union'ed. Potential fixers will be recommended based on their membership scores in the union'ed fuzzy set. Our preliminary results show that Bugzie achieves higher accuracy and efficiency than other state-of-the-art approaches.
Conference Paper
Full-text available
Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on the Eclipse project. We structure our study along 4 dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed on), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). Our case study on the Eclipse Platform 3.0 project shows that the comment and description text, the time it took to fix the bug, and the component the bug was found in are the most important factors in determining whether a bug will be re-opened. Based on these dimensions we create decision trees that predict whether a bug will be re-opened after its closure. Using a combination of our dimensions, we can build explainable prediction models that can achieve 62.9% precision and 84.5% recall when predicting whether a bug will be re-opened.
Code
R package for Data Analysis using multilevel/hierarchical model
Article
A key collaborative hub for many software development projects is the bug report repository. Although its use can improve the software development process in a number of ways, reports added to the repository need to be triaged. A triager determines if a report is meaningful. Meaningful reports are then organized for integration into the project's development process. To assist triagers with their work, this article presents a machine learning approach to create recommenders that assist with a variety of decisions aimed at streamlining the development process. The recommenders created with this approach are accurate; for instance, recommenders for which developer to assign a report that we have created using this approach have a precision between 70&percnt; and 98&percnt; over five open source projects. As the configuration of a recommender for a particular project can require substantial effort and be time consuming, we also present an approach to assist the configuration of such recommenders that significantly lowers the cost of putting a recommender in place for a project. We show that recommenders for which developer should fix a bug can be quickly configured with this approach and that the configured recommenders are within 15&percnt; precision of hand-tuned developer recommenders.
Conference Paper
Fault localization in telecommunication sector is a major challenge. Most companies manually try to trace faults back to their origin. Such a process is expensive, time consuming and ineffective. Therefore in this study we automated manual fault localization process by designing and implementing an intelligent software tool (Xiruxe) for a local telecommunications company. Xiruxe has a learning-based engine which uses powerful AI algorithms, such as Naïve Bayes, Decision Tree and Multi Layer Perceptrons, to match keywords and patterns in the fault messages. The initial deployment results show that this intelligent engine can achieve a misclassification rate as low as 1.28%.
Article
Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understanding the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to describe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.
Article
Defect prediction has been evolved with variety of metric sets, and defect types. Researchers found code, churn, and network metrics as significant indicators of defects. However, all metric sets may not be informative for all defect categories such that only one metric type may represent majority of a defect category. Our previous study showed that defect category sensitive prediction models are more successful than general models, since each category has different characteristics in terms of metrics. We extend our previous work, and propose specialized prediction models using churn, code, and network metrics with respect to three defect categories. Results show that churn metrics are the best for predicting all defects. The strength of correlation for code and network metrics varies with defect category: Network metrics have higher correlations than code metrics for defects reported during functional testing and in the field, and vice versa for defects reported during system testing.