Article

Predicting Bugs' Components via Mining Bug Reports

Computing Research Repository - CORR 10/2010; 7(5). DOI: 10.4304/jsw.7.5.1149-1154
Source: arXiv

ABSTRACT The number of bug reports in complex software increases dramatically. Now
bugs are triaged manually, bug triage or assignment is a labor-intensive and
time-consuming task. Without knowledge about the structure of the software,
testers often specify the component of a new bug wrongly. Meanwhile, it is
difficult for triagers to determine the component of the bug only by its
description. We dig out the components of 28,829 bugs in Eclipse bug project
have been specified wrongly and modified at least once. It results in these
bugs have to be reassigned and delays the process of bug fixing. The average
time of fixing wrongly-specified bugs is longer than that of
correctly-specified ones. In order to solve the problem automatically, we use
historical fixed bug reports as training corpus and build classifiers based on
support vector machines and Na\"ive Bayes to predict the component of a new
bug. The best prediction accuracy reaches up to 81.21% on our validation corpus
of Eclipse project. Averagely our predictive model can save about 54.3 days for
triagers and developers to repair a bug. Keywords: bug reports; bug triage;
text classification; predictive model

Full-text

Available from: Wenjun Wu, Jan 22, 2014
0 Followers
 · 
112 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large open source bug tracking systems receives large number of bug reports daily. Managing these huge numbers of incoming bug reports is a challenging task. Dealing with these reports manually consumes time and resources which leads to delaying the resolution of important bugs which are crucial and need to be identified and resolved earlier. Bug triaging is an important process in software maintenance. Some bugs are important and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. Most automatic bug assignment approaches do not take the priority of bug reports in their consideration. Assigning bug reports based on their priority may play an important role in enhancing the bug triaging process. In this paper, we present an approach to predict the priority of a reported bug using different machine learning algorithms namely Naive Bayes, Decision Trees, and Random Forest. We also investigate the effect of using two feature sets on the classification accuracy. We conduct experimental evaluation using open-source projects namely Eclipse and Fire fox. The experimental evaluation shows that the proposed approach is feasible in predicting the priority of bug reports. It also shows that feature-set-2 outperformsfeature-set-1. Moreover, both Random Forests and Decision Trees outperform Naive Bayes.
    Proceedings of the 2013 12th International Conference on Machine Learning and Applications - Volume 02; 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Software maintenance starts as soon as the first artifacts are delivered and is essential for the success of the software. However, keeping maintenance activities and their related artifacts on track comes at a high cost. In this respect, change request (CR) repositories are fundamental in software maintenance. They facilitate the management of CRs and are also the central point to coordinate activities and communication among stakeholders. However, the benefits of CR repositories do not come without issues, and commonly occurring ones should be dealt with, such as the following: duplicate CRs, the large number of CRs to assign, or poorly described CRs. Such issues have led researchers to an increased interest in investigating CR repositories, by considering different aspects of software development and CR management. In this paper, we performed a systematic mapping study to characterize this research field. We analyzed 142 studies, which we classified in two ways. First, we classified the studies into different topics and grouped them into two dimensions: challenges and opportunities. Second, the challenge topics were classified in accordance with an existing taxonomy for information retrieval models. In addition, we investigated tools and services for CR management, to understand whether and how they addressed the topics identified. Copyright © 2013 John Wiley & Sons, Ltd.
    Journal of Software: Evolution and Process 12/2013; DOI:10.1002/smr.1639 · 1.27 Impact Factor