Conference Paper

Automated Detection of Typed Links in Issue Trackers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For example, Nicholson and Guo [33] encoded both the textual data (title and description, using TF-IDF or fastText embeddings trained by Wikipedia, StackOverflow, or project-specific documents) and the metadata (issue type, reporter identifier and assignee identifier using one-hot encoding). Lüders et al. [20], on the other hand, chose to encode issues only using their title and description (through general BERT model 15 ) considering those are universal features across different issue trackers. Then, a pair of artifacts can be represented as a combination of the features of individual artifacts, with optionally other features on the relations of the two involved artifacts (such as differences in creation time between the two issues). ...
... First of all, the quality of the dataset needs to be carefully scrutinized before being used for training or testing the machine learning models. Previous works have pointed out that the use of link labels in Jira is inconsistent [33,29,20]. For example, multiple terms with slight variations are used as link labels in Jira, such as Depend, Dependency, Dependent, Depends. ...
... Methods frequently used to handle class imbalance include class weights and SMOTE [5]. Unfortunately, no decisive improvement can be observed in previous studies when applying those methods [33,20]. A closely related consideration is what the appropriate metrics are when comparing different methods or methods with different options to configure. ...
Preprint
Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
... Our work aims at paving the way for link (type) recommenders, that support stakeholders in issue linking and issue management in general. This paper extends our work published at RE22 [24] by adding an in-depth analysis of linking practices, repository structures, and inconsistencies in issue linking to further understand the intricacies of linking in practice. We extend our machine learning experiments by evaluating four prediction strategies. ...
... We also present the mean, median, and standard deviation per link type as well as the macro and weighted F1-score per repository. The values partly differ from those reported in our previous work [24], because we reran the training with a different random initialization for the extended evaluation reported in this paper. Overall the model macro F1-score achieves 0.63 on a multi-class problem with a median of 7 classes per repository. ...
... They went beyond the F1-score and also examined the return on investment (ROI) of both models. In our previous experiment, traditional models only achieved an average macro F1-score of 0.27 [24] on the dataset compared with the 0.63 of the BERT model. However, for the prediction strategy Only Linked, it might be worth to compare ROI with traditional machine learning models. ...
Article
Full-text available
Stakeholders in software projects use issue trackers like JIRA or Bugzilla to capture and manage issues, including requirements, feature requests, and bugs. To ease issue navigation and structure project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 16 JIRA repositories, we study the commonalities and differences in linking practices and link types across the repositories. We then investigate how state-of-the-art machine learning models can predict common link types. We observed significant differences across the repositories and link types, depending on how they are used and by whom. Additionally, we observed several inconsistencies, e.g., in how Duplicate links are used. We found that a transformer model trained on titles and descriptions of linked issues significantly outperforms other optimized models, achieving an encouraging average macro F1-score of 0.64 for predicting nine popular link types across all repositories (weighted F1-score of 0.73). For the specific Subtask- and Epic- links, the model achieves top F1-scores of 0.89 and 0.97, respectively. If we restrict the task to predict the mere existence of links, the average macro F1-score goes up to 0.95. In general, the shorter issue text, possibly indicating precise issues, seems to improve the prediction accuracy with a strong negative correlation of - 0.73. We found that Relate-links often get confused with the other links, which suggests that they are likely used as default links in unclear cases. Our findings particularly on the quality and heterogeinity of issue link data have implications for researching and applying issue link prediction in practice.
... Beside, pre-trained models like BERT [21] also shed some light on overcoming the lack of issue and commit training data. Lüders et al. [52] adopt BERT to predict the types of links (e.g., duplicate and clone) in issue trackers. T-BERT [18] is first pre-trained on CodeSearchNet [40]. ...
Preprint
Issue-commit links, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to create tags when making commits. Existing studies have deployed deep learning techniques, including pretrained models, to improve automatic issue-commit link recovery.Despite their promising performance, we argue that previous approaches have four main problems, hindering them from recovering links in large software projects. To overcome these problems, we propose an efficient and accurate pre-trained framework called EALink for issue-commit link recovery. EALink requires much fewer model parameters than existing pre-trained methods, bringing efficient training and recovery. Moreover, we design various techniques to improve the recovery accuracy of EALink. We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EALink. Results show that EALink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its training and inference overhead is orders of magnitude lower than existing methods.
Chapter
Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the dataset collection, metrics selection and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
Article
Full-text available
As Industry 4.0 transforms the food industry, the role of software in achieving compliance with food-safety regulations is becoming increasingly critical. Food-safety regulations, like those in many legal domains, have largely been articulated in a technology-independent manner to ensure their longevity and broad applicability. However, this approach leaves a gap between the regulations and the modern systems and software increasingly used to implement them. In this article, we pursue two main goals. First, we conduct a Grounded Theory study of food-safety regulations and develop a conceptual characterization of food-safety concepts that closely relate to systems and software requirements. Second, we examine the effectiveness of two families of large language models (LLMs) – BERT and GPT – in automatically classifying legal provisions based on requirements-related food-safety concepts. Our results show that: (a) when fine-tuned, the accuracy differences between the best-performing models in the BERT and GPT families are relatively small. Nevertheless, the most powerful model in our experiments, GPT-4o, still achieves the highest accuracy, with an average Precision of 89% and an average Recall of 87%; (b) few-shot learning with GPT-4o increases Recall to 97% but decreases Precision to 65%, suggesting a trade-off between fine-tuning and few-shot learning; (c) despite our training examples being drawn exclusively from Canadian regulations, LLM-based classification performs consistently well on test provisions from the US, indicating a degree of generalizability across regulatory jurisdictions; and (d) for our classification task, LLMs significantly outperform simpler baselines constructed using long short-term memory (LSTM) networks and automatic keyword extraction.
Preprint
As Industry 4.0 transforms the food industry, the role of software in achieving compliance with food-safety regulations is becoming increasingly critical. Food-safety regulations, like those in many legal domains, have largely been articulated in a technology-independent manner to ensure their longevity and broad applicability. However, this approach leaves a gap between the regulations and the modern systems and software increasingly used to implement them. In this article, we pursue two main goals. First, we conduct a Grounded Theory study of food-safety regulations and develop a conceptual characterization of food-safety concepts that closely relate to systems and software requirements. Second, we examine the effectiveness of two families of large language models (LLMs) -- BERT and GPT -- in automatically classifying legal provisions based on requirements-related food-safety concepts. Our results show that: (a) when fine-tuned, the accuracy differences between the best-performing models in the BERT and GPT families are relatively small. Nevertheless, the most powerful model in our experiments, GPT-4o, still achieves the highest accuracy, with an average Precision of 89% and an average Recall of 87%; (b) few-shot learning with GPT-4o increases Recall to 97% but decreases Precision to 65%, suggesting a trade-off between fine-tuning and few-shot learning; (c) despite our training examples being drawn exclusively from Canadian regulations, LLM-based classification performs consistently well on test provisions from the US, indicating a degree of generalizability across regulatory jurisdictions; and (d) for our classification task, LLMs significantly outperform simpler baselines constructed using long short-term memory (LSTM) networks and automatic keyword extraction.
Conference Paper
Full-text available
Ubiquitous digitalization has led to the continuous generation of large amounts of digital data, both in organizations and in society at large. In the requirements engineering community, there has been a growing interest in considering digital data as new sources for requirements elicitation, in addition to stake-holders. The volume, dynamics, and variety of data makes iterative requirements elicitation increasingly continuous, but also unstructured and complex, which current agile methods are unable to consider and manage in a systematic and efficient manner. There is also the need to support software evolution by enabling a synergy of stakeholder-driven requirements elicitation and management with data-driven approaches. In this study, we propose extension of agile requirements elicitation by applying situational method engineering. The research is grounded on two studies in the business domains of video games and online banking.
Conference Paper
Full-text available
Nowadays, development teams often rely on tools such as Jira or Bugzilla to manage backlogs of issues to be solved to develop or maintain software. Although they relate to many different concerns (e.g., bug fixing, new feature development, architecture refactoring), few means are proposed to identify and classify these different kinds of issues, except for non mandatory labels that can be manually associated to them. This may lead to a lack of issue classification or to issue misclassification that may impact automatic issue management (planning, assignment) or issue-derived metrics. Automatic issue classification thus is a relevant topic for assisting backlog management. This paper proposes a binary classification solution for discriminating bug from non bug issues. This solution combines natural language processing (TF-IDF) and classification (multi-layer perceptron) techniques, selected after comparing commonly used solutions to classify issues. Moreover, hyper-parameters of the neural network are optimized using a genetic algorithm. The obtained results, as compared to existing works on a commonly used benchmark, show significant improvements on the F1 measure for all datasets.
Article
Full-text available
In large-scale software development environments, defect reports are maintained through bug tracking systems (BTS) and analyzed by domain experts. Different users may create bug reports in a non-standard manner and may report a particular problem using a particular set of words due to stylistic choices and writing patterns. Therefore, the same defect can be reported with very different descriptions, generating non-trivial duplicates. To avoid redundant work for the development team, an expert needs to look at all new reports while trying to label possible duplicates. However, this approach is neither trivial nor scalable and directly impacts bug fix correction time. Recent efforts to find duplicate bug reports tend to focus on deep neural approaches that consider hybrid representations of bug reports, using both structured and unstructured information. Unfortunately, these approaches ignore that a single bug can have multiple previously identified duplicates and, therefore, multiple textual descriptions, titles, and categorical information. In this work, we propose SiameseQAT, a duplicate bug report detection method that considers information on individual bugs as well as information extracted from bug clusters. The SiameseQAT combines context and semantic learning on structured and unstructured features and corpus topic extraction-based features, with a novel loss function called Quintet Loss, which considers the centroid of duplicate clusters and their contextual information. We validated our approach on the well-known open-source software repositories Eclipse, NetBeans, and Open Office, comprised of more than 500 thousand bug reports. We evaluated both the retrieval and classification of duplicates, reporting a Recall@25 mean of 85% for retrieval and 84% AUROC for classification tasks, results that were significantly superior to previous works.
Conference Paper
Full-text available
Understanding and keeping the customer happy is a central tenet of requirements engineering. Strategies to gather, analyze, and negotiate requirements are complemented by efforts to manage customer input after products have been deployed. For the latter, support tickets are key in allowing customers to submit their issues, bug reports, and feature requests. Whenever insufficient attention is given to support issues, however, their escalation to management is time-consuming and expensive, especially for large organizations managing hundreds of customers and thousands of support tickets. Our work provides a step towards simplifying the job of support analysts and managers, particularly in predicting the risk of escalating support tickets. In a field study at our large industrial partner, IBM, we used a design science methodology to characterize the support process and data available to IBM analysts in managing escalations. Through iterative cycles of design and evaluation, we translated our understanding of support analysts' expert knowledge of their customers into features of a support ticket model to be implemented into a Machine Learning model to predict support ticket escalations. We trained and evaluated our Machine Learning model on over 2.5 million support tickets and 10,000 escalations, obtaining a recall of 79.9% and an 80.8% reduction in the workload for support analysts looking to identify support tickets at risk of escalation. Further on-site evaluations, through a prototype tool we developed to implement our Machine Learning techniques in practice, showed more efficient weekly support-ticket-management meetings. The features we developed in the Support Ticket Model are designed to serve as a starting place for organizations interested in implementing our model to predict support ticket escalations, and for future researchers to build on to advance research in ...
Conference Paper
Full-text available
Communication about requirements is often handled in issue tracking systems, especially in a distributed setting. As issue tracking systems also contain bug reports or programming tasks, the software feature requests of the users are often difficult to identify. This paper investigates natural language processing and machine learning features to detect software feature requests in natural language data of issue tracking systems. It compares traditional linguistic machine learning features, such as " bag of words " , with more advanced features, such as subject-action-object, and evaluates combinations of machine learning features derived from the natural language and features taken from the issue tracking system meta-data. Our investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches. We show that issues or data fields (e.g. descriptions or comments), which contain software feature requests, can be identified reasonably well, but hardly the exact sentence. Finally, we show that the choice of machine learning algorithms should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision. In addition, the paper contributes a double coded gold standard and an open-source implementation to further pursue this topic.
Article
Full-text available
Requirements traceability has long been recognized as an important quality of a well-engineered system. Among stakeholders, traceability is often unpopular due to the unclear benefits. In fact, little evidence exists regarding the expected traceability benefits. There is a need for empirical work that studies the effect of traceability. In this paper, we focus on the four main requirements implementation supporting activities that utilize traceability. For each activity, we propose generalized traceability completeness measures. In a defined process, we selected 24 medium to large-scale open-source projects. For each software project, we quantified the degree to which a studied development activity was enabled by existing traceability with the proposed measures. We analyzed that data in a multi-level Poisson regression analysis. We found that the degree of traceability completeness for three of the studied activities significantly affects software quality, which we quantified as defect rate. Our results provide for the first time empirical evidence that more complete traceability decreases the expected defect rate in the developed software. The strong impact of traceability completeness on the defect rate suggests that traceability is of great practical value for any kind of software development project, even if traceability is not mandated by a standard or regulation.
Conference Paper
Full-text available
The interlinking of commit and issue data has become a de-facto standard in software development. Modern issue tracking systems, such as JIRA, automatically interlink commits and issues by the extraction of identifiers (e.g., issue key) from commit messages. However, the conventions for the use of interlinking methodologies vary between software projects. For example, some projects enforce the use of identifiers for every commit while others have less restrictive conventions. In this work, we introduce a model called PaLiMod to enable the analysis of interlinking characteristics in commit and issue data. We surveyed 15 Apache projects to investigate differences and commonalities between linked and non-linked commits and issues. Based on the gathered information, we created a set of heuristics to interlink the residual of non-linked commits and issues. We present the characteristics of Loners and Phantoms in commit and issue data. The results of our evaluation indicate that the proposed PaLiMod model and heuristics enable an automatic interlinking and can indeed reduce the residual of non-linked commits and issues in software projects.
Conference Paper
Full-text available
In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified—that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.
Article
Full-text available
In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are, at the same time, most difficult to provide for users. Such insight is helpful for designing new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. The participants of our survey also provided 175 comments on hurdles in reporting and resolving bugs. Based on these comments, we discuss several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
Conference Paper
Full-text available
According to recent work, duplicate bug report entries in bug tracking systems impact negatively on software maintenance and evolution productivity due to, among other factors, the increased time spent on report analysis and validation, what in some cases take over 20 minutes. Therefore, a considerable amount of time is lost mainly with duplicate bug report analysis. This work presents an initial characterization study using data from bug trackers from private and open source projects, in order to understand the possible factors that cause bug report duplication and its impact on software development.
Conference Paper
Full-text available
An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks it as DUPLICATE and the bug report is removed from consideration for further work. In the literature, there are approaches exploiting only natural language information to de- tect duplicate bug reports. In this paper we present a new approach that further involves execution information. In our approach, when a new bug report arrives, its natural language information and execu- tion information are compared with those of the existing bug reports. Then, a small number of existing bug reports are suggested to the triager as the most similar bug reports to the new bug report. Finally, the triager examines the suggested bug reports to determine whether the new bug report duplicates an existing bug report. We calibrated our approach on a subset of the Eclipse bug repository and evalu- ated our approach on a subset of the Firefox bug repository. The experimental results show that our approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.
Conference Paper
Full-text available
Online feature request management systems are popular tools for gathering stakeholder requirements during system evolution. Deciding which feature requests require attention and how much upfront analysis to perform on them is an important problem in this context: too little upfront analysis may result in inadequate functionalities being developed, costly changes, and wasted development effort; too much upfront analysis is a waste of time and resources. Early predictions about which feature requests are most likely to fail due to insufficient or inadequate upfront analysis could facilitate such decisions. Our objective is to study whether it is possible to make such predictions automatically from the characteristics of the online discussions on feature requests. The paper presents a tool-implemented framework that automatically constructs failure prediction models using machine-learning classification algorithms and compares the performance of the different techniques for the Firefox and Netbeans projects. The comparison relies on a cost-benefit model for assessing the value of additional upfront analysis. In this model, the value of additional upfront analysis depends on its probability of success in preventing failures and on the relative cost of the failures it prevents compared to its own cost. We show that for reasonable estimations of these two parameters automated prediction models provide more value than a set of baselines for some failure types and projects. This suggests that automated failure prediction during online requirements elicitation may be a promising approach for guiding requirements engineering efforts in online settings.
Article
With the increasing number of software bugs, bug fixing plays an important role in software development and maintenance. To improve the efficiency of bug resolution, developers utilize bug reports to resolve given bugs. Especially, bug triagers usually depend on bugs’ descriptions to suggest priority levels for reported bugs. However, manual priority assignment is a time-consuming and cumbersome task. To resolve this problem, recent studies have proposed many approaches to automatically predict the priority levels for the reported bugs. Unfortunately, these approaches still face two challenges that include words’ nonconsecutive semantics in bug reports and the imbalanced data. In this article, we propose a novel approach that graph convolutional networks (GCN) based on weighted loss function to perform the priority prediction for bug reports. For the first challenge, we build a heterogeneous text graph for bug reports and apply GCN to extract words’ semantics in bug reports. For the second challenge, we construct a weighted loss function in the training phase. We conduct the priority prediction on four open-source projects, including Mozilla, Eclipse, Netbeans, and GNU compiler collection. Experimental results show that our method outperforms two baseline approaches in terms of the F-measure by weighted average of 13.22%.
Conference Paper
With a growing number of software projects, software quality is increasingly crucial. Researchers and engineers in the software engineering field often pay much attention to bug management tasks, such as bug localization, bug triage, and duplicate bug detection. However, there are few researchers to study blocking bug prediction. Blocking bugs prevent other bugs from being fixed and usually need more time to be fixed. Thus, developers need to identify blocking bugs and reduce the impact of blocking bugs. The previous studies utilized supervised algorithms to implement this task. However, they did not consider the dependencies among individual classifiers so that they cannot get the perfect accuracy for blocking bug prediction. In this paper, we propose a new framework XGBlocker that includes two stages. In the first stage, XGBlocker collects more features from bug reports to build an enhanced dataset. In the second stage, XGBlocker exploits XGBoost technique to construct an effective model to perform the prediction task. We conduct experiments on four projects with three evaluation metrics. The experimental results show that our method XGBlocker achieves promising performance compared with baseline methods in most cases. In detail, XGBlocker achieves F1-score, ER@20%, and AUC of up to 0.808, 0.944, and 0.975, respectively. On average across the four projects, XGBlocker improves F1-score, ER@20%, and AUC over the state-of-the-art method ELBlocker by 17.27%, 12.67%, and 4.85%, respectively.
Conference Paper
Classifying requirements is crucial for automatically handling natural language requirements. The performance of existing automatic classification approaches diminishes when applied to unseen projects because requirements usually vary in wording and style. The main problem is poor generalization. We propose NoRBERT that fine-tunes BERT, a language model that has proven useful for transfer learning. We apply our approach to different tasks in the domain of requirements classification. We achieve similar or better results F 1 -scores of up to 94%) on both seen and unseen projects for classifying functional and non-functional requirements on the PROMISE NFR dataset. NoRBERT outperforms recent approaches at classifying non-functional requirements subclasses. The most frequent classes are classified with an average F 1 -score of 87%. In an unseen project setup on a relabeled PROMISE NFR dataset, our approach achieves an improvement of 15 percentage points in average F 1 score compared to recent approaches. Additionally, we propose to classify functional requirements according to the included concerns, i.e., function, data, and behavior. We labeled the functional requirements in the PROMISE NFR dataset and applied our approach. NoRBERT achieves an F 1 -score of up to 92%. Overall, NoRBERT improves requirements classification and can be applied to unseen projects with convincing results.
Conference Paper
Duplicate bug reports often exist in bug tracking systems (BTSs). Almost all the existing approaches for automatically detecting duplicate bug reports are based on text similarity. A recent study found that such approaches may become ineffective in detecting duplicates in bug reports submitted after the just-in-time (JIT) retrieval, which is now a built-in feature of modern BTSs (e.g., Bugzilla). This is mainly because the embedded JIT feature suggests possible duplicates in a bug database when a bug reporter types in the new summary field, therefore minimizing the submission of textually similar reports. Although JIT filtering seems effective, a number of bug report duplicates remain undetected. Our hypothesis is that we can detect them using a semantic similarity-based approach. This paper presents HINDBR, a novel deep neural network (DNN) that accurately detects semantically similar duplicate bug reports using a heterogeneous information network (HIN). Instead of matching text similarity alone, HINDBR embeds semantic relations of bug reports into a low-dimensional embedding space where two duplicate bug reports represented by two vectors are close to each other in the latent space. Results show that HINDBR is effective.
Conference Paper
Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software as well as number and heterogeneity of stakeholders motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aim to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools.
Conference Paper
Issue tracking systems (ITS) are widely used to describe and manage activities in software systems. Within the ITS, software developers create issues and establish typed links among these artifacts. A variety of different link types exists, e. g. that one issue clones another. The rational about choosing a specific link type is not always obvious. In this paper, we study link type selection and focus on the relationship of textual properties of connected issues to picked link type. We performed a study on seven open-source systems and quantified the usage of typed links. Further, we report preliminary results indicating, that depending on link type, a link mostly captures textual similarity of issues and thus may provide only limited additional information.
Conference Paper
Context & motivation: Features are important for many software engineering activities, e.g. release planning. Companies document features in Issue Tracking Systems (ITS) and store feature code in Version Control Systems (VCS). Question/Problem: However, companies do not always manage features systematically. This issue hinders e.g. the prioritizing of features for release planning. Principal ideas/results: We want to provide insights into practice regarding feature management. We have developed first ideas on lightweight feature management using tags. We conducted semi-structured interviews with eight experts to get insight into practice and an early evaluation of our idea. Contribution: The interviews showed that fuzzy feature descriptions, insufficient traceability, and fragmentation of feature knowledge are major practice problems. The interviews thus confirm the need for a method for managing features across ITS and VCS. We propose our lightweight method for feature management and describe future research regarding our approach.
Conference Paper
Software developers use issues as a means to describe a range of activities to be undertaken on a software system, including features to be added and defects that require fixing. When creating issues, software developers expend manual effort to specify relationships between issues, such as one issue blocking another or one issue being a sub-task of another. In particular, developers use a variety of relationships to express how work is to be broken down on a project. To better understand how software developers use work breakdown relationships between issues, we manually coded a sample of work breakdown relationships from three open source systems. We report on our findings and describe how the recognition of work breakdown relationships opens up new ways to improve software development techniques.
Conference Paper
[Context and motivation] Traces between issues in issue tracking systems connect bug reports to software features, they connect competing implementation ideas for a software feature or they identify duplicate issues. However, the trace quality is usually very low. To improve the trace quality between requirements, features, and bugs, information retrieval algorithms for automated trace retrieval can be employed. Prevailing research focusses on structured and well-formed documents, such as natural language requirement descriptions. In contrast, the information in issue tracking systems is often poorly structured and contains digressing discussions or noise, such as code snippets, stack traces, and links. Since noise has a negative impact on algorithms for automated trace retrieval, this paper asks: [Question/Problem] Do information retrieval algorithms for automated traceability perform effectively on issue tracking system data? [Results] This paper presents an extensive evaluation of the performance of five information retrieval algorithms. Furthermore, it investigates different preprocessing stages (e.g. stemming or differentiating code snippets from natural language) and evaluates how to take advantage of an issue’s structure (e.g. title, description, and comments) to improve the results. The results show that algorithms perform poorly without considering the nature of issue tracking data, but can be improved by project-specific preprocessing and term weighting. [Contribution] Our results show how automated trace retrieval on issue tracking system data can be improved. Our manually created gold standard and an open-source implementation based on the OpenTrace platform can be used by other researchers to further pursue this topic.
Article
Automatic identification of duplicate bug reports is an important research problem in the mining software repositories field. This paper presents a collection of bug datasets collected, cleaned and preprocessed for the duplicate bug report identification problem. The datasets were extracted from open-source systems that use Bugzilla as their bug tracking component and contain all the bugs ever submitted. The systems used are Eclipse, Open Office, NetBeans and Mozilla. For each dataset, we store both the initial data and the cleaned data in separate collections in a mongoDB document-oriented database. For each dataset, in addition to the bug data collections downloaded from bug repositories, the database includes a set of all pairs of duplicate bugs together with randomly selected pairs of non-duplicate bugs. Such a dataset is useful as input for classification models and forms a good base to support replications and comparisons by other researchers. We used a subset of this data to predict duplicate bug reports but the same data set may also be used to predict bug priorities and severity.
Blocking bug prediction based on x gboost with enhanced features
  • X Cheng
  • N Liu
  • L Guo
  • Z Xu
  • T Zhang
Bert: Pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova
Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter
  • V Sanh
  • L Debut
  • J Chaumond
  • T Wolf
Bert: Pre-training of deep bidirectional transformers for language understanding
  • Devlin
Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter
  • Sanh