Conference Paper

An alternative issue tracking dataset of public Jira repositories

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Approaches Given the availability of large amounts of existing data from issue trackers such as Jira [29], link type prediction is often approached through learning-based methods. The first step in those methods is to represent the two involved artifacts as a vector for training the classifier and later making inferences. ...
... First of all, the quality of the dataset needs to be carefully scrutinized before being used for training or testing the machine learning models. Previous works have pointed out that the use of link labels in Jira is inconsistent [33,29,20]. For example, multiple terms with slight variations are used as link labels in Jira, such as Depend, Dependency, Dependent, Depends. ...
... While Duplicate is assigned by users explicitly after both involved issues are created, the label clone is automatically created by Jira when its users use the "clone" feature to create new issues from existing ones. Through a qualitative labelling process, Montgomery et al. [29] investigated initial data mined from Jira and created a cleaned dataset with around 1 million issue links. ...
Preprint
Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
... Popular ITS in practice are Bugzilla, 1 GitHub Issues, 2 and JIRA, 3 with common features like issue creation and tracking, communication around issues, release planning, issue triaging, and issue dependency management. As a central entity, issues have many properties, including their type [1] (e.g., requirement, user story, feature request, or bug report), priority (e.g., low, high, critical), and assignee. ITS and issue management, in general, have been the focus of software engineering and requirements engineering research for over a decade, with promising automation approaches, for instance, on issue type prediction (and correction) [2][3][4][5][6], priority (and escalation) prediction [7][8][9][10], or triaging and assignment [11,12]. ...
... Issues are often interconnected via links [1,13], which enable stakeholders to capture the relationships between the issues and to structure the overall project knowledge [14]. These links can have different types depending on the issue tracker and the project. ...
... The results reveal insights into link type prediction (errors) and how to design and apply link prediction in practice. In addition to the recently published dataset of JIRA repositories [1], we share our code and analysis scripts to ease replication. 4 The remainder of the paper is structured as follows. ...
Article
Full-text available
Stakeholders in software projects use issue trackers like JIRA or Bugzilla to capture and manage issues, including requirements, feature requests, and bugs. To ease issue navigation and structure project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 16 JIRA repositories, we study the commonalities and differences in linking practices and link types across the repositories. We then investigate how state-of-the-art machine learning models can predict common link types. We observed significant differences across the repositories and link types, depending on how they are used and by whom. Additionally, we observed several inconsistencies, e.g., in how Duplicate links are used. We found that a transformer model trained on titles and descriptions of linked issues significantly outperforms other optimized models, achieving an encouraging average macro F1-score of 0.64 for predicting nine popular link types across all repositories (weighted F1-score of 0.73). For the specific Subtask- and Epic- links, the model achieves top F1-scores of 0.89 and 0.97, respectively. If we restrict the task to predict the mere existence of links, the average macro F1-score goes up to 0.95. In general, the shorter issue text, possibly indicating precise issues, seems to improve the prediction accuracy with a strong negative correlation of - 0.73. We found that Relate-links often get confused with the other links, which suggests that they are likely used as default links in unclear cases. Our findings particularly on the quality and heterogeinity of issue link data have implications for researching and applying issue link prediction in practice.
... The most influential attributes were as follows: summary, component, priority, and assignee. The number of reported attributes can be quite large; in the survey paper [41], 25 attributes are listed in relevance to 16 Jira open-source projects. In our studies of diverse projects, we observed many more attributes. ...
... Data processing involves deriving diverse statistics on, the visualization of, and presentation of results. MongoDB was also used in other papers on issue tracking, e.g., [41]. ...
... In this paper, we presented illustrative examples related to Jira repositories. These repositories are quite popular among practitioners and researchers [41,51,52]. Most research studies in Section 2 refer to Jira and Bugzilla ITS systems. ...
Article
Full-text available
Software project development and maintenance activities have been reported in various repositories. The data contained in these repositories have been widely used in various studies on specific problems, e.g., predicting bug appearance, allocating issues to developers, and identifying duplicated issues. Developed analysis schemes are usually based on simplified data models while issue report details are neglected. Confronting this problem requires a deep and wide-ranging exploration of software repository contents adapted to their specificities, which differs significantly from classical data mining. This paper is targeted at three aspects: the structural and semantic exploration of repositories, deriving characteristic features in value and time perspectives, and defining the space of project monitoring goals. The considerations presented demonstrate a holistic image of the project development process, which is useful in the assessment of its efficiency and identification of imperfections. The original analysis introduced in this work was verified using open source and some commercial software project repositories.
... To measure the stability of Agile systems we used queueing theory to derive a new metric, the Stability Metric (SM) to classify the performance of queues. We applied this metric to 926 collections of Jira Projects (JPs), of Product Backlog Items (PBIs) from the more than 2.5 million records in the Public Jira Dataset [5]. The distribution of Stability Metric across all JPs and the relationship between Stability Metric and product backlog size and inter-service arrival time was extracted and analyzed. ...
... With the theoretical foundation of the Queuing Theory in mind, the primary research question we are addressing in this paper is: RQ: Are Agile systems stable from a queueing perspective? To answer this question, we use a metric derived from the Queueing theory called the Stability Metric, as described below, and analyze a Public Jira dataset [5] containing 16 public Jira repositories involving 1822 Jira projects to determine if those systems were stable. ...
... Past studies of Agile systems have focused on contextual realism with case studies of real software engineering teams, although these studies are typically not grounded in theory [10]. Montgomery et al. [5] curated and published a dataset of the contents of the 16 public issue tracking systems with 1822 Jira projects and 2.7 million Product Backlog Items (PBIs) all using the Atlassian Jira issue tracking tool, which they suggest is the leading issue-tracking tool for Agile systems. We decided to analyze this dataset to make the results as generalizable as possible. ...
Chapter
Full-text available
Agile systems, like the Kanban and Scrum frameworks, are built on assumptions of sustainability and stability, however, there is little empirical evidence on whether such systems are stable in practice or not. Therefore, in this study we aim to inspect the stability of Agile systems by leveraging the concept of stability described in Queueing Theory. We define a novel metric, the Stability Metric, as a way of assessing queueing systems, especially Agile systems. We inspect 926 Jira projects in 14 organizations with over 1.6 million product backlog items using this metric. The analysis showed that 72.89% of these Jira projects were not stable and stable systems, on average, had product backlog sizes 10 times shorter than unstable ones. These results suggest that while the goal of Agile is to create a sustainable, stable way of working, this is not guaranteed, and a better understanding of systems and queues may be required to help design, create, coach, and maintain optimal Agile systems.
... Popular issue trackers in practice are Bugzilla 1 , GitHub Issues 2 , and JIRA 3 , with common features like issue creation and tracking, communication around issues, release planning, issue triaging, and issue dependency management. As a central entity, issues have many properties including their type [24] (e.g. requirement, feature request, bug report, or task), priority (e.g. ...
... Issues are often interconnected via links [16], [24], which enables stakeholders to capture the relationships between the issues and structure the overall project knowledge. Depending on the issue tracker and the project, these links can have different types. ...
... The results reveal insights on link types confusion as well as on how to design and apply link prediction tools in practice. In addition to the recently published dataset of 15 JIRA repositories [24], we share our code and analysis scripts to ease replication. 4 The remainder of the paper is structured as follows. ...
Preprint
Stakeholders in software projects use issue trackers like JIRA to capture and manage issues, including requirements and bugs. To ease issue navigation and structure project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 15 JIRA repositories, we study how well state-of-the-art machine learning models can automatically detect common link types. We found that a pure BERT model trained on titles and descriptions of linked issues significantly outperforms other optimized deep learning models, achieving an encouraging average macro F1-score of 0.64 for detecting 9 popular link types across all repositories (weighted F1-score of 0.73). For the specific Subtask- and Epic- links, the model achieved top F1-scores of 0.89 and 0.97, respectively. Our model does not simply learn the textual similarity of the issues. In general, shorter issue text seems to improve the prediction accuracy with a strong negative correlation of -0.70. We found that Relate-links often get confused with the other links, which suggests that they are likely used as default links in unclear cases. We also observed significant differences across the repositories, depending on how they are used and by whom.
... This work takes a holistic view on issue link types. We report on a study comparing the various types and their usage in 15 well-known public JIRA repositories [27]. By studying link types in JIRA, a widely used ITS in practice, we hope to create awareness about how the other types beyond duplicates are used and to inform a more generalizable and reliable link type predictions. ...
... To answer RQ1, we first searched for a large ITS dataset, possibly of an issue tracker that allow customized link types and that is widely used in software development. The public JIRA dataset recently shared by Montgomery et al. [27] meets this requirement. JIRA offers various default link types and allow their customization, e.g., to support different workflows. ...
... We used a dataset of 15 public JIRA repositories [27]. Table 1 summarizes the analyzed repositories in terms of the number of issues, links, and link types. ...
Preprint
Software projects use Issue Tracking Systems (ITS) like JIRA to track issues and organize the workflows around them. Issues are often inter-connected via different links such as the default JIRA link types Duplicate, Relate, Block, or Subtask. While previous research has mostly focused on analyzing and predicting duplication links, this work aims at understanding the various other link types, their prevalence, and characteristics towards a more reliable link type prediction. For this, we studied 607,208 links connecting 698,790 issues in 15 public JIRA repositories. Besides the default types, the custom types Depend, Incorporate, Split, and Cause were also common. We manually grouped all 75 link types used in the repositories into five general categories: General Relation, Duplication, Composition, Temporal / Causal, and Workflow. Comparing the structures of the corresponding graphs, we observed several trends. For instance, Duplication links tend to represent simpler issue graphs often with two components and Composition links present the highest amount of hierarchical tree structures (97.7%). Surprisingly, General Relation links have a significantly higher transitivity score than Duplication and Temporal / Causal links. Motivated by the differences between the link types and by their popularity, we evaluated the robustness of two state-of-the-art duplicate detection approaches from the literature on the JIRA dataset. We found that current deep-learning approaches confuse between Duplication and other links in almost all repositories. On average, the classification accuracy dropped by 6% for one approach and 12% for the other. Extending the training sets with other link types seems to partly solve this issue. We discuss our findings and their implications for research and practice.
... Through this interactive learning experience, students gain a deeper understanding of agile principles within a 90-minute timeframe [6]. In terms of research resources, a dataset of 16 public Jira repositories with millions of issues, changes, comments, and issue links has been made available [7]. This dataset offers numerous opportunities for investigating issue evolution, linking, cross-project analysis, and cross-tool analysis when combined with other well-studied issue-tracking system datasets [7]. ...
... In terms of research resources, a dataset of 16 public Jira repositories with millions of issues, changes, comments, and issue links has been made available [7]. This dataset offers numerous opportunities for investigating issue evolution, linking, cross-project analysis, and cross-tool analysis when combined with other well-studied issue-tracking system datasets [7]. ...
Article
Full-text available
This article introduces an innovative project management tool developed using Visual Studio, C#, and Azure Optical Character Recognition (OCR) service. The tool digitizes scanned text, automatically generating ticket structures and tasks. It facilitates efficient project planning, user roles, task assignment, progress tracking, and customized reporting. Integration with Azure OCR enables automatic conversion of documents to digital formats. The tool streamlines task integration using modern software technologies. This tool empowers users to collaborate, set objectives, and efficiently manage projects
... For this study, we extract TD issues from the Jira Public dataset [32], which is an out-of-distribution (OOD) dataset, as the model was initially trained on GitHub issues. We investigated the generalizability of this OOD Jira dataset to see how well the model can adapt to other project management tools before and after finetuning our TD model. ...
... Recall also improved from 0.873 to 0.886, suggesting better identification of true TD issues. Overall accuracy saw a marked increase from 0.800 to 0.855, reflecting enhanced overall classification performance.Similar performance enhancements were observed across other datasets, including 'JIRA' [32] and 'VSCode' [31]. ...
Preprint
Full-text available
Technical Debt (TD) identification in software projects issues is crucial for maintaining code quality, reducing long-term maintenance costs, and improving overall project health. This study advances TD classification using transformer-based models, addressing the critical need for accurate and efficient TD identification in large-scale software development. Our methodology employs multiple binary classifiers for TD and its type, combined through ensemble learning, to enhance accuracy and robustness in detecting various forms of TD. We train and evaluate these models on a comprehensive dataset from GitHub Archive Issues (2015-2024), supplemented with industrial data validation. We demonstrate that in-project fine-tuned transformer models significantly outperform task-specific fine-tuned models in TD classification, highlighting the importance of project-specific context in accurate TD identification. Our research also reveals the superiority of specialized binary classifiers over multi-class models for TD and its type identification, enabling more targeted debt resolution strategies. A comparative analysis shows that the smaller DistilRoBERTa model is more effective than larger language models like GPTs for TD classification tasks, especially after fine-tuning, offering insights into efficient model selection for specific TD detection tasks. The study also assesses generalization capabilities using metrics such as MCC, AUC ROC, Recall, and F1 score, focusing on model effectiveness, fine-tuning impact, and relative performance. By validating our approach on out-of-distribution and real-world industrial datasets, we ensure practical applicability, addressing the diverse nature of software projects.
... Moreover, having a skilled customer or a product owner write good user stories is often far from reality for many software projects. It is thus common to see teams just keeping track of the tasks to be implemented and issues to be resolved [17,18]. The convenience of focusing on recording development and maintenance tasks comes with the risk that the development tasks and the actual needs of the users may drift apart. ...
Conference Paper
A central challenge for ensuring the success of software projects is to assure the convergence of developers’ and users’ views. While the availability of large amounts of user data from social media, app store reviews, and support channels bears many benefits, it still remains unclear how software development teams can effectively use this data. We present an LLM-powered approach called DeeperMatcher that helps agile teams use crowd-based requirements engineering (CrowdRE) in their issue and task management.We are currently implementing a command-line tool that enables developers to match issues with relevant user reviews. We validated our approach on an existing English dataset from a well-known open-source project. Additionally, to check how well DeeperMatcher works for other languages, we conducted a single-case mechanism experiment alongside developers of a local project that has issues and user feedback in Brazilian Portuguese. Our preliminary analysis indicates that the accuracy of our approach is highly dependent on the text embedding method used. We discuss further refinements needed for reliable crowd-based requirements engineering with multilingual support.
... Moreover, having a skilled customer or a product owner write good user stories is often far from reality for many software projects. It is thus common to see teams just keeping track of the tasks to be implemented and issues to be resolved [17,18]. The convenience of focusing on recording development and maintenance tasks comes with the risk that the development tasks and the actual needs of the users may drift apart. ...
Preprint
Full-text available
A central challenge for ensuring the success of software projects is to assure the convergence of developers' and users' views. While the availability of large amounts of user data from social media, app store reviews, and support channels bears many benefits, it still remains unclear how software development teams can effectively use this data. We present an LLM-powered approach called DeeperMatcher that helps agile teams use crowd-based requirements engineering (CrowdRE) in their issue and task management. We are currently implementing a command-line tool that enables developers to match issues with relevant user reviews. We validated our approach on an existing English dataset from a well-known open-source project. Additionally, to check how well DeeperMatcher works for other languages, we conducted a single-case mechanism experiment alongside developers of a local project that has issues and user feedback in Brazilian Portuguese. Our preliminary analysis indicates that the accuracy of our approach is highly dependent on the text embedding method used. We discuss further refinements needed for reliable crowd-based requirements engineering with multilingual support.
... By focusing on the interactions and communication patterns among team members through the lens of Jira issues, our technique is intended to provide a better understanding of the actual processes employed by the teams. Jira 1 is a widely established issue tracker with integrated tooling, such as project management via Kanban boards [21], [23], [24]. Therefore, any technique capable of mining processes from Jira issues can assist practitioners and researchers in comprehending their development processes. ...
... To this end, the tool needs an interface to the requirements entities, context information about the involved agents, and context information about the organization. The former two are often available in a requirements tracking system like Jira, while the latter a company likely has to generate and provide manually [83]. 2 Once provided with the necessary information, the tool characterizes both entities and context, i.e., quantifies the natural language requirements entities and the elusive factors determining the context. The quantified entities and context serve as input to the impact prediction model as described in Sect. ...
Article
Full-text available
High-quality requirements minimize the risk of propagating defects to later stages of the software development life cycle. Achieving a sufficient level of quality is a major goal of requirements engineering. This requires a clear definition and understanding of requirements quality. Though recent publications make an effort at disentangling the complex concept of quality, the requirements quality research community lacks identity and clear structure which guides advances and puts new findings into an holistic perspective. In this research commentary, we contribute (1) a harmonized requirements quality theory organizing its core concepts, (2) an evaluation of the current state of requirements quality research, and (3) a research roadmap to guide advancements in the field. We show that requirements quality research focuses on normative rules and mostly fails to connect requirements quality to its impact on subsequent software development activities, impeding the relevance of the research. Adherence to the proposed requirements quality theory and following the outlined roadmap will be a step toward amending this gap.
... A first Jira repository dataset was created in 2015, containing more than 700K issue reports and more than 2 million issue comments extracted from the Jira issue tracking system of the Apache Software Foundation, Spring, JBoss and CodeHaus OSS communities [134]. A more recent dataset created in 2022 gathers data from 16 public Jira repositories containing 1822 projects and spanning 2.7 million issues with a combined total of 32 million changes, 9 million comments, and 1 million issue links [122,123]. ...
Chapter
This chapter defines and presents the kinds of software ecosystems that are targeted in this book. The focus is on the development, tooling and analytics aspects of "software ecosystems", i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems, many of which will be explored in further detail in the subsequent chapters of the book. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.
... State bias in an ITS other than Bugzilla: In RQ1, the experimental results show that there is no significant difference between tools performed with the initial states and the latest states. Besides Bugzilla, we leverage the datasets shared by Montgomery et al. [37] and recover the states of issues in Jira (i.e., Hadoop and Spark) to the end of the submission day. We then perform the same experiments as RQ1 for state bias and report the results in Table 11. ...
Article
Full-text available
Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode . Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.
... State bias in an ITS other than Bugzilla: In RQ1, the experimental results show that there is no significant difference between tools performed with the initial states and the latest states. Besides Bugzilla, we leverage the datasets shared by Montgomery et al. [37] and recover the states of issues in Jira (i.e., Hadoop and Spark) to the end of the submission day. We then perform the same experiments as RQ1 for state bias and report the results in Table 11. ...
Preprint
Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.
Chapter
In this chapter, we share an experience report of teaching a master course on empirical research methods at Eindhoven University of Technology in the Netherlands. The course is taught for ten weeks to a mix of students from different study programs and combines both practical assignments with a closed-book exam. We discuss the challenges of teaching a course on research methods and explain how we address these challenges in the course design. Additionally, we share our lessons learned and the challenges we encountered over several iterations of teaching the course.
Preprint
In this chapter, we share an experience report of teaching a master course on empirical research methods at Eindhoven University of Technology in the Netherlands. The course is taught for ten weeks to a mix of students from different study programs and combines both practical assignments with a closed-book exam. We discuss the challenges of teaching a course on research methods and explain how we address these challenges in the course design. Additionally, we share our lessons learned and the do's and don'ts we learned over several iterations of teaching the course.
Conference Paper
The accurate estimation of project effort is a crucial objective in software development processes. Given that the communication of task descriptions typically occurs in natural language texts, Natural Language Processing (NLP) methods within machine learning have the potential to provide rapid and effective ways. The objective of this study was to enhance the accuracy of the effort estimation classification task for software requirements by proposing an NLP-based software requirements effort estimation model for agile software development processes. A semi-supervised noise filtering mechanism based on k-means clustering with tf-idf embeddings was implemented and evaluated. The software requirement documents were represented by FastText embeddings, and then a fast text classifier was used to predict the expected effort of a given requirement text. The effectiveness of the implemented model was evaluated and it was revealed that the application of noise filtering has improved the performance with an accuracy of 96.8%.
Chapter
The use of third-party packages is becoming increasingly popular and has led to the emergence of large software package ecosystems with a maze of interdependencies. Since the reliance on these ecosystems enables developers to reduce development effort and increase productivity, it has attracted the interest of researchers: understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities, to name a few examples. But the reality of these ecosystems also poses challenges to software engineering researchers, such as the following: How do we obtain the complete network of dependencies along with the corresponding versioning information? What are the boundaries of these package ecosystems? How do we consistently detect dependencies that are declared but not used? How do we consistently identify developers within a package ecosystem? How much of the ecosystem do we need to understand to analyze a single component? How well do our approaches generalize across different programming languages and package ecosystems? In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
Chapter
Knowledge graph representation learning (KGRL) aims to project the entities and relations into a continuous low-dimensional knowledge graph space to be used for knowledge graph completion and detecting new triples. Using textual descriptions for entity representation learning has been a key topic. However, the current work has two major constraints: (1) some entities do not have any associated descriptions; (2) the associated descriptions are usually phrases, and they do not contain enough information. This paper presents a novel KGRL method for learning effective embeddings by generating meaningful descriptive sentences from entities’ connections. The experiments using four public datasets and a new proposed dataset show that the New Description-Embodied Knowledge Graph Embedding (NDKGE for short) approach introduced in this paper outperforms most of the existing work in the task of link prediction. The code and datasets of this paper can be obtained from GitHub (https://github.com/MiaoHu-Pro/NDKGE.)KeywordsKnowledge graph embeddingEntity descriptionConstructing new descriptionsLink prediction
Preprint
Full-text available
Self-admitted technical debt (SATD) is a particular case of Technical Debt (TD) where developers explicitly acknowledge their sub-optimal implementation decisions. Previous studies mine SATD by searching for specific TD-related terms in source code comments. By contrast, in this paper we argue that developers can admit technical debt by other means, e.g., by creating issues in tracking systems and labelling them as referring to TD. We refer to this type of SATD as issue-based SATD or just SATD-I. We study a sample of 286 SATD-I instances collected from five open source projects, including Microsoft Visual Studio and GitLab Community Edition. We show that only 29% of the studied SATD-I instances can be tracked to source code comments. We also show that SATD-I issues take more time to be closed, compared to other issues, although they are not more complex in terms of code churn. Besides, in 45% of the studied issues TD was introduced to ship earlier, and in almost 60% it refers to Design flaws. Finally, we report that most developers pay SATD-I to reduce its costs or interests (66%). Our findings suggest that there is space for designing novel tools to support technical debt management, particularly tools that encourage developers to create and label issues containing TD concerns.
Article
Full-text available
Many community-based open source software (OSS) projects depend on a continuous influx of newcomers for their survival and continuity; yet, newcomers face many barriers to contributing to a project for the first time, leading in many cases to dropouts. In this paper, we provide guidelines for both OSS communities interested in receiving more external contributions, and newcomers who want to contribute to OSS projects. These guidelines are based on our previous work, which characterized barriers encountered by newcomers and proposed tools to support them in overcoming these barriers. Since newcomers are critical for OSS growth and continuity, our work may help increase contributions to OSS projects, as well as promote a more diverse community.
Conference Paper
Full-text available
Communication about requirements is often handled in issue tracking systems, especially in a distributed setting. As issue tracking systems also contain bug reports or programming tasks, the software feature requests of the users are often difficult to identify. This paper investigates natural language processing and machine learning features to detect software feature requests in natural language data of issue tracking systems. It compares traditional linguistic machine learning features, such as " bag of words " , with more advanced features, such as subject-action-object, and evaluates combinations of machine learning features derived from the natural language and features taken from the issue tracking system meta-data. Our investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches. We show that issues or data fields (e.g. descriptions or comments), which contain software feature requests, can be identified reasonably well, but hardly the exact sentence. Finally, we show that the choice of machine learning algorithms should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision. In addition, the paper contributes a double coded gold standard and an open-source implementation to further pursue this topic.
Conference Paper
Full-text available
Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and recently investigating developers "affectiveness". In particular, the Jira Issue Tracking System is a proprietary tracking system that has gained a tremendous popularity in the last years and offers unique features like the project management system and the Jira agile kanban board. This paper presents a dataset extracted from the Jira ITS of four popular open source ecosystems (as well as the tools and infrastructure used for extraction) the Apache Software Foundation, Spring, JBoss and CodeHaus communities. Our dataset hosts more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments. Using this data, we have been able to deeply study the communication process among developers, and how this aspect affects the development process. Furthermore, comments posted by developers contain not only technical information, but also valuable information about sentiments and emotions. Since sentiment analysis and human aspects in software engineering are gaining more and more importance in the last years, with this repository we would like to encourage further studies in this direction.
Conference Paper
Full-text available
Issue tracking systems store valuable data for testing hypotheses concerning maintenance, building statistical prediction models and (recently) investigating developer affectiveness. For the latter, issue tracking systems can be mined to explore developers emotions, sentiments and politeness---affects for short. However, research on affect detection in software artefacts is still in its early stage due to the lack of manually validated data and tools. In this paper, we contribute to the research of affects on software artefacts by providing a labeling of emotions present on issue comments. We manually labeled 2,000 issue comments and 4,000 sentences written by developers with emotions such as love, joy, surprise, anger, sadness and fear. Labeled comments and sentences are linked to software artefacts reported in our previously published dataset (containing more than 1K projects, more than 700K issue reports and more than 2 million issue comments). The enriched dataset presented in this paper allows the investigation of the role of affects in software development.
Conference Paper
Full-text available
While requirements for open source projects originate from a variety of sources like e.g. mailing lists or blogs, typically, they eventually end up as feature requests in an issue tracking system. When analyzing how these issue trackers are used for requirements evolution, we witnessed a high percentage of duplicates in a number of high-prole projects. Further investigation of six open source projects and their users led us to a number of important observations and a categorization of the root causes of these duplicates. Based on this, we propose a set of improvements for future issue tracking systems.
Article
Full-text available
An understanding of how to manage relationships with customers effectively has become an important topic for both academicians and practitioners in recent years. However, the existing academic literature and the practical applications of customer relationship management (CRM) strategies do not provide a clear indication of what specifically constitutes CRM processes. In this study, the authors (1) conceptualize a construct of the CRM process and its dimensions, (2) operationalize and validate the construct, and (3) empirically investigate the organizational performance consequences of implementing CRM processes. Their research questions are addressed in two cross-sectional studies across four different industries and three countries. The first key outcome is a theoretically sound CRM process measure that outlines three key stages: initiation, maintenance, and termination. The second key result is that the implementation of CRM processes has a moderately positive association with both perceptual and objective company performance.
Conference Paper
Full-text available
In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified—that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.
Article
Full-text available
In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are, at the same time, most difficult to provide for users. Such insight is helpful for designing new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. The participants of our survey also provided 175 comments on hurdles in reporting and resolving bugs. Based on these comments, we discuss several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
Article
Full-text available
Thematic analysis is a poorly demarcated, rarely acknowledged, yet widely used qualitative analytic method within psychology. In this paper, we argue that it offers an accessible and theoretically flexible approach to analysing qualitative data. We outline what thematic analysis is, locating it in relation to other qualitative analytic methods that search for themes or patterns, and in relation to different epistemological and ontological positions. We then provide clear guidelines to those wanting to start thematic analysis, or conduct it in a more deliberate and rigorous way, and consider potential pitfalls in conducting thematic analysis. Finally, we outline the disadvantages and advantages of thematic analysis. We conclude by advocating thematic analysis as a useful and flexible method for qualitative research in and beyond psychology.
Conference Paper
Full-text available
Thematic analysis is an approach that is often used for identifying, analyzing, and reporting patterns (themes) within data in primary qualitative research. 'Thematic synthesis' draws on the principles of thematic analysis and identifies the recurring themes or issues from multiple studies, interprets and explains these themes, and draws conclusions in systematic reviews. This paper conceptualizes the thematic synthesis approach in software engineering as a scientific inquiry involving five steps that parallel those of primary research. The process and outcome associated with each step are described and illustrated with examples from systematic reviews in software engineering.
Conference Paper
Full-text available
The severity of a reported bug is a critical factor in deciding how soon it needs to be fixed. Unfortunately, while clear guidelines exist on how to assign the severity of a bug, it remains an inherent manual process left to the person reporting the bug. In this paper we investigate whether we can accurately predict the severity of a reported bug by analyzing its textual description using text mining algorithms. Based on three cases drawn from the open-source community (Mozilla, Eclipse and GNOME), we conclude that given a training set of sufficient size (approximately 500 reports per severity), it is possible to predict the severity with a reasonable accuracy (both precision and recall vary between 0.65-0.75 with Mozilla and Eclipse; 0.70-0.85 in the case of GNOME).
Conference Paper
Full-text available
An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks it as DUPLICATE and the bug report is removed from consideration for further work. In the literature, there are approaches exploiting only natural language information to de- tect duplicate bug reports. In this paper we present a new approach that further involves execution information. In our approach, when a new bug report arrives, its natural language information and execu- tion information are compared with those of the existing bug reports. Then, a small number of existing bug reports are suggested to the triager as the most similar bug reports to the new bug report. Finally, the triager examines the suggested bug reports to determine whether the new bug report duplicates an existing bug report. We calibrated our approach on a subset of the Eclipse bug repository and evalu- ated our approach on a subset of the Firefox bug repository. The experimental results show that our approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.
Conference Paper
Full-text available
Online feature request management systems are popular tools for gathering stakeholder requirements during system evolution. Deciding which feature requests require attention and how much upfront analysis to perform on them is an important problem in this context: too little upfront analysis may result in inadequate functionalities being developed, costly changes, and wasted development effort; too much upfront analysis is a waste of time and resources. Early predictions about which feature requests are most likely to fail due to insufficient or inadequate upfront analysis could facilitate such decisions. Our objective is to study whether it is possible to make such predictions automatically from the characteristics of the online discussions on feature requests. The paper presents a tool-implemented framework that automatically constructs failure prediction models using machine-learning classification algorithms and compares the performance of the different techniques for the Firefox and Netbeans projects. The comparison relies on a cost-benefit model for assessing the value of additional upfront analysis. In this model, the value of additional upfront analysis depends on its probability of success in preventing failures and on the relative cost of the failures it prevents compared to its own cost. We show that for reasonable estimations of these two parameters automated prediction models provide more value than a set of baselines for some failure types and projects. This suggests that automated failure prediction during online requirements elicitation may be a promising approach for guiding requirements engineering efforts in online settings.
Conference Paper
Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". Examples of technical debt are code smells and bug hazards. Several techniques have been proposed to detect different types of technical debt. Among those, Potdar and Shihab defined heuristics to detect instances of self-admitted technical debt in code comments, and used them to perform an empirical study on five software systems to investigate the phenomenon. Still, very little is known about the diffusion and evolution of technical debt in software projects. This paper presents a differentiated replication of the work by Potdar and Shihab. We run a study across 159 software projects to investigate the diffusion and evolution of self-admitted technical debt and its relationship with software quality. The study required the mining of over 600K commits and 2 Billion comments as well as a qualitative analysis performed via open coding. Our main findings show that self-admitted technical debt (i) is diffused, with an average of 51 instances per system, (ii) is mostly represented by code (30%), defect, and requirement debt (20% each), (iii) increases over time due to the introduction of new instances that are not fixed by developers, and (iv) even when fixed, it survives long time (over 1,000 commits on average) in the system.
Conference Paper
Many successful software projects do not follow the commonly assumed best practice of engineering well-formed requirements at project inception. Instead, the requirements are captured less formally, and only fully elaborated once the implementation begins, known as 'just-in-time' requirements. Given the apparent disparity between best practices and actual practices, several questions arise. One concerns the nature of requirements engineering in non-traditional forms. What types of tools and practices are used? Another is formative: what types of problems are encountered in just-intime requirements, and how might we support organizations in solving those problems? In this paper we conduct separate case studies on the requirements practices of three open-source software projects. Using an individual task as the unit of analysis, we study how the project proceeds from requirement to implementation, in order to understand how each project manages requirements. We then comment on the benefits and problems of just-in-time requirements analysis. This allows us to propose research directions about requirements engineering in just-in-time settings. In particular, we see the need to better understand the context of practice, and the need to properly evaluate the cost of decisions. We propose a taxonomy to describe the requirements practices spectrum from fully formal to just-in-time.
Conference Paper
A critical item of a bug report is the so-called "severity", i.e. the impact the bug has on the successful execution of the software system. Consequently, tool support for the person reporting the bug in the form of a recommender or verification system is desirable. In previous work we made a first step towards such a tool: we demonstrated that text mining can predict the severity of a given bug report with a reasonable accuracy given a training set of sufficient size. In this paper we report on a follow-up study where we compare four well-known text mining algorithms (namely, Na¨ ive Bayes, NaBayes Multinomial, K-Nearest Neighbor and Support Vector Machines) with respect to accuracy and training set size. We discovered that for the cases under investigation (two open source systems: Eclipse and GNOME) Na¨ ive Bayes Multinomial performs superior compared to the other proposed algorithms.
Conference Paper
A bug report is typically assigned to a single developer who is then responsible for fixing the bug. In Mozilla and Eclipse, between 37%-44% of bug reports are "tossed" (reassigned) to other devel- opers, for example because the bug has been assigned by accident or another developer with additional expertise is needed. In any case, tossing increases the time-to-correction for a bug. In this paper, we introduce a graph model based on Markov chains, which captures bug tossing history. This model has sev- eral desirable qualities. First, it reveals developer networks which can be used to discover team structures and to find suitable experts for a new task. Second, it helps to better assign developers to bug reports. In our experiments with 445,000 bug reports, our model reduced tossing events, by up to 72%. In addition, the model in- creased the prediction accuracy by up to 23 percentage points com- pared to traditional bug triaging approaches.
Customer support ticket escalation prediction using feature engineering
  • Lloyd Montgomery
  • Daniela Damian
  • Montgomery Lloyd