Conference Paper

TD Classifier: Automatic Identification of Java Classes with High Technical Debt

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To date, the identification and quantification of Technical Debt (TD) rely heavily on a few sophisticated tools that check for violations of certain predefined rules, usually through static analysis. Different tools result in divergent TD estimates calling into question the reliability of findings derived by a single tool. To alleviate this issue, we present a tool that employs machine learning on a dataset built upon the convergence of three widely-adopted TD Assessment tools to automatically assess the class-level TD for any arbitrary Java project. The proposed tool is able to classify software classes as high-TD or not, by synthesizing source code and repository activity information retrieved by employing four popular open source analyzers. The classification results are combined with proper vi-sualization techniques, to enable the identification of classes that are more likely to be problematic. To demonstrate the proposed tool and evaluate its usefulness, a case study is conducted based on a real-world open-source software project. The proposed tool is expected to facilitate TD management activities and enable further experimentation through its use in an academic or industrial setting. Video: https://youtu.be/umgXU8u7lIA Running Instance: http://160.40.52.130:3000/tdclassifier Source Code: https://gitlab.seis.iti.gr/root/td-classifier.git

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We obtain raw data from JIRA and GitHub, utilizing the Pydriller tool to extract required metrics. These are commonly used data sources and data collection tools among researchers (Abidi et al., 2021;Tsoukalas et al., 2022;Qiu, 2022). Besides, we have provided a detailed data collection process (see Section 3.5.2) and code (Li et al., 2024) to ensure the standardization of our data collection procedure. ...
Preprint
Full-text available
Context: An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods. Objective: This work aims to construct just-in-time (JIT) MPLB prediction models with selected prediction metrics, analyze the significance of the metrics, and then evaluate the performance of cross-project JIT MPLB prediction. Method: We develop JIT MPLB prediction models with the selected metrics using machine learning algorithms and evaluate the models in within-project and cross-project contexts with our constructed dataset based on 18 Apache MPL projects. Results: Random Forest is appropriate for JIT MPLB prediction. Changed LOC of all files, added LOC of all files, and the total number of lines of all files of the project currently are the most crucial metrics in JIT MPLB prediction. The prediction models can be simplified using a few top-ranked metrics. Training on the dataset from multiple projects can yield significantly higher AUC than training on the dataset from a single project for cross-project JIT MPLB prediction. Conclusions: JIT MPLB prediction models can be constructed with the selected set of metrics, which can be reduced to build simplified JIT MPLB prediction models, and cross-project JIT MPLB prediction is feasible.
... We obtain raw data from JIRA and GitHub, utilizing the Pydriller tool to extract required metrics. These are commonly used data sources and data collection tools among researchers (Abidi et al., 2021;Tsoukalas et al., 2022;Qiu, 2022). Besides, we have provided a detailed data collection process (see Section 3.5.2) and code (Li et al., 2024) to ensure the standardization of our data collection procedure. ...
Article
Full-text available
Context: An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods. Objective: This work aims to construct just-in-time (JIT) MPLB prediction models with selected prediction metrics, analyze the significance of the metrics, and then evaluate the performance of cross-project JIT MPLB prediction. Method: We develop JIT MPLB prediction models with the selected metrics using machine learning algorithms and evaluate the models in within-project and cross-project contexts with our constructed dataset based on 18 Apache MPL projects. Results: Random Forest is appropriate for JIT MPLB prediction. Changed LOC of all files, added LOC of all files, and the total number of lines of all files of the project currently are the most crucial metrics in JIT MPLB prediction. The prediction models can be simplified using a few top-ranked metrics. Training on the dataset from multiple projects can yield significantly higher AUC than training on the dataset from a single project for cross-project JIT MPLB prediction. Conclusions: JIT MPLB prediction models can be constructed with the selected set of metrics, which can be reduced to build simplified JIT MPLB prediction models, and cross-project JIT MPLB prediction is feasible.
Article
Full-text available
Background: The life cycle of a technical debt from its identification to its payment is long and may include several activities, such as identification and management. There is a lot of research in the literature to address different sets of these activities by different means. Specifically, several tools have already tackled such technical debt identification problems. However, only a few studies empirically assessed those tools. Method: In this article, we carried a multi-method research. We first surveyed the literature for the technical debt tools available and then we evaluated two of them, which aim at identification of self-admitted technical debt. They are named eXcomment e DebtHunter. Results: We found 97 tools employing different approaches to support technical debt life cycle management. Most of them (59%) address only the high level task of management, instead of actually identify and pay the debt. Additionally, as for our empirical evaluation of tools, our results show that DebtHunter found only 7% of debt identified by eXcomment. In the other way around, eXcomment found 19.9% the debt found by DebtHunter. Besides, both tools have low levels of precision and recall. Conclusion: It is hard to find technical debt through comments. Both tools can find indicators of debt items, however they struggle on the precision and recall. In fact, although eXcomment and DebtHunter diverge on the amount of debt identified, they seem to converge with regard to the type of debt present in the system under evaluation.
Article
In recent years, we have witnessed an important increase in research focusing on how machine learning (ML) techniques can be used for software quality assessment and improvement. However, the derived methodologies and tools lack transparency, due to the black-box nature of the employed machine learning models, leading to decreased trust in their results. To address this shortcoming, in this paper we extend the state-of-the-art and-practice by building explainable AI models on top of machine learning ones, to interpret the factors (i.e. software metrics) that constitute a module as in risk of having high technical debt (HIGH TD), to obtain thresholds for metric scores that are alerting for poor maintainability, and finally, we dig further to achieve local interpretation that explains the specific problems of each module, pinpointing to specific opportunities for improvement during TD management. To achieve this goal, we have developed project-specific classifiers (characterizing modules as HIGH and NOT-HIGH TD) for 21 open-source projects, and we explain their rationale using the SHapley Additive exPlanation (SHAP) analysis. Based on our analysis, complexity, comments ratio, cohesion, nesting of control flow statements, coupling, refactoring activity, and code churn are the most important reasons for characterizing classes as in HIGH TD risk. The analysis is complemented with global and local means of interpretation, such as metric thresholds and case-by-case reasoning for characterizing a class as in-risk of having HIGH TD. The results of the study are compared against the state-of-the-art and are interpreted from the point of view of both researchers and practitioners.
Article
Full-text available
Technical debt (TD) refers to the phenomenon that developers choose a compromise solution from a short-term benefit perspective during design or architecture selection. TD-related issues, such as code smells, may have a critical impact on important non-functional requirements. Different severity levels of TD issues require different measures to be taken by developers in the future. Existing studies mainly focus on detecting TD in software projects through source code or comments, but usually ignore the severity degree of TD issues. As a matter of fact, it is very important to identify the severity of TD issues and clarify which TD should be prioritized. In this paper, we propose an approach that combines the semantic and structural information of the code snippets to identify their severity at method level. In the approach, we first transform each method affected by TD issues into an abstract syntax tree (AST) and use the paths in the AST to represent its semantic information. Then, we extract different code metrics to measure the size, coupling, and complexity of methods affected by TD issues to represent their structural information. Finally, we build a stacking ensemble model to identify the severity of TD issues by using Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for the base classifiers and Support Vector Machine (SVM) for the meta-classifier. The evaluation results on the real dataset show that our approach achieves 65.77% in terms of precision, 68.18% in terms of recall, and 65.84% in terms of F1-score on average. In addition, the experimental results also demonstrate that the strategy of combining the semantic and structural information of code snippets is effective in improving the effectiveness of our approach.
Article
Quality improvement can be performed at the: (a) micro-management level: interventions applied at a fine-grained level (e.g., at a class or method level, by applying a refactoring); or (b) macro-management level: interventions applied at a large-scale (e.g., at project level, by using a new framework or imposing a quality gate). By considering that the outcome of any activity can be characterized as the product of impact and scale , in this paper we aim at exploring the impact of Technical Debt (TD) Macro-Management, whose scale is by definition larger than TD Micro-Management. By considering that TD artifacts reside at the micro-level, the problem calls for a nested model solution; i.e., modeling the structure of the problem: artifacts have some inherent characteristics (e.g., size and complexity), but obey the same project management rules (e.g., quality gates, CI/CD features, etc.). In this paper, we use the Under-Bagging based Generalized Linear Mixed Models approach, to unveil project management activities that are associated with the existence of HIGH_TD artifacts, through an empirical study on 100 open-source projects. The results of the study confirm that micro-management parameters are associated with the probability of a class to be classified as HIGH_TD, but the results can be further improved by controlling some project-level parameters. Based on the findings of our nested analysis, we can advise practitioners on macro-technical debt management approaches (such as “ control the number of commits per day ”, “ adopt quality control practices ”, and “ separate testing and development teams ”) that can significantly reduce the probability of all software artifacts to concentrate HIGH_TD. Although some of these findings are intuitive, this is the first work that delivers empirical quantitative evidence on the relation between TD values and project- or process-level metrics.
Article
Full-text available
There are numerous commercial tools and research prototypes that offer support for measuring technical debt. However, different tools adopt different terms, metrics, and ways to identify and measure technical debt. These tools offer diverse features, and their popularity / community support varies significantly. Therefore, (a) practitioners face difficulties when trying to select a tool matching their needs; and (b) the concept of technical debt and its role in software development is blurred. We attempt to clarify the situation by comparing the features and popularity of technical debt measurement tools, and analyzing the existing empirical evidence on their validity. Our findings can help practitioners to find the most suitable tool for their purposes, and researchers by highlighting the current tool shortcomings
Article
Full-text available
Software teams are often asked to deliver new features within strict deadlines leading developers to deliberately or inadvertently serve “not quite right code” compromising software quality and maintainability. This non-ideal state of software is efficiently captured by the Technical Debt (TD) metaphor, which reflects the additional effort that has to be spent to maintain software. Although several tools are available for assessing TD, each tool essentially checks software against a particular ruleset. The use of different rulesets can often be beneficial as it leads to the identification of a wider set of problems; however, for the common usage scenario where developers or researchers rely on a single tool, diverse estimates of TD and the identification of different mitigation actions limits the credibility and applicability of the findings. The objective of this study is two-fold: First, we evaluate the degree of agreement among leading TD assessment tools. Second, we propose a framework to capture the diversity of the examined tools with the aim of identifying few “reference assessments” (or class/file profiles) representing characteristic cases of classes/files with respect to their level of TD. By extracting sets of classes/files exhibiting similarity to a selected profile (e.g., that of high TD levels in all employed tools) we establish a basis that can be used either for prioritization of maintenance activities or for training more sophisticated TD identification techniques. The proposed framework is illustrated through a case study on fifty (50) open source projects and two programming languages (Java and JavaScript) employing three leading TD tools.
Conference Paper
Full-text available
Quality has a price. But non-quality is even more expensive. Knowing the cost and consequences of software assets, being able to understand and control the development process of a service, or quickly evaluating the quality of external developments are of primary importance for every company relying on software. Standards and tools have tried with varying degrees of success to address these concerns, but there are many difficulties to be overcome: the diversity of software projects, the measurement process – from goals and metrics selection to data presentation, or the user's understanding of the reports. These are situations where the SQuORE business intelligence tool introduces a novel decision-based approach to software projects quality assessment by providing a more reliable, more intuitive, and more context-aware view on quality. This in turn allows all actors of the project to share a common vision of the project progress and performance, which then allows efficient enhancing of the product and process. This position paper presents how SQuORE solves the quality dilemma, and showcases two real-life examples of industrial projects: a unit testing improvement program, and a fully-featured software project management model.
Article
Full-text available
This article characterizes technical debt across 700 business applications, comprising 357 MLOC. These applications were analyzed against more than 1,200 rules of good architectural and coding practice. The authors present a formula with adjustable parameters for estimating the principal of technical debt from structural quality data.
Article
Technical Debt (TD) is a successful metaphor in conveying the consequences of software inefficiencies and their elimination to both technical and non-technical stakeholders, primarily due to its monetary nature. The identification and quantification of TD rely heavily on the use of a small handful of sophisticated tools that check for violations of certain predefined rules, usually through static analysis. Different tools result in divergent TD estimates calling into question the reliability of findings derived by a single tool. To alleviate this issue we use 18 metrics pertaining to source code, repository activity, issue tracking, refactorings, duplication and commenting rates of each class as features for statistical and Machine Learning models, so as to classify them as High-TD or not. As a benchmark we exploit 18,857 classes obtained from 25 Java projects, whose high levels of TD has been confirmed by three leading tools. The findings indicate that it is feasible to identify TD issues with sufficient accuracy and reasonable effort: a subset of superior classifiers achieved an F 2-measure score of approximately 0.79 with an associated Module Inspection ratio of approximately 0.10. Based on the results a tool prototype for automatically assessing the TD of Java projects has been implemented.
Conference Paper
Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity. URL: https://github.com/ishepard/pydriller Materials: https://doi.org/10.5281/zenodo.1327363 Pre-print: https://doi.org/10.5281/zenodo.1327411
Article
Journal version of "Architecture Technical Debt: Understanding Causes and a Qualitative Model". Context - A known problem in large software companies is to balance the prioritization of short-term with long-term feature delivery speed. Specifically, Architecture Technical Debt is regarded as sub-optimal architectural solutions taken to deliver fast that might hinder future feature development, which, in turn, would hinder agility. Objective – This paper aims at improving software management by shedding light on the current factors responsible for the accumulation of Architectural Technical Debt and to understand how it evolves over time. Method - We conducted an exploratory multiple-case embedded case study in 7 sites at 5 large companies. We evaluated the results with additional cross-company interviews and an in-depth, company-specific case study in which we initially evaluate factors and models. Results - We compiled a taxonomy of the factors and their influence in the accumulation of Architectural Technical Debt, and we provide two qualitative models of how the debt is accumulated and refactored over time in the studied companies. We also list a set of exploratory propositions on possible refactoring strategies that can be useful as insights for practitioners and as hypothesis for further research. Conclusion – Several factors cause constant and unavoidable accumulation of Architecture Technical Debt, which leads to development crises. Refactorings are often overlooked in prioritization and they are often triggered by development crises, in a reactive fashion. Some of the factors are manageable, while others are external to the companies. ATD needs to be made visible, in order to postpone the crises according to the strategic goals of the companies. There is a need for practices and automated tools to proactively manage ATD.
Article
It is sometimes useful in an analysis of variance to split the treatments into reasonably homogeneous groups. Multiple comparison procedures are often used for this purpose, but a more direct method is to use the techniques of cluster analysis. This approach is illustrated for several sets of data, and a likelihood ratio test is developed for judging the significance of differences among the resulting groups.
Article
The AUTOSPEC system is an automatic motor specification software system that primarily serves to non-interactively produce bill of materials from sales orders. At the core, the system applies Expert System and coordinated Relational Database technology ...
Java code metrics calculator (CK)
  • Mauricio Aniche
Maurício Aniche. 2015. Java code metrics calculator (CK). Available in https://github.com/mauricioaniche/ck/.
SonarQube in action ( 1 st edn ed.). Manning Publications Co. G Ann Campbell and Patroklos P Papapetrou
  • Ann Campbell
  • Patroklos P Papapetrou
  • Ann Campbell G