Figure 2 - uploaded by Boris Baldassari
Content may be subject to copyright.
Artefacts and Quality Trees

Artefacts and Quality Trees

Source publication
Conference Paper
Full-text available
Quality has a price. But non-quality is even more expensive. Knowing the cost and consequences of software assets, being able to understand and control the development process of a service, or quickly evaluating the quality of external developments are of primary importance for every company relying on software. Standards and tools have tried with...

Citations

... As a first step, we developed a "commonly agreed TD knowledge base" [2], i.e., an empirical benchmark of classes that exhibit high levels of 1 TD Identification is the practice of understanding which modules of a software suffer from high levels of technical debt [25] TD (these classes are from now on termed as "HIGH TD" classes). The identification of HIGH TD classes has been performed based on archetypal analysis, pointing to classes for which three widely adopted TD assessment tools (namely SonarQube [12], CAST [15], and Squore [8]) converge, and indicate them as classes with a high chance of containing high levels of TD. Next, to decouple the application of the method from the need of retaining licenses and installations of all three tools, we have evaluated the ability of ML algorithms to classify software classes as HIGH TD and NOT-HIGH TD [40] [41]. ...
Article
In recent years, we have witnessed an important increase in research focusing on how machine learning (ML) techniques can be used for software quality assessment and improvement. However, the derived methodologies and tools lack transparency, due to the black-box nature of the employed machine learning models, leading to decreased trust in their results. To address this shortcoming, in this paper we extend the state-of-the-art and-practice by building explainable AI models on top of machine learning ones, to interpret the factors (i.e. software metrics) that constitute a module as in risk of having high technical debt (HIGH TD), to obtain thresholds for metric scores that are alerting for poor maintainability, and finally, we dig further to achieve local interpretation that explains the specific problems of each module, pinpointing to specific opportunities for improvement during TD management. To achieve this goal, we have developed project-specific classifiers (characterizing modules as HIGH and NOT-HIGH TD) for 21 open-source projects, and we explain their rationale using the SHapley Additive exPlanation (SHAP) analysis. Based on our analysis, complexity, comments ratio, cohesion, nesting of control flow statements, coupling, refactoring activity, and code churn are the most important reasons for characterizing classes as in HIGH TD risk. The analysis is complemented with global and local means of interpretation, such as metric thresholds and case-by-case reasoning for characterizing a class as in-risk of having HIGH TD. The results of the study are compared against the state-of-the-art and are interpreted from the point of view of both researchers and practitioners.
... On top of that, they face a decision-making problem: "which tool shall I trust for TD identification/quantification?". As a first step to alleviate the aforementioned problems, in a previous study [5], we analyzed the TD assessment by three widelyadopted tools, namely SonarQube [6], CAST [7] and Squore [8] with the goal of evaluating the degree of agreement among them and building an empirical benchmark (TD Benchmarker 1 ) of classes/files sharing similar levels of TD. The outcome of that study (apart from the benchmark per se) confirmed that different tools end-up in diverse assessments of TD, but to some extent they converge on the identification of classes that exhibit high-levels of TD. ...
... To facilitate replication, we provide an experimental package containing the dataset that was constructed, as well as the scripts used for data collection, data preparation and classification model construction. This material can be found online 8 . Moreover, the source code of the 25 examined projects is publicly available on GitHub to obtain the same data. ...
... We believe that this tool will enable further feature experimentation through its use in academic or industrial setting and will pave the way for more data-driven TD management tools. 8. https://sites.google.com/view/ml-td-identification/home ...
Article
Technical Debt (TD) is a successful metaphor in conveying the consequences of software inefficiencies and their elimination to both technical and non-technical stakeholders, primarily due to its monetary nature. The identification and quantification of TD rely heavily on the use of a small handful of sophisticated tools that check for violations of certain predefined rules, usually through static analysis. Different tools result in divergent TD estimates calling into question the reliability of findings derived by a single tool. To alleviate this issue we use 18 metrics pertaining to source code, repository activity, issue tracking, refactorings, duplication and commenting rates of each class as features for statistical and Machine Learning models, so as to classify them as High-TD or not. As a benchmark we exploit 18,857 classes obtained from 25 Java projects, whose high levels of TD has been confirmed by three leading tools. The findings indicate that it is feasible to identify TD issues with sufficient accuracy and reasonable effort: a subset of superior classifiers achieved an F 2-measure score of approximately 0.79 with an associated Module Inspection ratio of approximately 0.10. Based on the results a tool prototype for automatically assessing the TD of Java projects has been implemented.
... Given the aforementioned challenges, in our recent research work [12] we empirically evaluated statistical and Machine Learning (ML) algorithms for their ability to classify software classes as High/Not-High TD. As ground truth for the development of the proposed classification framework, we considered a "commonly agreed TD knowledge base" [1], i.e., an empirical benchmark of classes that exhibit high levels of TD, based on the convergence of three widelyadopted TD assessment tools, namely SonarQube [6], CAST [8], and Squore [4]. As model features we considered a wide range of software factors spanning from code metrics to repository activity, retrieved by employing four popular open source tools, namely PyDriller [11], CK [2], PMD's Copy/Paste Detector 1 (CPD), and cloc 2 . ...
... As a proof of concept, the proposed approach described in Section 2.1 has been implemented in the form of a web tool. A running instance of the tool is available online 4 , enabling in that way its adoption by developers in practice, and, in turn, its further quantitative and qualitative evaluation by the community. The tool is implemented in the form of a web application, including both a backend and its associated frontend. ...
Conference Paper
To date, the identification and quantification of Technical Debt (TD) rely heavily on a few sophisticated tools that check for violations of certain predefined rules, usually through static analysis. Different tools result in divergent TD estimates calling into question the reliability of findings derived by a single tool. To alleviate this issue, we present a tool that employs machine learning on a dataset built upon the convergence of three widely-adopted TD Assessment tools to automatically assess the class-level TD for any arbitrary Java project. The proposed tool is able to classify software classes as high-TD or not, by synthesizing source code and repository activity information retrieved by employing four popular open source analyzers. The classification results are combined with proper vi-sualization techniques, to enable the identification of classes that are more likely to be problematic. To demonstrate the proposed tool and evaluate its usefulness, a case study is conducted based on a real-world open-source software project. The proposed tool is expected to facilitate TD management activities and enable further experimentation through its use in an academic or industrial setting. Video: https://youtu.be/umgXU8u7lIA Running Instance: http://160.40.52.130:3000/tdclassifier Source Code: https://gitlab.seis.iti.gr/root/td-classifier.git
... An earlier (unpublished) project to implement ECSS metrics used SQuORE (cf. [2]). Yet only few of its generic data providers fitted to space processes, and would have had to be developed anew for AENEAS. ...
Conference Paper
Full-text available
You software guys are too much like the weavers in the story about the Emperor and his new clothes. When I go out to check on a software development the answers I get sound like, 'We’re fantastically busy weaving this magic cloth. Just wait a while and it’ll look terrific." But there’s nothing I can see or touch, no numbers I can relate to, no way to pick up signals that things aren’t really all that great. And there are too many people I know who have come out at the end wearing a bunch of expensive rags or nothing at all.' Space projects, and development of software embedded in these systems, are complex, sometimes costing hundreds of millions of Euros and involving several tiers of suppliers. An important means of improving mutual understanding is to increase transparency of the development status between customers and suppliers. We raise the problem of transparency in complex projects to the reader’s attention, and, relying on results of a small survey of practitioners, propose to use ECSS software metrics/KPIs as a mitigation. We present our metrication infrastructure, and describe issues to be considered when implementing an early metrication programme in a real-world, industry space project. https://www.drprause.de/files/PROFES2018-EmperorNewClothes.pdf
... Several tools have been created to automate the computation of metrics [6], [7], [8]. Among these tools, we selected Understand [9] which is a multi-platform tool for code analysis that support a large number of programming languages, including C, C++, C#, Java, Python, Objective C, PHP, and JavaScript. ...
... The platform offers a dashboard in the form of a web interface and allows several configurations, which refer to all the available tools-plugins that are included. A similar platform is the SQUORE platform (Baldassari, 2013). ...
Article
Full-text available
The subjectivity that underlies the notion of quality does not allow the design and development of a universally accepted mechanism for software quality assessment. This is why contemporary research is now focused on seeking mechanisms able to produce software quality models that can be easily adjusted to custom user needs. In this context, we introduce QATCH, an integrated framework that applies static analysis to benchmark repositories in order to generate software quality models tailored to stakeholder specifications. Fuzzy multi-criteria decision-making is employed in order to model the uncertainty imposed by experts’ judgments. These judgments can be expressed into linguistic values, which makes the process more intuitive. Furthermore, a robust software quality model, the base model, is generated by the system, which is used in the experiments for QATCH system verification. The paper provides an extensive analysis of QATCH and thoroughly discusses its validity and added value in the field of software quality through a number of individual experiments.
... extension have been analysed. Code metrics are computed using SQuORE [2], and rules violations are extracted from SQuORE, Checkstyle [9,3] and PMD [1,4]. ...
... To ensure consistency between all artefact measures we rely on SQuORE, a professional tool for software project quality evaluation and business intelligence [2]. It features a parser, which builds a tree of artefacts (application, files, functions) and an engine that associates measures to each node and aggregates data to upper levels. ...
Conference Paper
Full-text available
Software engineering is a maturing discipline which has seen many drastic advances in the last years. However, some studies still point to the lack of rigorous and mathematically grounded methods to raise the field to a new emerging science, with proper and reproducible foundations to build upon. Indeed, mathematicians and statisticians do not necessarily have software engineering knowledge, while software engineers and practitioners do not necessarily have a mathematical background. The Maisqual research project intends to fill the gap between both fields by proposing a controlled and peer-reviewed data set series ready to use and study. These data sets feature metrics from different repositories, from source code to mail activity and configuration management meta data. Metrics are described and commented, and all the steps followed for their extraction and treatment are described with contextual information about the data and its meaning. This article introduces the Apache Ant weekly data set, featuring 636 extracts of the project over 12 years at different levels of artefacts – application, files, functions. By associating community and process related information to code extracts, this data set unveils interesting perspectives on the evolution of one of the great success stories of open source.
... l'obligation de clause default dans un switch). Ces informations sont fournies par des analyseurs tels que Checkstyle, PMD ou SQuORE (Baldassari, 2012). ...
Conference Paper
Full-text available
L'extraction de connaissances à partir de données issues du génie logi-ciel est un domaine qui s'est beaucoup développé ces dix dernières années, avec notamment la fouille de référentiels logiciels (Mining Software Repositories) et l'application de méthodes statistiques (partitionnement, détection d'outliers) à des thématiques du processus de développement logiciel. Cet article présente la démarche de fouille de données mise en oeuvre dans le cadre de Polarsys, un groupe de travail de la fondation Eclipse, de la définition des exigences à la proposition d'un modèle de qualité dédié et à son implémentation sur un prototype. Les principaux concepts adoptés et les leçons tirées sont également passés en revue.
Article
JavaScript (JS) is one of the most popular programming languages for developing client-side applications mainly due to allowing the adoption of different programming styles, not having strict syntax rules, and supporting a plethora of frameworks. The flexibility that the language provides may accelerate the development of application, but also pose threats to the quality of the final software product, e.g., introducing Technical Debt (TD). TD reflects the additional cost of software maintenance activities to implement new features, occurring due to poorly developed solutions. Being able to forecast the levels of TD in the future can be extremely valuable in managing TD, since it can contribute to informed decision making when designating future repayments and refactoring budget among a company's projects. Despite the popularity of JS and the undoubtful benefits of accurate TD forecasting, in the literature, there is available only a limited number of tools and methodologies that are able to: (a) forecast TD during software evolution, (b) provide a ground-truth TD quantifications to train forecasting, since TD tools that are available are based on different rulesets and none is recognized as a state-of-the-art solution, (c) take into consideration the language-specific characteristics of JS. As a main contribution for this study, we propose a methodology (along with a supporting tool) that supports the aforementioned goals based on the Backward Stepwise Regression and Auto-Regressive Integrated Moving Average (ARIMA). We evaluate the proposed approach through a case study on 19,636 releases of 105 open-source applications. The results point out that: (a) the proposed model can lead to an accurate prediction of TD, and (b) the Number of appearances of the “new” and “eval” keyword along with the number of “anonymous” and “arrow” functions are among the features of JavaScript language that are related to high levels of TD.