Conference Paper

Merging Smell Detectors: Evidence on the Agreement of Multiple Tools

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Technical Debt estimation relies heavily on the use of static analysis tools looking for violations of pre-defined rules. Largely, Technical Debt principal is attributed to the presence of low-level code smells, unavoidably tying the effort for fixing the problems with mere coding inefficiencies. At the same time, despite their simple definition, the detection of most code smells is non-trivial and subjective, rendering the assessment of Technical Debt principal dubious. To this end, we have revisited the literature on code smell detection approaches backed by tools and developed an Eclipse plugin that incorporates six code smell detection approaches. The combined application of various smell detectors can increase the certainty of identifying actual code smells that matter to the development team. We also conduct a case study to investigate the agreement among the employed code smell detectors. To our surprise the level of agreement is quite low even for relatively simple code smells threating the validity of existing TD analysis tools and calling for increased attention to the precise specification of code and design level issues.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Isso pode ocorrer devido ao uso de métricas e limiares diferentes nas estratégias de detecção. Existem ainda, soluções que realizam a detecção de smells via a interseção dos resultados de múltiplas ferramentas, tal que se pelo menos 50% delas acusarem um smell, o elemento avaliado é considerado problemático, semelhante à um sistema de votação [3,9]. ...
... Para isso, se pelo menos duas ferramentas classificassem uma classe como problemática, democraticamente ela era considerada God Class. A interseção dos resultados de ferramentas de detecção de code smells foram usados em estudos recentes [3,9]. O objetivo desse passo foi avaliar se a detecção de smells pelo sistema de votação é mais eficaz que de forma individualizada pelas ferramentas (RQ2). 1 https://github.com/mauricioaniche/ck ...
... Ichtsis et. al. [9] propõem um plugin que automaticamente identifica a interseção de resultados de seis ferramentas de detecção de smells. ...
Conference Paper
Full-text available
Existem muitas ferramentas e técnicas diferentes para detecção de code smells. Alguns estudos indicam que as formas de detecção de smells podem variar significativamente mesmo quando aplicadas em um mesmo contexto. Visando avaliar e compreender se alguma forma de detecção automática de code smells provê resultados mais próximos da realidade, nesse artigo é realizada uma investigação quantitativa envolvendo três ferramentas de detecção de code smells e um sistema de votação que combina o uso dessas ferramentas. Todas essas formas de detecção de code smells foram comparadas com um oráculo proposto por especialistas para classificar uma classe como God Class. Por meio do uso de estatística descritiva, os resultados indicaram que a ferramenta JDeodorant apresenta melhor revocação na detecção de God Class quando usada individualmente e que o sistema de votação não gera ganhos significativos.
... As an example, we demonstrate this process on the Apache Pinot project for a specific commit. 4 As refactoring candidates, we used five refactoring opportunities obtained through the Smell Detector Merger (Ichtsis et al. 2022) tool that validates the existence of a smell based on the intersection of multiple tools. By following the steps, we described above, we end up with the coloured RCs shown in Table 9. ...
... We explore the RCs whose metric scores exceed by 2-times the mean score of samples, and for those we upgrade the coloring assignment (e.g., from ORANGE to RED) classification models can be built, so that refactoring suggestion tools can prioritize the extracted opportunities. In this direction, we plan to further work on the current dataset to train and validate such models, and then integrate them in the Smell Detector Merger (Ichtsis et al. 2022) to equip it with prioritization functionality. Finally, we aim at an empirical validation on the usability and effectiveness of the proposed approach and tool in an industrial setting. ...
Article
Full-text available
Refactoring is the most prominent way of repaying Technical Debt and improving software maintainability. Despite the acknowledgement of refactorings as a state-of-practice technique (both by industry and academia), refactoring-based quality optimizations are debatable due to three important concerns: (a) the impact of a refactoring on quality is not always positive; (b) the list of available refactoring candidates is usually vast, restricting developers from applying all suggestions; and (c) there is no empirical evidence on which parameters are related to positive refactoring impact on quality. To alleviate these concerns, we reuse a benchmark (constructed in a previous study) of real-world refactorings having either a positive or negative impact on quality; and we explore the parameters (structural characteristics of classes) affecting the impact of the refactoring. Based on the findings, we propose a metrics-based approach for guiding practitioners on how to prioritize refactoring candidates. The results of the study suggest that classes with high coupling and large size should be given priority, since they tend to have a positive impact on technical debt.
... To build our ground dataset, we used a voting method (Ichtsis et al., 2022), in which each instance (a class or method) received three votes from three different detection tools on the presence of a certain smell. Each vote represents if the tool detected the instance. ...
Article
Full-text available
Code smell is a symptom of decisions about the system design or code that may degrade its modularity. For example, they may indicate inheritance misuse, excessive coupling and size. When two or more code smells occur in the same snippet of code, they form a code smell agglomeration. Few studies evaluate how agglomerations may impact code modularity. In this work, we evaluate which aspects of modularity are being hindered by agglomerations. This way, we can support practitioners in improving their code, by refactoring the code involved with code smell agglomeration that was found as harmful to the system modularity. We analyze agglomerations composed of four types of code smells: Large Class, Long Method, Feature Envy, and Refused Bequest. We then conduct a comparison study between 20 systems mined from the Qualita Corpus dataset with 10 systems mined from GitHub. In total, we analyzed 1789 agglomerations in 30 software projects, from both repositories: Qualita Corpus and GitHub. We rely on frequent itemset mining and non-parametric hypothesis testing for our analysis. Agglomerations formed by two or more Feature Envy smells have a significant frequency in the source code for both repositories. Agglomerations formed by different smell types impact the modularity more than classes with only one smell type and classes without smells. For some metrics, when Large Class appears alone, it has a significant and large impact when compared to classes that have two or more method-level smells of the same type. We have identified which agglomerations are more frequent in the source code, and how they may impact the code modularity. Consequently, we provide supporting evidence of which agglomerations developers should refactor to improve the code modularity.
... To build our ground dataset, we used a voting method [48], in which each instance (a class or method) received three votes from three different detection tools on the presence of a certain smell. Each vote represents if the tool detected the instance. ...
Preprint
Full-text available
Context. Code smell is a symptom of decisions about the system design or code that may degrade its modularity. For example, they may indicate inheritance misuse, excessive coupling and size. When two or more code smells occur in the same snippet of code, they form a code smell agglomeration. Objective. Few studies evaluate how agglomerations may impact code modularity. In this work, we evaluate which aspects of modularity are being hindered by agglomerations. This way, we can support practitioners in improving their code, by refactoring the code involved with code smell agglomeration that was found as harmful to the system modularity. Method. We analyze agglomerations composed of four types of code smells: Large Class, Long Method, Feature Envy, and Refused Bequest. We then conduct a comparison study between 20 systems mined from the Qualita Corpus dataset with 10 systems mined from GitHub. In total, we analyzed 1,789 agglomerations in 30 software projects, from both repositories: Qualita Corpus and GitHub. We rely on frequent itemset mining and non-parametric hypothesis testing for our analysis. Results. Agglomerations formed by two or more Feature Envy smells have a significant frequency in the source code for both repositories. Agglomerations formed by different smell types impact the modularity more than classes with only one smell type and classes without smells. For some metrics, when Large Class appears alone, it has a significant and large impact when compared to classes that have two or more method-level smells of the same type. Conclusion. We have identified which agglomerations are more frequent in the source code, and how they may impact the code modularity. Consequently, we provide supporting evidence of which agglomerations developers should refactor to improve the code modularity.
Conference Paper
Full-text available
Code smells are a popular mechanism to identify structural design problems in software systems. Since it is generally not feasible to f x all the smells arising in the code, some of them are often postponed by developer s to be resolved in the future. One reason for this decision is that the improvement of the code structure, to achieve modifability goals, requires extra effort from developer s. Therefore, they might not always spend this additional effort, particularly when they are focused on delivering customer-visible features. This postponement of code smells are seen as a source of technical debt. Furthermore, not all the code smells may be urgent to f x in the context of the system's modifability and business goals. While there are a number of tools to detect smells, they do not allow developer s to discover the most urgent smells according to their goals. In this article, we present a fexible tool to prioritize technical debt in the form of code smells. The tool is fexible to allow developer s to add new smell detection strategies and to prioritize smells, and groups of smells, based on the confguration of their manifold criteria. To illustrate this fexibility, we present an application example of our tool. The results suggest that our tool can be easily extended to be aligned with the developer's goals.
Article
Full-text available
Refactoring is a technique to make a computer program more readable and maintainable. A bad smell is an indication of some setback in the code, which requires refactoring to deal with. Many tools are available for detection and removal of these code smells. These tools vary greatly in detection methodologies and acquire different competencies. In this work, we studied different code smell detection tools minutely and try to comprehend our analysis by forming a comparative of their features and working scenario. We also extracted some suggestions on the bases of variations found in the results of both detection tools.
Article
Full-text available
Code smells are structural characteristics of software that may indicate a code or design problem that makes software hard to evolve and maintain, and may trigger refactoring of code. Recent research is active in defining automatic detection tools to help humans in finding smells when code size becomes unmanageable for manual review. Since the definitions of code smells are informal and subjective, assessing how effective code smell detection tools are is both important and hard to achieve. This paper reviews the current panorama of the tools for automatic code smell detection. It defines research questions about the consistency of their responses, their ability to expose the regions of code most affected by structural decay, and the relevance of their responses with respect to future software evolution. It gives answers to them by analyzing the output of four representative code smell detectors applied to six different versions of GanttProject, an open source system written in Java. The results of these experiments cast light on what current code smell detection tools are able to do and what the relevant areas for further improvement are.
Conference Paper
Full-text available
Identifying refactoring opportunities in software systems is an important activity in today's agile development environments. The concept of code smells has been proposed to characterize different types of design shortcomings in code. Additionally, metric-based detection algorithms claim to identify the "smelly" components automatically. This paper presents results for an empirical study performed in a commercial environment. The study investigates the way professional software developers detect god class code smells, then compares these results to automatic classification. The results show that, even though the subjects perceive detecting god classes as an easy task, the agreement for the classification is low. Misplaced methods are a strong driver for letting subjects identify god classes as such. Earlier proposed metric-based detection approaches performed well compared to the human classification. These results lead to the conclusion that an automated metric-based pre-selection decreases the effort spent on manual code inspections. Automatic detection accompanied by a manual review increases the overall confidence in the results of metric-based classifiers.
Conference Paper
Full-text available
Code duplication is a common problem, and a well-known sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However, duplicated fragments rarely remain identical after they are copied; they are oftentimes modified here and there. This adaptation usually "scatters" the duplicated code block into a large amount of small "islands" of duplication, which detected and analyzed separately hide the real magnitude and impact of the duplicated block. In this paper we propose a novel, automated approach for recovering duplication blocks, by composing small isolated fragments of duplication into larger and more relevant duplication chains. We validate both the efficiency and the scalability of the approach by applying it on several well known open-source case-studies and discussing some relevant findings. By recovering such duplication chains, the maintenance engineer is provided with additional cases of duplication that can lead to relevant refactorings, and which are usually missed by other detection methods.
Conference Paper
Bad smells are symptoms that something may be wrong in the system design or code. There are many bad smells defined in the literature and detecting them is far from trivial. Therefore, several tools have been proposed to automate bad smell detection aiming to improve software maintainability. However, we lack a detailed study for summarizing and comparing the wide range of available tools. In this paper, we first present the findings of a systematic literature review of bad smell detection tools. As results of this review, we found 84 tools; 29 of them available online for download. Altogether, these tools aim to detect 61 bad smells by relying on at least six different detection techniques. They also target different programming languages, such as Java, C, C++, and C#. Following up the systematic review, we present a comparative study of four detection tools with respect to two bad smells: Large Class and Long Method. This study relies on two software systems and three metrics for comparison: agreement, recall, and precision. Our findings support that tools provide redundant detection results for the same bad smell. Based on quantitative and qualitative data, we also discuss relevant usability issues and propose guidelines for developers of detection tools.
Article
Over the past 15 years, researchers presented numerous techniques and tools for mining code smells. It is imperative to classify, compare, and evaluate existing techniques and tools used for the detection of code smells because of their varying features and outcomes. This paper presents an up-to-date review on the state-of-the-art techniques and tools used for mining code smells from the source code of different software applications. We classify selected code smell detection techniques and tools based on their detection methods and analyze the results of the selected techniques. We present our observations and recommendations after our critical analysis of existing code smell techniques and tools. Our recommendations may be used by existing and new tool developers working in the field of code smell detection. The scope of this review is limited to research publications in the area of code smells that focus on detection of code smells as compared with previous reviews that cover all aspects of code smells. Copyright © 2015 John Wiley & Sons, Ltd.
Conference Paper
Almost every expert in Object-Oriented Development stresses the importance of iterative development. As you proceed with the iterative development, you need to add function to the existing code base. If you are really lucky that code base is structured just right to support the new function while still preserving its design integrity. Of course most of the time we are not lucky, the code does not quite fit what we want to do. You could just add the function on top of the code base. But soon this leads to applying patch upon patch making your system more complex than it needs to be. This complexity leads to bugs, and cripples your productivity.
Article
Placement of attributes/methods within classes in an object-oriented system is usually guided by conceptual criteria and aided by appropriate metrics. Moving state and behavior between classes can help reduce coupling and increase cohesion, but it is nontrivial to identify where such refactorings should be applied. In this paper, we propose a methodology for the identification of Move Method refactoring opportunities that constitute a way for solving many common feature envy bad smells. An algorithm that employs the notion of distance between system entities (attributes/methods) and classes extracts a list of behavior-preserving refactorings based on the examination of a set of preconditions. In practice, a software system may exhibit such problems in many different places. Therefore, our approach measures the effect of all refactoring suggestions based on a novel entity placement metric that quantifies how well entities have been placed in system classes. The proposed methodology can be regarded as a semi-automatic approach since the designer will eventually decide whether a suggested refactoring should be applied or not based on conceptual or other design quality criteria. The evaluation of the proposed approach has been performed considering qualitative, metric, conceptual, and efficiency aspects of the suggested refactorings in a number of open-source projects.
Article
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that kappa approaches these measures as the number of negative cases grows large. Positive specific agreement-or the equivalent F-measure-may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies.
Eclipse Plug-ins Third Edition
  • Eric Clayberg
  • Dan Rubel
  • Clayberg Eric