ArticlePublisher preview available

A fuzzy genetic automatic refactoring approach to improve software maintainability and flexibility

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The creation of high-quality software is of great importance in the current state of the enterprise systems. High-quality software should contain certain features including flexibility, maintainability, and a well-designed structure. Correctly adhering to the object-oriented principles is a primary approach to make the code more flexible. Developers usually try to leverage these principles, but many times neglecting them due to the lack of time and the extra costs involved. Therefore, sometimes they create confusing, complex, and problematic structures in code known as code smells. Code smells have specific and well-known anti-patterns that can be corrected after their identification with the help of the refactoring techniques. This process can be performed either manually by the developers or automatically. In this paper, an automated method for identifying and refactoring a series of code smells in the Java programs is introduced. The primary mechanism used for performing such automated refactoring is by leveraging a fuzzy genetic method. Besides, a graph model is used as the core representation scheme along with the corresponding measures such as betweenness, load, in-degree, out-degree, and closeness centrality, to identify the code smells in the programs. Then, the applied fuzzy approach is combined with the genetic algorithm to refactor the code using the graph-related features. The proposed method is evaluated using the Freemind, Jag, JGraph, and JUnit as sample projects and compared the results against the Fontana dataset which contains results from IPlasma, FluidTool, Anti-patternScanner, PMD, and Maeinescu. It is shown that the proposed approach can identify on average 68.92% of the bad classes similar to the Fontana dataset and also refactor 77% of the classes correctly with respect to the coupling measures. This is a noteworthy result among the currently existing refactoring mechanisms and also among the studies that consider both the identification and the refactoring of the bad smells.
METHODOLOGIES AND APPLICATION
A fuzzy genetic automatic refactoring approach to improve software
maintainability and flexibility
Raana Saheb Nasagh
1
Mahnoosh Shahidi
1
Mehrdad Ashtiani
1
Published online: 20 November 2020
Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
The creation of high-quality software is of great importance in the current state of the enterprise systems. High-quality
software should contain certain features including flexibility, maintainability, and a well-designed structure. Correctly
adhering to the object-oriented principles is a primary approach to make the code more flexible. Developers usually try to
leverage these principles, but many times neglecting them due to the lack of time and the extra costs involved. Therefore,
sometimes they create confusing, complex, and problematic structures in code known as code smells. Code smells have
specific and well-known anti-patterns that can be corrected after their identification with the help of the refactoring
techniques. This process can be performed either manually by the developers or automatically. In this paper, an automated
method for identifying and refactoring a series of code smells in the Java programs is introduced. The primary mechanism
used for performing such automated refactoring is by leveraging a fuzzy genetic method. Besides, a graph model is used as
the core representation scheme along with the corresponding measures such as betweenness, load, in-degree, out-degree,
and closeness centrality, to identify the code smells in the programs. Then, the applied fuzzy approach is combined with the
genetic algorithm to refactor the code using the graph-related features. The proposed method is evaluated using the
Freemind, Jag, JGraph, and JUnit as sample projects and compared the results against the Fontana dataset which contains
results from IPlasma, FluidTool, Anti-patternScanner, PMD, and Maeinescu. It is shown that the proposed approach can
identify on average 68.92% of the bad classes similar to the Fontana dataset and also refactor 77% of the classes correctly
with respect to the coupling measures. This is a noteworthy result among the currently existing refactoring mechanisms and
also among the studies that consider both the identification and the refactoring of the bad smells.
Keywords Code smells Refactoring Fuzzy system Genetic algorithm Graph modeling
1 Introduction
In recent decades, software programs have played an
important role not only in business and scientific domains
but also in everyday life. Developing and maintaining high-
quality programs is therefore of crucial importance. High-
quality software is a software that is flexible (Subramaniam
and Zulzalil 2012), reusable, reliable, and maintainable
(Yamashita 2013). To create programs with the stated
characteristics, programmers usually can stick to a set of
predefined rules and principles, namely the object-oriented
design principles. Most of the time, the object-oriented
design principles act as a guideline for the programmers to
create better computer programs. For example, the single
responsibility principle (Wampler 2007), low coupling, and
high cohesion (Johann et al. 1994) are some of these well-
known principles.
Unfortunately, the lack of time and the associated
development costs often lead to technical debt (Charette
2005) in the projects. Therefore, poorly designed structures
are introduced in the software’s code structure. These
poorly structured and complex codes are known as bad
Communicated by V. Loia.
&Mehrdad Ashtiani
m_ashtiani@iust.ac.ir
Raana Saheb Nasagh
r_sahebnassagh@comp.iust.ac.ir
Mahnoosh Shahidi
ma_shahidi@comp.iust.ac.ir
1
School of Computer Engineering, Iran University of Science
and Technology, Hengam St., Resalat Sq.,
Tehran 16846-13114, Iran
123
Soft Computing (2021) 25:4295–4325
https://doi.org/10.1007/s00500-020-05443-0(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Since more databases will cause a certain burden on the operation of the system, the reconstruction method is adopted to create a distributed database [24,25]. e reconstruction method is to re-establish a distributed database from the overall design, including the database on each site, according to the implementation environment and user requirements of the system [26], according to the design idea and method of distributed database, and from a unified point of view, as shown in Figure 5. ...
Full-text available
Article
In order to improve the accuracy and response speed of sports fitness monitoring results and make the monitoring results more comprehensive, a new cloud storage-oriented sports fitness monitoring system is designed. Based on cloud storage technology, the overall framework of the sports fitness monitoring system is established; the function of the hardware module of the monitoring system is analyzed, and distributed database is established. The ray-casting image feature scanning method was used to collect the physical condition monitoring image and generate a high-quality human body target frame to realize the physical condition monitoring. Based on the monitoring data, the fitness method recommendation method is designed according to the user’s physical condition. The experimental results show that the monitoring results of the proposed system have higher accuracy, faster system response speed, and higher comprehensiveness of the monitoring results, which verifies the application value of the proposed system.
... UML diagrams (Baqais and Alshayeb, 2020). Specifically, several works focus on techniques for improving security (Abid et al., 2020) and maintainability metrics (Nasagh et al., 2020), improving performance through removing unused features (Bruce et al., 2020), reasoning about refactoring activities over long periods of time (Brito et al., 2020) as well as improving software accessibility (Paiva et al., 2020) and achieving business-oriented goals (Ivers et al., 2020). ...
Article
The JavaScript language did not specify, until ECMAScript 6 (ES6), native features for streamlining encapsulation and modularity. Developer community filled the gap with a proliferation of design patterns and module formats, with impact on code reusability, portability and complexity of build configurations. This work studies the automated refactoring of legacy ES5 code to ES6 modules with fine-grained reuse of module contents through the named import/export language constructs. The focus is on reducing the coupling of refactored modules through destructuring exported module objects to fine-grained module features and enhancing module dependencies by leveraging the ES6 syntax. We employ static analysis to construct a model of a JavaScript project, the Module Dependence Graph (MDG), that represents modules and their dependencies. On the basis of MDG we specify the refactoring procedure for module migration to ES6. A prototype implementation has been empirically evaluated on 19 open source projects. Results highlight the relevance of the refactoring with a developer intent for fine-grained reuse. The analysis of refactored code shows an increase in the number of reusable elements per project and reduction in the coupling of refactored modules. The soundness of the refactoring is empirically validated through code inspection and execution of projects’ test suites.
... UML diagrams [18]. Specifically, several works focus on techniques for improving security [19] and maintainability metrics [20], improving performance through removing unused features [21], reasoning about refactoring activities over long periods of time [22] as well as improving software accessibility [23] and achieving business-oriented goals [24]. ...
Full-text available
Preprint
The JavaScript language did not specify, until ECMAScript 6 (ES6), native features for streamlining encapsulation and modularity. Developer community filled the gap with a proliferation of design patterns and module formats, with impact on code reusability, portability and complexity of build configurations. This work studies the automated refactoring of legacy ES5 code to ES6 modules with fine-grained reuse of module contents through the named import/export language constructs. The focus is on reducing the coupling of refactored modules through destructuring exported module objects to fine-grained module features and enhancing module dependencies by leveraging the ES6 syntax. We employ static analysis to construct a model of a JavaScript project, the Module Dependence Graph (MDG), that represents modules and their dependencies. On the basis of MDG we specify the refactoring procedure for module migration to ES6. A prototype implementation has been empirically evaluated on 19 open source projects. Results highlight the relevance of the refactoring with a developer intent for fine-grained reuse. The analysis of refactored code shows an increase in the number of reusable elements per project and reduction in the coupling of refactored modules. The soundness of the refactoring is empirically validated through code inspection and execution of projects' test suites.
Article
Refactoring is extensively recognized for enhancing the internal structure of object‐oriented software while preserving its external behavior. However, determining refactoring opportunities is challenging for designers and researchers alike. In recent years, machine learning algorithms have shown a great possibility of resolving this issue. This study proposes a deep neural network‐based fitness function (DNNFF) to resolve the software refactoring issue. This study suggests an effective learning technique that automatically featured extracted from the trained models and predicted code clones to recommend which category to refactor. The software engineers automatically assess the recommended refactoring solutions using Genetic Algorithms (GA) for minimum iterations. A Deep Neural Networks (DNN) utilizes these training instances to assess the refactoring solutions for the residual iterations. The refactoring process primarily depends on software designers' skills and perceptions. The simulation findings demonstrate that the suggested DNNFF model enhances the code change score of 98.7%, automatic refactoring score of 97.3%, defect correlation ratio of 96.9%, refactoring precision ratio of 95.9%, flaw detection ratio of 94.4%, and reduces the execution time of 10.2% compared to other existing methods.
Full-text available
Preprint
Advanced AI technologies are serving humankind in a number of ways, from healthcare to manufacturing. Advanced automated machines are quite expensive, but the end output is supposed to be of the highest possible quality. Depending on the agility of requirements, these automation technologies can change dramatically. The likelihood of making changes to automation software is extremely high, so it must be updated regularly. If maintainability is not taken into account, it will have an impact on the entire system and increase maintenance costs. Many companies use different programming paradigms in developing advanced automated machines based on client requirements. Therefore, it is essential to estimate the maintainability of heterogeneous software. As a result of the lack of widespread consensus on software maintainability prediction (SPM) methodologies, individuals and businesses are left perplexed when it comes to determining the appropriate model for estimating the maintainability of software, which serves as the inspiration for this research. A structured methodology was designed, and the datasets were preprocessed and maintainability index (MI) range was also found for all the datasets expect for UIMS and QUES, the metric CHANGE is used for UIMS and QUES. To remove the uncertainty among the aforementioned techniques, a popular multiple criteria decision-making model, namely the technique for order preference by similarity to ideal solution (TOPSIS), is used in this work. TOPSIS revealed that GARF outperforms the other considered techniques in predicting the maintainability of heterogeneous automated software.
Chapter
Roundabouts are effective intersection designs, which are rapidly gaining attention and popularity among traffic engineers. This is due to the roundabout’s capacity to handle the mobility of a substantial number of vehicles. The ever increasing demand for more traffic capacity can be satisfied by either significant capital investment in infrastructure or creating more capacity by intelligent signalization. The appropriate traffic signal timing is critical in smoothing traffic flow. Inappropriate traffic signal timing not only causes delays and inconvenience to drivers but also increases environmental pollution. Thus, it is important to investigate different signal timings to ensure that implemented plan will have a positive impact on the network’s performance. The optimization of roundabouts’ signal timing is relatively a new area of research. The problem is difficult to model realistically and computationally challenging. Due to the flow nature of the traffic problem, wider areas of traffic must be regulated simultaneously in a network. This can be achieved via either field-testing or using a reliable simulation tool. Microscopic simulation allows a safer and cheaper evaluation of many more alternative signal timings compared to field-testing. Although development, calibration, and validation of simulation models for traffic networks are challenging. We present a model to evaluate the performance of a network of signalized roundabouts, with which the combination of different traffic volume and cycle length scenarios can be intelligently studied. We also provide information on the development, calibration, and validation of the model, as well as a real-life implementation on a network of roundabouts in Izmir/Turkey.KeywordsMicrosimulationTraffic signal timingTraffic signal controlStochastic simulation modelNetwork of signalized roundaboutsCycle length
Chapter
Most real-world problems can be pictured as a set of connections and interactions between various entities. Together, these entities create a complex phenomenon investigated in the form of complex networks. Each of the entities in the network plays a particular role in the definition of the structure and the analysis of the studied problem. Several measures of centrality have been proposed in the litera-ture to estimate the contribution and quantify the relevance of network entities. The most influential nodes are defined either locally, via the measurement of their connections with their directly related neighbors, or globally, via the measurement of the importance of their neighbors or their relevance in terms of contribution to the fast propagation of information based on the shortest paths. Due to the in-completeness of real-world data, crisp representations do not adequately describe the problem. Therefore, fuzzy graphs have been proposed to give more realistic representations by taking into account the uncertainties present in data. This paper proposes a state of the art of fuzzy centrality measures with a focus on proposed studies on urban traffic networks.
Full-text available
Conference Paper
Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehen-sibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with DECOR, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while DECOR generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.
Full-text available
Book
This book puts forward a new method for solving the text document (TD) clustering problem, which is established in two main stages: (i) A new feature selection method based on a particle swarm optimization algorithm with a novel weighting scheme is proposed, as well as a detailed dimension reduction technique, in order to obtain a new subset of more informative features with low-dimensional space. This new subset is subsequently used to improve the performance of the text clustering (TC) algorithm and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, the (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; each algorithm represents an incremental improvement on its predecessor. For the evaluation process, seven benchmark text datasets are used with different characterizations and complexities. Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where all documents in the same cluster are similar. The findings presented here confirm that the proposed methods and algorithms delivered the best results in comparison with other, similar methods to be found in the literature.
Full-text available
Article
In this paper, a novel text clustering method, improved krill herd algorithm with a hybrid function, called MMKHA, is proposed as an efficient clustering way to obtain promising and precise results in this domain. Krill herd is a new swarm-based optimization algorithm that imitates the behavior of a group of live krill. The potential of this algorithm is high because it performs better than other optimization methods; it balances the process of exploration and exploitation by complementing the strength of local nearby searching and global wide-range searching. Text clustering is the process of grouping significant amounts of text documents into coherent clusters in which documents in the same cluster are relevant. For the purpose of the experiments, six versions are thoroughly investigated to determine the best version for solving the text clustering. Eight benchmark text datasets are used for the evaluation process available at the Laboratory of Computational Intelligence (LABIC). Seven evaluation measures are utilized to validate the proposed algorithms, namely, ASDC, accuracy, precision, recall, F-measure, purity, and entropy. The proposed algorithms are compared with the other successful algorithms published in the literature. The results proved that the proposed improved krill herd algorithm with hybrid function achieved almost all the best results for all datasets in comparison with the other comparative algorithms.
Full-text available
Article
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known object-oriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers. Copyright © 2017 Institute of Advanced Engineering and Science. All rights reserved.
Full-text available
Article
Constrained clustering or semi-supervised clustering has received a lot of attention due to its flexibility of incorporating minimal supervision of domain experts or side information to help improve clustering results of classic unsupervised clustering techniques. In the domain of software remodularisation, classic unsupervised software clustering techniques have proven to be useful to aid in recovering a high-level abstraction of the software design of poorly documented or designed software systems. However, there is a lack of work that integrates constrained clustering for the same purpose to help improve the modularity of software systems. Nevertheless, due to time and budget constraints, it is laborious and unrealistic for domain experts who have prior knowledge about the software to review each and every software artifact and provide supervision on an on-demand basis. We aim to fill this research gap by proposing an automated approach to derive clustering constraints from the implicit structure of software system based on graph theory analysis of the analysed software. Evaluations conducted on 40 open-source object-oriented software systems show that the proposed approach can serve as an alternative solution to derive clustering constraints in situations where domain experts are non-existent, thus helping to improve the overall accuracy of clustering results.
Article
Declarative rules are frequently used in model refactoring in order to detect refactoring opportunities and to apply the appropriate ones. However, a large number of rules is required to obtain a complete specification of refactoring opportunities. Companies usually have accumulated examples of refactorings from past maintenance experiences. Based on these observations, we consider the model refactoring problem as a multi objective problem by suggesting refactoring sequences that aim to maximize both structural and textual similarity between a given model (the model to be refactored) and a set of poorly designed models in the base of examples (models that have undergone some refactorings) and minimize the structural similarity between a given model and a set of well-designed models in the base of examples (models that do not need any refactoring). To this end, we use the Non-dominated Sorting Genetic Algorithm (NSGA-II) to find a set of representative Pareto optimal solutions that present the best trade-off between structural and textual similarities of models. The validation results, based on 8 real world models taken from open-source projects, confirm the effectiveness of our approach, yielding refactoring recommendations with an average correctness of over 80%. In addition, our approach outperforms 5 of the state-of-the-art refactoring approaches.
Article
Class cohesion has an immediate impact on maintainability, modifiability and understandability of the software. Here, a new metric of cohesion based on complex networks (CBCN) for measuring connectivity of class members was developed mainly relying on calculating class average clustering coefficient from graphs representing connectivity patterns of the various class members. In addition, the CBCN metric was assessed with theoretical validation according to four properties (nonnegativity and normalization, null and maximum values, monotonicity, cohesive modules) of the class cohesion theory. Based on data comparison with existing seventeen typical class cohesion metrics of class cohesion for a system, the CBCN metric was superior to others. Applying the CBCN metric to three open source software systems to calculate class average clustering coefficients, we found that understanding, modification and maintenance of classes in an open software system could be likely less difficult compared with those of others. Three open software systems have power-law distributions for the class average clustering coefficient, which makes possible the further understanding of the cohesion metric based on complex networks.
Article
Context: During its lifecycle, a software system undergoes repeated modifications to quickly fulfill new requirements, but its underlying design is not properly adjusted after each update. This leads to the emergence of bad smells. Refactoring provides a de facto behavior-preserving approach to eliminate these anomalies. However, manually determining and performing useful refactorings is a formidable challenge, as stated in the literature. Therefore, framing object-oriented automated refactoring as a search-based technique has been proposed. However, the literature shows that search-based refactoring of component-based software has not yet received proper attention. Objective: This paper presents a genetic algorithm-based approach for the automated refactoring of component-based software. This approach consists of detecting component-relevant bad smells and eliminating these bad smells by searching for the best sequence of refactorings using a genetic algorithm. Method: Our approach consists of four steps. The first step includes studying the literature related to component-relevant bad smells and formulating bad smell detection rules. The second step involves proposing a catalog of component-relevant refactorings. The third step consists of constructing a source code model by extracting facts from the source code of a component-based software. The final step seeks to identify the best sequence of refactorings to apply to reduce the presence of bad smells in the source code model using a genetic algorithm. The latter uses bad smell detection rules as a fitness function and the catalog of refactorings as a means to explore the search space. Results: As a case study, we conducted experiments on an unbiased set of four real-world component-based applications. The results indicate that our approach is able to efficiently reduce the total number of bad smells by more than one half, which is an acceptable value compared to the recent literature. Moreover, we determined that our approach is also accurate in refactoring only components suffering from bad smells while leaving the remaining components untouched whenever possible. Furthermore, a statistical analysis shows that our genetic algorithm outperforms random search and local search in terms of efficiency and accuracy on almost all the systems investigated in this work. Conclusion: This paper presents a search-based approach for the automated refactoring of component-based software. To the best of our knowledge, our approach is the first to focus on component-based refactoring, whereas the state-of-the-art approaches focus only on object-oriented refactoring.
Article
In this study, we describe a system-level multiple refactoring algorithm, which can identify the move method, move field, and extract class refactoring opportunities automatically according to the principle of ”high cohesion and low coupling.” The algorithm works by merging and splitting related classes to obtain the optimal functionality distribution from the system-level. Furthermore, we present a weighted clustering algorithm for regrouping the entities in a system based on merged method-level networks. Using a series of preprocessing steps and preconditions, the ”bad smells” introduced by cohesion and coupling problems can be removed from both the non-inheritance and inheritance hierarchies without changing the code behaviors. We rank the refactoring suggestions based on the anticipated benefits that they bring to the system. Based on comparisons with related research and assessing the refactoring results using quality metrics and empirical evaluation, we show that the proposed approach performs well in different systems and is beneficial from the perspective of the original developers. Finally, an open source tool is implemented to support the proposed approach.