Table 2 - uploaded by Chun Yong Chong
Content may be subject to copyright.
Ordering of relationships in UML class diagram proposed by (Dazhou et al., 2004)

Ordering of relationships in UML class diagram proposed by (Dazhou et al., 2004)

Source publication
Article
Full-text available
Constrained clustering or semi-supervised clustering has received a lot of attention due to its flexibility of incorporating minimal supervision of domain experts or side information to help improve clustering results of classic unsupervised clustering techniques. In the domain of software remodularisation, classic unsupervised software clustering...

Contexts in source publication

Context 1
... implies that the complexity of relationship has direct implication toward measuring the complexity of classes. In order to measure the complexity of relationships, UML class relationships are ordered in ordinal scale, as shown in Table 2. In the work by (Dazhou, Baowen, Jianjiang, & Chu, 2004), the authors argue that since the complexities of different relationships are relative with to each other, arbitrary values of 1-10 can be assigned to H1-H10 respectively. ...
Context 2
... on this ordinal scale, one can compare the complexities between different kinds of relationships in UML class diagram. Empirical testing using real open-source software has been demonstrated in ( Chong et al., 2013) based on the ranking in Table 2. Generalization (abstract parent) H9 10 ...
Context 3
... a bidirectional relationship, the weight will be calculated based on the average of both directions. By referring to Table 2, one can identify the relative complexity of relationship í µí± and measure the weight of the relationship R between class í µí°· í µí±– and í µí°· í µí±— using the proposed equation formulated in Equation (1). ...
Context 4
... first operand of Equation (1) denotes the complexity of relationship R while the second operand denotes the complexity of terminus class linked by R, which will be discussed later. í µí°» í µí± indicates the relative complexity of relationship í µí± (by referring to Table 2). ...
Context 5
... this research, Visual Paradigm is chosen to perform the transformation because it is capable of preserving the directionality of method calls using a build-in function called the Impact Analysis. 3. Analyse the complexity of UML relationships based on the ranking shown in Table 2. 4. Convert the UML class diagram into its respective weighted complex network using the method proposed in (Chong & Lee, 2015a). ...

Similar publications

Preprint
Full-text available
Researchers have demonstrated various techniques for fingerprinting and identifying devices. Previous approaches have identified devices from their network traffic or transmitted signals while relying on software or operating system specific artifacts (e.g., predictability of protocol header fields) or characteristics of the underlying protocol (e....
Chapter
Full-text available
Psychotherapy, unanimously described as a particular organized and systematic relationship between a patient and a therapist, is a real complex system. The interaction between the numerous variables belonging to the patient, the therapist and the context in which the therapeutic couple is inserted, presents auto-poietic characteristics and generate...
Preprint
Full-text available
We provide an up-to-date view of the structure of the energy landscape of the low autocorrelation binary sequences problem, a typical representative of the $NP$-hard class. To study the landscape features of interest we use the local optima network methodology through exhaustive extraction of the optima graphs for problem sizes up to $24$. Several...
Preprint
Full-text available
The higher-order interactions in complex networks are interactions between two or more nodes, i.e., beyond edges. The critical higher-order interactions are the ones that play critical roles in information diffusion and primarily affect network structure and functions. Identifying critical higher-order interactions is a significant research problem...
Article
Full-text available
A fundamental question is whether groups of nodes of a complex network can possibly display long-term cluster-synchronized behavior. While this question has been addressed for the restricted classes of unweighted and labeled graphs, it remains an open problem for the more general class of weighted networks. The emergence of coordinated motion of no...

Citations

... Sarhan et al. [5] systematically reported the state-of-the-art empirical contributions in software module clustering. Chong and Lee [6] also presented a method to integrate the concept of graph theory analysis to automatically derive clustering constraints from the implicit structure of software systems. ...
... Basically modularisation is a design principle to have a complex system composed of smaller subsystems that are able to work independently [6,7]. Furthermore, the module view is the most common way to understand software system architecture [8]. ...
Article
Full-text available
In software engineering, a software development process, also known as software development life cycle (SDLC), involves several distinct activities for developing, testing, maintaining, and evolving a software system. Within the stages of SDLC, software maintenance occupies most of the total cost of the software life. However, after extended maintenance activities, software quality always degrades due to increasing size and complexity. To solve this problem, software modularisation using clustering is an intuitive way to modularise and classify code into small pieces. , A multi‐pattern clustering (MPC) algorithm for software modularisation is proposed in this study. The proposed MPC algorithm can be divided into five different steps: (1) preprocessing, (2) file labelling, (3) collection of chain dependencies, (4) hierarchical agglomerative clustering, (5) modification of the clustering result. The performance of the proposed MPC algorithm to selected clustering techniques is compared by using three open‐source and one closed‐source software programs. Experimental results show that the modularisation quality of the proposed MPC algorithm is nearly 1.6 times better than that of the expert decomposition. Additionally, compared to other software clustering algorithms, the proposed MPC algorithm, on average, has a 13% enhancement in producing results similar to human thinking. Consequently, it can be seen that the proposed MPC algorithm is suitable for human comprehension while producing better module quality compared to other clustering algorithms.
... Subsequently, the clustering results produced by the class-level clustering algorithm will be completely different from a method-level clustering algorithm, although both results might be equally feasible. Furthermore, comparing software clustering algorithms within the same level of granularity is also not straightforward, due to different fitness functions and cluster validity metrics employed by different algorithms (Chong and Lee, 2017;Chong et al., 2013). Even if we were to compare the effectiveness of the clustering algorithms from the same family (i.e., agglomerative hierarchical clustering), there are still different ways to configure them (i.e. ...
... Most studies which introduce new clustering algorithms often only evaluate their approach on a specific set of problem instances (Maqbool and Babri, 2007;Chong and Lee, 2017;Shtern and Tzerpos, 2012). Different from existing studies, this work aims to provide a better understanding of which software/code features (i.e., lines of code, number of methods, coupling between objects, depth inheritance) are related to the performance of clustering algorithms, and whether the software/code features can be used to select the most suitable clustering algorithm. ...
... To evaluate the performance of each hierarchical clustering algorithm against the reference model, we use MoJoFM metric proposed in the work by Tzerpos and Holt (1999), Wen and Tzerpos (2003). The MoJo family of metrics were widely used in the domain of software clustering to evaluate the performance of different clustering algorithms (Maqbool and Babri, 2007;Chong and Lee, 2017;Beck et al., 2016;Naseem et al., 2019). Hence, in the remaining of this paper, the term performance of clustering algorithm refers to the MoJoFM value computed when comparing between the produced clustering results and the ground truth. ...
Article
Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up-to-date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings. This work aims to fill this gap by introducing a new approach, Explaining Software Clustering for Remodularisation (E-SC4R), to evaluate the effectiveness of different software clustering approaches. This work focuses on hierarchical clustering and Bunch clustering algorithms and provides information about their suitability according to the features of the software, which as a consequence, enables the selection of the most suitable algorithm and configuration that can achieve the best MoJoFM value from our existing pool of choices for a particular software system. The E-SC4R framework is tested on 30 open-source software systems with varying sizes and domains, and demonstrates that it can characterise both the strengths and weaknesses of the analysed software clustering algorithms using software features extracted from the code. The proposed approach also provides a better understanding of the algorithms’ behaviour by showing a 2D representation of the effectiveness of clustering techniques on the feature space generated through the application of dimensionality reduction techniques.
... Subsequently, the clustering results produced by the class-level clustering algorithm will be completely different from a method-level clustering algorithm, although both results might be equally feasible. Furthermore, comparing software clustering algorithms within the same level of granularity is also not straightforward, due to different fitness functions and cluster validity metrics employed by different algorithms [9,16]. Even if we were to compare the effectiveness of the clustering algorithms from the same family (i.e., agglomerative hierarchical clustering), there are still different ways to configure them (i.e. ...
... Most studies which introduce new clustering algorithms often only evaluate their approach on a specific set of problem instances [14,16,17]. Different from existing studies, this work aims to provide a better understanding of which software/code features (i.e., lines of code, number of methods, etc.) are related to the performance of clustering algorithms, and whether the software/code features can be used to select the most suitable clustering algorithm. ...
... Cutting the dendrogram tree at a higher distance value always yields a smaller number of clusters. However, this decision involves a tradeoff with respect to relaxing the constraint of cohesion in the cluster memberships [9,16,42]. As such, in this work, we attempt to determine the optimal total number of clusters by dividing the total number of classes with the following divisors : 5, 7, 10, 20, and 25. ...
Preprint
Full-text available
Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings. This work aims to fill this gap by introducing a new approach, Explaining Software Clustering for Remodularisation, to evaluate the effectiveness of different software clustering approaches. This work focuses on hierarchical clustering and Bunch clustering algorithms and provides information about their suitability according to the features of the software, which as a consequence, enables the selection of the most optimum algorithm and configuration from our existing pool of choices for a particular software system. The proposed framework is tested on 30 open source software systems with varying sizes and domains, and demonstrates that it can characterise both the strengths and weaknesses of the analysed software clustering algorithms using software features extracted from the code. The proposed approach also provides a better understanding of the algorithms behaviour through the application of dimensionality reduction techniques.
... In other words, a node with special properties like high in-degree centrality in the system can be potentially mapped to a class with an attribute such as the level of reusability of a class. This attribute can be mapped to a bad smell such as shotgun surgery or the lazy class (Chong and Lee 2017;Jenkins and Kirk 2007). The main difference of the proposed approach with the similar existing approaches is the execution of the refactoring process immediately after the bad smell identification. ...
... This cohesion metric can represent badly written codes and can be used to find bad smells. Yong Chong and Peck Lee used weighted complex networks and graph theory concepts to automatically derive the clustering constraints (Chong and Lee 2017). These methods can also be used for code refactoring. ...
... Clustering approaches (Gu et al. 2017;Chong and Lee 2017) Effective in finding dependencies and improving cohesion and coupling ...
Article
Full-text available
The creation of high-quality software is of great importance in the current state of the enterprise systems. High-quality software should contain certain features including flexibility, maintainability, and a well-designed structure. Correctly adhering to the object-oriented principles is a primary approach to make the code more flexible. Developers usually try to leverage these principles, but many times neglecting them due to the lack of time and the extra costs involved. Therefore, sometimes they create confusing, complex, and problematic structures in code known as code smells. Code smells have specific and well-known anti-patterns that can be corrected after their identification with the help of the refactoring techniques. This process can be performed either manually by the developers or automatically. In this paper, an automated method for identifying and refactoring a series of code smells in the Java programs is introduced. The primary mechanism used for performing such automated refactoring is by leveraging a fuzzy genetic method. Besides, a graph model is used as the core representation scheme along with the corresponding measures such as betweenness, load, in-degree, out-degree, and closeness centrality, to identify the code smells in the programs. Then, the applied fuzzy approach is combined with the genetic algorithm to refactor the code using the graph-related features. The proposed method is evaluated using the Freemind, Jag, JGraph, and JUnit as sample projects and compared the results against the Fontana dataset which contains results from IPlasma, FluidTool, Anti-patternScanner, PMD, and Maeinescu. It is shown that the proposed approach can identify on average 68.92% of the bad classes similar to the Fontana dataset and also refactor 77% of the classes correctly with respect to the coupling measures. This is a noteworthy result among the currently existing refactoring mechanisms and also among the studies that consider both the identification and the refactoring of the bad smells.
... The maintainability and reliability of these systems are then carefully evaluated based on the networks. Later they [64] provided an approach which can help practitioners automatically achieve the clustering constraints from the implicit structure of software systems based on graph theory. In [65], C. R. Myers discussed the relationships between several network topological measurements to software engineering practices. ...
Article
Full-text available
In the current decade, software systems have been more intensively employed in every aspect of our lives. However, it is disappointing that the quality of software is far from satisfactory. More importantly, the complexity and size of today’s software systems are increasing dramatically, which means that the number of required modifications also increases exponentially. Therefore, it is necessary to understand how function-level modifications impact the distribution of software bugs. In addition, other factors such as function’s structural characteristics as well as attributes of functions themselves may also serve as informative indicators for software fault prediction. In this paper, we perform statistical methods and logistic regression to analyze the possible factors which are related to the distribution of software bugs. We demonstrate our study from the following five perspectives: 1) the distribution of bugs in time and space; 2) the distribution of function-level modifications in time and space; 3) the relationship between function-level modifications and functions’ fault-proneness; 4) the relationship between functional attributes and functions’ fault-proneness; and 5) the relationship between software structural characteristics and functions’ fault-proneness.
... However, the ways to represent software-based complex networks are generally not standardized across multiple studies due to the fact that different studies might be addressing some specific issues at different levels of granularity, i.e. package level [5], class level [6,7], or code level [8]. While most of the existing studies focus on utilizing source code as the main source of information to form a software-based complex network, there is a lack of studies that attempt to harness the data and metadata that are available on source code management systems (SCMS). ...
Chapter
Full-text available
Various studies had successfully utilized graph theory analysis as a way to gain a high-level abstraction view of the software systems, such as constructing the call graph to visualize the dependencies among software components. The level of granularity and information shown by the graph usually depends on the input such as variable, method, class, package, or combination of multiple levels. However, there are very limited studies that investigated how software evolution and change history can be used as a basis to model software-based complex network. It is a common understanding that stable and well-designed source code will have less update throughout a software development lifecycle. It is only those code that were badly design tend to get updated due to broken dependencies, high coupling, or dependencies with other classes. This paper put forward an approach to model a commit change-based weighted complex network based on historical software change and evolution data captured from GitHub repositories with the aim to identify potential fault prone classes. Four well-established graph centrality metrics were used as a proxy metric to discover fault prone classes. Experiments on ten open-source projects discovered that when all centrality metrics are used together, it can yield reasonably good precision when compared against the ground truth.
... Some tools are available to support Class Responsibility Assignment (CRA) which can be used for analysing and designing OOP, was designed to provide a cognitive toolkit for designers and developers [4]. Designing the object-oriented software (OOS) is the complex process and in the initial steps it's included analysing the class candidates and allocating responsibilities of a system to them [5]. This type of initial design is applied in more advanced Object-oriented mechanisms like interfaces, design patterns, inheritance or even architectural styles [6]. ...
Article
Object Oriented Program (OOP) provides the way to reuse the program by pre-implementing functionalities in the software. It is difficult to develop the object oriented software, which is important in the computer programing. For OOP, modularization mainly depends on the class. There are many methods present for the assigning responsibilities, but most of them are based on the human for decision making. In this research, back propagation neural network (BPNN) was used to provide the solution to object oriented design of the software. The Cinema Booking System (CBS) was taken as the input documentation and Formal Concept analysis (FCA) then found the relationship of the element in the lattice manner, after that the relationship was set with each other. The result showed that the proposed system outperformed the existing system and also the design made manually.
... A complex network approach was used to study software dependency network evolution [13,14]. Chong and Lee used weighted complex network with graph theory analysis to automate the derivation of clustering constraints from object-oriented software [15]. Joblin et al. investigated the evolutionary trends of developer coordination using a network approach [16]. ...
Article
Full-text available
The phenomenon of local worlds (also known as communities) exists in numerous real-life networks, for example, computer networks and social networks. We proposed the Weighted Multi-Local-World (WMLW) network evolving model, taking into account (1) the dense links between nodes in a local world, (2) the sparse links between nodes from different local worlds, and (3) the different importance between intra-local-world links and inter-local-world links. On topology evolving, new links between existing local worlds and new local worlds are added to the network, while new nodes and links are added to existing local worlds. On weighting mechanism, weight of links in a local world and weight of links between different local worlds are endowed different meanings. It is theoretically proven that the strength distribution of the generated network by the WMLW model yields to a power-law distribution. Simulations show the correctness of the theoretical results. Meanwhile, the degree distribution also follows a power-law distribution. Analysis and simulation results show that the proposed WMLW model can be used to model the evolution of class diagrams of software systems.
Article
Programmers strive to design programs that are flexible, updateable, and maintainable. However, several factors such as lack of time, high costs, and workload lead to the creation of software with inadequacies known as anti-patterns. To identify and refactor software anti-patterns, many research studies have been conducted using machine learning. Even though some of the previous works were very accurate in identifying anti-patterns, a method that takes into account the relationships between different structures is still needed. Furthermore, a practical method is needed that is trained according to the characteristics of each program. This method should be able to identify anti-patterns and perform the necessary refactorings. This paper proposes a framework based on probabilistic graphical models for identifying and refactoring anti-patterns. A graphical model is created by extracting the class properties from the source code. As a final step, a Bayesian network is trained, which determines whether anti-patterns are present or not based on the characteristics of neighboring classes. To evaluate the proposed approach, the model is trained on six different anti-patterns and six different Java programs. The proposed model has identified these anti-patterns with a mean accuracy of 85.16 percent and a mean recall of 79%. Additionally, this model has been used to introduce several methods for refactoring, and it has been shown that these refactoring methods will ultimately create a system with less coupling and higher cohesion.
Preprint
Full-text available
Marketplaces for distributing software products and services have been getting increasing popularity. GitHub, which is most known for its version control functionality through Git, launched its own marketplace in 2017. GitHub Marketplace hosts third party apps and actions to automate workflows in software teams. Currently, this marketplace hosts 440 Apps and 7,878 Actions across 32 different categories. Overall, 419 Third party developers released their apps on this platform which 111 distinct customers adopted. The popularity and accessibility of GitHub projects have made this platform and the projects hosted on it one of the most frequent subjects for experimentation in the software engineering research. A simple Google Scholar search shows that 24,100 Research papers have discussed GitHub within the Software Engineering field since 2017, but none have looked into the marketplace. The GitHub Marketplace provides a unique source of information on the tools used by the practitioners in the Open Source Software (OSS) ecosystem for automating their project's workflow. In this study, we (i) mine and provide a descriptive overview of the GitHub Marketplace, (ii) perform a systematic mapping of research studies in automation for open source software, and (iii) compare the state of the art with the state of the practice on the automation tools. We conclude the paper by discussing the potential of GitHub Marketplace for knowledge mobilization and collaboration within the field. This is the first study on the GitHub Marketplace in the field.