Article

A reverse‐engineering approach to subsystem structure identification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Reverse-engineering is the process of extracting system abstractions and design information out of existing software systems. This process involves the identification of software artefacts in a particular subject system, the exploration of how these artefacts interact with one another, and their aggregation to form more abstract system representations that facilitate program understanding.This paper describes our approach to creating higher-level abstract representations of a subject system, which involves the identification of related components and dependencies, the construction of layered subsystem structures, and the computation of exact interfaces among subsystems. We show how top-down decompositions of a subject system can be (re)constructed via bottom-up subsystem composition. This process involves identifying groups of building blocks (e.g., variables, procedures, modules, and subsystems) using composition operations based on software engineering principles such as low coupling and high cohesion. The result is an architecture of layered subsystem structures.The structures are manipulated and recorded using the Rigi system, which consists of a distributed graph editor and a parsing system with a central repository. The editor provides graph filters and clustering operations to build and explore subsystem hierarchies interactively. The paper concludes with a detailed, step-by-step analysis of a 30-module software system using Rigi.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Because our empirical validation is performed in the context of iOS applications and Apple's flavour of MVC which can be viewed as a linear layered architectural pattern, we describe in what follows several approaches that take into account the particularities of the layered-based architecture. Similar to the general systems case, the approaches developed for analysing the layer-style projects take into account either the structural features of the codebase ( [20], [21], [22], [23], [24], [25], , [26], [27] ) or the hybrid (structural and lexical) features ( [28], [29] [30], [31], [32]). ...
... Muller [20] identified various building blocks (e.g., variables, procedures, modules and subsystems) by using composition operations that respect two principles: low coupling and high cohesion. Lattix tool [21] extracted the intermodule dependencies among the code modules by conventional static analysis. ...
... In addition to the type of features involved, these reconstruction methods can be analysed by considering other criteria also. For instance, some approaches are based on a simple static analysis [20], [21], [23], on clustering algorithms as k-means [22], [28] or hierarchical clustering [32] or iterative search algorithms when the reconstruction problem is transformed into an optimisation one [23], [25], [26], [29], [27]. When the clustering is solved by an iterative search algorithm, the involved optimization objectives look for emphasizing the highcohesion and low-coupling of the cluster elements, principles encapsulated in the modularisation metric [33], and for reducing the cyclic dependencies. ...
... To summarize our findings in terms of principles, we have adopted the guidelines of Systematic Literature Reviews [169] for finding and aggregating the data found in the documents included in the review (i.e., primary studies). The presented principles come from the following 83 primary studies: [2,8,13,15,16,18,19,20,21,22,24,25,26,27,28,30,31,33,37,39,40,41,42,43,47,57,60,63,64,71,72,74,86,89,94,95,96,98,100,101,102,103,104,105,106,107,109,110,111,113,114,115,116,117,118,121,123,124,125,126,127,128,129,130,131,133,134,135,136,137,138,139,140,141,142,144,145,151,152,153,156,162,171]. These studies include: 1) peer-reviewed papers that we retrieved either from well-known databases or from the manual search, and that we retained because they explicitly discuss some layering principles; 2) technical reports; 3) and a couple of software engineering books focusing on the layered pattern. ...
... (transverse) ones. These are called omnipresent components in [8,18,47]. They are also called library components (e.g., [18]). ...
... Very few approaches support more than three out of the six catalogued layering rules. The few that diverge from that trend include [8,47,57,64,121,129,137]. Surprisingly, most of the approaches (e.g., [47,57,129]) that cover most of the catalogued rules, are the ones supported by the Rigi tool that was developed more than two decades ago by Müller et al. [47]. ...
Preprint
Full-text available
Architectural reconstruction is a reverse engineering activity aiming at recovering the missing decisions on a system. It can help identify the components, within a legacy software application, according to the application's architectural pattern. It is useful to identify architectural technical debt. We are interested in identifying layers within a layered application since the layered pattern is one of the most used patterns to structure large systems. Earlier component reconstruction work focusing on that pattern relied on generic component identification criteria, such as cohesion and coupling. Recent work has identified architectural-pattern specific criteria to identify components within that pattern. However, the architectural-pattern specific criteria that the layered pattern embodies are loosely defined. In this paper, we present a first systematic literature review (SLR) of the literature aiming at inventorying such criteria for layers within legacy applications and grouping them under four principles that embody the fundamental design principles underlying the architectural pattern. We identify six such criteria in the form of design rules. We also perform a second systematic literature review to synthesize the literature on software architecture reconstruction in the light of these criteria. We report those principles, the rules they encompass, their representation, and their usage in software architecture reconstruction .
... The discovery and identification of sub-system structures by using (k,2) partite graphs was proposed by Muller and Uhl (Muller and Uhl, 1990). Their reverse engineering approach consists of the following steps (Muller et al., 1993): ...
... Many techniques for architecture recovery, even when they are not wholly based on clustering, employ clustering at some stage to derive and present results. For example Rigi (Muller et al., 1993) uses clustering to form cohesive sub-systems, Dali uses Rigi's visualization features, Harris et al. use clustering of files to support their bottom-up recovery process (Harris et al., 1995) and Sartipi et al. (Sartipi and Kontogiannis, 2003) use an approach based on association rule mining and clustering to recover a system's architecture. Thus clustering is an underlying technique within many architecture recovery approaches. ...
... Alteration distance measures the distance between a changed and affected module, and is zero if both modules are controlled by one sub-system. Muller and Uhl (Muller et al., 1993) suggested the use of automated techniques to produce alternative clusterings, which can be reviewed by a human expert for suitability. ...
... Our focus in this paper is the recovery of layered architectures as the layered style is a widely used pattern to structure large software systems. Some approaches were proposed to reconstruct layered architectures (e.g., [9][10][11][12][13][14][15]). However, most of these approaches propose greedy algorithms that partition elements of the analyzed system into layers using some heuristics or some particular criterion (e.g., the number of fan-in and fan-out dependencies of a module [12,13]). ...
... The work in this paper is related to the approaches proposed to recover layered architectures (e.g., [9][10][11][12][13][14][15]). Most of these approaches rely on some criterion or heuristics (e.g. ...
... [10][11][12][13][14]) in the process. Muller et al. [9] propose an approach aiming at supporting users in discovering, restructuring and analyzing subsystem structures using a reverse engineering tool. The proposed process involves the identification of the layered subsystem structures. ...
Conference Paper
Full-text available
Software architecture recovery is a bottom-up process that aims at building high-level views that support the understanding of existing software applications. Many approaches have been proposed to support architecture recovery using various techniques. However, very few approaches are driven by the architectural styles that were used to build the systems under analysis. In this paper, we address the problem of recovering layered views of existing software systems. We reexamine the layered style to extract a set of fundamental principles which encompass a set of constraints that a layered system must conform to at design time and during its evolution. These constraints are used to guide the recovery process of layered architectures. In particular, we translate the problem of recovering the layered architecture into a quadratic assignment problem (QAP) based on these constraints, and we solve the QAP using a heu-ristic search algorithm. In this paper, we introduce the QAP formulation of the layering recovery and we present and discuss the results of the experimentation with the approach on four open source software systems.
... Omnipresent classes represent crosscutting concerns, utility functionalities or elementary domain concepts such as entities. Their basic attributes are that they are called by a vast number of classes in the system (Maqbool and Babri, 2007;Mancoridis et al., 1999;Mitchell and Mancoridis, 2006;Muller and Uhl, 1990;Muller et al., 1993;Luo et al., 2005;Wen and Tzerpos, 2005;Zhang et al., 2010), directly or indirectly, and they are usually located in the bottom architectural layers. However, they are not very important from an architectural point of view in SAR activities, since they do not necessarily represent architecturally significant decisions (Lungu et al., 2014). ...
... One of our contributions is that we can assist in the exclusion of noise as a preparatory procedure prior to modularization or feature location processes and not only to SAR techniques. In general, any approach attempting to comprehend a system for the identification of higher-level constructs, such as architecturally significant program elements or program elements related to a feature, should first remove elements that represent noise in the system (Maqbool and Babri, 2007;Muller and Uhl, 1990;Muller et al., 1993). ...
... They argue that omnipresent components should be removed with their incident edges, since they obscure the system structure. In a following work (Muller et al., 1993), the authors suggest that further inspection of the items identified as omnipresent is required, since some central components of the system are also identified as omnipresent. Mancoridis et al. (1999) cluster software systems to create the system decomposition and as a preprocessing step they identify omnipresent modules. ...
Article
Software systems’ concrete architecture often drifts from the intended architecture throughout their evolution. Program comprehension activities, like software architecture recovery, become very demanding, especially for large and complex systems due to the existence of noise, which is created by omnipresent and utility classes that obscure the system structure. Omnipresent classes represent crosscutting concerns, utilities or elementary domain concepts. The identification and filtering of noise is a necessary preprocessing step before attempting program comprehension techniques, especially for undocumented systems. In this paper, we propose an automated methodology for noise identification. Our methodology is based on the notion that noisy classes are widely used in a system, directly or indirectly. We combine classes’ usage significance with their participation in the system’s subgraphs, in order to identify the classes that are persistently used. Usage significance is measured according to Component Rank, a well-established metric in the literature, which ranks software artifacts according to their usage significance. The experimental results show that the proposed methodology successfully captures classes that produce noise and improves the results of existing algorithms for software systems’ architectural decomposition.
... Approaches that target the recovery of layered architectures : Some approaches targeted the recovery of layered architectures (e.g., [8][9][10][11]23,24] ). For instance, Müller et al. [23] proposed an approach that supports users in discovering, restructuring and analyzing subsystem structures. ...
... Approaches that target the recovery of layered architectures : Some approaches targeted the recovery of layered architectures (e.g., [8][9][10][11]23,24] ). For instance, Müller et al. [23] proposed an approach that supports users in discovering, restructuring and analyzing subsystem structures. The proposed process involves the identification of the layered subsystem structures, which are obtained through the clustering of system's entities into building blocks using composition operations based on principles such as low coupling and high cohesion. ...
... There are numerous examples of approaches that permit a more light-weight form of feedback from the user. Users might be permitted to tune the parameters of the algorithm [10], [11], [13], [20], [21], [22]. Other approaches allow more fine-grained control over the algorithm's decisions [11], [13] -that is, by providing hints to constrain the range of solutions that it can produce. ...
... Rigi [20] provides an editor for a modularisation that presents an example of the parameter-tuning approach. It allows the user to manually reorganise the system, but also provides component identification through the application of graph-based heuristics. ...
Article
Full-text available
Remodularising the components of a software system is challenging: sound design principles (e.g., coupling and cohesion) need to be balanced against developer intuition of which entities conceptually belong together. Despite this, automated approaches to remodularisation tend to ignore domain knowledge, leading to results that can be nonsensical to developers. Nevertheless, suppling such knowledge is a potentially burdensome task to perform manually. A lot information may need to be specified, particularly for large systems. Addressing these concerns, we propose the SUMO (SUpervised reMOdularisation) approach. SUMO is a technique that aims to leverage a small subset of domain knowledge about a system to produce a remodularisation that will be acceptable to a developer. With SUMO, developers refine a modularisation by iteratively supplying corrections. These corrections constrain the type of remodularisation eventually required, enabling SUMO to dramatically reduce the solution space. This in turn reduces the amount of feedback the developer needs to supply. We perform a comprehensive systematic evaluation using 100 real world subject systems. Our results show that SUMO guarantees convergence on a target remodularisation with a tractable amount of user interaction.
... This technique seems potentially suitable for application in round-trip engineering. Systä [10], [12] presents an experimental Shimba tool that uses SCED and Rigi [8] to combine information obtained from static and dynamic analysis and to create various views in analyzing Java software. The event trace information included in the scenario diagrams is used to build high-level dynamic views applying two techniques: state diagrams and pattern matching. ...
Article
By studying existing solutions of round-trip engineering in object-oriented modeling we derived a set of specific requirements, both functional and non-functional, to support round-trip engineering of behavioral model elements. In this paper we propose a use case- and testdriven approach to round-trip engineering of a complete UML model, as prescribed by required elements of the Unified Process template. URCA method was devised to satisfy the stated requirements to the maximum extent possible. Experimental results gained with a prototype that implements the core technique are also given, along with a discussion of some questions requiring further investigation.
... During the past two decades, many tools have been developed to provide solutions for static code analysis and/or software structure visualizations. For example, Müller et al. developed the Rigi system, which is able to identify system blocks and visualize their hierarchies in an interactive manner [2]. Later the ShriMP toolkit developed by Storey and Müller was incor- porated into the Rigi system to support the visualization of system architectures at multiple levels of abstraction [4]. ...
Article
Full-text available
This study introduces a web-based visual analytic framework to better understand the software structures of large-scale environmental models. The framework integrates data management, software structures analysis, and web-based visualizations. A system for the Community Land Model (CLM) is developed to demonstrate the capability of the proposed framework. It consists of three major components: (1) a Fortran-syntax analysis tool that decomposes CLM source code into simpler forms; (2) an application tier that further analyzes and converts the preprocessed data into meaningful software structural information; (3) a web-based front end that is developed using state-of-the-art web technologies and visualization toolkit (e.g., D3.js). The framework provides users with easy access to the internal structures of complex environmental models. Currently, the prototype system is being used by CLM modelers and field scientists to tackle different environmental research problems.
... Poor design, unstructured programming methods, and crisis-driven maintenance can contribute to poor code quality, which in turn affects understanding of the system properties. Program understanding (Tilley, 1998;Muller et al., 1993;Tilley et al., 1998) is a relatively young and evolving field concerned with identifying artifacts and their relationships and understanding their structure and semantics. The essence of this process is essentially pattern matching at different abstraction levels. ...
Chapter
Full-text available
Tool support for program understanding becomes increasingly important in the software evolution cycle, and it has become an integral part of managing systems evolution and maintenance. Using interactive visual tools for getting insight into large evolving legacy information systems has gained popularity. Although several such tools exist, few of them have the flexibility and retargetability needed for easy deployment outside the contexts they were initially built for. The lack of flexibility and limitations for customizability is a management as well as a technical problem in software evolution and maintenance. This chapter discusses the requirements of an open architecture for software visualization tools, implementation details of such an architecture, and examples using some specific software system analysis cases. The focus is primarily on reverse engineering, although the proposed tool architecture is equally applicable to forward engineering activities. This material serves the software architects and system managers as well as the tool designers.
... However, in most cases, there already exists a meaningful modularization of a software system, which is often not perfect and might require updates. Müller et al. [14] recommend not to fully automate the process and Glorie et al. [15] and Rama [16] warn not to ignore the already existing structures. There are some approaches that address the issues by only partly changing the system, for instance, by extracting only specific components [17], [18], [19] or using the current modularization as a starting point for optimization [20]. ...
... They can be categorized into static and dynamic approaches and into automatic and semiautomatic approaches. Static approaches reverse engineer from source code by creating an intermediate graphical representation of the code [5,[19][20][21] while dynamic approaches actually execute the code [22][23][24]. Automatic approaches [22,25,26] are completely tool-based that recover designs and requirements from source code, while semiautomatic techniques employ manual intervention in the process [22,27,28]. ...
Article
Full-text available
In the US Air Force there exist several systems for which design documentation does not exist. Chief reasons for this lack of system documentation include software having been developed several decades ago, natural evolution of software, and software existing mostly in its binary versions. However, the systems are still being used and the US Air Force would like to know the actual designs for the systems so that they may be reengineered for future requirements. Any knowledge of such systems lies mostly with its users and managers. A project was commissioned to recover designs for such systems based on knowledge of systems obtained from stakeholders by interviewing them. In this paper we describe our application of the NFR Approach, where NFR stands for Nonfunctional Requirements, to recover software design of a middleware system used by the Air Force called the Phoenix system. In our project we interviewed stakeholders of the Phoenix system, applied the NFR Approach to recover design artifacts, and validated the artifacts with the design engineers of the Phoenix system. Our study indicated that there was a high correlation between the recovered design and the actual design of the Phoenix system.
... The ensemble-based approach comes with a small, developer friendly visual and metadata interface. Rigi [105] is a reverse engineering tool that generates a layered architecture with disjunct modules. It complements the tool described in this chapter nicely, as they focus on generating a structure where the focus in this work is on maintaining and validating the structure as part of an incremental build process. ...
... We find that the reverse engineering research community has been actively investigating techniques to decompose (partition) the structure of software systems into subsystems (clusters) [26], [5], [20], [15], [29], [4], but regrettably the most part of researchers limited themselves to that, ignoring the problem of provided/required interfaces identification. In the context of restructuring an application, required interfaces of a component, i.e., what the environment should provide in terms of services to the considered component, are defined so that they match the provided interfaces by the used components, i.e., what the component should provide in terms of services to its clients. ...
Conference Paper
Full-text available
Although there are contributions on component-oriented languages, components are mostly implemented using object-oriented (OO) languages. In this perspective, a component corresponds to a set of classes that work together to provide one or more services. Services are grouped together in interfaces that are each implemented by a class. Thus, dependencies between components are defined using the semantic of the enclosed classes, which is mostly structural. This makes it difficult to understand an architecture described with such links. Indeed, at an architectural level dependencies between components must represent functional aspects. This problem is worse, when the components are obtained by re-engineering of legacy OO systems. Indeed, in this case the obtained components are mainly based on the consistency of the grouping logic. So, in this paper we propose an approach to identify the interfaces of a component according to its interactions with the other components. To this end, we use formal concept analysis. The evaluation of the proposed approach via an empirical study showed that the identified interfaces overall correspond to the different functional aspects of the components.
... A Lisp-like interface is provided to supplement REFINE's database access facilities. Rigi [121,122,169] is primarily intended as a tool for reverse-engineering and visualization of large, complex software systems. A parser extracts facts about the software under analysis and stores it either as a graph in a GRAS [16] database or in Rigi Standard Form (RSF) in an ASCII text file. ...
... The extracted facts may be in many forms. Researchers have extracted information about function calls, data accesses, and file operations to help reconstruct views of an architecture [28,29,33,44,56,58,63]. ...
... Várias tentativas para facilitar a compreensão de sistemas de software surgiram baseadas nas hipóteses que o código fonteé aúnica fonte de informação confiável sobre o sistema disponível [1], [3], [5], [7]. Técnicas atuais para compreensão de programas são insuficientes para habilitar desenvolvedores para compreender rapidamente a implementação de uma característica do software. ...
Article
Full-text available
Resumo. Várias abordagens para facilitar a compreensão do compor-tamento de sistemas de softwares têm sido propostas. No entanto, não existe uma abordagem amplamente aceita que recupera informações em alto nível de abstração sobre a estrutura e comportamento de sistemas complexos. Neste trabalhó e apresentada uma abordagem para simpli-ficar tarefas de compreensão de programas orientados a objetos. A abor-dagem propõe uma técnica para reconstrução de diagramas estruturais e comportamentais de alto nível baseada na análise de rastros de execução sumarizados E apresentada avaliação do desempenho da abordagem em termos de precisão e recall em dois sistemas públicos de terceiros, dentre eles o servidor Web Tomcat. O resultado sugere a viabilidade da abor-dagem para uso em sistemas reais de larga escala. Palavras chave: Sumarização; Rastros de Execução; Recuperação de Ar-quitetura; Visões Dinâmicas. 1 Introdução Várias tentativas para facilitar a compreensão de sistemas de software surgiram baseadas nas hipóteses que o código fonté e á unica fonte de informação confiável sobre o sistema disponível [1], [3], [5], [7]. Técnicas atuais para compreensão de programas são insuficientes para habilitar desenvolvedores para compreender rapidamente a implementação de uma característica do software. IDEs recentes fornecem ferramentas de depuração importantes para compreender o comporta-mento do sistema, mas fornecem somente visões rudimentares sobre a arquitetura que são definidas principalmente pela organização de pacotes do projeto. Existem várias técnicas para ajudar a compreensão de sistemas em alto nível de abstração [9], [11]. Estas abordagens usam informações estáticas sobre o código ou informações dinâmicas de programas, como rastros de execução ou mesmo uma combinação de ambas [6]. Rastros de execução têm sido aplicados para identificar quais partes do código implementam funcionalidades específicas [10]. No entanto, rastros de execução são difíceis de analisar porque mesmo para cenários pequenos de sistemas do mundo real, um rastro pode ter centenas de milhares ou milhões de chamadas de métodos.
... Currently there is a great deal of research in the Software Engineering community geared towards a standard exchange format for these artifacts [GXL,HSW00]. Some forms include TA [Hol97], GXL [GXL, HSW00], and RSF [MOTU93]. As well as a syntactic form, the artifacts also have semantics expressed by the constraints imposed by the schema. ...
... As a consequence, numerous researchers (starting with Hutchens and Basili's work in the mid-eighties [2]) have developed automated remodularisation techniques that seek to minimise this effort. Although these tools are automated in principle, it is generally expected that the final proposed modularisation is to be refined by an expert software developer [11]. ...
Conference Paper
Full-text available
Current software remodularisation tools only oper-ate on abstractions of a software system. In this paper, we inves-tigate the actual impact of automated remodularisation on source code using a tool that automatically applies remodularisations as refactorings. This shows us that a typical remodularisation (as computed by the Bunch tool) will require changes to thousands of lines of code, spread throughout the system (typically no code files remain untouched). In a typical multi-developer project this presents a serious integration challenge, and could contribute to the low uptake of such tools in an industrial context. We relate these findings with our ongoing research into techniques that produce iterative commit friendly code changes to address this problem.
... Moreover, the model-to-model transformation is carried out explicitly, in addition to the discovery process, e.g., with text-to-model transformation. Existing reverse engineering approaches are often very flexible w.r.t. the metamodels (e.g., using graphs [11], [12], [13]), or in the contemporary use of source code and model analysis/transformation (e.g., Moose [14] or MARPLE-DPD [15]). From this point of view, we can consider MDRE a subset of the possible approaches for reverse engineering, where the discovery process and model manipulation are disciplined by means of MDE. ...
Article
This paper explores and describes the state of the art for what concerns the model-driven approaches proposed in the literature to support reverse engineering. We conducted a systematic literature review on this topic with the aim to answer three research questions. We focus on various solutions developed for model-driven reverse engineering, outlining in particular the models they use and the transformations applied to the models. We consider also the tools used for model definition, extraction, and transformation and the level of automation reached by the available tools. The model-driven reverse engineering approaches are also analyzed based on various features such as genericity, extensibility, automation of the reverse engineering process, and coverage of the full or partial source artifacts. We describe in detail and compare fifteen approaches applying model-driven reverse engineering. Based on this analysis, we identify and indicate some hints on choosing a model-driven reverse engineering approach from the available ones, and we outline open issues concerning the model-driven reverse engineering approaches.
... An often-used method of representing program model graphs is by showing the nodes and links between them. Examples of this type of representation are found in the Rigi workbench (Müller et al., 1993), SHriMP (Storey et al., 2002), or others (Lutz, 2001) Although this type of graph works well for a relatively small number of elements, when the number of elements and/or their interconnections are increased then this type of graph does not form a clear and easy to interpret representation and significant work needs to be done improving the interpretability of it (Sugiyama et al., 1981). Since the number of elements in one of the projects reaches over 1500, showing the behaviour structure as a graph with nodes and links becomes too cumbersome. ...
Thesis
Full-text available
Software is sometimes considered the most flexible part of any system. The flexibility however appears to exist only if the software was created ‘correctly’. When software is not appropriately adaptable in relation to the problem domain, the cost of modifications increases over the software’s lifetime until it eventually becomes too costly to maintain. To reduce the cost of maintenance several solutions have been proposed either by improving the programming language, design method (methodology), requirements analysis or others. To understand the problem of maintaining flexible software and the increasing complexity when software is not appropriately adaptable, this thesis presents an empirical investigation into software structures in an attempt to determine the influence that language, development method, team organisation have had on the programs analysed.
... Static techniques analyse a system by examining its source or object code. Static techniques can help in understanding the relationships between classes in a system, and in identifying the system architecture [Müller 1993]. Although software systems written in procedural languages are well suited to analysis with static techniques, aspects of the object-oriented paradigm, such as polymorphism, overloading, and dynamic binding, make it more difficult to gain an understanding of an object-oriented software system using static techniques alone. ...
... A Reverse Engineering Approach to Subsystem Structure Identification [64] re-structures the system into a hierarchy of subsystems along with their high-level abstract representation as components. This approach uses composition measures (coupling, cohesion etc.) and composition relations (composition dependency graph) to cluster and map the components along with their explicit interfaces. ...
Chapter
Full-text available
This chapter discussed the overall nature of the domain of reverse engineering and the current shortcomings that can be tackled by analysing this domain from a different perspective. Architecture over comprehension is the key idea behind the proposed reverse engineering.
... All of them are necessarily reduce as much as possible. Turning on the creation of ideas of product design, by using technical specifications derived from QFD, Reverse Engineering method has been applied to detect function process and its needed function parts [6] (shown in Fig. 1). Morphological Matrix which concerns the analysis and permutations of possible solutions of machine elements is used for construction elements development process [7]. ...
Article
Full-text available
A new conceptual design for a small-scale and low-cost plastic recycling machine is generated by combining melting part and compression process. Starting with one of the outstanding requirements is in terms of an affordable-priced machine that can perform two processes with high accuracy and capacity, some issues related to balancing among quality, capacity and cost of machine occurred during a discussion. After implementing various designing methods such as Quality Function Deployment, Reverse Engineering, Morphological Matrix and Pugh Method, an idea of final concept about using an electric oven and hydraulic system to melt down and compress plastic tile which has a dimension of 300x300x9 mm was created. The design of concept is divided into two parts which are mechanical and electrical systems. In a mechanical section, the technical drawing and simulation are made to see how machine performs under operation. Besides, we examined the forces that applied in the moulds to evaluate the strength of the system. In heating and electricity section, we chose electrical components, designed oven parameters and conducted the heating simulation on the mould. In addition, the heating and cooling time was calculated based on the principles of thermodynamics and heat transfer. Furthermore, the manufacturing plan is created to estimate the essential resources producing a certain number of heat-forming machines. In general, the machine needs to be prototyped for controlling its main function and finding practical issues. After that, some improvements could be made to enhance efficiency and increase capacity by designing an optimal mould to more heat absorb and reduce post process, calculate and design more efficient oven, create faster lock mechanism and other improvements for an automatizing machine.
... Such documents can be used to analyze a structure and understand how each component of the system works. ese components are later used to build a similar system or program that may not have exactly the same structure as the reverse-engineered system [1]. ...
Article
Full-text available
Recently, reverse engineering has become widely recognized as a valuable process for extracting system abstractions and design information from existing software. This study focuses on ForUML, a reverse engineering tool developed to extract UML diagrams from modern object-oriented Fortran programs. Generally, Fortran is used to implement scientific and engineering software in various domains, such as weather forecasting, astrophysics, and engineering design. However, methods for visualizing the existing design of object-oriented Fortran software are lacking. UML diagrams of Fortran software would be beneficial to scientists and engineers in explaining the structure and behavior of their programs at a higher level of abstraction than the source code itself. UML diagrams can enhance discussions within development teams and with the broader scientific community. The first version of ForUML produces only UML class diagrams. Class diagrams provide a useful window into the static structure of a program, including the structure and components of each class and the relationships between classes. However, class diagrams lack the temporal information required to understand class behavior and interactions between classes. UML sequence diagrams provide this important algorithmic information. Therefore, herein, an extension for ForUML to extract UML sequence diagrams from the Fortran code is proposed, and this capability is provided using a widely used open-source platform. This study argues that the proposed extension will enable the visualization of object-oriented Fortran software behavior and algorithmic structure and thereby enhance the development, maintenance practices, decision processes, and communications in scientific and engineering software communities worldwide.
... The ARCH tool by Schwanke (1991) determined clusters using coupling and cohesion measurements. The Rigi system by Müller et al. (1993) pioneered the concepts of isolating omnipresent modules, grouping modules with common clients and suppliers, and grouping modules that had similar names. The last idea was followed up by Anquetil and Lethbridge (1998), who used common patterns in file names as a clustering criterion. ...
Article
Full-text available
Recovering test-to-code traceability links may be required in virtually every phase of development. This task might seem simple for unit tests thanks to two fundamental unit testing guidelines: isolation (unit tests should exercise only a single unit) and separation (they should be placed next to this unit). However, practice shows that recovery may be challenging because the guidelines typically cannot be fully followed. Furthermore, previous works have already demonstrated that fully automatic test-to-code traceability recovery for unit tests is virtually impossible in a general case. In this work, we propose a semi-automatic method for this task, which is based on computing traceability links using static and dynamic approaches, comparing their results and presenting the discrepancies to the user, who will determine the final traceability links based on the differences and contextual information. We define a set of discrepancy patterns, which can help the user in this task. Additional outcomes of analyzing the discrepancies are structural unit testing issues and related refactoring suggestions. For the static test-to-code traceability, we rely on the physical code structure, while for the dynamic, we use code coverage information. In both cases, we compute combined test and code clusters which represent sets of mutually traceable elements. We also present an empirical study of the method involving 8 non-trivial open source Java systems.
... Among the first works that considered the negative impact of omnipresent software modules on software clustering, and that stressed upon removing them is that of Muller et al. [MOTU93]. Following this work, different strategies for the detection of omnipresent modules have been proposed. ...
Thesis
Full-text available
Bad designs in software code have a significant impact on the total cost incurred in the development of software. This is because software code with bad designs has poor structure, which decreases its readability, understand- ability and maintainability. Software restructuring is thus a crucial activity in software development. Cohesion is an important measure in assessing the quality of software. The cohesion of a software module is the degree to which module components belong together. An ill-structured software code is charac- terized by low cohesion. Software restructuring techniques based on hierarchical agglomerative clustering (HAC) algorithms have been widely used to restruc- ture large modules with low cohesion into smaller modules with high cohesion. These techniques generate clustering trees (or dendrograms) of the modules. The clustering trees are then sliced at different cut-points to obtain the desired restructurings. Choosing the appropriate cut-points is a difficult problem in clustering. This problem is exacerbated in previous HAC techniques as those techniques generate clustering trees which have a large number of cut-points. Moreover, many of those cut-points return clusters of which only a few lead to a meaningful restructuring. In this thesis, we propose a new hierarchical clustering technique for re- structuring software at the function-level that generates clustering trees where the number of cut-points is reduced, and the quality of the cut-points is im- proved. To establish this we compare the results of our technique with those of four previous hierarchical clustering algorithms. We also develop an easy-to-use software tool that allows the user to generate clustering trees of functions us- ing five different clustering algorithms, including the algorithm proposed in this thesis. Finally, we give a characterization of clusters returned by cut-points, in the context of software restructuring.
... It is also a common characteristic among legacy software to have poor quality. The software itself is affected by the proper technical problems, which may be addressed with different reverse engineering techniques [2][3] [4][5] [6]. It is also affected by the problems beyond the software architecture and its realization, which resides in reverse engineering why software has been developed and architected in the specific manner to satisfy the system needs. ...
Article
Full-text available
In a complex system, a legacy software as a component is determined by various factors beyond its own capability. Lack of knowledge that shaped software, which is often the case of a legacy software, can prohibit appropriate maintenance and development to comply with the system needs. To reverse engineering legacy software for a fit with the overall system of interest is a daunting task. Existing techniques of reverse engineering are mostly from a purely technical point of view and for the single discipline of software engineering. Thus, this paper aims for an approach to properly reverse engineer the reasoning behind the legacy software developments in a complex system. By jointly apply the CAFCR model and the reverse engineering, a roadmap is created to guide incremental developments of legacy software in a complex system, which benefits both the maintenance of existing implementation and realization of new functionalities for improved system performance.
... Pattern-based Reverse Engineering of Design Components [12] extracts design components based on the structural descriptions of design patterns. A Reverse Engineering Approach to Subsystem Structure Identification [14] re-structures the system into a hierarchy of subsystems along with their highlevel abstract representation as components. Washizaki [16] detects reusable part of object-oriented classes and transforms classes into JavaBeans components. ...
Conference Paper
Full-text available
Current component-directed reverse engineering approaches extract ADL-based components from legacy systems. ADL-based components need to be configured at code level for reuse, they cannot provide re-deposition after composition for future reuse and they cannot provide flexible re-usability as one has to bind all the ports in order to compose them. This paper proposes a solution to these issues by extracting X-MAN components from legacy systems. In this paper, we explain our component model and mapping from object-oriented code to X-MAN clusters using basic scenarios of our rule base.
Chapter
Nature has always been a source of inspiration for human beings. Large numbers of complex optimization problems have been solved by the techniques inspired by nature. Software modularization is one of such complex problems that have been encountered by software engineers. It is the process of organizing modules of a software system into optimal clusters. In this chapter, some bio-inspired algorithms such as bat, artificial bee colony, black hole and firefly algorithm have been proposed for the cause of software modularization. The hybrid of these algorithms with crossover and mutation operators of the genetic algorithm has also been proposed. All the algorithms along with their hybrids are tested on seven benchmark open source software systems. It has been evaluated from the results thus obtained that the hybrid of these algorithms proved to optimize better than the existing genetic and hill-climbing approaches.
Article
Full-text available
In software testing, one of the most important issues is to validate the consistency of program and design. Usually, this kind of design information is expressed by software related documents, such as software requirements, software design, etc. Therefore, how to effectively extract the design information based on these documents is very important. In this paper, we first introduce the method of the analyzing class diagram model and method call diagram model from UML model of design document; secondly, we classify and summarize the inconsistency problems that may exist in the two types of models; thirdly, we design static and dynamic consistency algorithms for these models. Finally, we implement the CCoSaD (consistency checking of software and Design) tool and validate the effectiveness based on experiments.
Conference Paper
Full-text available
The polymorphic domain of software reverse engineering varies since 90s due to multiple reasons. Some of the primary reasons include the acceptance of new programming languages, underlying technique of reverse engineering and the desired output notation of the reverse engineering that varies with evolution of software. The purpose of this paper is to provide a trend-based taxonomy of reverse engineering that can classify the differences and similarities in the reverse engineering throughout the years.
Article
We propose a novel technique for recovering certain elements of the UML model of a software system. These include relationships between use cases as well as class roles in collaborations that realize each use case, identifying common functionality and thus establishing a hierarchical view of the model. The technique is based on dynamic analysis of the system for the selected test cases that cover relevant use cases. The theory of formal concept analysis is applied to obtain classification of model elements, obtained by a static analysis of code, in terms of use case realiTations.
Conference Paper
High productivity and high flexibility are the demands of digital manufacturing industry. The current trend in manufacturing came up with the fourth industrial revolution, i.e. Industry 4.0 [1]. The concept is taking its shape from automated manufacturing systems to intelligent manufacturing systems but is still in its nascent stage. One of the basic components of these systems is a cyber-physical system (CPS) [2], i.e. a mechanism controlled by computer-based algorithms integrated with users over a network. The CPS is the smart system that consists of physical and computational elements; these elements can be distributed into four-layered architecture, which is made up of a sensing layer, networking layer, analyzing layer, and application layer [3]. The benefits of these systems are that they are time saving and flexible, feasible for a demand of even one quantity placed by an individual customer, and do not require reconfiguration of the manufacturing system. The term CPPS (cyber-physical production system) was coined in Germany that proposed a complete automated system in the realm of Industry 4.0: a manufacturing system based on CPS that comprises of physical elements which are robots, conveyors, sensors, actuators, etc. and a cyber-layer based on computational elements [4]. The independent elements of CPPS can cooperate with each other through Internet of Things (IoT) [5], a concept in which components having unique identity can transfer data to each other over a network without requiring any human–computer interaction (HCI), thus creating smart factories [6]. Internet can be one such communication protocol in IoT. A similar case of smart-factory production system is presented in [7]. Though the robots and computers take a major share in the CPS, human presence is essential for productivity either for supervision or complicated jobs that robots cannot undertake. The smart-factory concept exists for large production systems; however, there is very little research that exists for manufacturing at microdomain which is deemed necessary due to the limitations of the macro devices, i.e. their large size, greater power consumption, large cost effect, higher susceptibility to environment conditions, and control loop that is believed to be significantly larger [8]. In this chapter, a smart factory is proposed; a collaboration is envisaged between a human, a cobot, and a multistaged micromilling machine. The related concepts are stated below.
Thesis
La rétro-conception est une activité de plus en plus utilisée dans l’industrie manufacturière qui, en réutilisant des composants ou des sous-systèmes déjà existants, permet de réduire le temps de développement des produits.Cette activité telle qu’on la connait repose essentiellement sur la capture de la forme du produit, en étudiant sa topologie et sa géométrie. Cependant, l’aspect de forme seul, ne permet pas une intégration optimale dans ces nouveaux projets. Alors, le recours à la représentation du produit sous d’autres aspects complémentaires à celui de la forme est intéressant. Dans notre travail de recherche, nous proposons une nouvelle façon de faire la rétro-conception, en utilisant un ensemble de données hétérogènes qui peuvent représenter le produit. Ces données sont de types différents : elles peuvent être textuelles, picturales, ou virtuelles, et représentent le produit selon des aspects différents, tels que les fonctions, la structure, ou la dynamique (comportementale). Pour cela, nous proposons dans un premier temps une démarche d’intégration de ces données hétérogènes à travers une analyse détaillée qui permet d’identifier les informations pertinentes à intégrer, pour ensuite aboutir à une démarche qui permet de faire le lien entre ces données analysées, et le modèle produit qui intègre différents aspects du produit, tout cela dans le but de construire une représentation sémantiquement plus riche qu’une simple représentation géométrique.
Conference Paper
Execution trace analysis is particularly valuable in the context of object-oriented program comprehension. However, coping with object-oriented execution traces is a very difficult task. In particular, because of coupling, current object-oriented systems lead to form a very complicated interwoven lattice of dependencies which is known as “Spaghetti Architectures” phenomenon. Therefore, trace reduction techniques are used to make execution traces more tractable and less difficult. This research focuses on establishing a trace simplification framework that depends on decoupling. Decoupling is a very useful in reducing the complexity of execution traces for comprehension process. In particular, decoupling could truthfully lessen the problem of “Spaghetti Architectures” phenomenon without removing trace components and creating several holes and gaps in trace structure. For this purpose, a set of related challenges is posed and addressed in order to develop the framework properly. The framework is based on combination of three components namely, scope filtering, utility detection, and utility decoupling components
Conference Paper
Program source code is one of the main targets of software engineering research. A wide variety of research has been conducted on source code, and many studies have leveraged structural, vocabulary, and method signature similarities to measure the functional sameness of source code. In this research, we conducted an empirical study to ascertain how we should use three similarities to measure functional sameness. We used two large datasets and measured the three similarities between all the method pairs in the datasets, each of which included approximately 15 million Java method pairs. The relationships between the three similarities were analyzed to determine how we should use each to detect functionally similar code. The results of our study revealed the following. (1) Method names are not always useful for detecting functionally similar code. Only if there are a small number of methods having a given name, the methods are likely to include functionally similar code. (2) Existing file-level, method-level, and block-level clone detection techniques often miss functionally similar code generated by copy-and-paste operations between different projects. (3) In the cases we use structural similarity for detecting functionally similar code, we obtained many false positives. However, we can avoid detecting most false positives by using a vocabulary similarity in addition to a structural one. (4) Using a vocabulary similarity to detect functionally similar code is not suitable for method pairs in the same file because such method pairs use many of the same program elements such as private methods or private fields.
Chapter
The goal of our research is to find ways to improve the productivity of software developers who are maintaining very large legacy systems. To help achieve this goal, we have developed a software tool called TkSee, which allows software engineers to explore and understand source code. This tool serves as the infrastructure for various studies of program comprehension, and is used intensively by several software developers inside Mitel Corporation. As researchers, the most important part of our mission is to evaluate the usefulness of the ideas we implement. In this chapter we present a case study in which we obtained insights about usability while trying to evaluate TkSee’s overall usefulness. The intent of this chapter is to highlight the factors that made the evaluation process difficult, and to provide pointers to those who wish to assess the usefulness of complex software products.
Conference Paper
Clustering is a useful technique to group similar data entities based on their features. Clustering uses similarity or distance measures to find groups among entities. Many clustering algorithms and measures have been proposed in the literature. These algorithms and measures have their own strengths and weaknesses in the software clustering domain. To combine the strengths of various algorithms/measures, researchers have inte- grated more than one algorithm/measure in a single clustering process. This approach allowing cooperation between actors, is called Combination of Multiple Clustering approach (CMC). Although the use of CMC has been explored in various disciplines, little work has been done using CMC in the software domain. In this paper, we explore the idea of Cooperative Clustering (CC), a type of CMC, for software modularization. Our CC combines the strengths of similarity and distance measures together in a single clustering process. Software modularization is very important for architectural understanding. Modularization is the breaking down of a software system into modules so that similar entities (e.g. classes or functions) are collected together. We expect high quality results of CC in terms of authoritativeness. which is very important for architectural understanding.
Technical Report
Full-text available
Since the ability to represent relational data in the form of drawings or graphs is a powerful tool which allows us to perform analysis through visual exploration. Several data presentation problems require the drawing or display of graphs; examples include circuit schematics and software engineering diagrams. To display the relational data in a meaningful way is always been a problem of this representation. For example if you need to calculate the optimal placement of electrical components inside the electrical application which contains a large number of small connected components. These components have to be placed on the chip such that the number of crossings is as small as possible, and the required are of the chip must not become too large. The problem become even more complex when several different constraints have to be satisfied as well, for example the number of bends and total length of the connections must be minimized as in design of Very Large Scale Integration (VLSI) chips. A wide variety of fields each with their own requirements utilize automatic graph drawing algorithms to clarify or to display the structure of the information in a compact and relatively small space. As a result, graph drawing algorithms have been a focus of research from the past couple of decades providing a better quality of drawing and higher drawing performance. Several classes of graph drawing algorithms with different aesthetic criteria‟s have been evolved to address the problem of planar drawings. Some of the most flexible algorithms for calculating layouts of simple undirected graphs belong to a class known as force-directed algorithms. Also known as spring embedders, such algorithms calculate the layout of a graph using only information contained within the structure of the graph itself, rather than relying on domain-specific knowledge. An algorithm designed by M.J. Fruchterman and M. Reingold as described in [1] model the vertices behavior as atomic particles or celestial bodies, exerting attractive and repulsive forces on each other.
Chapter
In small and medium-sized enterprises (SME), where typically the decision making process is highly centralized, important decisions, such as innovation adoption and implementation, will be strongly influenced by Chief Executive Officers (CEOs). Based on Upper Echelon Theory (UET), which emphasizes CEOs’ limited awareness, this chapter shows that a firm’s open innovation (OI) journey is closely linked with the CEOs’ cognitive characteristics, their past experience and personal network of relationships. The case analysis results suggest that the micro–foundation of OI is deeply rooted in a key individual’s characteristics in innovative SMEs.
Chapter
Software visualization is considered by many researchers to be a useful and powerful technique for helping programmers understand large and complex programs. Consequently, many visualization tools have been developed for exploring software code and documentation. Some software visualization tools use graphical representations for the navigation, analysis and presentation of software information to further understanding [43]. For instance, several software visualization tools show animations with the goal of teaching widely used algorithms and data structures. Another class of tools shows the dynamic execution of programs for debugging, profiling and for understanding run-time behavior. Other software visualization tools mainly focus on showing textual representations, some of which may be pretty printed to increase understanding [4, 15] or use hypertext in an effort to improve the navigability of the software [40].
Article
In this article, a strategy that seeks to assist the arduous cognitive process that involves understanding a GUI‐based system is presented. In order to reach this goal, a UML use case model with the most relevant features is obtained. To derive this model, the strategy performs the following steps: Extraction and filtering of specific system static information and implementation of a clustering process that inspects this information including GUI widgets, which are components closely related to the system problem domain. Although these steps are commonly known and used in the context of reverse engineering, the strategy introduces unusual approaches regarding the proposals found in the available literature. More specifically, the strategy presents (a) a set of metrics that infers the relative importance of a method or a function within the analysed system, (b) a summarization process driven by different features of software systems, and (c) a technique to cluster software artefacts and to map the cluster model onto a use case model. The article also propose a methodology that allows to compare the achieved results. The assessment of the approach suggests that the strategy can assist the software engineer to understand a software system providing a useful fine‐grained use case model. In the context of software maintenance, the most time‐consuming activities are those the software engineer must execute to understand a system. Reverse engineering provides methods and tools aiming to assist this arduous cognitive process. We propose a reverse engineering strategy that extracts a UML use case model with the most relevant features for the system under study. This model provides a connection between a high‐level abstraction of the system functionalities and the source code artifacts that implement those functionalities.
Conference Paper
In the present paper, a reverse engineering tool that represents the design intention of engineers is proposed. Without considering the design intention, it is difficult to reproduce documents that represent the design information of existing software. Therefore, estimating the intention of engineers could be a key factor in the construction of an understandable model of the software structure. Since naming conventions are established in each organization and in product development, a module name may suggest the design intention. Based on this premise, the authors proposed a method for estimating feature categories of modules with the design intention and implemented the method in a reverse engineering tool. This paper analyzes two kinds of embedded software products for a body fat scale and wireless communication equipment using the proposed tool and discuss the results.
Article
Full-text available
Reverse-engineering application codes back to the design and specification stage may entail the recreation of lost information for an application, or the extraction of new information. We describe techniques which produce abstractions in object-oriented and functional notations, thus aiding the comprehension of the essential structure and operations of the application, and providing formal design information which may make the code much more maintainable and certainly more respectable.The two types of application considered here are (1) data processing applications written in COBOL - of primary importance owing to their predominance in present computing practice - and (2) scientific applications written in FORTRAN. These two require somewhat different abstraction approaches.
Article
Full-text available
The design and operation of a computer-aided software engineering (CASE) environment called SOFTMAN is described. SOFTMAN is designed to support both forward and reverse engineering of large software systems. In the authors' view, tools and techniques for forward and reverse CASE are complementary and therefore should be integrated into the same environment. SOFTMAN provides support for correctness-assuring forward engineering through the incremental verification and validation of evolving software systems under its management. This applies to both formal and informal software system descriptions. Further, SOFTMAN support for reverse software engineering provides mechanisms that enables the SOFTMAN-based evolution of pre-existing software systems. As such, the authors describe their approach to both forward and reverse CASE, as well as the tools and a scenario for their use that SOFTMAN supports.
Conference Paper
Full-text available
Software professionals rely on internal documentation as an aid in understanding programs. Unfortunately, the documentation for most programs is usually out-of-date and cannot be trusted. Without it, the only reliable and objective information is the source code itself. Personnel must spend an inordinate amount of time exploring the system by looking at low-level source code to gain an understanding of its functionality. One way of producing accurate documentation for an existing software system is through reverse engineering. This paper outlines a reverse engineering methodology for building subsystem structures out of software building blocks, and describes how documenting a software system with views created by this process can produce numerous benefits. It addresses primarily the needs of the software engineer and technical manager as document users.
Article
Full-text available
EDGE is an editor kernel for the direct and visual manipulation of graphs. The kernel can be adapted quickly to diverse applications based on graphs, such as PERT chart editing, directory browsing, call graph display, logic circuit simulation or configuration visualization. EDGE provides potential solutions to the following general problems faced by any graph editor. (1) Automatic graph layout: how can application-specific layout requirements, individual preferences, and layout stability be integrated with automatic layout algorithms? EDGE solves this problem with a novel algorithm that is based on layout constraints. (2) Graph abstraction: how can users deal with large graphs containing hundreds of nodes and edges, and thousands of edge crossings? EDGE can reduce the apparent complexity with subgraph abstractions and a novel clustering technique called edge concentration. (3) Adaptability: how should the editor kernel be structured to be adaptable to various applications? EDGE uses a special graph representation language for specifying visual appearance and the inheritance mechanism of C++ to achieve extendibility. (4) Persistence of graphs: how can the graph structures produced by the editor be kept in long-term storage, especially if the node and edge data structures have been extended for a particular application? Our approach uses a standardized, external format for graphs and handles extensions with program generator technology: the I/O routines for reading and writing extended node and edge data structures are produced automatically from the declarations of these data structures. This paper describes EDGE and presents details of the above solutions.
Conference Paper
Full-text available
The authors describe a clustering method that uses equivalence relations for identifying subsystem structures. The relations are intended to embody the software engineering principles that concern module interactions, such as low coupling, high strength, small interfaces, and few interfaces. The resulting compositions are ( k ,2)-partite graphs (a class of layered graphs) rather than strict tree hierarchies. The method is supported by an interactive graph editor
Article
Full-text available
The Documents Integration Facility, an environment based on objects and relationships between objects that was constructed for the development, use, and maintenance of large-scale systems and their life-cycle documents, is presented. DIF helps integrate and manage the documents produced and used throughout the life cycle: requirements specifications, functional specifications, architectural designs (structural specifications), detailed designs, source code, testing information, and user and maintenance manuals. DIF supports information management in large systems where there is much natural-language text. The documentation method used by DIF and DIF's structure are described. How DIF is used is discussed, and the DIF environment is examined. Issues that were encountered in the design of DIF are considered.< >
Article
Full-text available
Extraction of the structural and, to a lesser degree, functional and dynamic properties of systems composed of modules and subsystems is treated. The process is equivalent to reverse engineering a system-level design description. The approach used is to map the resource exchange among modules and then derive a hierarchical design description using a system-restructuring algorithm. The medium for the design description is a module interconnection language, NuMIL. The performance of the algorithm shows that it is practical.< >
Article
Full-text available
Reverse engineering is the process of extracting system abstractions and design information out of existing software systems. This information can then be used for subsequent development, maintenance, re-engineering, or reuse purposes. This process involves the identi#cation of software artifacts in a particular subject system, and the aggregation of these artifacts to form more abstract system representations. This paper describes a reverse engineering environment which uses the spatial and visual information inherent in graphical representations of software systems to form the basis of a software interconnection model. This information is displayed and manipulated by the reverse engineer using an interactive graph editor to build subsystem structures out of software building blocks. The spatial component constitutes information about the relative positions of the meaningful parts of a software structure, whereas the visual component contains information about how a software structure...
Article
Full-text available
We introduce and compare two models of cooperation among programmers during software maintenance. Enforced cooperation is the normal mode of operation when the sheer size of the software maintenance effort makes laissez-faire management infeasible. Voluntary cooperation is more common when a small group works together to enhance a small system or modify a small portion of a large system. We describe a tool, Infuse, that provides change management in the context of both models of cooperation. We demonstrate how Infuse automates change propagation and enforces negotiation of conflicts for the enforced model, but provides less restrictive aids for maintaining consistency for the voluntary model. _______________ * Supported in part by a grant from the AT&T Foundation, in part by a grant from Siemens Research and Technology Laboratories, and in part by a Faculty Award from Digital Equipment Corporation. 1. Introduction The maintenance and evolution of large systems requires cooperation ...
Article
Program modules and data structures are interconnected by calls and references in software systems. Partitioning these entities into clusters reduces complexity. For very large systems manual clustering is impractical. A method to perform automatic clustering is described and a metric to quantify the complexity of the resulting partition is developed.
Article
Module interconnection languages are considered essential tools in the development of large software systems. Current software development practice follows the principle of the recursive decomposition of larger problems that can be grasped, understood, and handled by specialized and usually independent teams of software engineers. After teams succeed in designing and coding their respective subsystems, they are faced with different but usually more difficult issues; how to integrate independently developed subsystems or modules into the originally planned complete system. Module interconnection languages (MILs) provide formal grammar constructs for describing the global structure of a software system and for deciding the various module interconnection specifications required for its complete assembly. Automatic processing of these formal descriptions results in a verification of system integrity and intermodular compatibility. This paper is a survey of MILs that are specifically designed to support module interconnection and includes brief descriptions of some software development systems that support module interconnection.
Article
The display of a directed graph is a commonly used visual aid for representing relationships. However, some graphs contain so many edges that their display by traditional graph layout algorithms is virtually impossible because of the overwhelming number of crossings. Graphs representing large software systems and their configurations are particularly prone to this problem. Examples of such graphs include: graphs depicting a system's configuration, call graphs, graphs depicting import and export relationships between modules, and graphs depicting the “includes” relation among a system's source files. This paper proposes the elimination of some edges by replacing sets of edges that have the same set of source and target nodes by a special node called an edge concentration node. Reducing the number of edges often has the desirable side effect of reducing the number of crossings. An algorithm that determines a reasonable set of edge concentrations of a graph in &Ogr;(n4) operations for each level in the graph is presented where n is the number of nodes in that level. Several examples from the area of software configuration management are shown to demonstrate the effectiveness of using edge concentrations.
Conference Paper
This paper proposes an expressional loop notation (XLoop) based on the ideas described in [16,17] which makes it practical to express loops as compositions of functions. The primary benefit of XLoop is that it brings the powerful metaphor of expressions ...
Article
The authors examine the use of cluster analysis as a tool for system modularization. Several clustering techniques are discussed and used on two medium-sized systems and a group of small projects. The small projects are presented because they provide examples of certain types of phenomena. Data bindings between the routines of the system provide the basis for the bindings. It appears that the clustering of data bindings provides a meaningful view of system modularization.
Article
"December 1984." "Also numbered CIS-G553-1." "The research report ... was supported in part by the Defense Advanced Research Projects Agency under contract no. MDA 903-80-C-0432." Thesis (Ph. D.)--Stanford University, 1984. Includes bibliographical references (p. 207-214). Microfiche.
Article
Thesis (M. Sc.)--University of Victoria, 1990. Includes bibliographical references.
Conference Paper
The authors describe Rigi, a model and tool that uses a graph model and abstraction mechanisms to structure and represent the information accumulated during the development process. The objects and relationships of the graph model represent system components and their dependencies. The objects can be arranged in aggregation and generalization hierarchies. Rigi was designed to address three of the most difficult problems in the area of programming-in-the-large: the mastery of the structural complexity of large software systems, the effective presentation of development information, and the definition of procedures for checking and maintaining the completeness, consistency, and traceability of system descriptions. Thus, the major objective of Rigi is to effectively represent and manipulate the building blocks of a software system and their myriad dependencies, thereby aiding the development phases of the project
Conference Paper
The author describes a software tool that provides heuristic modularization advice for improving existing code. A heuristic design similarity measure is defined, based on the Parna information hiding principle. The measure supports two services: clustering, which identifies groups of related procedures, and maverick analysis, which identifies individual procedures that appear to be in the wrong module. The tool has already provided useful advice in several real programming projects. The tool will soon incorporate an automatic tuning method, which allows the tool to learn from its mistakes, adapting its advice to the architect's preferences. A preliminary experiment demonstrates that the automatically tuned similarity function can assign procedures to modules very accurately
Conference Paper
The majority of software maintenance activity is performed on systems that are poorly documented and were built without the benefit of modern design practices. A discussion is presented of the importance of documentation in the maintenance phase, and a tool based on hypertext technology is proposed that can meet a number of the documentation needs of the maintenance programmer. This tool supports the following features: incremental documentation, casual update, quality assurance, team use, configuration management, integrated source code, integrated automatic documentation, and information hiding
Conference Paper
An empirical study is presented that investigates hierarchical software system descriptions that are based on measures of cohesion and coupling. The study evaluates the effectiveness of the hierarchical descriptions in identifying error-prone system structure. The measurement of cohesion and coupling is based on intrasystem interaction in terms of software data bindings. The measurement of error-proneness is based on software error data collected from high-level system design through system test; some error data from system operation are also included. The data bindings software analysis and supporting tools are described, followed by the data analysis, interpretations of the results, and some conclusions
Conference Paper
The INFUSE change-management facility's methodology is briefly explained and its use of a hierarchy of experimental databases for controlling and coordinating changes is described. The authors then present the algorithm which INFUSE uses to automatically build and maintain this hierarchy. Some results of the application of INFUSE are given; one presented hierarchy is similar to one independently identified by a human expert
Article
The key to applying computer-aided software engineering to the maintenance and enhancement of existing systems lies in applying reverse-engineering approaches. However, there is considerable confusion over the terminology used in both technical and marketplace discussions. The authors define and relate six terms: forward engineering, reverse engineering, redocumentation, design recovery, restructuring, and reengineering. The objective is not to create new terms but to rationalize the terms already in use. The resulting definitions apply to the underlying engineering processes, regardless of the degree of automation applied
Article
This paper examines the use of cluster analysis as a tool for system modularization. Several clustering techniques are discussed and used on two medium-size systems and a group of small projects. The small projects are presented because they provide examples (that will fit into a paper) of certain types of phenomena. Data bindings between the routines of the system provide the basis for the bindings. It appears that the clustering of data bindings provides a meaningful view of system modularization.
Article
A formal model for describing and evaluating visibility control mechanisms is introduced. The model reflects a general view of visibility in which the concepts of requisition of access and provision of access are distinguished. This model provides a means for characterizing and reasoning about the various properties of visibility control mechanisms. Specifically, the notion of preciseness is defined. The utility of the model is illustrated by using it to evaluate and compare the relative strengths and weaknesses, with respect to preciseness, of the visibility control mechanisms found in Algol 60, Ada, Gypsy, and an approach called PIC, which specifically addresses the concerns of visibility control in large software systems.
Article
Software professionals rely on internal documentation as an aid in understanding programs. Unfortunately, the documentation for most programs is usually out-of-date and cannot be trusted. Without it, the only reliable and objective information is the source code itself. Personnel must spend an inordinate amount of time exploring the system by looking at low-level source code to gain an understanding of its functionality. One way of producing accurate documentation for an existing software system is through reverse engineering. This paper outlines a reverse engineering methodology for building subsystem structures out of software building blocks, and describes how documenting a software system with views created by this process can produce numerous bene#ts. It addresses primarily the needs of the software engineer and technical manager as document users. Key words: Software documentation, reverse engineering, software maintenance. 1 Introduction As today's software ages, the task of ...
Article
Managers of large software systems face enormous challenges when it comes to making informed project-related decisions. They require a high-level understanding of the entire system and in-depth information on selected components. Unfortunately, many software systems are so complex and/or old that such information is not readily available. Reverse engineering---the process of extracting system abstractions and design information from existing software systems---can provide some of this missing information. This paper outlines how a software system can benefit from reverse engineering, and describes how management personnel can use the information provided by this process as an aid in making informed decisions related to large software projects.
Article
We present a formulation of interconnection models and present the unit and syntactic models --- the primary models used for managing the evolution of large software systems. We discuss various tools that use these models and evaluate how well these models support the management of system evolution. We then introduce the semantic interconnection model. The semantic interconnection model incorporates the advantages of the unit and syntactic interconnection models and provides extremely useful extensions to them. By refining the grain of interconnections to the level of semantics (that is, to the predicates that define aspects of behavior) we provide tools that are better suited to manage the details of evolution in software systems and that provide a better understanding of the implications of changes. We do this by using the semantic interconnection model to formalize the semantics of program construction, the semantics of changes, and the semantics of version equivalence and compatibi...
Article
In current change management tools, the actual changes occur outside the tool. In contrast, Infuse concentrates on the actual change process and provides facilities for both managing and coordinating source changes. Infuse provides facilities for automatically structuring the cooperation among programmers, propagating changes, and determining the consistency of changes, and provides a basis for negotiating the resolution of conflicting changes and for iterating over a set of changes. 1. Introduction A number of tools address the problems of managing changes to large software systems. Most such tools provide a framework in which programmers can reserve modules 1 for change and in which the changes themselves occur outside of the tool. Examples include SCCS [Rochkind 75], Cedar's System Modeller [Lampson 83], Darwin [Minsky 85] and DSEE [Leblang 84]. When a change is to be made to a module, a programmer reserves the module, obtains an official copy of the module, and then proceeds by...
Reverse engineering tools', CASE OUTLOOK
  • G Forte
Forte, G. (1992) 'Reverse engineering tools', CASE OUTLOOK, March-April, 5-28.
Documenting software systems with viewsDiscovering structure in large software systems
  • S R Tilley
  • H A Muller
  • M A Orgun
Tilley, S. R., Muller, H. A. and Orgun, M. A. (1992) 'Documenting software systems with views', Proceedings of ACM SIGDOC '92, Ottawa, Ontario, 13-16 October 1992, ACM (Order Number Uhl, J. S. (1989) 'Discovering structure in large software systems', M S c. thesis, Department of Computer Science, University of Victoria, Victoria, BC.
A workbench for realistic image synthesis
  • B Come
Come, B. (1990) 'A workbench for realistic image synthesis', MSc. thesis, Department of Computer Science, University of Victoria, Victoria, BC.
SOFTMAN: environment for forward and reverse CASE, Information and Software Technology
  • A C Choi
  • W Scacchi
Choi, A. C. and Scacchi, W. (1991) 'SOFTMAN: environment for forward and reverse CASE, Information and Software Technology, 33(9), 664-674.
EDGE: an extendible graph editor', Software-Practice & Experience
  • F J Newbery
  • W F Tichy
Newbery, F. J. and Tichy, W. F. (1990) 'EDGE: an extendible graph editor', Software-Practice & Experience, 20(1), 63-88.
Verifying1990`Verifying software quality criteria using an interactive graph editor
  • H A Mm Uller
MM uller, H. A. 1990`Verifying1990`Verifying software quality criteria using an interactive graph editor', Proceedings of the Eighth Annual Paciic Northwest Software Quality Conference Portland, Oregon, October 29-31, pp. 2288241.
  • D Dunn
  • B S Everitt
Dunn, D. and Everitt, B. S. 1982 An Introduction to Mathematical Taxonomy, Cambridge University Press.
Tutorial1990`Tutorial on Software Reengineering', Conference on Software Maintenance|
  • R S Arnold
Arnold, R. S. 1990`Tutorial1990`Tutorial on Software Reengineering', Conference on Software Maintenance| 1990, San Diego, California, November 26629, IEEE Computer Society Press Order number 2091.
Algorithms for Treewidth-Bounded G r aphs
  • M Mata-Montero
Mata-Montero, M. 1990 Algorithms for Treewidth-Bounded G r aphs, Ph.D. Dissertation, Department of Computer Science, University of Victoria, Victoria, B.C.
Reverse engineering tools
  • Forte