Article

Toward Practical Automated Program Understanding

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

1 Introduction It is an open question whether automated program understanding can become a practical, useable tool in the reverse engineering or maintaining of existing, real-world legacy systems. However, there are clearly several traits that any deployable automated program understanding tool must possess: 1. It must be based on an understanding algorithm that scales in practice to large programs. 2. It must produce an understanding targeted to the specific reverse engineering or maintenance tasks it is being used to support. 3. It must provide mechanisms that allow the programmers who perform reverse engineering or maintenance tasks to update its knowledge base. 4. It must integrate with other, existing tools that support maintenance and reverse engineering. 5. Finally, it must help the end-user achieve tasks more simply and more cheaply than alternative approaches. This abstract provides an overview of our approach to constructing a program understanding tool that possesses t...

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Quilici's method is representative of other earlier work in this area, including work by Kozaczynski and Ning (1994). This approach (Quilici, 1994(Quilici, , 1995Chin, 1994, 1995) is based on a construction of an explicit library of programming plan templates, complete with an indexing ability, which can quickly associate a particular instance of recognized source code with program plan templates in the knowledge base. Furthermore, a combination of top-down and bottom-up search strategies is utilized to implement the matching process. ...
Article
The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper we present a model of program understanding using constraint satisfaction. Within this model we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understan...
Conference Paper
In traditional way, software plans are represented explicitly by some semantic schemas. However, semantic contents, constrains and relations of plans are hard for explicit presentation. Besides, it is a heavy and error-prone work to build such a library of plans. Algorithms of recognition of such plans demand exact matching by which semantic denotation is obvious itself. We thus present a novel approach of applying neural network in the presentation and recognition of plans via asymmetric Hebbian plasticity and non-linear auto-regressive with exogenous inputs (NARX) to learn and recognize plans. Semantics of plans are represented implicitly and error-tolerant. The recognition procedure is also error-tolerant because it tends to match fuzzily like human. Models and relevant limitations are illustrated and analyzed in this article
Article
Most current models of program understanding are unlikely to scale up successfully. Top-down approaches require advance knowledge of what the program is supposed to do, which is rarely available with aging software systems. Bottom-up approaches require complete matching of the program against a library of programming plans, which is impractical with the large plan libraries needed to understand programs that contain many domain-speciic plans. This paper presents a hybrid approach to program understanding that uses an indexed, hierarchical organization of the plan library to limit the number of candidate plans considered during program understanding. This approach is based on observations made from studying student programmers attempt to perform bottom-up understanding on geometrically-oriented C functions.
Article
Program understanding can be seen as the process of understanding abstract concepts in the program code. Thus, automated recognition of abstract concepts may greatly assist the human understanding process. This paper describes an approach to automated concept recognition and its implementation. In the approach, we use a concept model and a library of concept recognition rules to describe what the concepts are and how to recognize them from lower-level concepts. Programming language knowledge as well as domain knowledge are used to aid the recognition of abstract concepts.
Conference Paper
The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated understanding tools will be able to completely understand them. Yet automated tools can clearly recognize portions of the design. That suggests exploring environments in which programmer and system work together to understand legacy software. This paper describes such an environment that supports programmer and system cooperating to extract an object-oriented design from legacy software systems. It combines an automated program understanding component that recognizes standard implementations of domain independent plans with with a structured notebook that the programmer uses to link object-oriented design primitives to arbitrary source code fragments. This jointly extracted information is used to support conceptual queries about the program's code and design
Conference Paper
A hybrid approach to program understanding is presented. It uses an indexed, hierarchical organization of the plan library to limit the number of candidate plans considered during program understanding. This approach is based on observations made from studying the attempts of student programmers to perform bottom-up understanding on geometrically oriented C functions and relies on a highly organized plan library, where each plan has indexing, specialization, and implication links to other plans. It uses an algorithm that takes advantage to these indices to suggest general candidate plans to match top-down against the code, specializations to refine these general plans once they are recognized, and implications to recognize other, related plans without performing further matching
Conference Paper
The development of a tool for modularizing large common business-oriented language (COBOL) programs is described. The motivation for modularizing these programs is discussed, together with a manual modularization process. The business motivation for building a tool to automate the manual process is indicated. An enabling technology and its use in the development of the tool are discussed. Experience to date in alpha-testing the tool is reported
Article
The automated recognition of abstract high-level conceptual information or concepts, which can greatly aid the understanding of programs and therefore support many software maintenance and reengineering activities, is considered. An approach to automated concept recognition and its application to maintenance-related program transformations are described. A unique characteristic of this approach is that transformations of code can be expressed as transformations of abstract concepts. This significantly elevates the level of transformation specifications