Article

Applying Plan Recognition Algorithms To Program Understanding

Authors:
  • Inovia Capital
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Program understanding is often viewed as the task of extracting plans and design goals from program source. As such, it is natural to try to apply standard AI plan recognition techniques to the program understanding problem. Yet program understanding researchers have quietly, but consistently, avoided the use of these plan recognition algorithms. This paper shows that treating program understanding as plan recognition is too simplistic and that traditional AI search algorithms for plan recognition are not suitable, as is, for program understanding. In particular, we show (1) that the program understanding task differs significantly from the typical general plan recognition task along several key dimensions, (2) that the program understanding task has particular properties that make it particularly amenable to constraint satisfaction techniques, and (3) that augmenting AI plan recognition algorithms with these techniques can lead to effective solutions for the program understanding problem.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The other method works piecemeal in a way that uses subsets of the activity sequence to eliminate infeasible plans before attempting to recognize the entire sequence. This second method was suggested by Quilici et al. (1998) but first tested empirically here. In contrast to the greedy approach , the constraint satisfaction approach is complete, in the sense that if all of the recipes for solving a given TinkerPlots problem exist, and the student solved the problem, the algorithm is guaranteed to find the plan that explains the student's interaction. ...
... ture actions from observation sequences. They used Dynamic Bayesian Networks to compute a posterior distribution over possible goals given players' actions in the game. They are able to capture agents' mistakes , but infer the likelihood of a single goal or action, rather than recognizing a hierarchical plan representing the entire action sequence. Quilici et al. (1998) proposed an algorithm for implementing plan recognition as a constraint satisfaction problem but do not evaluate it on real data. We augment this work in several ways. First, by implementing this algorithm on ecologically realistic data, that of adults and middle school students using pedagogical software. Second, by describing alternat ...
... By eliminating branches from the plan tree for the desired complex action C, this pruning process narrows the search space of possible expanded recipes for root action C. This algorithm was suggested by Quilici et al. (1998). The CSPprune method calls the CSPbrute algorithm once per distinct complex sub-action. ...
Article
This paper describes a challenging plan recognition problem that arises in environments in which agents engage widely in exploratory behavior, and presents new algorithms for effective plan recognition in such settings. In exploratory domains, agentsʼ actions map onto logs of behavior that include switching between activities, extraneous actions, and mistakes. Flexible pedagogical software, such as the application considered in this paper for statistics education, is a paradigmatic example of such domains, but many other settings exhibit similar characteristics. The paper establishes the task of plan recognition in exploratory domains to be NP-hard and compares several approaches for recognizing plans in these domains, including new heuristic methods that vary the extent to which they employ backtracking, as well as a reduction to constraint-satisfaction problems. The algorithms were empirically evaluated on peopleʼs interaction with flexible, open-ended statistics education software used in schools. Data was collected from adults using the software in a lab setting as well as middle school students using the software in the classroom. The constraint satisfaction approaches were complete, but were an order of magnitude slower than the heuristic approaches. In addition, the heuristic approaches were able to perform within 4% of the constraint satisfaction approaches on student data from the classroom, which reflects the intended user population of the software. These results demonstrate that the heuristic approaches offer a good balance between performance and computation time when recognizing peopleʼs activities in the pedagogical domain of interest.
... Most previous approaches of design pattern identification are limited because of their performance. Some approaches use Prolog-like unification mechanism [26] or constraint programming [23] , which have poor performance because of the combinatorial explosion of possible occurrences, i.e., the possible combinations of entities in a program that form micro-architectures similar to a design motif. Other approaches based on metrics [1, 14] show promising increase in performance but are still too slow to be included in maintainers' day-to-day design recovery tasks. ...
... Most of the approaches use structural matching between micro-architectures and design motifs. Different structural matching techniques are used: rule inference [20, 26], queries [6, 18] , fuzzy reasoning nets [16], constraint programming [13, 23]. For example, in his precursor work, Wuyts [26] introduces the SOUL environment. ...
... The main problem of such a structural approach is the inherent combinatorial complexity of identifying subsets of entities matching design motifs, which corresponds to a problem of subgraph isomorphism [8] . Approaches bases on constraint pro- gramming [23] also face a combinatorial complexity, although explanations [17] help in reducing this complexity through user-interactions [13]. Antoniol et al. introduce an alternative approach, in which they reduce the search space using metrics [1] . ...
Conference Paper
Design patterns are important in software maintenance because they help in designing, in understanding, and in re-engineering programs. The identification of occurrences of a design pattern consists in identifying, in a program, classes which structure and organisation match - strictly or approximately - the structure and organisation of classes as suggested by the design pattern. We express the problem of design pattern identification with operations on finite sets of bit-vectors. We use the inherent parallelism of bit-wise operations to derive an efficient bit-vector algorithm that finds exact and approximate occurrences of design patterns in a program. We apply our algorithm on three small-to-medium size programs, JHotDraw, Juzzle, and QuickUML, with the Abstract Factory and Composite design patterns and compare its performance and results with two existing constraint-based approaches.
... These following approaches use different data and different representation and detection techniques. For example, Quilici et al. [QYW97] established a relationship between plan recognition and program comprehension. Plan recognition makes use of structural events and actions to determine "the best unified context which causally explains a set of perceived events as they are observed ". ...
... To the best of our knowledge, in previous work in which CSP is used to identify design patterns, constraints are only defined among variables of the same type, for example [QYW97] or [GAA01]. In our approach, constraints can be defined among The set of constraints used in our approach to express the relations between variables, which can be combined to form more complex constraints, includes: ...
Article
Département d'informatique et de recherche opérationnelle Faculté des arts et des sciences Mémoire présent a la Faculté de etudes supérieures en vue de l'obtention du grade de Matrè es sciences (M.Sc.) en informatique Avril, 2008 c Janice Ka-Yee Ng, 2008.
... They allow to describe motifs with precision because a relation can be expressed between Messages, Classifiers, or between a Message and a Classifier. To the best of our knowledge, in previous work, for example [31] or [16], constraints were only defined among variables of the same type. ...
... These works use different data and different representation and detection techniques. For example, Quilici et al. [31] introduced the use of constraint programming to describe motifs as constraint satisfaction problems, while improving both the descriptions and the performance. Guéhéneuc et al. [18] drew inspiration from these works and introduced the use of explanationbased constraint programming and of a dedicated metamodel to describe both motifs and programs, including binary class relationships [15] to improve both the representation and the handling of variants. ...
Article
Full-text available
Design patterns are considered to be a simple and elegant way to solve problems in object-oriented soft- ware systems, because their application leads to a well- structured object-oriented design, and hence, are con- sidered to ease software comprehension and mainte- nance. However, due to the complexity of large object- oriented software systems nowadays, it is impossible to recover manually the design patterns applied dur- ing the design and implementation of a system, which, in turn, impedes its comprehension. In the past few years, the structure and organization among classes were the predominant means of identifying design pat- terns in object-oriented software systems. In this pa- per, we show how to describe behavioral and creational design patterns as collaborations among objects and how these representations allow the identiflcation of be- havioral and creational design patterns using dynamic analysis and constraint programming.
... To make good decisions in a social context, humans often need to recognize the plan underlying the behavior of others, and make predictions based on this recognition. This process, when carried out by software agents or robots, is known as plan recognition, or agent modeling [6,11,19,27,35]. One of the key tasks in agent modeling is behavior classificatio in which a stream of observations is categorized into pre-determined classes. ...
... One of the key tasks in agent modeling is behavior classificatio in which a stream of observations is categorized into pre-determined classes. The focus here is on recognizing patterns (possibly, multiple patterns) in the stream, that would allow its classification This is in contrast to other agent modeling tasks, where the entire sequence of observed actions is to be recognized and matched against the plan library (e.g., to predict goals [22], or identify the sequence of actions that compose a plan [10,11,19,27, 30,31,35,40,42]). To carry out the classification activity recognition algorithms rely on a plan library that encodes the patterns to be matched against the incoming observations. ...
Article
Full-text available
To make good decisions in a social context, humans often need to recognize the plan underlying the behavior of others, and make predictions based on this recognition. This process, when carried out by software agents or robots, is known as plan recognition, or agent modeling. Most existing techniques for plan recognition assume the availability of carefully hand-crafted plan libraries, which encode the a-priori known behavioral repertoire of the observed agents; during run-time, plan recognition algorithms match the observed behavior of the agents against the plan-libraries, and matches are reported as hypotheses. Unfortunately, techniques for automatically acquiring plan-libraries from observations, e.g., by learning or data-mining, are only beginning to emerge. We present an approach for automatically creating the model of an agent behavior based on the observation and analysis of its atomic behaviors. In this approach, observations of an agent behavior are transformed into a sequence of atomic behaviors (events). This stream is analyzed in order to get the corresponding behavior model, represented by a distribution of relevant events. Once the model has been created, the proposed approach presents a method using a statistical test for classifying an observed behavior. Therefore, in this research, the problem of behavior classification is examined as a problem of learning to characterize the behavior of an agent in terms of sequences of atomic behaviors. The experiment results of this paper show that a system based on our approach can efficiently recognize different behaviors in different domains, in particular UNIX command-line data, and RoboCup soccer simulation.
... In these traditional approaches, a machine cannot do anything beyond the predesigned representation. For example , the traditional approaches of algorithm recognition are unable to recognize the algorithms whose programming plans or templates78910 are not defined in the library of algorithm templates. In contrast, the autonomous development paradigm, called autonomous mental development (AMD), enables machines to develop their minds autonomously when they interact with their environments [1, 14]. ...
Article
Full-text available
A developmental model of algorithmic concepts is proposed here for program comprehension. Unlike traditional approaches that cannot do anything beyond their predesigned representation, this model can develop its internal representation autonomously from chaos into algorithmic concepts by mimicking concept formation in the brain under an uncontrollable environment that consists of program source codes from the Internet. The developed concepts can be employed to identify what algorithm a program performs. The accuracy of such identification reached 97.15% in a given experiment.
... Several other approaches have been proposed, using different representations of both the design patterns and the systems in which to detect their occurrences and various algorithms with different trade-offs between simplicity, performance, precision, and recall. Algorithms used for the first time to detect design pattern include logic programming [60], constraint programming [48], queries [34], fuzzy networks [31], graph transformations [2]. Some dedicated algorithms have also been introduced, e.g., [1, 9, 36, 56]. ...
Article
Full-text available
On the one hand, design patterns are solutions to recurring design problems, aimed at increasing reuse, flexibility, and maintainability. However, much prior work found that some patterns, such as the Observer and Singleton, are correlated with large code structures and argued that they are more likely to be fault prone. On the other hand, anti-patterns describe poor solutions to design and implementation problems that highlight weaknesses in the design of software systems and that may slow down maintenance and increase the risk of faults. They have been found to negatively impact change and fault-proneness. Classes participating in design patterns and anti-patterns have dependencies with other classes, e.g., static and co-change dependencies, that may propagate problems to other classes. We investigate the impact of such dependencies in object-oriented systems by studying the relations between the presence of static and co-change dependencies and (1) the fault-proneness, (2) the types of changes, and (3) the types of faults that these classes exhibit. We analyze six design patterns and 10 anti-patterns in 39 releases of ArgoUML, JFreeChart, and XercesJ, and investigate to what extent classes having dependencies with design patterns or anti-patterns have higher odds of faults than other classes. We show that in almost all releases of the three systems, classes having dependencies with anti-patterns are more fault-prone than others while this is not always true for classes with dependencies with design patterns. We also observe that structural changes are the most common changes impacting classes having dependencies with anti-patterns. Software developers could use this knowledge about the impact of design pattern and anti-pattern dependencies to better focus their testing and reviewing activities towards the most risky classes and to propagate changes adequately.
... Biggerstaff and others (1994) presented reverse engineering as the assignment of concepts to program locations, as have others (Rajlich 2009;Duala-Ekoko and Robillard 2007). Still others see reverse engineering activities as the recognition of "plans" intended by the developer of the software (Quilici and Woods 1998;Allemang 1991;Soloway and Ehrlich 1984). ...
... Biggerstaff (1994) and others (Rajlich, 2009, Duala-Ekoko andRobillard, 2007) have presented reverse engineering as the assignment of concepts to program locations. Still others see reverse engineering activities as the recognition of "plans" intended by the developer of the software (Quilici and Woods, 1998, Allemang, 1991, Soloway and Ehrlich, 1984. ...
Article
Full-text available
People perform reverse engineering to discover vulnerabilities, to understand how attackers could exploit vulnerabilities, and to determine ways in which vulnerabilities might be mitigated. People reverse engineer executable programs to determine the structure, function, and behavior of software from unknown provenance that may not be trustworthy or safe to use. Reverse engineering also allows the investigation of malicious code to understand how it works and how to circumvent self-protection and stealth techniques used by malware authors. Finally, reverse engineering can help engineers determine how to interface with legacy software that only exists in executable form. Although each of these applications of reverse engineering provides part of an organization's defensive knowledge of their information systems, there has been relatively little work in understanding the human factors involved with reverse engineering software from executable code. Consequently, reverse engineering work remains a highly specialized skill, and many reverse engineering tools are difficult for analysts to use. To better understand the human factors considerations of reverse engineering executable software, we conducted semi-structured interviews with five nationally-renowned subject matter expert reverse engineers and analyzed the verbal data from the interviews using two analysis approaches. We used thematic analysis techniques borrowed from educational psychology to investigate themes from the interview responses, first at the idea level, then at the sentence level. We decomposed the responses into a set of main goals that we describe in this paper.
... Other, mostly generative approaches employ a more abstract specification of the legacy target platform (e.g., [KWDE98,MCAH95,Jar95,PP94]). They have proven useful for fully-automatic activities like program pattern recognition [QYW98], schema transformation [MCAH95], and code restructuring [SV98]. However, a CARE tool should also facilitate the customization of DBRE heuristics (e.g., [SLGC94]) and the performed process (e.g., [HK94]). ...
Article
Full-text available
Software evolution and maintenance problems might be caused by all kinds of ne w or changed requirements. However, McCabe (McC98) has identified a number of requirements, which are currently of special importance because they are responsible for significant mass changes in today's business software. Among these central requirements are the Year-2000 problem (Mar97a), the Euro-conversion problem (Gro98), and the ability to compete on a global, electronic mark et. The primary concern of all these requirements is the issue of ho w b usiness data should adequately be represented in softw are systems. The addressed problems range from simple questions, e.g., for the number of digits that are necessary to represent a date (Y ear-2000 problem), up to complex architectural decisions, e.g., ho w to federate data maintained by diverse (formerly autonomous) information systems and integrate these systems with the Web to facilitate electronic commerce. If a legacy software system (LSS) has to be adapted to one of these requirements, a conceptual documentation of its data structure (DS) is thus often a necessary prerequisite to achie ve the maintenance goal. Moreover, a conceptual DS is an excellent starting point for the migration to modern programming languages, as the y are usually data-oriented (GK93). This is because the conceptual DS reflects major b usiness rules but is fairly independent from procedural application code.
... Other techniques proposed in the literature include plan recognition techniques [28, 29] used to build a hierarchy of concepts (abstractions such as complex data structures or specific functionality). ...
... ng Microsoft Visual Studio and ad hoc algorithms are implemented as queries over the intermediate code generated during the compilation. Other query-based approaches include (Ciupke 1999; Keller et al. 1999). Queries have the potential to be extremely fast (Beyer et al. 2005) but so far have been used only to specify motifs in a non-systematic way. Quilici et al. (1997) used constraint programming to identify design motifs. Their approach consists of translating the problem of design-motif identification into a problem of constraint satisfaction. Design motifs are described as constraint systems for which the classes of a program form the domains of the variables. The resolution of the constraint syste ...
Article
Full-text available
The identification of occurrences of design patterns in programs can help maintainers to understand the program design and implementation. It can also help them to make informed changes. Current identification approaches are limited to complete occurrences, are time- and resource-consuming, and lead to many false positives. We propose to combine a structural and a numerical approach to improve the identification of complete and incomplete occurrences of design patterns. We develop a structural approach using explanation-based constraint programming and we enhance this approach using experimentally built numerical signatures. We show that the use of numerical signatures improves the identification of complete and incomplete occurrences in terms of performance and precision.
... Most of the program understanding algorithm with this approach were using library of programming plan with multi-heuristics strategies to find the existence of plans in the source program. This statement has emphasized in former researches such as in6789 . ...
Article
Full-text available
Problem statement: Understanding on computer program is a complex cognitive activity. It is ability and also a difficult task especially for novice programmer. The object-oriented languages has widely used in education and industry recently. In programming it is important to have such software which can aid programmers or students to code the program. But, available program understanding systems using the plan based approach usually are developed for non-object-oriented programming languages. Reviewed from the available system also showed that none of the plan formalisms used is for an object-oriented language. Specifically, problem arises when the existing system is not usable for teaching programming purposes. Program understanding system with plan for object-oriented does not exist was the main reason why this research is being carried out. Approach: Method used on developed the program understanding system named CONCEIVER++ is Unified Approach (UA). The process involved from UA for developing and testing the system is iterative development and continuous testing. The process must be iterate and reiterate until satisfied with the system. In order to test the quality assurance of the system is by choosing the black box testing strategies. Results: The object-oriented program understanding system has been successfully implemented. The implementation is tested with an example of Java programming code. The binary search tree for control flow graph and linked list for plan has been generated. Results of understanding the meaning or semantic of the program codes also has been produced. The black box testing had shows that all statements of line of code of the example program have been recognized and the correctness output has been checked. Conclusion: The understanding module of CONCEIVER++, which are code/CFG processor, plan processor and recognition engine has been tested. All line of codes (or nodes) has been recognized and got correct meaning using the developed module.
... Most of the approaches use structural matching between groups of classes—micro-architectures—and design motifs. Different techniques are used: Rule inference [18, 27], queries [7, 16], fuzzy reasoning nets [14] , constraint pro- gramming [11, 22]. For example, Wuyts [27] developed the SOUL environment in which design motifs are described as Prolog predicates and programs entities as facts (classes, methods, fields. . . ...
Conference Paper
Full-text available
Design patterns describe good solutions to common and recurring problems in program design. The solutions are design motifs which software engineers imitate and introduce in the architecture of their program. It is important to identify the design motifs used in a program architecture to understand solved design problems and to make informed changes to the program. The identification of micro-architectures similar to design motifs is difficult because of the large search space, i.e., the many possible combinations of classes. We propose an experimental study of classes playing roles in design motifs using metrics and a machine learning algorithm to fingerprint design motifs roles. Fingerprints are sets of metric values characterising classes playing a given role. We devise fingerprints experimentally using a repository of micro-architectures similar to design motifs. We show that fingerprints help in reducing the search space of micro-architectures similar to design motifs efficiently using the Composite design motif and the JHotDraw framework.
... Recogniser [20] 2 Gold HB-CAS [9], [10], [11] 4 Johnson PROUST [17], [18] 3 Rich, Waters Programmer' s Apprentice [34], [35], [25], [26], [27], [28] 4 Chin, Quilici DECODE [6] 3 Woods et al. PU-CSP [22], [39], [40], [23], [43], [24], [41], ...
Conference Paper
Full-text available
Program and system comprehension are vital parts of the software maintenance process. We discuss the need for both perspectives and describe two methods that may be integrated to provide a smooth transition in understanding from the system level to the program level. Results from a qualitative survey of expert industrial software maintainers, their information needs and requirements when comprehending software are initially presented. We then review existing software tools which facilitate system level and program comprehension. Two successful methods from the fields of data mining and concept assignment are discussed, each addressing some of these requirements. We also describe how these methods can be coupled to produce a broader software comprehension method which partly satisfies all the requirements. Future directions including the closer integration of the techniques are also identified.
... The automatic understanding systems of programs intend to simulate the analyst's behavior mapping the knowledge of the professional to the source code [22]. The final objective pursue to extract programming plans and the design objectives from the source code [12]. The programming plans are algorithmical structures that the programmers have applied repeatedly during the implementation of the system and these structures are disperse in the source code (Delocalized programming plans) [16]. ...
Article
The construction of the plan libraries as repository for automatic understanding systems is almost a mystery. So far, there have been no reports that describe some technique on how to accomplish programs analysis about the construction of plan libraries of programming. This paper presents a technique based on the automatic comparison of slices. It permits the analyst to focus his attention on a meaningful code for the design of program plans. The results obtained until now confirm the feasibility of the approach, and lead to strengthen the practical application of the plan libraries in the maintenance and re-engineering process.
... Program understanding is often viewed as the process of finding program plans in source code, which represent a certain meaning [15]. In taking a closer look to approaches in that field we see that most of them begin by analyzing some source-code instructions. ...
Article
Full-text available
One of the hardest tasks to be fulfilled during the analysis of legacy systems is how to determine the precise semantics of program components. Investigating the internal data and control structures is difficult due to the huge number of possible implementation variants for the same problem. To facilitate the task we propose to use components kept and described in a repository of reusable concepts as reference points. This becomes possible when behavior sampling is used as classification/retrieval strategy. In matching the results of isolated components from a legacy system against already executed components in a repository, one can tackle the problem of classifying legacy components without considering there internal structure. As a side effect, the population of the reuse repository is increased. In this paper we propose a model to reuse the knowledge containd in a behavior based reuse repository for analyzing, classifying and understanding isolated executable components from a le...
... Generally, program understanding is the process of acquiring knowledge about computer program. Specifically, program understanding is the process of recognizing program plans and extracting design goals in the source code [QYW98,KN94,Will92]. Program plans are abstract representations of the clichés or particular code patterns. ...
Article
Full-text available
Currently, programming instructors continually face the problem of helping to debug students' programs. Although there currently exist a number of debuggers and debugging tools in various platforms, most of these projects or products are crafted through the needs of software maintenance, and not through the perspective of teaching of programming. Moreover, most debuggers are too general, meant for experts as well as not user-friendly. We propose a new knowledge-based automated debugger to be used as a user-friendly tool by the students to self-debug their own programs. Stereotyped code (cliche) and bugs cliche will be stored as library of plans in the knowledge-base. Recognition of correct code or bugs is based on pattern matching and constraint satisfaction. Given a syntax error-free program and its specification, this debugger called Adil (Automated Debugger in Learning system) will be able locate, pinpoint and explain logical errors of programs. If there are no errors, it will be able to explain the meaning of the program. Adil is based on the design of the Conceiver, an automated program understanding system developed at Universiti Kebangsaan Malaysia. Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AADEBUG 2000), August 2000, Munich. cs.SE/0010035
Article
A method of automatic plan extraction based on suffix trees was proposed to support the maintenance of software plan repositories for program comprehension, which are maintained by domain and computer science experts manually. The source codes were transformed into token strings after lexical and syntax analysis. The output token strings were used to construct a suffix tree, by traveling through which, a set of candidates of software plans was obtained. Filters were applied to the set to obtain a much smaller set. A dynamic pattern matching algorithm was finally applied to the set to merge suitable candidate plans and name them. The time and space requirements are linear to the size of nodes in suffix trees. The method can be applied to large scale software. Test results of applying the method to some software with different sizes show that it can extract valid plans from source code.
Article
Aiming at the problem of constructing huge amounts of projected databases in PrefixSpan algorithm, this paper proposes an Improved PrefixSpan algorithm for Mining Sequential Patterns, called BLSPM algorithm (based on bi-level Sequential Patterns Mining). The algorithm use duplicated projection and certain specific sequential patterns pruning, reduce the scale of projected databases and the runtime of scanning projected databases, thus, the efficiency of algorithm could be raised up greatly, and all needed sequential patterns are obtained. Experiment results shows that BLSPM algorithm is more efficient than PrefixSpan algorithm in large databases.
Article
This research presents an empirical study on the program comprehension and debugging processes of novice programmers. We provide empirical evidence that the increase exposure to a large number of quality code modification and adaptation in formal teaching is a viable technique for novices to learn program debugging but not for program comprehension. The empirical study is based on case studies at one of the Malaysian universities among the first-degree Information Technology programs students taking Java Programming, an elective programming course. We design a quasi-experiment with non-randomized quota sampling control group with pre-test-post-test. This experiment looks into the program comprehension and debugging constructs at the micro level. Code segments in Java programming language between 5-25 lines of codes are given for the students to try to comprehend or debug manually with pen and paper within a specific timeframe. It will form part of the normal assessment tests for the course. The pre-test involve correct code while the post-test involve both correct and (logical and run-time) bugged code. A control group of 80 students and a treated group of 24 students form the non-randomized quota samples.
Thesis
Full-text available
This dissertation presents a theory of how reverse engineers make sense of executable programs. The theory describes the process of sensemaking in reverse engineering as a goal-directed planning-based search activity, in which the reverse engineer interacts with an executable program using reverse engineering tools in order to construct a mental model and working understanding of the functionality of the program. This theory is developed through a case study, semi-structured interviews with expert reverse engineers, and observations of reverse engineers performing a reverse engineering task. The theory of sensemaking in reverse engineering is a step toward building autonomy into analysis tools so they will be able to discovery vulnerabilities in complex software-based systems and analyze executable programs to determine whether those programs contain undocumented malicious functionality and should not be trusted.
Article
To improve the accuracy of information retrieval (IR) based program comprehension method, a new two stages method was proposed, which consists of IR stage and probabilistic finite-state automata (PFA) recognition stage. This method uses, PFAs to address the problem of imprecise in applying IR in program comprehension directly. Meanwhile, applying IR makes it possible to construct many simple PFAs rather than a big complex one to greatly improve the scalability of recognition. PFAs are learned from clusters generated by latent semantic analysis (LSA) in training state. In recognition state, source code segment is processed in lexical, and then it is used as an IR query to retrieve n candidate plans. After that, the corresponding PFAs of the plans are found, and the PFA with maximum probability is chosen. Finally, the code segment is marked with the same semantic as the result PFA.
Article
In this paper we propose a new plan recognition method from observations of incomplete action sequences by regarding them as prefixes in a probabilistic context-free grammar (PCFG). In previous work that uses a PCFG for plan recognition, the PCFG receives a sentence, i.e. an observation of complete action sequences to recognize the plan behind it. However, when we deal with real plan recognition problems such as the Web access log analysis, we often cannot have complete sequences of actions and the traditional PCFG approach is not applicable. To overcome this difficulty, we extend the probability computation of PCFGs to prefix probability computation though it requires an infinite sum of probabilities. We applied the proposed method to infer the intended goals of Web site visitors from the online and partial observations of their actions. Also we compared the performance of plan recognition from observations of initial sequences of visitors' actions with that from full observations.
Article
Full-text available
Domestic and industrial robots, intelligent software agents, virtual-world avatars, and other artificial entities are being created and deployed in our society for various routine and hazardous tasks, as well as for entertainment and companionship. Over the past ten years or so, primarily in response to the growing security threats and financial fraud, it has become necessary to accurately authenticate the identities of human beings using biometrics. For similar reasons, it may become essential to determine the identities of nonbiological entities. Trust and security issues associated with the large-scale deployment of military soldier-robots [55], robot museum guides [22], software office assistants [24], human like biped robots [67], office robots [5], domestic and industrial androids [93], [76], bots [85], robots with humanlike faces [60], virtual-world avatars [109], and thousands of other man-made entities require the development of methods for a decentralized, affordable, automatic, fast, secure, reliable, and accurate means of authenticating these artificial agents. The approach has to be decentralized to allow authority-free authentication important for open-source and collaborative societies. To address these concerns, we proposed [117], [120], [119], [38] the concept of artimetricsa field of study that identifies, classifies, and authenticates robots, software, and virtual reality agents. In this article, unless otherwise qualified, the term robot refers to both embodied robots (industrial, mobile, tele, personal, military, and service) and virtual robots or avatars, focusing specifically on those that have a human morphology.
Article
Full-text available
This chapter expends behavior based intrusion detection approach to a new domain of game networks. Specifically, our research shows that a behavioral biometric signature can be generated based on the strategy used by an individual to play a game. We wrote software capable of automatically extracting behavioral profiles for each player in a game of poker. Once a behavioral signature is generated for a player, it is continuously compared against player's current actions. Any significant deviations in behavior are reported to the game server administrator as potential security breaches. In this chapter, we report our experimental results with user verification and identification, as well as our approach to generation of synthetic poker data and potential spoofing approaches of the developed system. We also propose utilizing techniques developed for behavior based recognition of humans to the identification and verification of intelligent game bots. Our experimental results demonstrate feasibility of such methodology.
Article
As C Programming Language is a basic and required language course in vast majority of non-computer science engineering students, but the learning is difficult and the pass rate is low. A idea of using PBL teaching methods is presented by the research of teaching in non-computer science. This may improve the effect of C language teaching, and promote the students' learning initiative. Through the practical teaching, it is found that it is more suitable for PBL teaching in C Programming Language lessons, not just in the classroom, so on the lab course. It effectively improves the students' ability to understand the program, the algorithm analysis and design ability, and the ability to write program by using this method.
Article
Full-text available
Homeland security requires technologies capable of positive and reliable identification of humans for law enforcement, government, and commercial applications. As artificially intelligent agents improve in their abilities and become a part of our everyday life, the possibility of using such programs for undermining homeland security increases. Virtual assistants, shopping bots, and game playing programs are used daily by millions of people. We propose applying statistical behavior modeling techniques developed by us for recognition of humans to the identification and verification of intelligent and potentially malicious software agents. Our experimental results demonstrate feasibility of such methods for both artificial agent verification and even for recognition purposes.
Article
Architectural migration is the restructuring of a software system and/or its data to a new architecture, usually as provided by a new platform or software technology. The paper studies the practical problems of replacing "old" with "new" technology while preserving the features and to the extent possible the architecture of an "old" (legacy) application. We propose a reengineering process of constructing a "mapping" between feature implementations on the old and the new platforms, and support that process with a tool
Conference Paper
Full-text available
Applying program comprehension techniques may render software maintenance and evolution easier. Understanding a software system typically requires a combination of static and dynamic analysis techniques. The aim of this workshop is to bring together researchers and practitioners working in the area of program comprehension with an emphasis on dynamic analysis. We are interested in investigating how dynamic analysis techniques are used or can be used to enable better comprehension of a software system. The objective is to compare existing techniques, identify common case studies and possible symbioses for existing solutions. Building upon three previous editions of the workshop, we aim to set up a forum for exchanging experiences, discussing solutions, and exploring new ideas.
Article
The plan matching problem is to determine whether a program plan is present in a program. This problem has been shown to be NP-hard, which makes it an open question whether plan matching algorithms can be developed that scale sufficiently well to be useful in practice. This paper discusses experiments in the scalability of a series of constraint-based program plan matching algorithms we have developed. These empirical studies have led to significant improvements in the scalability of our plan matching algorithm, and they suggest that this algorithm can be successfully applied to large, real-world programs.
Conference Paper
Intelligent bots are quickly becoming a part of our everyday life. Virtual assistants, shopping bots, and game playing programs are used daily by millions of people. As such programs become closer in their abilities and intelligence to human beings the need will arise to verify and recognize such artificially intelligent software just like it is often necessary to authenticate and confirm identity of people. We propose applying techniques developed for behavior based recognition of humans to the identification and verification of intelligent game bots. Our experimental results demonstrate feasibility of such methods for both game Bot verification and even for recognition purposes.
Conference Paper
We propose a novel technique for recovering certain elements of the UML model of a software system. These include relationships between use cases as well as class roles in collaborations that realize each use case, identifying common functionality and thus establishing a hierarchical view of the model. The technique is based on dynamic analysis of the system for the selected test cases that cover relevant use cases. The theory of formal concept analysis is applied to obtain classification of model elements, obtained by a static analysis of code, in terms of use case realizations.
Conference Paper
Full-text available
This paper presents a plan recognition algorithm for inferring student behavior using virtual science laboratories. The algorithm extends existing plan recognition technology and was integrated with an existing educational application for chemistry. Automatic recognition of students' activities in virtual laboratories can provide important information to teachers as well as serve as the basis for intelligent tutoring. Student use of virtual laboratories presents several challenges: Students may repeat activities indefinitely, interleave between activities, and engage in exploratory behavior using trial-and-error. The plan recognition algorithm uses a recursive grammar that heuristically generates plans on the fly, taking into account chemical reactions and effects to determine students' intended high-level actions. The algorithm was evaluated empirically on data obtained from college students using virtual laboratory software for teaching chemistry. Results show that the algorithm was able to (1) infer the plans used by students to construct their models; (2) recognize such key processes as titration and dilution when they occurred in students' work; (3) identify partial solutions; (4) isolate sequences of actions that were part of a single error.
Conference Paper
In this paper, Action Language formalism has been used to reason about narratives in a multi agent framework. The actions have been given a semantic frame representation. Hypothetical situations have been dealt using different states for world knowledge and agents' knowledge. A notion of plan recognition has been proposed to answer causal queries. Finally, an algorithm has been proposed for automatically translating a given narrative into the representation and causal query entailment has been shown.
Article
When kept up-to-date, formal specifications can act as valid artifacts for maintenance tasks. However, their linguistic density and size impede comprehension, reuse, and change activities. Techniques such as specification slicing and chunking help in reducing the number of relevant lines of text to be considered, but they expect the point of change to be known a priori. This contribution presents a process model for concept location within formal Z-specifications. It also considers those situations when the location is not even roughly known. The identification is comparable to the identification of regions with high cohesion. The approach is based on the idea of first transforming the specification to an augmented graph and, secondly, on the generation of spacial clusters. Copyright © 2008 John Wiley & Sons, Ltd.
Article
Full-text available
Design patterns are important in software maintenance because they help in understanding and re-engineering systems. They propose design motifs, solutions to recurring design problems. The identification of occurrences of design motifs in large systems consists of identifying classes whose structure and organization match exactly or approximately the structure and organization of classes as suggested by the motif. We adapt two classical approximate string matching algorithms based on automata simulation and bit-vector processing to efficiently identify exact and approximate occurrences of motifs. We then carry out two case studies to show the performance, precision, and recall of our algorithms. In the first case study, we assess the performance of our algorithms on seven medium-to-large systems. In the second case study, we compare our approach with three existing approaches (an explanation-based constraint approach, a metric-enhanced explanation-based constraint approach, and a similarity scoring approach) by applying the algorithms on three small-to-medium size systems, JHotDraw, Juzzle, and QuickUML. Our studies show that approximate string matching based on bit-vector processing provides efficient algorithms to identify design motifs.
Article
Full-text available
We propose a novel method for recovering certain elements of the UML model of a software system. These include relationships between use cases as well as class roles in collaborations that realize each use case, identifying common functionality and thus establishing a hierarchical view of the model. The method is based on dynamic analysis of the system for the selected test cases that cover relevant use cases. The theory of formal concept analysis is applied to obtain classification of model elements, obtained by a static analysis of code, in terms of use case realizations.
Conference Paper
We introduce QBO, or Query by Outlines, a tool specially developed to help explore programs. It relies on a previously implemented system able to automatically construct outlines (F. Balmas, 1997; 1998): every linear loop identified in a program is conceptualized according to the kind of computations it performs. QBO proposes an outline storage mechanism together with a query algorithm that enables outlines to be efficiently retrieved. QBO eases exploration of programs, thus program management, clone detection or plan recognition can be envisaged at lower cost; as outlines are already computed and indexed, only high level constructs have to be checked. Therefore, answering queries is a rather fast process. We sketch our outlining model, present our query tool and discuss how query by outlines may help explore programs
Article
The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper we present a model of program understanding using constraint satisfaction. Within this model we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understan...
Article
Full-text available
We propose an efficient interval partitioning algorithm to solve the continuous Constraint Satisfaction Problem (CSP). The method comprises a new dynamic tree search management system that also in-vokes local search in selected subintervals. This approach is compared with two classical tree search techniques and three other interval meth-ods. We study some challenging kinematics problems for testing the algorithm. The goal in solving kinematics problems is to identify all real solutions of the system of equations defining the problem. In other words, it is desired to find all object positions and orientations that satisfy a coupled nonlinear system of equations. The kinematics bench-marks used here arise in industrial applications.
Article
Full-text available
We propose a new CSP formalism that incorporates hard constraints and preferences so that the two are easily distinguished both conceptually and for purposes of problem solving. Preferences are represented as a lexicographic order over variables and domain values, respectively. Constraints are treated in the usual manner. Therefore, these problems can be solved with ordinary CSP algorithms, with the proviso that complete algorithms cannot terminate search after finding a feasible solution, except in the important case of heuristics that follow the preference order (lexical order). We discuss the relation of this problem representation to other formalisms that have been applied to preferences, including soft constraint formalisms and CP-nets. We show how algorithm selection can be guided by work on phase transitions, which serve as a useful marker for a reversal in relative efficiency of lexical ordering and ordinary CSP heuristics due to reduction in number of feasible solutions. We also consider branch and bound algorithms and their anytime properties. Finally, we consider partitioning strategies that take advantage of the implicit ordering of assignments in these problems to reduce the search space.
Article
Full-text available
A large variety of problems in Artificial Intelligence and other areas of computer science can be viewed as a special case of the constraint satisfaction problem. Some examples are machine vision, belief maintenance, scheduling, temporal reasoning, graph problems, floor plan design, planning genetic experiments, and the satisfiability problem. A number of different approaches have been developed for solving these problems. Some of them use constraint propagation to simplify the original problem. Others use backtracking to directly search for possible solutions. Some are a combination of these two techniques. This paper presents a brief overview of many of these approaches in a tutorial fashion. 1 Introduction A large variety of problems in Artificial Intelligence (AI) and other areas of computer science can be viewed as a special case of the constraint satisfaction problem (CSP) (Nadel 1990a). Some examples are machine vision (Chakravarty 1979; Davis 1981; Mackworth 1977b; ...
Article
In order to deal with over-constrained Constraint Satisfaction Problems, various extensions of the CSP framework have been considered by taking into account costs, uncertainties, preferences, priorities. ..Each extension uses a specific mathematical operator (+; max : : :) to aggregate constraint violations. In this paper, we consider a simple algebraic framework, related to Partial Constraint Satisfaction, which subsumes most of these proposals and use it to characterize existing proposals in terms of rationality and computational complexity. We exhibit simple relationships between these proposals, try to extend some traditional CSP algorithms and prove that some of these extensions may be computationally expensive. 1 Introduction and related works The CSP framework provides a very convenient framework for representing and solving various problems related to AI and OR (scheduling, assignment, design...). When a real problem is casted in the CSP framework, different types of knowledg...
Article
The problem of recognizing an agent's plans arises in many contexts in work in artificial intelligence. The plan recognition techniques suggested in the literature are rarely formally justified. We view plan recognition as a special kind of non-monotonic reasoning, and demonstrate how formal techniques developed for such reasoning -- namely, circumscription and minimal entailment -- can be used in plan recognition. The first half of this paper reviews a broad range of work in artificial intelligence and philosophy which relates to plan recognition. A formal treatment of a simple case of plan recognition follows, and the paper concludes with proposals for future extensions of this work.
Book
This book discusses intention-based analysis, a technique for identifying and diagnosing bugs in novice programs. The technique involves determining a program's intended function and structure and comparing them to its actual structure. An implementation of this technique, PROUST, analyses bugs in novice Pascal programs.
Article
The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated tools will be able to extract a complete underlying design. Yet automated tools can clearly recognize portions of the design. This suggests exploring environments in which programmer and system work together to understand legacy software. DECODE is such an environment. It supports programmer and system co-operation to extract design information from legacy software systems. DECODE's automated program understanding component recognizes standard implementations of domain independent plans to produce an initial knowledge base of object-orientated design elements. DECODE's structured notebook component provides the user with a graphical view of the initial understanding, which the user can extend by linking arbitrary source code fragments to either existing or new design elements, and then uses this design information to support conceptual queries about the program's code and design.
Article
This paper describes the main algorithms for solving constraint satisfaction problems, and includes the corresponding encodings (the main different with respect [3] is that, in their paper, they provide a more informal description of the algorithms). We consider three main algorithmic approaches: search, inference and hybrid methods. Search methods can be divided into systematic and non-systematic. We present backtracking as an example of systematic search, and local search as an example of non-systematic search. Inference methods can be divided into complete and incomplete. We describe adaptive consistency as an example of complete inference, and several local consistency algorithms as incomplete inference. We also present some examples of hybrid methods which combine search and inference.
Article
Constraint satisfaction problems can be solved by network consistency algorithms that eliminate local inconsistencies before constructing global solutions. We describe a new algorithm that is useful when the variable domains can be structured hierarchically into recursive subsets with common properties and common relationships to subsets of the domain values for related variables. The algorithm, HAC, uses a technique known as hierarchical arc consistency. Its performance is analyzed theoretically and the conditions under which it is an improvement are outlined. The use of HAC in a program for understanding sketch maps, Mapsee3, is briefly discussed and experimental results consistent with the theory are reported. Les problèmes de satisfaction de contrainte peuvent ětre résolus par des algorithmes de consistance de réseau qui éliminent les inconsistances locales avant de construire des solutions globales. Nous décrivons un nouvel algorithme qui s'avère utile lorsque les domaines variables peuvent ětre structurés hiérarchiquement en sousensembles récursifs possédant des propriétés communes et des relations communes avec les sous-ensembles des valeurs de domaine pour les variables reliées. Ľalgorithme, HAC, utilise une technique connue sous le nom de consistance ?arc hiérarchique. Son fonctionnement est analysé?un point de vue théorique et les améliorations qu'il peut apporter sont mises en évidence. Ľutilisation de HAC dans un programme de compréhension ?esquisses de cartes, Mapsee3, est brièvement discutée et des résultats expérimentaux consistants avec la théorie sont mentionés. Mots clés: problèmes de satisfaction de contrainte, algorithme de consistance de réseau, consistance ?arc, compréhension de carte, vision automatique.
Article
A constraint satisfaction problem involves finding values for variables subject to constraints on which combinations of values are allowed. In some cases it may be impossible or impractical to solve these problems completely. We may seek to partially solve the problem, in particular by satisfying a maximal number of constraints. Standard backtracking and local consistency techniques for solving constraint satisfaction problems can be adapted to cope with, and take advantage of, the differences between partial and complete constraint satisfaction. Extensive experimentation on maximal satisfaction problems illuminates the relative and absolute effectiveness of these methods. A general model of partial constraint satisfaction is proposed.
Article
The recognition of familiar computational structures in a program can help an experienced programmer to understand a program. Automating this recognition process will facilitate many tasks that require program understanding, e.g., maintenance, translation, and debugging. This paper describes a prototype recognition system which demonstrates the feasibility of automating program recognition. The prototype system automatically identifies occurrences of stereotyped algorithmic fragments and data structures, called clichés, in programs. It does so even though the clichés may be expressed in a wide range of syntactic forms and may be in the midst of unfamiliar code. Based on the known behaviors of these clichés and the relationships between them, the system generates a hierarchical description of a plausible design of the program. It does this systematically and exhaustively, using a parsing technique. This work is built on two previous advances: a graphical, programming-language-independent representation for programs, called the Plan Calculus, and an efficient graph parsing algorithm.
Article
In recent years, many new backtracking algorithms for solving constraint satisfaction problems have been proposed. The algorithms are usually evaluated by empirical testing. This method, however, has its limitations. Our paper adopts a different, purely theoretical approach, which is based on characterizations of the sets of search tree nodes visited by the backtracking algorithms. A notion of inconsistency between instantiations and variables is introduced, and is shown to be a useful tool for characterizing such well-known concepts as backtrack, backjump, and domain annihilation. The characterizations enable us to: (a) prove the correctness of the algorithms, and (b) partially order the algorithms according to two standard performance measures: the number of nodes visited, and the number of consistency checks performed. Among other results, we prove the correctness of Backjumping and Conflict-Directed Backjumping, and show that Forward Checking never visits more nodes than Backjumping. Our approach leads us also to propose a modification to two hybrid backtracking algorithms, Backmarking with Backjumping (BMJ) and Backmarking with Conflict-Directed Backjumping (BM-CBJ), so that they always perform fewer consistency checks than the original algorithms.
Article
In reasoning tasks involving the maintenance of consistent databases (so-called QQconstraint networks/Q/Q), it is customary to enforce local consistency conditions in order to simplify the subsequent construction of a globally coherent model of the data. In this paper we present a relationship between the sizes of the variables' domains, the constraints' arity and the level of local consistency sufficient to ensure global consistency. Based on these parameters a new tractability classification of constraint networks is presented. We also show, based on this relationship, that any relation on bi-valued variables which is not representable by a network of binary constraints cannot be represented by networks with any number of hidden variables.
Article
Artificial intelligence tasks which can be formulated as constraint satisfaction problems, with which this paper is for the most part concerned, are usually by solved backtracking the examining the thrashing behavior that nearly always accompanies backtracking, identifying three of its causes and proposing remedies for them we are led to a class of algorithms whoch can profitably be used to eliminate local (node, arc and path) inconsistencies before any attempt is made to construct a complete solution. A more general paradigm for attacking these tasks is the altenation of constraint manipulation and case analysis producing an OR problem graph which may be searched in any of the usual ways.Many authors, particularly Montanari and Waltz, have contributed to the development of these ideas; a secondary aim of this paper is to trace that history. The primary aim is to provide an accessible, unified framework, within which to present the algorithms including a new path consistency algorithm, to discuss their relationships and the may applications, both realized and potential of network consistency algorithms.
Conference Paper
This paper presents a strengthened algorithm for temporal reasoning during plan recognition, which improves on a straightforward application of Allen's reasoning algorithm. This is made possible by viewing plans as both hierarchical structures and temporal networks. As a result, we can show how to use as constraints the temporal relations explicitly given in input to improve the results of plan recognition. We also discuss how to combine the given constraints with those prestored in the system's plan library to make more specific the temporal constraints indicated in the plans being recognised.
Conference Paper
This paper addresses the question of whether the re- verse engineering of legacy systems is doomed to fail- ure. Our position is that the answer is highly depen- dent on the specific goals of the reverse engineering process. We argue that while most reverse engineer- ing efforts may well fail to achieve the traditional goal of automatically extracted complete specifica- tions suitable for forward engineering, they are likely to succeed on the more modest goal of automatically extracting partial specifications that can augmented by system-assisted human understanders.
Conference Paper
Different program understanding algorithms often use different representational frameworks and take advantage of numerous heuristic tricks. This situation makes it is difficult to compare these approaches and their performance. This paper addresses this problem by proposing constraint satisfaction as a general framework for describing program understanding algorithms, demonstrating how to tranform a complex existing program understanding algorithm into an instance of a constraint satisfaction problem, and showing how facilitates better understanding of its performance.
Conference Paper
We propose a new CSP formalism that incorporates hard constraints and preferences so that the two are easily distinguished both conceptually and for purposes of problem solving. Preferences are represented as a lexicographic order over variables and domain values, respectively. Constraints are treated in the usual manner. Therefore, these problems can be solved with ordinary CSP al- gorithms, with the proviso that complete algorithms cannot terminate search after finding a feasible solution, except in the important case of heuristics that follow the preference order (lexical order). We discuss the relation of this problem rep- resentation to other formalisms that have been applied to preferences, including soft constraint formalisms and CP-nets. We show how algorithm selection can be guided by work on phase transitions, which serve as a useful marker for a rever- sal in relative efficiency of lexical ordering and ordinary CSP heuristics due to reduction in number of feasible solutions. We also consider branch and bound al- gorithms and their anytime properties. Finally, we consider partitioning strategies that take advantage of the implicit ordering of assignments in these problems to reduce the search space.
Article
Recognizing the plan underlying a query aids in the generation of an appropriate response. In this paper, we address the problem of how to generate cooperative responses when the user's plan is ambiguous. We show that it is not always necessary to resolve the ambiguity, and provide a procedure that estimates whether the ambiguity matters to the task of formulating a response. The procedure makes use of the critiquing of possible plans and identifies plans with the same fault. We illustrate the process of critiquing with examples. If the ambiguity does matter, we propose to resolve the ambiguity by entering into a clarification dialogue with the user and provide a procedure that performs this task. Together, these procedures allow a question-answering system to take advantage of the interactive and collaborative nature of dialogue in order to recognize plans and resolve ambiguity. This work therefore presents a view of generation in advice-giving contexts which is different from the straightforward model of a passive selection of responses to questions asked by users. We also report on a trial implementation in a course-advising domain, which provides insights on the practicality of the procedures and directions for future research.
Article
Most current models of program understanding are unlikely to scale up successfully. Top-down approaches require advance knowledge of what the program is supposed to do, which is rarely available with aging software systems. Bottom-up approaches require complete matching of the program against a library of programming plans, which is impractical with the large plan libraries needed to understand programs that contain many domain-speciic plans. This paper presents a hybrid approach to program understanding that uses an indexed, hierarchical organization of the plan library to limit the number of candidate plans considered during program understanding. This approach is based on observations made from studying student programmers attempt to perform bottom-up understanding on geometrically-oriented C functions.
Article
Program understanding can be seen as the process of understanding abstract concepts in the program code. Thus, automated recognition of abstract concepts may greatly assist the human understanding process. This paper describes an approach to automated concept recognition and its implementation. In the approach, we use a concept model and a library of concept recognition rules to describe what the concepts are and how to recognize them from lower-level concepts. Programming language knowledge as well as domain knowledge are used to aid the recognition of abstract concepts.
Article
It might be said that there are five basic tree search algorithms for the constraint satisfaction problem (csp), namely, naive backtracking (BT), backjumping (BJ), conflict-directed backjumping (CBJ), backmarking (BM), and forward checking (FC). In broad terms, BT, BJ, and CBJ describe different styles of backward move (backtracking), whereas BT, BM, and FC describe different styles of forward move (labeling of variables). This paper presents an approach that allows base algorithms to be combined, giving us new hybrids. The base algorithms are described explicitly, in terms of a forward move and a backward move. It is then shown that the forward move of one algorithm may be combined with the backward move of another, giving a new hybrid. In total, four hybrids are presented: backmarking with backjumping (BMJ), backmarking with conflict-directed backjumping (BM-CBJ), forward checking with backjumping (FC-BJ), and forward checking with conflict-directed backjumping (FC-CBJ). The performances of the nine algorithms (BT, BJ, CBJ, BM, BMJ, BM-CBJ, FC, FC-BJ, FC-CBJ) are compared empirically, using 450 instances of the ZEBRA problem, and it is shown that FC-CBJ is by far the best of the algorithms examined.
Article
"May 1991." Thesis (Ph. D.)--University of Texas, 1991. Includes bibliographical references. Supported in part by NSF. Supported in part by NASA. Supported in part by the Texas Advanced Research Program.
Conference Paper
The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper, we present a model of program understanding using constraint satisfaction. Within this model, we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understanding could now be cast under the same spectrum of heuristics, and thus be readily compared. Another advantage is the improvement in search efficiency using various heuristic techniques in constraint satisfaction
Conference Paper
Program understanding is the process of making sense of a complex source code. This process has been considered as computationally difficult and conceptually complex. So far no formal complexity results have been presented, and conceptual models differ from one researcher to the next. We formally prove that program understanding is NP hard. Furthermore, we show that even a much simpler subproblem remains NP hard. However we do not despair by this result, but rather offer an attractive problem solving model for the program understanding problem. Our model is built on a framework for solving constraint satisfaction problems, or CSPs, which are known to have interesting heuristic solutions. Specifically, we can represent and heuristically address previous and new heuristic approaches to the program understanding problem with both existing and specially designed constraint propagation and search algorithms
Conference Paper
The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated understanding tools will be able to completely understand them. Yet automated tools can clearly recognize portions of the design. That suggests exploring environments in which programmer and system work together to understand legacy software. This paper describes such an environment that supports programmer and system cooperating to extract an object-oriented design from legacy software systems. It combines an automated program understanding component that recognizes standard implementations of domain independent plans with with a structured notebook that the programmer uses to link object-oriented design primitives to arbitrary source code fragments. This jointly extracted information is used to support conceptual queries about the program's code and design
Conference Paper
A hybrid approach to program understanding is presented. It uses an indexed, hierarchical organization of the plan library to limit the number of candidate plans considered during program understanding. This approach is based on observations made from studying the attempts of student programmers to perform bottom-up understanding on geometrically oriented C functions and relies on a highly organized plan library, where each plan has indexing, specialization, and implication links to other plans. It uses an algorithm that takes advantage to these indices to suggest general candidate plans to match top-down against the code, specializations to refine these general plans once they are recognized, and implications to recognize other, related plans without performing further matching
Conference Paper
The development of a tool for modularizing large common business-oriented language (COBOL) programs is described. The motivation for modularizing these programs is discussed, together with a manual modularization process. The business motivation for building a tool to automate the manual process is indicated. An enabling technology and its use in the development of the tool are discussed. Experience to date in alpha-testing the tool is reported
Conference Paper
The author presents a practical method for automatic control concept recognition in large, unstructured imperative programs. Control concepts are abstract notions about interactions between control flow, data flow, and computation, e.g., read-process loops. They are recognized by comparing a language-independent abstract program representation against standard implementation plans. Recognition is efficient and scalable because the program representation is hierarchically decomposed by propers (single entry/exit control flow subgraphs). A recognition experiment using the UNPROG program understander shows the method's performance, the role of proper decomposition, and the ability to use standard implementations in a sample of programs. How recognized control concepts are used to perform Cobol restructuring with quality not possible with existing syntactic methods is described
Article
: There have been many proposals for adding sound implementations of numeric processing to Prolog. This paper describes an approach to numeric constraint processing which has been implemented in Echidna, a new constraint logic programming (CLP) language. Echidna uses consistency algorithms which can actively process a wider variety of numeric constraints than most other CLP systems, including constraints containing some common non-linear functions. A unique feature of Echidna is that it implements domains for real-valued variables with hierarchical data structures and exploits this structure using a hierarchical arc consistency algorithm specialized for numeric constraints. This gives Echidna two advantages over other systems. First, the union of disjoint intervals can be represented directly. Other approaches require trying each disjoint interval in turn during backtrack search. Second, the hierarchical structure facilitates varying the precision of constraint processing. Consequently...
Article
The n -queens problem is a classical combinatorial problem in the artificial intelligence (AI) area. Since the problem has a simple and regular structure, it has been widely used as a testbed to develop and benchmark new AI search problem-solving strategies. Recently, this problem has found practical applications in VLSI testing and traffic control. Due to its inherent complexity, currently even very efficient AI search algorithms developed so far can only find a solution for the n -queens problem with n up to about 100. In this paper we present a new, probabilistic local search algorithm which is based on a gradient-based heuristic. This efficient algorithm is capable of finding a solution for extremely large size n -queens problems. We give the execution statistics for this algorithm with n up to 500,000.
Article
The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated understanding tools will be able to completely understand them. Yet automated tools can clearly recognize portions of the design. That suggests exploring environments in which programmer and system work together to understand legacy software. This paper describes such an environment that supports programmer and system cooperating to extract an object-oriented design from legacy software systems. It combines an automated program understanding component that recognizes standard implementations of domain independent plans with with a structured notebook that the programmer uses to link object-oriented design primitives to arbitrary source code fragments. This jointly extracted information is used to support conceptual queries about the program's code and design. 1 Introduction The standard goal of most program understanding efforts is a tool that takes source code and extra...
Article
The process of understanding a source code in a high-level programming language is a complex cognitive task. The provision of helpful decision aid subsystems would be of great benefit to software maintainers. Given a library of program plan templates, generating a partial understanding of a piece of software source code can be shown to correspond to the construction of mappings between segments of the source code and particular program plans represented in a library of domain source programs (plans). These mappings can be used as part of the larger task of reverse engineering source code, to facilitate many software engineering tasks such as software reuse, and for program maintenance. We present a novel model of program understanding using constraint satisfaction. The model composes a partial global picture of source program code by transforming knowledge about the problem domain and the program structure into constraints. These constraints facilitate the efficient construction of ma...
DECODE:Acooperativeprogramunderstandingenvironment
  • D Chin
  • A Andquilici
Chin,D.andQuilici,A.1996. DECODE:Acooperativeprogramunderstandingenvironment. JournalofSoftware Maintenance, 8:3–34
A theoretical evaluation of selected backtracking algorithms
  • G Kondrak
  • P Van Beek
Kondrak, G. and van Beek, P. 1995. A theoretical evaluation of selected backtracking algorithms. Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, CA, pp. 541-547.
Hierarchical arc consistency: exploiting structured domains in constraint satisfaction problems
  • A K Mackworth
  • J Muldter
  • W Havens
Mackworth, A.K., Muldter, J. and Havens, W. 1985. Hierarchical arc consistency: exploiting structured domains in constraint satisfaction problems. Computation Intelligence, 1 :188-196.
A polynomial time algorithm for the n-queens problem. SIGART, 1
  • R Sosic
  • J Gu
  • F Song
  • R Cohen
Sosic, R. and Gu, J. 1990. A polynomial time algorithm for the n-queens problem. SIGART, 1. Song, F. and Cohen, R. 1991. Temporal teasoning during plan recognition, In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, pp. 247-252.