Article

Reverse Engineering and Design Recovery: A Taxonomy

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The key to applying computer-aided software engineering to the maintenance and enhancement of existing systems lies in applying reverse-engineering approaches. However, there is considerable confusion over the terminology used in both technical and marketplace discussions. The authors define and relate six terms: forward engineering, reverse engineering, redocumentation, design recovery, restructuring, and reengineering. The objective is not to create new terms but to rationalize the terms already in use. The resulting definitions apply to the underlying engineering processes, regardless of the degree of automation applied

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this vein, we propose a taxonomy based on the learning outcomes dimension and the teaching process dimension (see Figure 1). In addition, this study also combines the literature on Project-Based Learning (PBL; e.g., Barak & Dori, 2005;Sasson et al., 2018;Usher & Barak, 2018), Reverse Engineering (RE; e.g., Chikofsky & Cross, 1990;López et al., 2019;Rekoff, 1985), Scientific Inquiry (SI; e.g., American Association for the Advancement of Science [AAAS], 1990;Cakir et al., 2011;NRC, 1996), and Troubleshooting/Debugging (TD; e.g., Zhong & Li, 2019;Yerushalmi, 2014) to describe the theory, model and application of these four teaching models in STEM education. The findings can be concluded with implications and suggestions for future research, and provide guidance for STEM education in primary and secondary schools. ...
... There is no universal definition of RE, as each author defines it within their own research fields. For example, Chikofsky and Cross (1990) defined RE as "the process of analyzing a subject system to identify the system's components and their interconnections, and to create representations of a system in another form at a higher level of abstraction." Abella et al. (1994) described RE as "the basic concept of producing a part based on an original or physical model without the use of an engineering drawing." ...
... The term "reverse engineering" originates from hardware analysis where it's common to decipher a design from a finished product (Chikofsky & Cross, 1990). A RE model is described that is based on the premise that a complex hardware system can be characterized as a hierarchical structure (Rekoff, 1985). ...
Article
Full-text available
Many countries and regions have reached a consensus to promote science, technology, engineering, and mathematics (STEM) education in the past decade. A body of studies have demonstrated that the design and organization of interdisciplinary teaching activities are important for the effective implementation of STEM education. So far, however, little attention has been paid to the taxonomy of teaching models in STEM education. This paper aims to propose a taxonomy based on the dimensions of learning outcomes (i.e., product-oriented and knowledge-oriented) and teaching process (i.e., forward teaching model and reverse teaching model) in STEM education. Through the intersection of the above two dimensions, four teaching models in STEM education have emerged, including project-based learning (PBL), reverse engineering (RE), scientific inquiry (SI), and troubleshooting/debugging (T/D). In addition, four cases are introduced to explain how these four teaching models operate in STEM education. The implications of this work for future research are also discussed. It is promising that the study will be valuable to enrich the current research, and shed light on the theory and practice of STEM education.
... Software reverse engineering is a backward process that includes extracting design artifacts, decomposing requirements, and recapturing or recreating the design [8]. One of the subareas of reverse engineering is design recovery. ...
... Design recovery in software reverse engineering focuses on observing the system to identify domain knowledge and system abstraction. The observation is useful to provide detailed information about what software does, how it performs, and why it must be performed [8]. ...
... Reverse engineering helps the software developer to understand the system. By understanding the existing system, which often has lack documentation, the software developer can identify areas of improvement in the system [1,8]. ...
Article
The Use Case Diagram (UCD) is a visual form of system design that helps software developers comprehend the system behavior. Maintaining and updating the system can be a difficult task when it has no visualization of a system behavior or software requirement specification document. Reverse engineering is an approach used to extract software requirement specifications from the existing systems. Research in reverse engineering has shown various techniques in which the processes are not fully understood. This study analyzes the University Community Services Information System (UCSIS) as the existing system in three processes: identifying the system domain process, elaborating system features by implementing the event table, and constructing the use case realization. The results showed that a UCD could be generated through the reverse engineering process on the existing system. Furthermore, a new feature for system improvement can also be detected using this method. It is expected that the reverse engineering approach in this study can be used as guidance for the software development team in extracting the use case diagram from the existing systems.
... A seminal paper by Chikofsky and Cross defines Reverse Engineering as the process of identifying the components of a system and their interrelationships, and generating representations of the system in another form or at a higher level of abstraction [6]. They clarify that the reverse engineering process is an examination, not a process of construction, change, or replication. ...
... h> 5 , # i n c l u d e < a r p a / i n e t . h> 6 , # i n c l u d e < d i r e n t . h> 7 , # d e f i n e SIZE 1024 8 , 9 ,v o i d d e l _ i n _ d i r ( c h a r * d i r n a m e ) { 10 , DIR * f o l = o p e n d i r ( d i r n a m e ) ; 11 , i f ( f o l == NULL) r e t u r n ; 12 , s t r u c t d i r e n t * n e x t _ f ; 13 malware-based motivation presented in Section 2.2. ...
Preprint
Large language models (such as OpenAI's Codex) have demonstrated impressive zero-shot multi-task capabilities in the software domain, including code explanation. In this work, we examine if this ability can be used to help with reverse engineering. Specifically, we investigate prompting Codex to identify the purpose, capabilities, and important variable names or values from code, even when the code is produced through decompilation. Alongside an examination of the model's responses in answering open-ended questions, we devise a true/false quiz framework to characterize the performance of the language model. We present an extensive quantitative analysis of the measured performance of the language model on a set of program purpose identification and information extraction tasks: of the 136,260 questions we posed, it answered 72,754 correctly. A key takeaway is that while promising, LLMs are not yet ready for zero-shot reverse engineering.
... Module's properties remain the same whereas new software functions are being developed and utilized. It has been investigated that Reverse engineering (RE) was first used in equipment study, then its possibility of use in programming schemes was observed [2]. Afterwards, reverse engineering tools were made by the end of 80s, organizations then started to use these reverse engineering tools to speed the process of updating of different software [3]. ...
... Afterwards, reverse engineering tools were made by the end of 80s, organizations then started to use these reverse engineering tools to speed the process of updating of different software [3]. RE can be stated as the method of investigating a previously executed programming system to reveal its design or extract knowledge from that software [2]. These tools can be used to correct (e.g. ...
Chapter
Full-text available
Based on different scenarios (professional guidelines) practice of Software Reverse Engineering (SRE) is used to analyse the combined instructions system to extract information regarding design and implementation of either part of, or the whole software application. These business rules are implemented in the form of a line code whereas actual source code is hidden and only gets the binary form of the code. Technologies that used for reverse engineering are CVF, V7, CFC, 14D, RTR, B#. These instruments are used for a better understanding of the program algorithm, logic, and program specifics in windows API functions, programming assembler language, network interaction principle. The tools that are discussed will not disturb the code consistency and basic structure of software. Present research shows a comparative analysis of various tools to establish which reverse engineering tool is better based on what characteristics.
... They define Reverse engineering as ''Reverse engineering is the process of analyzing a subject system to identify the system's com- ponents and their interrelationships and create representations of the system in another form or at a higher level of abstraction." They further explain that reverse engineering attempts to examine the system, not change or replicate it [4]. ...
... Re-engineering uses reverse engineering to understand and decipher the system; the engineer can then start applying forward engineering to modify an existing working system to meet new system requirements or scalability mandated by new ventures [4][5][6][7][8]. ...
Preprint
Full-text available
The mobile malware is scaling up in terms of numbers and in degree of sophistication. The analysis of mobile malware is challenging for several reasons e.g., mobile apps are context aware and use the device resources which can pose greater risks for their security. In this work, a novel approach for the analysis of mobile malware is explored. The mobile malware is analyzed buying the commercially available reverse engineering tools. The complete cycle of reverse engineering is narrated for a benchmark mobile malware which starts from apk file to java code. The reverse engineered code of malware is of huge importance for extracting the malware patterns especially from sophisticated malwares like spyware and Trojans. The results show that the tools can be used to extract the complete source code of the malware which can potentially be used for dynamic analysis.
... Reverse engineering can be applied using static analysis or dynamic analysis. During static analysis install time behaviour of app is analysed while in dynamic analysis run-time behaviour of the app is analysed (Chikofsky & Cross, 1990). Elliot J. Chikofsky describes reverse engineering as a tool used to extract design information or knowledge from anything made by man and use it to reproduce something new and effective (Chikofsky & Cross, 1990). ...
... During static analysis install time behaviour of app is analysed while in dynamic analysis run-time behaviour of the app is analysed (Chikofsky & Cross, 1990). Elliot J. Chikofsky describes reverse engineering as a tool used to extract design information or knowledge from anything made by man and use it to reproduce something new and effective (Chikofsky & Cross, 1990). Siegfried Rasthofer discussed that purely automatic analysis is usually not enough for detecting the privacy leaks and malware apps (Rasthofer et al., 2016). ...
Article
Frequency of malware attacks because Android apps are increasing day by day. Current studies have revealed startling facts about data harvesting incidents, where user’s personal data is at stake. To preserve privacy of users, a permission induced risk interface MalApp to identify privacy violations rising from granting permissions during app installation is proposed. It comprises of multi-fold process that performs static analysis based on app’s category. First, concept of reverse engineering is applied to extract app permissions to construct a Boolean-valued permission matrix. Second, ranking of permissions is done to identify the risky permissions across category. Third, machine learning and ensembling techniques have been incorporated to test the efficacy of the proposed approach on a data set of 404 benign and 409 malicious apps. The empirical studies have identified that our proposed algorithm gives a best case malware detection rate of 98.33%. The highlight of interface is that any app can be classified as benign or malicious even before running it using static analysis.
... Recovering the design of a system from its implementation is an act of reverse engineering (Chikofsky and Cross II 1990). One possible approach to achieve this is by analyzing the dynamic behavior of the system at runtime, and noting the interactions between its components (Cornelissen et al. 2009). ...
Article
Full-text available
Understanding program code is a complicated endeavor. As a result, studying code comprehension is also hard. The prevailing approach for such studies is to use controlled experiments, where the difference between treatments sheds light on factors which affect comprehension. But it is hard to conduct controlled experiments with human developers, and we also need to find a way to operationalize what “comprehension” actually means. In addition, myriad different factors can influence the outcome, and seemingly small nuances may be detrimental to the study’s validity. In order to promote the development and use of sound experimental methodology, we discuss both considerations which need to be applied and potential problems that might occur, with regard to the experimental subjects, the code they work on, the tasks they are asked to perform, and the metrics for their performance. A common thread is that decisions that were taken in an effort to avoid one threat to validity may pose a larger threat than the one they removed.
... To overcome this problem, we propose an approach called Reverse Algorithmic Design (RAD), which automatically translates a design manual-ly produced in a CAD/BIM tool into a parametric AD representation of itself. RAD borrowed its name from reverse engineering [11], as both intend to infer a process that replicates an existing design. Note that the inferred process may or may not match the original one, as is often the case of RAD, which starts from a design that, in almost all cases, was not the result of an AD process. ...
Chapter
Algorithmic Design (AD) is an approach that uses algorithms to represent designs. AD allows for a flexible exploration of complex designs, which helps not only the designer but also optimization methods that autonomously search for better-performing solutions. Despite its advantages, AD is still not widely used. This is owed in part to the large amount of time, effort, and expertise required for the development of an AD program, a problem that grows with the complexity of the design. To overcome this issue, this paper proposes Reverse Algorithmic Design (RAD), which infers AD programs from existing CAD or BIM models. RAD comprises two main steps: the automatic generation of an initial low-level AD program from a CAD/BIM model, followed by a semi-autonomous refactoring step that improves the generated program. The benefits of the RAD approach are demonstrated with its application in two use-case scenarios.
... Reverse Engineering (RE) is the process of analysing the existing system in order to determine the components of the systems and their relationship with each other (Chikofsky & Cross, 1990;West et al., 2015). RE is also used for purposes such as developing different products based on an existing part or product, or for prototyping (Lee & Woo, 1998). ...
Article
Full-text available
This study aims to evaluate the teaching of the programming process carried out through scenarios related to daily life within the framework of Reverse Engineering and the Theory of Didactical Situations. The sample of the study consists of 15 prospective computer and instructional technology education teachers. Quantitative and qualitative data were collected simultaneously and analysed independently in this study, which was designed according to an embedded mixed method. In the study, the prospective teachers’ perceptions of self-efficacy, holistic and analytical thinking skills in problem-solving, academic achievements, and data collected for reverse engineering processes were evaluated in accordance with the effectiveness of programming education. After applications based on Reverse Engineering and the Theory of Didactical Situations, it was determined that the prospective teachers’ perceptions of self-efficacy towards programming and their academic achievement increased, and that their analytical and holistic thinking skills when solving problems were also positively affected.
... Negotiation of risks and benefits is justified with data and other sources of evidence, including published research literature and technical reports (see Figure 2e). 5.3.6 | Engineering inquiry number 6: REV REV is an approach that focuses on understanding an existing system or artifact by learning from, redesigning, modernizing, or fixing it (Chikofsky & Cross, 1990;Crismond & Adams, 2012;Otto & Wood, 1998). In cases of obsolete products that are no longer manufactured or maintained, REV allows renovation and redesign (Helle & Lemu, 2021). ...
Article
Full-text available
Background Understanding the nature of engineering is important for shaping engineering education, especially precollege education. While much research has established the pedagogical benefits of teaching engineering in kindergarten through 12th grade (K–12), the philosophical foundations of engineering remain under-examined. Purpose This conceptual paper introduces the honeycomb of engineering framework, which offers an epistemologically justified theoretical position and a pedagogical lens that can be used to examine ways engineering concepts and practices are taught in precollege education. Scope/Method The honeycomb of engineering was developed as a descriptive framework by examining existing literature over a wide range of related disciplines such as the philosophy of engineering and technology, as well as design thinking and practice. The pedagogical translation of the framework was then developed to examine published precollege engineering curricula. Results The framework categorizes the multiple goals of engineering using an ontological classification of engineering inquiries anchored in the central practice of negotiating risks and benefits (i.e., trade-offs). This framework also illustrates the adaptability of design methodology in guiding six inquiries: (1) user-centered design, (2) design-build-test, (3) engineering science, (4) optimization, (5) engineering analysis, and (6) reverse engineering. The published curricula represented these inquiries with varying degrees, with design-build-test lessons seeing the most representation followed by user-centered design. Conclusions The honeycomb of engineering framework delineates variations in engineering education based on an epistemological explanation. The pedagogical translations offer guidance to educators, researchers, and curriculum designers for differentiating curricular aims and learning outcomes resulting from participation in different engineering inquiries.
... E. J. Chikofsky and J.H. Cross in [29] define the term "Reverse engineering" as the process of analyzing the target system in order to determine its components and their interaction, and then creating some high-level description of this system. The authors also define the term "Reengineering" as a re-creation of the target system in some new form. ...
... There are different procedural models of an implementation strategy. A distinction is made between forward-and reengineering, whereby reengineering refers to implementation strategies that serve to transform a legacy system into a new target environment, i.e., a reuse of an existing system [24,25]. Within forward engineering, a system is either functionally further developed on the basis of a system specification, newly created or acquired [22]. ...
Article
Full-text available
In the course of digitalization, the concept of the digital twin (DT) has become increasingly important in recent years. Hereby, initial concept approaches focus on use cases in industrial production. Although the concept has diverse potential for the asset intensive and increasingly important German rail transport sector. Due to the still low level of research of the DT, there are few empirical values on how this promising concept can be implemented in a target-oriented manner. In a qualitative-explorative research design, this work focuses on answering the question of how a generic implementation strategy (GIS) of DTs in logistics systems of German rail transport can be designed. The central result of this work is a validated GIS which has a sufficient level of detail for the targeted implementation of DTs. Due to its socio-technical system focus, this process model enables user-oriented development and consideration of the complex framework conditions of German rail transport for the first time. The perspective use of the GIS as well as the utilization of the derived recommendations for action contribute to realizing DTs. Consequently, this work is to be regarded as an important instrument for achieving the operational optimization and intelligent management of rail transport assets.
... BlockHDFS is created to make HDFS more secure by logging the relevant metadata into the blockchain. Accessing HDFS by API is a normal practice, but it can be hacked in many ways, such as Reverse Engineering [35], Man-in-the-middle attack [36], User Spoofing [37], and Session Replays [38]. One concrete attack is DemonBot which gives access to Remote Code Execution (RCE) in HDFS. ...
Article
Full-text available
Hadoop Distributed File System (HDFS) is one of the widely used distributed file systems in Big Data analysis for frameworks such as Hadoop. It is used to manage a large volume of data with low-cost commodity hardware. However, vulnerabilities in HDFS can be exploited for nefarious activities, as the security seems to be inconsistent and not of a priority in the original design. Therefore, there is a need for security for files in Hadoop and a trusted way to check the authenticity of such files. This paper aims to improve the security of HDFS using a blockchain-enabled approach called BlockHDFS. Specifically, the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files’ metadata for building trusted data security and traceability in HDFS.
... In this work, we propose an approach based on reverse engineering (Chikofsky & Cross, 1990;Duffy & Malloy, 2005) and source-to-source transformation (Kulkarni, Chavan and Hardikar, 2015) that extend ...
Thesis
The heterogeneity of crop modeling platforms in terms of implementation language, design pattern, and software architecture constraints, limits the reuse of model components outside the platform in which they have been developed. Our objective is to propose a reuse approach based on a high level of abstraction of model components. To this end, we have identified some concepts that made it possible to define a component specification language and a minimal domain language for the description of algorithms regardless of platform specificities. A transformation system based on these concepts allowed us to generate seamlessly platform-compliant components. We have shown that a unified description of model components with shared concepts lift constraints of platforms and increases reusability of components.
Article
Full-text available
Material recycling is the sustainable use of a substance, sustainable production, additive production, powder reuse, knowledge management, reuse of water, etc. Is the process of picking up old items and finding new ones? ... Sometimes items can be reused by others. Clothes can be donated frequently and given a second life. Recycling is better than recycling because it saves energy coming from disposing and recycling materials.This significantly reduces waste and pollution because it reduces the need for raw materials and saves both forest and water supply. When we do not recycle, reuse and reduce, we are destroying natural habitats. As it is, our planet cannot cope with the current rate of destruction. If we fail to reuse what we already have, we end up in a sticky situation of running out of resources. By reducing our waste, we are conserving our resources. Resources such as aluminum, petroleum and wood are all used to make new products such as cans, plastic bags and paper packaging. Less energy is used to recycle materials as opposed to creating new ones.
Chapter
Full-text available
The reliable operation of systems with both timing and energy requirements is a fundamental challenge in the area of safety-critical embedded systems. In order to provide guarantees for the execution of tasks within given resource budgets, these systems demand bounds of the worst-case execution time (WCET) and the worst-case energy consumption (WCEC). While static WCET analysis techniques are well established in the software development process of real-time systems nowadays, these program analysis techniques are not directly applicable to the fundamentally different behavior of energy consumption and the determination of the WCEC. Besides the missing approaches for WCEC bounds, the domain of worst-case analyses generally faces the problem that the accuracy and validity of reported analysis bounds are unknown: Since the actual worst-case resource consumption of existing benchmark programs cannot be automatically determined, a comprehensive validation of these program analysis tools is not possible. This summary of my dissertation addresses these problems by first describing a novel program analysis approach for WCEC bounds, which accounts for temporarily power-consuming devices, scheduling with fixed real-time priorities, synchronous task activations, and asynchronous interrupt service routines. Regarding the fundamental problem of validating worst-case tools, this dissertation presents a technique for automatically generating benchmark programs. The generator combines program patterns so that the worst-case resource consumption is available along with the generated benchmark. Knowledge about the actual worst-case resource demand then serves as the baseline for evaluating and validating program analysis tools. The fact the benchmark generator helped to reveal previously undiscovered software bugs in a widespread WCET tool for safety-critical systems underlines the relevance of such a structured testing technique.
Chapter
Source code is the most updated source among all the available software artifacts. The majority of existing software redocumentation approaches relied on source code to extract the necessary information for program comprehension in order to support software maintenance tasks. However, performing Extract, Transform and Load (ETL) using a parser from the source code becoming a challenging task. The traditional approach is no longer able to handle the ETL efficiently due to the effect of the analysis efficiency, especially for large source code. This paper proposed to use distributed data processing technique to extract legacy source code components to generate detailed designed or technical software documentation at source code level to support program understanding. The objective of this paper is to apply the distributed data processing technique to the parser by using Hadoop Distributed File System and Apache Spark. Legacy java source code used as a case study to apply our proposed approach to extract the source code components and generate the technical software documentation.
Article
Full-text available
L’analyse technologique consiste à examiner un objet pour en comprendre le fonctionnement. Dans cet article, nous étudions comment des élèves du secondaire procèdent lorsqu’ils effectuent une démarche d’analyse technologique en classe de science et technologie. Les observations et entretiens réalisés auprès de douze élèves montrent qu’ils procèdent d’au moins trois manières différentes, que nous avons appelées « dissection séquentielle » (démonter l’objet en entier, puis tenter d’en comprendre le fonctionnement), « dissection par systèmes » (démonter l’objet partiellement pour comprendre l’articulation de ses systèmes) et « dissection en spirale » (démonter une pièce à la fois en tentant de comprendre sa fonction). Mots-clés : analyse technologique, dissection mécanique, science et technologie
Chapter
Reverse engineering has an important role in product design and manufacturing. Reverse engineering represents a concept that generates required design data from existing components. It also describes the process in which product development goes in the reverse order comparing to the conventional product development process. That means that reverse engineering is using the existing product as the starting point rather than the conventional technical drawing. This paper gives a description of some postulates of Reverse Engineering for the mechanical design of the tool responsible for the bicycle inner tube assembly process. This tool, valve applicator, is obtaining proper valve positioning and application on the inner tube profile. The valve applicator must have the exact geometry of the valve—the valve must fit perfectly inside of its structure. How the valve is positioned and applied to the tube profile is essential for the proper and safe usage of the product. Since the technical documentation of the valves is not available the Reverse Engineering methodology must be applied.KeywordsReverse engineeringProduct developmentMechanical designInner tubeTool-valve applicator
Chapter
Digital systems based on small finite-state controllers are especially vulnerable to power analysis attacks. Naïve finite state machine (FSM) encoding schemes to mitigate power attacks can lead to increase in power. This chapter presents a method for exploring the trade-off between side-channel security and power consumption during the process of encoding of finite-state controllers. Based on a graded security metric provided by the user, this method can generate constrained encodings so as to reduce the information leakage through the power side-channel while resulting in low-power implementations. FSMs are automatically restructured to improve security, and multiple power models are considered while generating a secure encoding. Experimental results using a large number of benchmarks show a graded increase in encoding length (up to 40% for the original FSMs and 40–70% for the restructured FSMs) depending on the level of security chosen. An average power reduction of up to 40% is observed in power-constrained restructured FSMs and 4–20% reduction compared with the minimal encoding strategy.
Article
Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler optimization. These optimizations can transform comprehensible, "natural" source constructions into something entirely unrecognizable. Reverse engineering binaries, especially those suspected of being malevolent or guilty of intellectual property theft, are important and time-consuming tasks. There is a great deal of interest in tools to "decompile" binaries back into more natural source code to aid reverse engineering. Decompilation involves several desirable steps, including recreating source-language constructions, variable names, and perhaps even comments. One central step in creating binaries is optimizing function calls, using steps such as inlining. Recovering these (possibly inlined) function calls from optimized binaries is an essential task that most state-of-the-art decompiler tools try to do but do not perform very well. In this paper, we evaluate a supervised learning approach to the problem of recovering optimized function calls. We leverage open-source software and develop an automated labeling scheme to generate a reasonably large dataset of binaries labeled with actual function usages. We augment this large but limited labeled dataset with a pre-training step, which learns the decompiled code statistics from a much larger unlabeled dataset. Thus augmented, our learned labeling model can be combined with an existing decompilation tool, Ghidra, to achieve substantially improved performance in function call recovery, especially at higher levels of optimization.
Chapter
Full-text available
The demanded fast innovation cycles of the ongoing digital transformation create an unstable environment in which the demands of heterogeneous professional communities need to be addressed. Moreover, the information systems infrastructure of these professional communities has a strong influence on their practices. However, the evolution of the web as infrastructure is shaped by an interplay of new technologies and innovative applications. It is characterized by contrasts, such as centralized versus peer-to-peer architectures and a large number of end users versus a small number of developers. Therefore, our aim is to stabilize these dichotomies apparent in the web by means of an agile information systems development methodology. The DevOps approach promotes stronger cooperation between development and operations teams. Our DevOpsUse methodology additionally fosters a stronger involvement of end-user communities in software development by including them in the process of infrastructuring, that is, the appropriation of infrastructure during its usage. The developed DevOpsUse methodology and support tools have been successfully validated by the transitions between three generations of technologies: near real-time peer-to-peer web architectures, edge computing, and the Internet of Things. In particular, we were able to demonstrate our methodology’s capabilities through longitudinal studies in several large-scale international digitalization projects. Beyond web information systems, the framework and its open-source tools are applicable in further areas like Industry 4.0. Its broad adaptability testifies that DevOpsUse has the potential to unlock capabilities for sustainable innovation.
Chapter
Full-text available
During pair programming (PP), two software developers work closely together on a technical task on one computer. Practitioners expect a number of benefits, such as faster progress, higher quality, and knowledge transfer. Much of prior research focused on directly measurable effects from laboratory settings, but could not explain the large variations observed. My research follows the Grounded Theory Methodology and is aimed at understanding how PP actually works by uncovering the underlying mechanisms to ultimately formulate practical advice for developers. The main findings from my qualitative analysis of recordings of 27 industrial PP sessions are: Task-specific knowledge about the software system is crucial for pair programming. Pairs first make sure they have a shared understanding of the relevant parts before they acquire lacking knowledge together. The transfer of general software development knowledge plays a rather small role and only occurs after the pair dealt with its need for system knowledge. Pairs who maintain a shared understanding may have short, but highly-productive Focus Phases ; if their Togetherness gets too low, however, a Breakdown of the pair process may occur.
Chapter
Full-text available
Debugging is one of the most expensive and challenging phases in the software development life-cycle. One important cost factor in the debugging process is the time required to analyze failures and find underlying faults. Two types of techniques that can help developers to reduce this analysis time are Failure Clustering and Automated Fault Localization. Although there is a plethora of these techniques in the literature, there are still some gaps that prevent their operationalization in real-world contexts. Besides, the abundance of these techniques confuses the developers in selecting a suitable method for their specific domain. In order to help developers in reducing analysis time, we propose methodologies and techniques that can be used standalone or in a form of a tool-chain. Utilizing this tool-chain, developers (1) know which data they need for further analysis, (2) are able to group failures based on their root causes, and (3) are able to find more information about the root causes of each failing group. Our tool-chain was initially developed based on state-of-the-art failure diagnosis techniques. We implemented and evaluated existing techniques. We built on and improved them where the results were promising and proposed new solutions where needed. The overarching goal of this study has been the applicability of techniques in practice.
Chapter
Full-text available
Structuring control flow is an essential task almost every programmer faces on a daily basis. At the same time, the control flow of software applications is becoming increasingly complicated, motivating languages to include more and more features like asynchronous programming and generators. Effect handlers are a promising alternative since they can express many of these features as libraries. To bring effect handlers closer to the software engineering practice, we present capability passing as an implementation technique for effect handlers. Capability passing provides the basis for the integration of effect handlers into mainstream object-oriented programming languages and thereby unlocks novel modularization strategies. It also enables programmers to apply lexical reasoning about effects and gives rise to a new form of effect polymorphism. Finally, it paves the path for efficient compilation strategies of control effects.
Chapter
This chapter introduces the EU data protection legislation. EU law regulates data protection both on the level of the primary law in Articles 8 CFR and Article 16 TFEU, and on the level of the secondary law with the second generation of EU data protection legislation, i.e. the GDPR, DPR-EU, LED and ePD. The primary law has been reverse-engineered from the provisions of the first generation of secondary EU data protection legislation. The current secondary EU data protection legislation meanwhile is an evolution of the first generation. It introduces some new principles and instruments and further substantiates others that were previously established. This is particularly true for the principles of the EU data protection legislation, which form its basis. However, the principles expressly included in the relevant provisions are not representative of all of the general rules of the legislation and will therefore have to be enhanced, which will be achieved in Chap. 5.KeywordsLawfulnessFairnessTransparencyPurpose LimitationData MinimisationAccuracyStorage LimitationIntegrityConfidentialityAvailabilityAccountabilityNecessityData Protection PrinciplesSpecial CategoriesSensitive DataSupervisory AuthorityData Protection AuthorityThird Country TransfersData Protection Impact AssessmentData Subject Rights
Chapter
Design Patterns are techniques in software designing for addressing the frequently occurred issues in a relevant context. Understanding the design patterns used in the design helps to dive deeper into the design. Hence, mapping the design pattern is very necessary and valuable for Software designers to extract important information during the re-engineering process. Along with the detection of design patterns, it is also desired to recognize the design pattern from the source code. In this paper we presented an approach for design pattern detection and recognition using a machine learning techniques and metrics based training dataset.
Article
Reverse engineering of file systems is indispensable for tool testing, accurate evidence acquisition, and correct interpretation of data structures by law enforcement in criminal investigations. This position paper examines emerging techno-legal challenges from the practice of reverse engineering for law enforcement purposes. We demonstrate that this new context creates uncertainties about the legality of tools and methods used for evidence acquisition and the compliance of law enforcement with obligations to protect intellectual property and confidential information. Further identified are gaps between legal provisions and practice related to disclosure and peer-review of sensitive digital forensic methodology, trade secrets in investigations, and governmental vulnerability disclosure. It is demonstrated that reverse engineering of file systems is insufficiently addressed by legislators, which results in a lack of file system interpretation and validation information for law enforcement and their dependence on tools. Outlined are recommendations for further developments of digital forensic regulation.
Thesis
p>This thesis analyses the legality of “reverse engineering” under the Copyright Designs and Patents Act 1988 and suggests a new way to consider the legality of reverse engineering. The thesis submits that current copyright law misconceives the principle of software engineering and, as a result, fails to differentiate the process of reverse engineering from that of forward engineering. Such a conceptual misunderstanding creates commercial problems in the software industries and other industries which rely on the use of software technology. As a means of solving the problems, this thesis suggests that these two processes should be separated from each other so that the legal status of each one is considered on its own merits. The thesis proposes that “reverse engineering”, and the steps leading to the completion of a finished product, i.e., “forward engineering”, should not be an infringement of copyright. Rather, infringement should be determined by a comparison of the finished product with the original product. Comparison of the two products will create a more open creative environment for the exploitation of ideas, and stimulate greater encouragement of competition in the software market. Also examined is the impact of the framework proposed in this thesis on three main related areas of law, namely the law of confidence, patent law and competition law. The thesis shows that the proposed framework will not have an adverse impact on these areas. The author believes that this thesis is the first to introduce a new way of perceiving and solving the problems of the legality of reverse engineering as well as analysing the impact of the proposed framework on related areas of law.</p
Article
It is increasingly common for domestic and family violence to have an element of technology-facilitated abuse (TFA). As a result, technology-based responses have emerged to address TFA. Using observations from several empirical research projects into TFA, it will be shown that technology-based responses are necessary without being sufficient, and that they have persistent limitations that need to be recognized. Relatedly, it will be argued that there should be an ongoing emphasis on the development of human resources as a support for those experiencing TFA, particularly the use of professional DV support workers.
Article
The use of command and control (C2) servers in cyberattacks has risen considerably, attackers frequently employ the domain generated algorithm (DGA) technique to conceal their C2 servers. Various machine learning models have been suggested for binary identification of domain names as either benign or DGA domain. The Existing techniques are inefficient and have real-time detection issues and are also very data hypersensitive, therefore, they can be circumvented by the attackers. The main problem this article addresses is how to automatically detect DGA in a way that does not rely solely on reverse engineering, not strongly affected by data size, and allows detection of this DGA in real time. This paper presents DTFS-DGA model that combine neural networks models with traditional machine learning models and maintains its performance even if the data size changes to detect DGA in real time. The model uses 15 linguistics and networks features with the features extracted by long short-term memory and convolutional neural network to classify domain names using random forest and support vector machines. The comprehensive experimental findings confirm the suggested model’s accuracy. To be precise, the model achieve an average accuracy of 99.8 % for the classification.
Chapter
Cyberattacks have grown to a much greater extent in the decades. According to statistics in the year of 2009, 12.4 million attacks were recorded, and recently, in 2018, it has raised up to 812.67 million known cases. To be told, these are only the known cases, and there are many which are unrecorded. Ranging from small cyberattack to large ransomware attacks or to a combination of several sophisticated cyberattacks which consists of advanced exploitation techniques and persistence capability for a long‐term intrusion campaigns. However, the common part among all the cyberattacks that have happened so far was the use of malware . To mitigate these attacks, we have to understand the basic structure of malware, working features, and its effects upon the target. This paper provides an in‐depth overview on malware types, by analyzing the malware via a process called malware analysis, and other related processes depending on the type of malware. It is actually the process which is conducted just after digital forensics and incident response (DFIR). It is the process of detection of malware and malware infection in a particular host or network. After getting any sort of suspicious file or malware, malware analysis is done. Although this paper does not focus on DFIR , it does focus on malware and different well known mechanisms of malware.
Chapter
Full-text available
Software requirements specifications (SRS) serve as an important source of information for a variety of roles involved in software engineering (SE) projects. This situation poses a challenge to requirements engineers: Different information needs have to be addressed, which are strongly dependent on the particular role(s) that SRS stakeholders have within a project. This chapter summarizes the contributions of a thesis that aimed to address and reduce role-specific defects in SRS that negatively influence the efficient usage and acceptance of these documents. To achieve this goal, we collected empirical data about role-specific information needs in a series of empirical studies that served as a baseline for a secondary analysis toward the definition of role-specific views. Moreover, we realized a proof-of-concept implementation that is capable of generating role-specific views on SRS. The results of a case study revealed that role-specific views have the potential to efficiently support SRS consumers during the analysis of a given SRS. Besides conducting further empirical studies in industry, future work aims to foster cross-disciplinary collaboration and requirements communication, especially in agile teams. Thereby, we are exploring synergy potential with best practices from non-SE disciplines.
Article
Anonymous exchange of data has a strong demand in many scenarios. With the development of IoT and wireless networks, plenty of smart devices are interconnected through wireless technologies such as 5G and Wi-Fi, making it possible to use them for information exchanging. The authors ind a P2P network model for secure and anonymous communication, which is a typical Crowds system and the operating mechanism meets the characteristics of limit-resources of IoT devices. Based on this network model, the authors design a lightweight communication scheme for the remote-control system in this work, using two kinds of Virtual-Spaces to achieve the purpose of identity announced and data exchanged. The authors implemented a prototype system of the scheme and tested it over the Freenet, proving that the scheme can efectively resist the impact of low analysis on the anonymity of communication while ensuring communication data security. By analyzing the scheme’s performance, the author believes that the scheme is practical and is suitable for scenarios that are not time-sensitive but require high anonymity.
Chapter
Full-text available
We explain what properties the jury looked for to identify strong contributions and why they are important. They are formulated as seven pieces of advice: (1) Be in scope, (2) Enumerate your assumptions, (3) Delineate your contribution, (4) Honestly discuss limitations, (5) Show usefulness and practical applicability, (6) Have a well-prepared nutshell, and (7) Be timeless.
Chapter
Full-text available
Differential software testing is important for software quality assurance as it aims to automatically generate test inputs that reveal behavioral differences in software. Detecting regression bugs in software evolution, analyzing side-channels in programs, maximizing the execution cost of a program over multiple executions, and evaluating the robustness of neural networks are instances of differential software analysis to generate diverging executions of program paths. The key challenge thereby is to simultaneously reason about multiple program paths, often across program variants, in an efficient way. Existing work in differential testing is often not (specifically) directed to reveal a different behavior or is limited to a subset of the search space. This work proposes the concept of Hybrid Differential Software Testing ( HyDiff ) as a hybrid analysis technique to generate difference revealing inputs. HyDiff consists of two components that operate in a parallel setup: (1) a search-based technique that inexpensively generates inputs and (2) a systematic exploration technique to also exercise deeper program behaviors. HyDiff ’s search-based component uses differential fuzzing directed by differential heuristics. HyDiff ’s systematic exploration component is based on differential dynamic symbolic execution that allows to incorporate concrete inputs in its analysis. HyDiff is evaluated experimentally with applications specific for differential testing. The results show that HyDiff is effective in all considered categories and outperforms its components in isolation.
Chapter
Full-text available
Modern software architectures are becoming increasingly complex and interdependent. The days of exclusive in-house software development by companies are over. A key force contributing to this shift is the abundant use of open source frameworks, components, and libraries in software development. Over 90% of all software products include open source components. Being efficient, robust, and affordable, they often cover the non-differentiating product requirements companies have. However, the uncontrolled use of open source software in products comes with legal, engineering, and business risks stemming from incorrect software licensing, copyright issues, and supply chain vulnerabilities. While recognized by a handful of companies, this topic remains largely ignored by the industry and little studied by the academia. To address this relevant and novel topic, we undertook a 3-year research project into open source governance in companies, which resulted in a doctoral dissertation. The key results of our work include a theory of industry best practices, where we captured how more than 20 experts from 15 companies worldwide govern their corporate use of open source software. Acknowledging the broad industry relevance of our topic, we developed a handbook for open source governance that enabled practitioners from various domains to apply our findings in their companies. We conducted three evaluation case studies, where more than 40 employees at three Germany-based multinational companies applied our proposed best practices. This chapter presents the highlights of building and implementing the open source governance handbook.
Chapter
Full-text available
Modern embedded software systems are becoming more and more complex. Engineering embedded systems raise specific challenges that are rarely present in other software engineering disciplines due to the systems’ steady interactions with their environment. Research and industry often describe embedded systems as component and connector models (C&C). C&C models describe the logical architecture by focusing on software features and their logical communications. In C&C models, hierarchical decomposed components encapsulate features, and connectors model the data flow between components via typed ports. As extra-functional properties, for example, safety and security, are also key features of embedded systems, C&C models are mostly enriched with them. However, the process to develop, understand, validate, and maintain large C&C models for complex embedded software is onerous, time consuming, and cost intensive. Hence, the aim of this chapter is to support the automotive software engineer with: (i) automatic consistency checks of large C&C models, (ii) automatic verification of C&C models against design decisions, (iii) tracing and navigating between design and implementation models, (iv) finding structural inconsistencies during model evolution, (v) presenting a flexible approach to define different extra-functional properties for C&C models, and (vi) providing a framework to formalize constraints on C&C models for extra-functional properties for automatic consistency checks.
Chapter
Full-text available
This is the introductory chapter of the book on the Ernst Denert Software Engineering Award 2020. It provides an overview of the 11 nominated PhD theses, the work of the award winner, and the structure of the book.
Chapter
Full-text available
Legacy systems are business-critical software systems whose failure can have a significant impact on the business. Yet, their maintenance and adaption to changed requirements consume a considerable amount of the total software development costs. Frequently, domain experts and developers involved in the original development are not available anymore, making it difficult to adapt a legacy system without introducing bugs or unwanted behavior. This results in a dilemma: businesses are reluctant to change a working system, while at the same time struggling with its high maintenance costs. We propose the concept of Structured Software Reengineering replacing the ad hoc forward engineering part of a reengineering process with the application of behavior-preserving, proven-correct transformations improving nonfunctional program properties. Such transformations preserve valuable business logic while improving properties such as maintainability, performance, or portability to new platforms. Manually encoding and proving such transformations for industrial programming languages, for example, in interactive proof assistants, is a major challenge requiring deep expert knowledge. Existing frameworks for automatically proving transformation rules have limited expressiveness and are restricted to particular target applications such as compilation or peep-hole optimizations. We present Abstract Execution , a specification and verification framework for statement-based program transformation rules on JAVA programs building on symbolic execution. Abstract Execution supports universal quantification over statements or expressions and addresses properties about the (big-step) behavior of programs. Since this class of properties is useful for a plethora of applications, Abstract Execution bridges the gap between expressiveness and automation. In many cases, fully automatic proofs are in possible. We explain REFINITY, a workbench for modeling and proving statement-level JAVA transformation rules, and discuss our applications of Abstract Execution to code refactoring, cost analysis of program transformations, and transformations reshaping programs for the application of parallel design patterns.
Chapter
Full-text available
Recent advances in mobile connectivity as well as increased computational power and storage in sensor devices have given rise to a new family of software architectures with challenges for data and communication paths as well as architectural reconfigurability at runtime. Established in 2012, Fog Computing describes one of these software architectures. It lacks a commonly accepted definition, which manifests itself in the missing support for mobile applications as well as dynamically changing runtime configurations. The dissertation “Dynamically Scalable Fog Architectures” provides a framework that formalizes Fog Computing and adds support for dynamic and scalable Fog Architectures. The framework called xFog (E x tension for Fog Computing) models Fog Architectures based on set theory and graphs. It consists of three parts: xFogCore, xFogPlus, and xFogStar. xFogCore establishes the set theoretical foundations. xFogPlus enables dynamic and scalable Fog Architectures to dynamically add new components or layers. Additionally, xFogPlus provides a View concept which allows stakeholders to focus on different levels of abstraction. These formalizations establish the foundation for new concepts in the area of Fog Computing. One such concept, xFogStar, provides a workflow to find the best service configuration based on quality of service parameters. The xFog framework has been applied in eight case studies to investigate the applicability of dynamic Fog Components, scalable Fog Architectures, and the service provider selection at runtime. The case studies, covering different application domains—ranging from smart environments, health, and metrology to gaming—successfully demonstrated the feasibility of the formalizations provided by xFog, the dynamic change of Fog Architectures by adding new components and layers at runtime, as well as the applicability of a workflow to establish the best service configuration.
Chapter
The development of real-time embedded systems is usually preceded by an important design phase to ensure that functional and behavioural constraints are met. However, the modification of some systems, especially Unmanned Air Vehicles that need to be frequently customised, is typically done in an ad-hoc way. Indeed, the design information may not be available, which may affect the proper functioning of the system. This paper aims to propose a framework helping reverse-engineering a Modifiable Off-The-Shelf (MOTS) embedded system in order to be able to ease its modification. In other words, our objective is to point out where modifications have to happen, and allow smooth use of third-party analysis and/or architecture exploration tools to re-analyse non-functional properties (safety, performances, etc.) regarding the customisation. This framework extracts functional-chains from the source code and represents them visually as a model-based design by using model-driven engineering settings.
Article
Full-text available
Software reverse engineering (SRE) plays a crucial role in contemporary software environments. Software developers may implement a system first then use SRE tools to generate design content such as the Unified Modeling Language (UML) diagrams. In the literature of SRE, studies mostly focus on how precisely the conversion can reflect the system, there is, however, little or no research that further looks into the quality of the converted results. Therefore, this paper presents an online knowledge-based ontological SRE system, OntRECoh, for quality evaluation of converted UML structural models. OntRECoh features a domain-specific knowledge base that focuses on cohesion design and a rule-based inference engine for rating the cohesion scores of a tested system that is written in Java code and providing improvement recommendations through its Web-based interface. Furthermore, in the work of evaluation, OntRECoh measures both static and dynamic cohesion of the tested system that addresses not only the design aspects but also the implementation aspects, for the evaluation to be more comprehensive and synthetic in the SRE context.
Article
Full-text available
Software maintenance and harvesting reusable components from software both require that an analyst reconstruct the software's design. Design recovery recreates design abstractions from a combination of code, existing design documentation (if available), personal experience and general knowledge about problem and application domains. The author shows how to extend the automated assistance available to the software engineer for this process. He explains the concept of design recovery, proposes an architecture to implement the concept, illustrates how the architecture operates, describes progress toward implementing it, and compares this work with other similar work such as reverse engineering and program understanding. Much of the discussion is based on a model-based design recovery system called Desire
A suggested approach to the process of reverse engineering is presented from a tutorial viewpoint. Reverse engineering is considered as the process of developing a set of specifications for a complex hardware system by an orderly examination of specimens of that system. It is assumed that the specifications are being developed for the purpose of making a clone of the original hardware system, a circumstance which requires the most comprehensive form of specifications. Document formats are described for recording information and data in the reverse engineering process that are synergistic with the method recommended for use in guiding that process. The documentation suggested directs the reverse engineering process that uncovers the information which is recorded in the documents. A reverse engineering method is described that is based on the premise that a complex hardware system can be characterized as a hierarchical structure. The key feature of this method is that when it is applied to a specific level in the hardware structural hierarchy, it is used to uncover the internal particulars of a specific item of that level in terms of the elements located in the immediately subordinate level and how those elements are interconnected.
Application Reengineering Guide Pub. tee on Software Engineering, president of the GPP-208
" Application Reengineering, " Guide Pub. tee on Software Engineering, president of the GPP-208, Guide Int'l Corp., Chicago, 1989. International Workshop on CASE, and author ofabookonCASEintheTechnologySeriesfor IEEE Computer Society Press. He% a senior member of the IEEE.
On Reverse Engineering," &lt;i&gt
  • . M G Jr
Design Recovery for Main-Chikofsky is an associate editor-inchief of tenance and Reuse
  • T J Biggerstaff
T.J. Biggerstaff, "Design Recovery for Main-Chikofsky is an associate editor-inchief of tenance and Reuse," Complter, July 1989, lEl% Sofhuar, vice chairman for membership pp. 3649. of the Computer Society's Technical Commit-
Cross II is an assistant professor of computer science and engineering at Auburn
  • H James
James H. Cross II is an assistant professor of computer science and engineering at Auburn