ArticlePDF Available

On the Criteria To Be Used in Decomposing Systems into Modules

Authors:
  • Middle Road Software

Abstract

This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time. The effectiveness of a "modularization" is dependent upon the criteria used in dividing the system into modules. A system design problem is presented and both a conventional and unconventional decomposition are described. It is shown that the unconventional decompositions have distinct advantages for the goals outlined. The criteria used in arriving at the decompositions are discussed. The unconventional decomposition, if implemented with the conventional assumption that a module consists of one or more subroutines, will be less efficient in most cases. An alternative approach to implementation which does not have this effect is sketched.
A preview of the PDF is not available
... As mentioned above, code is the interface to many data science tools, and SE is the discipline of organizing interfaces methodically. For this paper, we define SE as the discipline of managing the complexity of code and data with interfaces as one of its primary tools [Par72]. While many SE practices focus on enterprise software and do not trivially apply to all components of DSSs, it is our conviction that SE methodologies must play a more prominent role in future data science projects. ...
... Well-established code development principles are a critical component of SE. One key principle is the separation of concerns, which splits the software into different components, each handling a single isolated concern and possessing a simple, complexity-hiding interface [Par72]. These components are, in turn, formed by connecting lower-level isolated components. ...
Preprint
In this perspective, we argue that despite the democratization of powerful tools for data science and machine learning over the last decade, developing the code for a trustworthy and effective data science system (DSS) is getting harder. Perverse incentives and a lack of widespread software engineering (SE) skills are among many root causes we identify that naturally give rise to the current systemic crisis in reproducibility of DSSs. We analyze why SE and building large complex systems is, in general, hard. Based on these insights, we identify how SE addresses those difficulties and how we can apply and generalize SE methods to construct DSSs that are fit for purpose. We advocate two key development philosophies, namely that one should incrementally grow -- not biphasically plan and build -- DSSs, and one should always employ two types of feedback loops during development: one which tests the code's correctness and another that evaluates the code's efficacy.
... The reuse of pre-trained NLP models is very similar to the software reuse before the notion of modular programming. With finer reuse [64,65], parts of the software can be reused and replaced. Recently, for image-based classification problems, Pan and Rajan [62,63] have proposed an approach to decompose a monolithic model into modules to enable reusability and replaceability of the decomposed modules. ...
Preprint
Full-text available
In NLP, reusing pre-trained models instead of training from scratch has gained popularity; however, NLP models are mostly black boxes, very large, and often require significant resources. To ease, models trained with large corpora are made available, and developers reuse them for different problems. In contrast, developers mostly build their models from scratch for traditional DL-related problems. By doing so, they have control over the choice of algorithms, data processing, model structure, tuning hyperparameters, etc. Whereas in NLP, due to the reuse of the pre-trained models, NLP developers are limited to little to no control over such design decisions. They either apply tuning or transfer learning on pre-trained models to meet their requirements. Also, NLP models and their corresponding datasets are significantly larger than the traditional DL models and require heavy computation. Such reasons often lead to bugs in the system while reusing the pre-trained models. While bugs in traditional DL software have been intensively studied, the nature of extensive reuse and black-box structure motivates us to understand the different types of bugs that occur while reusing NLP models? What are the root causes of those bugs? How do these bugs affect the system? To answer these questions, We studied the bugs reported while reusing the 11 popular NLP models. We mined 9,214 issues from GitHub repositories and identified 984 bugs. We created a taxonomy with bug types, root causes, and impacts. Our observations led to several findings, including limited access to model internals resulting in a lack of robustness, lack of input validation leading to the propagation of algorithmic and data bias, and high-resource consumption causing more crashes, etc. Our observations suggest several bug patterns, which would greatly facilitate further efforts in reducing bugs in pre-trained models and code reuse.
... This creates value for firms because it enables efficient transactions (Baldwin, 2008), allows for independent technological innovations, and conserves scarce cognitive resources (Colfer and Baldwin, 2016). Furthermore, firms can use modularity strategically to hide information (Parnas, 1972) and protect intellectual property (Baldwin and Henkel, 2015). As this practice can permit firms to produce innovative products at a low cost, modularity can create value for customers. ...
Article
Full-text available
How will the technological shift from internal combustion engine vehicles (ICEVs) to battery electric vehicles (BEVs) change the architecture of the automotive industry? To explore this question, we systematically compare the technological structure of ICEVs and BEVs using data from large incumbent automobile companies and start-ups. While our analysis based on technical descriptions and design structure matrices suggests that the power train of BEVs is structurally simpler compared to the power train of ICEVs, BEVs are slightly less modular than ICEVs. We discusss important strategic implications of this finding for incumbent firms and start-ups.
Article
Full-text available
It is a great honor to have a special issue of Industrial and Corporate Change dedicated to The Power of Modularity Today: 20 Years of “Design Rules.” In this retrospective article, I will give some background on how Design Rules, Volume 1: The Power of Modularity came to be written and list what I believe were its major contributions. I go on to describe the book’s shortcomings and gaps, which prevented the widespread adoption of the tools and methods the book proposed. I then explain how the perception of the gaps influenced my thinking in Design Rules, Volume 2: How Technology Shapes Organizations.
Article
This paper elaborates on how design rules emerge and evolve as firms’ micro-level choices of product and organization architectures coevolve with changes in product markets and an industry’s competitive and cooperative dynamics. We suggest that the design rules a firm adopts will vary according to firms’ strategic choices of product and organization architectures that they believe are or may become feasible in a given industry. Building on the mirroring hypothesis that product designs a firm adopts will influence the organization designs it uses, we develop a model that identifies key relationships that influence firms’ strategic choices of product and organization architectures and associated design rules. We then elaborate on key interactions between firm-level architectural choices and the architecture-enabled competitive and cooperative dynamics that obtain in an industry. Our model identifies strategically important aspects of open- and closed-system architectures and modular and nonmodular architectures that impact industry structures, interfirm interactions, and resulting industry dynamics. Drawing on these analyses, we suggest how firms’ strategic choices of architectures are influenced by their assessments of (i) the potential for capturing value through both gains from specialization and gains from trade that firms believe will be enabled by their architectural choices and (ii) both ex ante and ex post transaction costs implied by their architecture decisions. We conclude by suggesting how the perspective on firm’s strategic architectural decisions we develop here enables new approaches to understanding evolutions of both product markets and industry structures for serving product markets.
Article
Concurrent design facilities hold the promise of shorter design cycles with efficient cross-disciplinary integration. However, when an atypical design problem is encountered, the standard organization may be a poor fit to solve it, resulting in problems during the design process. This study examines the extent to which different types of novelty in design problems lead to poor fit with a standard organization, with implications for design process performance. We use an empirical study of a NASA concurrent design team to identify common perturbations in design problems, then a computational simulation to examine their effect on fit. The findings suggest that perturbations localized to one or a few designers are manageable within standard structures, but those with diffuse impacts may generate difficult-to-predict issues in the design process. These results suggest when concurrent design facilities can accommodate novel design problems and when they may need to adapt their design approaches.
Preprint
Full-text available
Separation of concerns allows the achievement of important benefits in all phases of the software development life cycle. Thus, it is possible to take advantage of this technique with the consequent improvement of the understanding of the models. However, conflicts of different types may be produced when concerns are composed to form a single system, due to the same fact of having them managed separately. Additionally, this problem is increased by the number of people needed to deal with large projects. This article is focused on the composition of concerns in structural models, in which each concern is realized by an individual class diagram. The same classes are present in several diagrams, although with different members and relationships, due to the specific nature of the concern they realize. In systems of medium to high complexity, the intervention of more than one analyst is required and, because of the working styles each one might have, many other conflicts could arise when all diagrams are composed into only one. In this paper, we present an experience in which four systems analysts elaborated six class diagrams which belong to a single system and we expose the conflicts that occurred after the composition of the diagrams. After the analysis and classification of the conflicts, a set of modeling agreements and recommendations was elaborated in order to reduce them. Then, the models were rebuilt and composed again, with a significant decrease in the number of conflicts detected after the second composition.
Preprint
Full-text available
Software weaknesses that create attack surfaces for adversarial exploits, such as lateral SQL injection (LSQLi) attacks, are usually introduced during the design phase of software development. Security design patterns are sometimes applied to tackle these weaknesses. However, due to the stealthy nature of lateral-based attacks, employing traditional security patterns to address these threats is insufficient. Hence, we present SEAL, a secure design that extrapolates architectural, design, and implementation abstraction levels to delegate security strategies toward tackling LSQLi attacks. We evaluated SEAL using case study software, where we assumed the role of an adversary and injected several attack vectors tasked with compromising the confidentiality and integrity of its database. Our evaluation of SEAL demonstrated its capacity to address LSQLi attacks.
Conference Paper
A programmer using existing programming languages typically codes a problem by (1) defining it, then (2) analyzing the processing requirements, and (3) on the basis of these requirements, choosing a data representation, and finally, (4) coding the problem. Almost always, difficulties arise because necessary processing not envisioned in the analysis phase makes the chosen data representation inappropriate because of a lack of space, efficiency, ease of use or some combination of these. The decision is then made to either live with these difficulties or change the data representation. Unfortunately, changing the data representation usually involves making extensive changes to the code already written. Furthermore, there is no assurance that this dilemma will not recur with the new data representation.
Article
A multiprogramming system is described in which all activities are divided over a number of sequential processes. These sequential processes are placed at various hierarchical levels, in each of which one or more independent abstractions have been implemented. The hierarchical structure proved to be vital for the verification of the logical soundness of the design and the correctness of its implementation.
Article
The book, suitable for a second course in computer programming at the graduate level, is for undergraduates as well as graduates interested in the design of programming languages and in the implementation of language processors as well as for those who are using computers and are faced with the need for developing data structures appropriate to their problems. Areas covered include Markov Algorithms and primitive elements of programming, the ALGOL language, a general view of data structures, and extendability of languages through definitions. (Author)