Conference Paper

Exploring the Relationship Between Architecture Coupling and Software Vulnerabilities

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Employing software metrics, such as size and complexity, for predicting defects has been given a lot of attention over the years and proven very useful. However, the few studies looking at software architecture and vulnerabilities are limited in scope and findings. We explore the relationship between software vulnerabilities and component metrics (like code churn and cyclomatic complexity), as well as architecture coupling metrics (direct, indirect, and cyclic coupling). Our case is based on the Google Chromium project, an open source project that has not been studied for this topic yet. Our findings show a strong relationship between vulnerabilities and both component level metrics and architecture coupling metrics. 68% of the files associated with a vulnerability are cyclically coupled, compared to 43% of the non-vulnerable files. Our best regression model is a combination of low commenting, high code churn, high direct fan-out within the main cyclic group, and high direct fan-in outside of the main cyclic group.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Hence, various metrics are proposed to measure software complexity, including lines of code (LOC), McCabe's Cyclomatic Complexity (MCC) [74], Halstead's metrics [75], function points [76], and architecture coupling measures [77]. Each of these static code attributes tries to measure complexity from different perspectives. ...
... Despite being useful, these metrics can not capture more fundamental aspects of complexity, including architectural complexity. For instance, previous work proposes limited architecture coupling metrics (direct, indirect, and cyclic coupling) on the Google Chromium project [77]. Their findings show a strong relationship between existing Chromium vulnerabilities and architecture coupling metrics. ...
Thesis
This dissertation proposes novel approaches and mechanisms for application compartmental- ization and isolation to reduce their ever-growing attack surface. Our approach is motivated by the key observation that while hardware vendors compete to provide security features, notably memory safety and privilege separation, existing systems software like commodity OSs fail to utilize such features to improve application security and privacy properly. By proposing a novel principled approach to privilege separation and isolation, this work enables application security to be designed and enforced within and across different isolation boundaries yet remain flexible in the face of diverse threats and changing hardware requirements. We begin by analyzing the effectiveness of existing systems in mitigating ever-increasing threats. We explore their efficacy where diverse compartments, such as processes, sandboxes, or trusted execution environments (TEEs)/enclaves, are involved. We call such computing environ- ments hetero-compartment, which are becoming the future of modern applications. This thesis focuses on resolving three fundamental limitations of the state-of-the-art compartmentalization techniques. The most important one is the inability to scale, extend, and monitor compartments beyond a fixed security model, a single privilege layer (e.g., userspace), or address space bound- aries in hetero-compartment environments. Second is the lack of flexible isolation and secure resource sharing. Finally, the third key limitation is ineffective hardware utilization, which leads to significant overhead and weak security, particularly in resource-constrained devices. We propose dispersed compartments as a fundamentally new approach for building ap- plications by encapsulating arbitrary isolation boundaries across privilege levels. Dispersed compartments provide a unified model for extensible and auditable compartmentalization. To enable such system-wide privilege separation, we introduce two key concepts; first, dispersed monitoring to check extensible security policies. Second, dispersed enforcement to enforce isolation and security policies across various privilege boundaries while reducing the trusted computing base (TCB) through deprivileging the host kernel on-demand. Furthermore, we present SIRIUS, our implementation of these security primitives on commodity hardware by focusing on ARM and x86-64 platforms. SIRIUS includes new security extensions and ab- stractions within the underlying OSs, firmware, and TEE stacks. Moreover, it provides a novel userspace API to reduce application modifications during compartmentalization. Finally, we show the significant security and performance benefits of SIRIUS through microbenchmarks, compartmentalizing real-world applications, and investigating major attack vectors.
... Furthermore, function fusion introduces additional complexity into the coding process. To perform the same tasks, developers must write extra code, which, according to Lagerström et al. (2017), could lead to undesirable consequences in the long term. Complex code tends to give rise to security vulnerabilities, bugs, and errors, potentially resulting in higher costs or security risks. ...
Thesis
Full-text available
Serverless functions have emerged as a prominent paradigm in software deployment, providing automated resource scaling, resulting in demand-based operational expenses. One of the most significant challenges associated with serverless functions is the cold start delay, preventing organisations with latency-critical web applications from adopting a serverless technology. Existing research on the cold start problem primarily focuses on mitigating the delay by modifying and optimising serverless platform technologies. However, these solutions have predominantly yielded modest reductions in time delay. Consequently, the purpose of this study is to establish conditions and circumstances under which the cold start issue can be addressed through the type of approach presented in this study. Through a design science research methodology, a software artefact named AdaptiveServerless Invocation Predictor (ASIP) was developed to mitigate the cold start issue through monitoring web application user traffic in real-time. Based on the user traffic, ASIP preemptively pre-initialises serverless functions likely to be invoked, to avoid cold start occurrences. ASIP was tested against a realistic workload generated by test participants. Evaluation of ASIP was performed through analysing the reduction in time delay achieved and comparing this against existing cold start mitigation strategies. The results indicate that predicting serverless function invocations based on real-time traffic analysis is a viable approach, as a tangible reduction in response time was achieved. Conclusively, the cold start mitigation strategy assessed and presented in this study may not provide a sufficiently significant mitigation effect relative to the required implementation effort and operational expenses. However, the study has generated valuable insights regarding circumstantial factors concerning cold start mitigation. Consequently, this study provides a proof of concept for a more sophisticated version of the mitigation strategy developed in this study, with greater potential to provide a significant delay reduction without requiring substantial computational resources.
... Research literature is quite rich in the derivation of new coupling metrics. These metrics have been used in various disciplines like in SFP, design patterns (Antoniol, Fiutem & Cristoforetti, 1998), re-modularization (Abreu, Pereira & Sousa, 2000), assessing software quality (Briand et al., 2000), maintenance cost (Akaikine, 2010), productivity (Sturtevant, 2013), software vulnerabilities (Lagerström et al., 2017), reusability (Hristov et al., 2012), changeability (Rongviriyapanish et al., 2016;Parashar & Chhabra, 2016;Kumar, Rath & Sureka, 2017), and reliability (Yadav & Khan, 2012). ...
Article
Full-text available
Software is a complex entity, and its development needs careful planning and a high amount of time and cost. To assess quality of program, software measures are very helpful. Amongst the existing measures, coupling is an important design measure, which computes the degree of interdependence among the entities of a software system. Higher coupling leads to cognitive complexity and thus a higher probability occurrence of faults. Well in time prediction of fault-prone modules assists in saving time and cost of testing. This paper aims to capture important aspects of coupling and then assess the effectiveness of these aspects in determining fault-prone entities in the software system. We propose two coupling metrics, i.e., Vovel-in and Vovel-out, that capture the level of coupling and the volume of information flow. We empirically evaluate the effectiveness of the Vovel metrics in determining the fault-prone classes using five projects, i.e., Eclipse JDT, Equinox framework, Apache Lucene, Mylyn, and Eclipse PDE UI. Model building is done using univariate logistic regression and later Spearman correlation coefficient is computed with the existing coupling metrics to assess the coverage of unique information. Finally, the least correlated metrics are used for building multivariate logistic regression with and without the use of Vovel metrics, to assess the effectiveness of Vovel metrics. The results show the proposed metrics significantly improve the predicting of fault prone classes. Moreover, the proposed metrics cover a significant amount of unique information which is not covered by the existing well-known coupling metrics, i.e., CBO, RFC, Fan-in, and Fan-out. This paper, empirically evaluates the impact of coupling metrics, and more specifically the importance of level and volume of coupling in software fault prediction. The results advocate the prudent addition of proposed metrics due to their unique information coverage and significant predictive ability.
... For this study, we select the Chromium OS project for the following five reasons-(i) it is one of the most popular OSS projects, (ii) it is a large-scale matured project containing more than 41.7 million Source Lines of Code (SLOC) [25]. (iii) it has been conducting tool-based code reviews for almost a decade, (iv) it maintains security advisories 2 to provide regular updates on identified security vulnerabilities, and (v) it has been subject to prior studies on security vulnerabilities [14], [30], [38], [39], [43]. ...
Preprint
Peer code review has been found to be effective in identifying security vulnerabilities. However, despite practicing mandatory code reviews, many Open Source Software (OSS) projects still encounter a large number of post-release security vulnerabilities, as some security defects escape those. Therefore, a project manager may wonder if there was any weakness or inconsistency during a code review that missed a security vulnerability. Answers to this question may help a manager pinpointing areas of concern and taking measures to improve the effectiveness of his/her project's code reviews in identifying security defects. Therefore, this study aims to identify the factors that differentiate code reviews that successfully identified security defects from those that missed such defects. With this goal, we conduct a case-control study of Chromium OS project. Using multi-stage semi-automated approaches, we build a dataset of 516 code reviews that successfully identified security defects and 374 code reviews where security defects escaped. The results of our empirical study suggest that the are significant differences between the categories of security defects that are identified and that are missed during code reviews. A logistic regression model fitted on our dataset achieved an AUC score of 0.91 and has identified nine code review attributes that influence identifications of security defects. While time to complete a review, the number of mutual reviews between two developers, and if the review is for a bug fix have positive impacts on vulnerability identification, opposite effects are observed from the number of directories under review, the number of total reviews by a developer, and the total number of prior commits for the file under review.
... There is also an increasing focus on investigating the relationship between the design phase (e.g. [17]), developer activities (e.g. [30]), TD (e.g. ...
Preprint
Full-text available
Technical debt (TD), its impact on development and its consequences such as defects and vulnerabilities, are of common interest and great importance to software researchers and practitioners. Although there exist many studies investigating TD, the majority of them focuses on identifying and detecting TD from a single stage of development. There are also studies that analyze vulnerabilities focusing on some phases of the life cycle. Moreover, several approaches have investigated the relationship between TD and vulnerabilities, however, the generalizability and validity of findings are limited due to small dataset. In this study, we aim to identify TD through multiple phases of development, and to automatically measure it through data and text mining techniques to form a comprehensive feature model. We plan to utilize neural network based classifiers that will incorporate evolutionary changes on TD measures into predicting vulnerabilities. Our approach will be empirically assessed on open source and industrial projects.
... strategy to prioritize software feature production and tested on a software system at Ericsson [5] [11]. Besides, the method can also be used to explore the relationship between software vulnerabilities and component metrics, as well as architecture coupling metrics [23]. ...
Conference Paper
Digitization has increased exposure and opened up for more cyber threats and attacks. To proactively handle this issue, enterprise modeling needs to include threat management during the design phase that considers antagonists, attack vectors, and damage domains. Agile methods are commonly adopted to efficiently develop and manage software and systems. This paper proposes to use an enterprise architecture repository to analyze not only shipped components but the overall architecture, to improve the traditional designs represented by legacy systems in the situated IT-landscape. It shows how the hidden structure method (with Design Structure Matrices) can be used to evaluate the enterprise architecture, and how it can contribute to agile development. Our case study uses an architectural descriptive language called ArchiMate for architecture modeling and shows how to predict the ripple effect in a damaging domain if an attacker’s malicious components are operating within the network.
... Bu çalışmada ayrıca birkaç veri dengeleme yaklaşımlarının yazılım zafiyet kestirimi üzerine etkileri de incelenmiştir. Largerstorm ve arkadaşları [4] farklı yazılım metriklerinin zafiyet tespiti üzerindeki etkilerini incelemiştir. Bileşen ve mimari seviyedeki metriklerin, zafiyet tespitinde etkili oldukları görülmüştür. ...
Conference Paper
Full-text available
Software vulnerability prediction aims to detect vulnerabilities in the source code before the software is deployed into the operational environment. The accurate prediction of vulnerabilities helps to allocate more testing resour-ces to the vulnerability-prone modules. From the machine learning perspective, this problem is a binary classification task which classifies software modules in-to vulnerability-prone and non-vulnerability-prone categories. Several machine learning models have been built for addressing the software vulnerability pre- diction problem, but the performance of the state-of-the-art models is not yet at an acceptable level. In this study, we aim to improve the performance of software vulnerability prediction models by using Extreme Learning Machines (ELM) algorithms which have not been investigated for this problem. Before we apply ELM algorithms for selected three public datasets, we use data balan-cing algorithms to balance the data points which belong to two classes. We dis-cuss our initial experimental results and provide the lessons learned. In particu-lar, we observed that ELM algorithms have a high potential to be used for add-ressing the software vulnerability prediction problem.
... Ozkaya [79] shows that metrics derived from DSMs can be used to assess the value released by "re-factoring" designs with poor architectural properties. And Lagerström et al. [80] connects DSM-based complexity measures with known vulnerabilities in Google Chrome. ...
Article
Full-text available
Recent contributions to information systems theory suggest that the primary role of a firm’s information technology (IT) architecture is to facilitate, and therefore ensure the continued alignment of a firm’s IT investments with a constantly changing business environment. Despite the advances we lack robust methods with which to operationalize enterprise IT architecture, in a way that allows us to analyze performance, in terms of the ability to adapt and evolve over time. We develop a methodology for analyzing enterprise IT architecture based on “Design Structure Matrices” (DSMs), which capture the coupling between all components in the architecture. Our method addresses the limitations of prior work, in that it i) captures the architecture “in-use” as opposed to high level plans or conceptual models; ii) identifies discrete layers in the architecture associated with different technologies; iii) reveals the “flow of control” within the architecture; and iv) generates measures that can be used to analyze performance. We apply our methodology to a dataset from a large pharmaceutical firm. We show that measures of coupling derived from an IT architecture DSM predict IT modifiability – defined as the cost to change software applications. Specifically, applications that are tightly coupled cost significantly more to change.
... Lagerstrom et al 21 We did not encounter a paper, which evaluates the effect of data balancing techniques on SVP models in detail in literature and as outlined Also in the source manuscripts, these results are presented in this way. ...
Article
Full-text available
Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text‐based features are more useful, and ensemble‐based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance.
... This can be achieved by automatic modeling (Holm et al., 2014;Välja et al., 2015) and using reference architecture models Vernotte et al., 2018), however, neither of these has been tested within the vehicle domain. To enhance the precision, it could also be useful to couple this approach with databases containing known vulnerabilities (Johnson et al., 2016b;Lagerström et al., 2017a). ...
Conference Paper
Modern vehicles are dependent on software, and are often connected to the Internet or other external services, which makes them vulnerable to various attacks. To improve security for Internet facing systems, holistic threat modeling is becoming a common way to proactively make decisions and design for security. One approach that has not been commonly implemented is to enhance the threat models with probabilistic attack simulations. That is, incorporating security intelligence, attack types, vulnerabilities, and countermeasures to get objective security metrics and risk assessments. This combination has been shown efficient in other disciplines, e.g. energy and banking. However, it has so far been fairly unexplored in the vehicle domain. This position paper reviews previous research in the field, and implements a vehicle threat model using a tool called securiCAD, based on which future research requirements for connected vehicle attack simulations are also derived. The main findings are: 1) not much work has been done in the combined area of connected vehicles and threat modeling with attack simulations, 2) initial tests show that the approach is useful, 3) more research in vehicle specific attacks and countermeasures is needed in order to provide more accurate simulation results, and 4) a more tailored metamodel is needed for the vehicle domain.
... This can be achieved by automatic modeling (Holm et al., 2014;Välja et al., 2015) and using reference architecture models Vernotte et al., 2018), however, neither of these has been tested within the vehicle domain. To enhance the precision, it could also be useful to couple this approach with databases containing known vulnerabilities (Johnson et al., 2016b;Lagerström et al., 2017a). ...
Preprint
Modern vehicles contain more than a hundred Electronic Control Units (ECUs) that communicate over different in-vehicle networks, and they are often connected to the Internet, which makes them vulnerable to various cyber-attacks. Besides, data collected by the connected vehicles is directly connected to the vehicular network. Thus, big vehicular data are collected, which are valuable and generate insights into driver behavior. Previously, a probabilistic modeling and simulation language named vehicleLang is presented to analyze the security of connected vehicles. However, the privacy issues of vehicular data have not been addressed. To fill in the gap, this work present a privacy specification for vehicles based on vehicleLang, which uses the Meta Attack Language (MAL) to assess the security of connected vehicles in a formal way, with a special focus on the privacy aspect. To evaluate this work, test cases are also presented.
... These exploited software vulnerabilities lead to a loss of confidentiality, integrity, and availability which translated into a loss of time and money. Several studies indicated that the main cause of vulnerabilities is software complexity [1]. Software practitioners hypothesize that ill-designed and maintained systems tend to be vulnerable. ...
Article
Full-text available
Software vulnerabilities might be exploited badly which might eventually lead to a loss of confidentiality, integrity, and availability which translated into a loss of time and money. Although several studies indicated that complexity in software is the main cause of vulnerabilities, still the argument is poorly designed and maintained. Moreover, some studies have already related complexity to vulnerabilities and found that this cannot be generalized. In this work, we explored that what are the factors that contribute more to make a software vulnerable. Several feature selection techniques were applied to find the contribution of each feature. Five classifiers are used in this study to predict the vulnerable classes. The dataset is collected from twelve Java applications, where these applications are analyzed and based on complexity, code coverage, and security. The studied applications are varying in its characteristics regarding a number of code lines, used classes; application size, etc. The result indicates that complexity in all its components (size, depth of inheritance, etc.) can be utilized in predicting vulnerabilities.
... • quality Metrics such as cohesion, code churn, code com- plexity, architectural coupling, and developer activity metrics have been linked to vulnerability [8,19,30]. us, quality metrics can indicate the presence of vulnerability. In addition, Williams et al. [34] show that 26% of library downloads have known vulnerabilities. ...
Conference Paper
Identifying security issues before attackers do has become a critical concern for software development teams and software users. While methods for finding programming errors (e.g. fuzzers ¹, static code analysis [3] and vulnerability prediction models like Scandariato et al. [10]) are valuable, identifying security issues related to the lack of secure design principles and to poor development processes could help ensure that programming errors are avoided before they are committed to source code. Typical approaches (e.g. [4, 6--8]) to identifying security-related messages in software development project repositories use text mining based on pre-selected sets of standard security-related keywords, for instance; authentication, ssl, encryption, availability, or password. We hypothesize that these standard keywords may not capture the entire spectrum of security-related issues in a project, and that additional project-specific and/or domain-specific vocabulary may be needed to develop an accurate picture of a project's security. For instance, Arnold et al.[1], in a review of bug-fix patches on Linux kernel version 2.6.24, identified a commit (commit message: "Fix - >vm_file accounting, mmap_region() may do do_munmap()" ²) with serious security consequences that was mis-classified as a non-security bug. While no typical security keyword is mentioned, memory mapping ('mmap') in the domain of kernel development has significance from a security perspective, parallel to buffer overflows in languages like C/C++. Whether memory or currency is at stake, identifying changes to assets that the software manages is potentially security-related. The goal of this research is to support researchers and practitioners in identifying security issues in software development project artifacts by defining and evaluating a systematic scheme for identifying project-specific security vocabularies that can be used for keyword-based classification. We derive three research questions from our goal: • RQ1: How does the vocabulary of security issues vary between software development projects? • RQ2: How well do project-specific security vocabularies identify messages related to publicly reported vulnerabilities? • RQ3: How well do existing security keywords identify project-specific security-related messages and messages related to publicly reported vulnerabilities? To address these research questions, we collected developer email, bug tracking, commit message, and CVE record project artifacts from three open source projects : Dolibarr, Apache Camel, and Apache Derby. We manually classified 5400 messages from the three project's commit messages, bug trackers, and emails, and linked the messages to each project's public vulnerability records, Adapting techniques from Bachmann and Bernstein [2], Schermann et al. [11], and Guzzi [5], we analyzed each project's security vocabulary and the vocabulary's relationship to the project's vulnerabilities. We trained two classifiers (Model.A and Model.B) on samples of the project data, and used the classifiers to predict security-related messages in the manually-classified project oracles. Our contributions include: • A systematic scheme for linking CVE records to related messages in software development project artifacts • An empirical evaluation of project-specific security vocabulary similarities and differences between project artifacts and between projects To summarize our findings on RQ1, we present tables of our qualitative and quantitative results. We tabulated counts of words found in security-related messages. Traditional security keywords (e.g. password, encryption) are present, particularly in the explicit column, but each project also contains terms describing entities unique to the project, for example 'endpoint' (Camel), 'blob' (short for 'Binary Large Object'), 'clob' ('Character Large Object'), 'deadlock' (Derby), and 'invoice', 'order' for Dolibarr. The presence of these terms in security-related issues suggests that they are assets worthy of careful attention during the development life cycle. Table 1 lists the statistics for security-related messages from the three projects, broken down by security class and security property. Explicit security-related messages (messages referencing security properties) are in the minority in each project. Implicit messages represent the majority of security-related messages in each project. In Table 2, we present the results of the classifiers built using the various project and literature security vocabularies to predict security-related messages in the oracle and CVE datasets. We have marked in bold the highest result for each performance measure for each dataset. Both Models A and B have a high performance across the projects when predicting for the oracle dataset of the project for which they were built. Further, the project-specific models have higher performance than the literature-based models (Ray.vocab [9] and Pletea.vocab [7]) on the project oracle datasets. Model performance is not sustained and is inconsistent when applied to other project's datasets. To summarize our findings on RQ2, Table 3 presents performance results for the project vocabulary models on the CVE datasets for each project. We have marked in bold the highest result for each performance measure for each dataset. Results for Model.A shows a high recall for Derby and Camel and a worse than average recall for Dolibarr. However, in Model.B, the recall is above 60% for Dolibarr and over 85% for both Derby and Camel. We reason the low precision is due to our approach of labeling only CVE-related messages as security-related and the rest of the messages are labeled to be not security-related. The Dolibarr results are further complicated by the low proportion of security-related messages compared with the other two projects (as reported in 1). To summarize our findings on RQ3, Table 2 and Table 3 present the classifier performance results for two sets of keywords, Ray.vocab, and Pletea.vocab, drawn from the literature. In each case, the project vocabulary model had the highest recall, precision and F-Score on the project's oracle dataset. With regards to the CVE-dataset, the project vocabulary model has the highest recall. However, the overall performance, as measured by F-Score, varied by dataset, with the Ray and Pletea keywords scoring higher than the project vocabulary model. The low precision for the classifier built on the project's vocabularies follows the explanation provided under RQ2. Our results suggest that domain vocabulary model show recalls that outperform standard security terms across our datasets. Our conjecture, supported in our data, is that augmenting standard security keywords with a project's security vocabulary yields a more accurate security picture. In future work, we aim to refine vocabulary selection to improve classifier performance, and to define tools implementing the approach in this paper to aid practitioners and researchers in identifying software project security issues.
... Files, directories, and lines have the same philosophy of the classes. For the file complexity, complexity is always the enemy of security [32]. Regarding duplicated blocks and critical violations, they are patterns found to be unhealthy for software systems especially security and privacy. ...
Article
Full-text available
As developers face an ever-increasing pressure to engineer secure software, researchers are building an understanding of security-sensitive bugs (i.e. vulnerabilities). Research into mining software repositories has greatly increased our understanding of software quality via empirical study of bugs. Conceptually, however, vulnerabilities differ from bugs: they represent an abuse of functionality as opposed to insufficient functionality commonly associated with traditional, non-security bugs. We performed an in-depth analysis of the Chromium project to empirically examine the relationship between bugs and vulnerabilities. We mined 374,686 bugs and 703 post-release vulnerabilities over five Chromium releases that span six years of development. We used logistic regression analysis, ranking analysis, bug type classifications, developer experience, and vulnerability severity metrics to examine the overarching question: are bugs and vulnerabilities in the same files? While we found statistically significant correlations between pre-release bugs and post-release vulnerabilities, we found the association to be weak. Number of features, source lines of code, and pre-release security bugs are, in general, more closely associated with post-release vulnerabilities than any of our non-security bug categories. In further analysis, we examined sub-types of bugs, such as stability-related bugs, and the associations did not improve. Even the files with the most severe vulnerabilities (by measure of CVSS or bounty payouts) did not show strong correlations with number of bugs. These results indicate that bugs and vulnerabilities are empirically dissimilar groups, motivating the need for security engineering research to target vulnerabilities specifically.
Article
Full-text available
Information technology is quickly spreading across critical infrastructures and software has become an inevitable part of industries and organisations. At the same time, many cyberthreats are the result of poor software coding. Stuxnet, which was the most powerful cyber-weapon used against industrial control systems, exploited zero-day vulnerabilities in Microsoft Windows.1 The US Department of Homeland Security (DHS) also announced that software vulnerabilities are among the three most common cyber-security vulnerabilities in Industrial Control Systems (ICSs).2 Therefore, improving software security has an important role in increasing the security level of computer-based systems. Software vulnerability prediction is a tedious task, so automating vulnerability prediction would save a lot of time and resources. One recently used methodology in vulnerability prediction is based on automatic fault prediction using software metrics. Here, Sara Moshtari, Ashkan Sami and Mahdi Azimi of Shiraz University, Iran build on previous studies by providing more complete vulnerability information. They show what can be achieved using different classification techniques and more complete vulnerability information.
Article
Full-text available
NVD is one of the most popular databases used by researchers to conduct empirical research on data sets of vulnerabilities. Our recent analysis on Chrome vulnerability data reported by NVD has revealed an abnormally phenomenon in the data where almost vulnerabilities were originated from the first versions. This inspires our experiment to validate the reliability of the NVD vulnerable version data. In this experiment, we verify for each version of Chrome that NVD claims vulnerable is actually vulnerable. The experiment revealed several errors in the vulnerability data of Chrome. Furthermore, we have also analyzed how these errors might impact the conclusions of an empirical study on foundational vulnerability. Our results show that different conclusions could be obtained due to the data errors.
Article
Full-text available
Security metrics and vulnerability prediction for software have gained a lot of interests from the community. Many software security metrics have been proposed e.g., complex-ity metrics, cohesion and coupling metrics. In this paper, we propose a novel code metric based on dependency graphs to predict vulnerable components. To validate the efficiency of the proposed metric, we conduct a prediction model which targets the JavaScript Engine of Firefox. In this experiment, our prediction model has obtained a very good result in term of accuracy and recall rates. This empirical result is a good evidence showing dependency graphs are also a good option for early indicating vulnerability.
Article
Full-text available
Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs and improve the quality of software. Objective: We investigate how the context of models, the independent variables used and the modelling techniques applied, influence the performance of fault prediction models. Method:We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesise the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modelling techniques such as Na??ve Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology and performance comprehensively.
Conference Paper
Full-text available
We introduce Vulture, a new approach and tool to predict vulnerable components in large software systems. Vulture relates a software project's version archive to its vulnerability database to find those components that had vulnerabilities in the past. It then analyzes the import structure of software com- ponents and uses a support vector machine to learn and predict which imports are most important for a component to be vulnerable. We evaluated Vulture on the C++ codebase of Mozilla and found that Vulture correctly identifies about two thirds of all vulnerable components. This allows developers and project managers to focus their testing and inspection efforts: "We should look at nsXPInstallManager more closely, because it is likely to contain yet unknown vulnerabilities."
Conference Paper
Full-text available
While there are many software metrics measuring the architecture of a system and its quality, few are able to assess architectural change qualitatively. Given the sheer size and complexity of current software systems, modifying the architecture of a system can have severe, unintended consequences. We present a method to measure architectural change by way of structural distance and show its strong relationship to defect incidence. We show the validity and potential of the approach in an exploratory analysis of the history and evolution of the Spring Framework. Using other, public datasets, we corroborate the results of our analysis.
Article
Full-text available
It is well known that soft computing techniques can be very well deployed for software engineering applications. Among these fuzzy and neural models are widely used to estimate lines of codes, effort, software maintainability, software understandability etc. This paper proposes to carry out a sensitivity analysis of the two models and shows which one is better. This is done with the help of a case study where the two models are used to measure software maintainability.
Article
This paper evaluates a metric suite to predict vulnerable Java classes based on how much the design of an application has changed over time. It refers to this concept as design churn in analogy with code churn. Based on a validation on 10 Android applications, it shows that several design churn metrics are in fact significantly associated with vulnerabilities. When used to build a prediction model, the metrics yield an average precision of 0.71 and an average recall of 0.27.
Conference Paper
In this paper, we present a case were we employ the Hidden Structure method to product feature prioritization at Ericsson. The method extends the more common Design Structure Matrix (DSM) approach that has been used in technology management (e.g. project management and systems engineering) for quite some time in order to model complex systems and processes. The hidden structure method focuses on analyzing a DSM based on coupling and modularity theory, and it has been used in a number of software architecture and software portfolio cases. In previous work by the authors the method was tested on organization transformation at Ericsson, however this is the first time it has been employed in the domain of product feature prioritization. Today, at Ericsson, features are prioritized based on a business case approach where each feature is handled isolated from other features and the main focus is customer or market-based requirements. By employing the hidden structure method we show that features are heavily dependent on each other in a complex network, thus they should not be treated as isolated islands. These dependencies need to be considered when prioritizing features in order to save time and money, as well as increase end customer satisfaction.
Article
Technical Debt is created when design decisions that are expedient in the short term increase the costs of maintaining and adapting this system in future. An important component of technical debt relates to decisions about system architecture. As systems grow and evolve, their architectures can degrade, increasing maintenance costs and reducing developer productivity. This raises the question if and when it might be appropriate to redesign (“refactor”) a system, to reduce what has been called “architectural debt”. Unfortunately, we lack robust data by which to evaluate the relationship between architectural design choices and system maintenance costs, and hence to predict the value that might be released through such refactoring efforts. We address this gap by analyzing the relationship between system architecture and maintenance costs for two software systems of similar size, but with very different structures; one has a “Hierarchical” design, the other has a “Core-Periphery” design. We measure the level of system coupling for the 20,000+ components in each system, and use these measures to predict maintenance efforts, or “defect-related activity.” We show that in both systems, the tightly-coupled Core or Central components cost significantly more to maintain then loosely-coupled Peripheral components. In essence, a small number of components generate a large proportion of system costs. However, we find major differences in the potential benefits available from refactoring these systems, related to their differing designs. Our results generate insight into how architectural debt can be assessed by understanding patterns of coupling among components in a system.
Conference Paper
We test a method that was designed and used previously to reveal the hidden internal architectural structure of software systems. The focus of this paper is to test if it can also uncover new facts about the components and their relationships in an enterprise architecture, i.e., if the method can reveal the hidden external structure between architectural components. Our test uses data from a biopharmaceutical company. In total, we analyzed 407 components and 1,157 dependencies. Results show that the enterprise structure can be classified as a core-periphery architecture with a propagation cost of 23%, core size of 32%, and architecture flow through of 67%. We also found that business components can be classified as control elements, infrastructure components as shared, and software applications as belonging to the core. These findings suggest that the method could be effective in uncovering the hidden structure of an enterprise architecture.
Article
In this paper, we test a Design Structure Matrix (DSM) based method for visualizing and measuring software portfolio architectures, and use our measures to predict the costs of architectural change. Our data is drawn from a biopharmaceutical company, comprising 407 architectural components with 1,157 dependencies between them. We show that the architecture of this system can be classified as a "core-periphery" system, meaning it contains a single large dominant cluster of interconnected components (the "Core") representing 32% of the system. We find that the classification of software applications within this architecture, as being either Core or Peripheral, is a significant predictor of the costs of architectural change. In regression tests, we show that this measure has greater predictive power than prior measures of coupling used in the literature.
Conference Paper
Building secure software is difficult, time-consuming, and expensive. Prediction models that identify vulnerability prone software components can be used to focus security efforts, thus helping to reduce the time and effort required to secure software. Several kinds of vulnerability prediction models have been proposed over the course of the past decade. However, these models were evaluated with differing methodologies and datasets, making it difficult to determine the relative strengths and weaknesses of different modeling techniques. In this paper, we provide a high-quality, public dataset, containing 223 vulnerabilities found in three web applications, to help address this issue. We used this dataset to compare vulnerability prediction models based on text mining with models using software metrics as predictors. We found that text mining models had higher recall than software metrics based models for all three applications.
Conference Paper
EA initiatives are usually spanning the entire enterprise on high level. While, a typical development organization (could be a business unit within a larger enterprise) often has detailed models describing their product, the enterprise architecture on the business unit level is handled in an ad hoc or detached way. However, research shows that there is a tight link between the product architecture and its developing organization. In this paper we have studied an organization within Ericsson, which focuses on the development of large software and hardware products. We have applied the hidden structure method, which is based on the Design Structure Matrix approach, to analyze of organizational transformations. The to-be scenarios are possible alternatives in trying to become more agile and lean. Our analysis shows that one scenario likely increases the complexity of developing the product, while the other two suggestions are both promising to-be scenarios.
Article
This paper reports results of an empirical study that aimed to demonstrate the link between software product design structure and engineers' effort to perform a code modification in the context of a corrective maintenance task. First, this paper reviews the current state of the art in engineering economics of the maintenance phase of software lifecycle. Secondly, a measure of software product complexity suitable to assess maintainability of a software system is developed. This measure is used to analyze the design structure change that happened between two versions of a mature software product. The product selected for this study underwent a significant re-design between two studied versions. Thirdly, an experiment is designed to measure the effort engineers spend designing a code modification associated with a corrective change request. These effort measurements are used to demonstrate the effect of product design complexity on engineers' productivity. It is asserted in the paper that engineer's productivity improvements have a significant economic value and can be used to justify investments into re-design of an existing software product.
Article
Many modern systems are so large that no one truly understands how they work. It is well known in the engineering community that architectural patterns (including hierarchies, modules, and abstraction layers) should be used in design because they play an important role in controlling complexity. These patterns make a system easier to evolve and keep its separate portions within the bounds of human understanding so that distributed teams can operate independently while jointly fashioning a coherent whole. This study set out to measure the link between architectural complexity (the complexity that arises within a system due to a lack or breakdown of hierarchy or modularity) and a variety of costs incurred by a development organization. A study was conducted within a successful software firm. Measures of architectural complexity were taken from eight versions of their product using techniques recently developed by MacCormack, Baldwin, and Rusnak. Significant cost drivers including defect density, developer productivity, and staff turnover were measured as well. The link between cost and complexity was explored using a variety of statistical techniques. Within this research setting, we found that differences in architectural complexity could account for 50% drops in productivity, three-fold increases in defect density, and order-of-magnitude increases in staff turnover. Using the techniques developed in this thesis, it should be possible for firms to estimate the financial cost of their complexity by assigning a monetary value to the decreased productivity, increased defect density, and increased turnover it causes. As a result, it should be possible for firms to more accurately estimate the potential dollar-value of refactoring efforts aimed at improving architecture.
Conference Paper
Vulnerability prediction models (VPM) are believed to hold promise for providing software engineers guidance on where to prioritize precious verification resources to search for vulnerabilities. However, while Microsoft product teams have adopted defect prediction models, they have not adopted vulnerability prediction models (VPMs). The goal of this research is to measure whether vulnerability prediction models built using standard recommendations perform well enough to provide actionable results for engineering resource allocation. We define 'actionable' in terms of the inspection effort required to evaluate model results. We replicated a VPM for two releases of the Windows Operating System, varying model granularity and statistical learners. We reproduced binary-level prediction precision (~0.75) and recall (~0.2). However, binaries often exceed 1 million lines of code, too large to practically inspect, and engineers expressed preference for source file level predictions. Our source file level models yield precision below 0.5 and recall below 0.2. We suggest that VPMs must be refined to achieve actionable performance, possibly through security-specific metrics.
Conference Paper
In an empirical study of 3241 Red Hat packages, we show that software vulnerabilities correlate with dependencies between packages. With formal concept analysis and statistical hypothesis testing, we identify dependencies that decrease the risk of vulnerabilities ("beauties") or increase the risk ("beasts"). Using support vector machines on dependency data, our prediction models successfully and consistently catch about two thirds of vulnerable packages (median recall of 0.65). When our models predict a package as vulnerable, it is correct more than eight times out of ten (median precision of 0.83). Our findings help developers to choose new dependencies wisely and make them aware of risky dependencies.
Article
This paper examines the impact of architectural decisions on the level of defects in a product. We view products as collections of components linked together to work as an integrated whole. Previous work has established modularity (how decoupled a component is from other product components) as a critical determinant of defects, and we confirm its importance. Yet our study also provides empirical evidence for a relation between product quality and cyclicality (the extent to which a component depends on itself via other product components). We find cyclicality to be a determinant of quality that is distinct from, and no less important than, modularity. Extending this main result, we show how the cyclicality–quality relation is affected by the centrality of a component in a cycle and the distribution of a cycle across product modules. These findings, which are based on analysis of open source software development projects, have implications for the study and design of complex systems.
Article
This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.
Article
Software security failures are common and the problem is growing. A vulnerability is a weakness in the software that, when exploited, causes a security failure. It is difficult to detect vulnerabilities until they manifest themselves as security failures in the operational stage of software, because security concerns are often not addressed or known sufficiently early during the software development life cycle. Numerous studies have shown that complexity, coupling, and cohesion (CCC) related structural metrics are important indicators of the quality of software architecture, and software architecture is one of the most important and early design decisions that influences the final quality of the software system. Although these metrics have been successfully employed to indicate software faults in general, there are no systematic guidelines on how to use these metrics to predict vulnerabilities in software. If CCC metrics can be used to indicate vulnerabilities, these metrics could aid in the conception of more secured architecture, leading to more secured design and code and eventually better software. In this paper, we present a framework to automatically predict vulnerabilities based on CCC metrics. To empirically validate the framework and prediction accuracy, we conduct a large empirical study on fifty-two releases of Mozilla Firefox developed over a period of four years. To build vulnerability predictors, we consider four alternative data mining and statistical techniques – C4.5 Decision Tree, Random Forests, Logistic Regression, and Naïve-Bayes – and compare their prediction performances. We are able to correctly predict majority of the vulnerability-prone files in Mozilla Firefox, with tolerable false positive rates. Moreover, the predictors built from the past releases can reliably predict the likelihood of having vulnerabilities in the future releases. The experimental results indicate that structural information from the non-security realm such as complexity, coupling, and cohesion are useful in vulnerability prediction.
Conference Paper
Software complexity is often hypothesized to be the enemy of software security. We performed statistical analysis on nine code complexity metrics from the JavaScript Engine in the Mozilla application framework to investigate if this hypothesis is true. Our initial results show that the nine complexity measures have weak correlation (ρ=0.30 at best) with security problems for Mozilla JavaScript Engine. The study should be replicated on more products with design and code-level metrics. It may be necessary to create new complexity metrics to embody the type of complexity that leads to security problems.
Conference Paper
In software development, resources for quality assurance are limited by time and by cost. In order to allocate resources effectively, managers need to rely on their experience backed by code complexity metrics. But often dependencies exist between various pieces of code over which managers may have little knowledge. These dependencies can be construed as a low level graph of the entire system. In this paper, we propose to use network analysis on these dependency graphs. This allows managers to identify central program units that are more likely to face defects. In our evaluation on Windows Server 2003, we found that the recall for models built from network measures is by 10% points higher than for models built from complexity metrics. In addition, network measures could identify 60% of the binaries that the Windows developers considered as critical-twice as many as identified by complexity metrics.
Conference Paper
How do design decisions impact the quality of the resulting soft- ware? In an empirical study of 52 ECLIPSE plug-ins, we found that the software design as well as past failure history, can be used to build models which accurately predict failure-prone components in new programs. Our prediction only requires usage relationships be- tween components, which are typically defined in the design phase; thus, designers can easily explore and assess design alternatives in terms of predicted quality. In the ECLIPSE study, 90% of the 5% most failure-prone components, as predicted by our model from design data, turned out to actually produce failures later; a random guess would have predicted only 33%.
Conference Paper
Abstract—Many factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to beattacked. Early identification of defects has been a widely investigated topic in software engineering research. Early identification of software vulnerabilities can help mitigate these attacks to a large degree by focusing better security verification efforts in these ,components. Predicting vulnerabilities is complicated by the fact that vulnerabilities are, most often, few in number and introduce significant bias bycreating a sparse dataset in the population. As a result, vulnerability prediction can be thought of us ,preverbally “searching for a needle in a haystack.” In this paper, we present a large-scale empirical study on Windows Vista, where weempirically evaluate the efficacy of classical metrics like complexity, churn, coverage, dependency measures, and organizational structure of the ,company ,to predict vulnerabilities and assess how well these software measures correlate with vulnerabilities.We observed in our experiments that classical software measures predict vulnerabilities with a high precision but low recall values. The actual dependencies, however, predict vulnerabilities with a lower precision but
Article
Security inspection and testing require experts in security who think like an attacker. Security experts need to know code locations on which to focus their testing and inspection efforts. Since vulnerabilities are rare occurrences, locating vulnerable code locations can be a challenging task. We investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations. If so, security experts can use this prediction to prioritize security inspection and testing efforts. The metrics we investigated fall into three categories: complexity, code churn, and developer activity metrics. We performed two empirical case studies on large, widely used open-source projects: the Mozilla Firefox web browser and the Red Hat Enterprise Linux kernel. The results indicate that 24 of the 28 metrics collected are discriminative of vulnerabilities for both projects. The models using all three types of metrics together predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects. Compared to a random selection of files for inspection and testing, these models would have reduced the number of files and the number of lines of code to inspect or test by over 71 and 28 percent, respectively, for both projects.
Article
A variety of academic work argues a relationship exists between the structure of a development organization and the design of the products that this organization produces. Specifically, products are often said to "mirror" the architectures of the organizations from which they come. This dynamic occurs because an organization's problem solving routines and normal patterns of communication tend to constrain the space of designs within which it searches for new solutions. Such a link, if confirmed empirically, would be important, given that product architecture has been shown to be an important predictor of product performance, product variety, process flexibility and industry evolution. We explore this relationship in the software industry by use of a technique called Design Structure Matrices (DSMs), which allows us to visualize the architectures of different software products and to calculate metrics to compare their levels of modularity. Our research takes advantage of a natural experiment in this industry, where products exist that fulfill the same function, but that have been developed using very different organizational modes - specifically, open source versus closed source development. We use DSMs to analyze a sample of matched-pair products - products that perform the same function but that have been developed via these contrasting modes of organization. Our results reveal significant differences in modularity, consistent with a view that larger, more distributed teams tend to develop products with more modular architectures. Furthermore, the differences between systems are substantial - the pairs we examine vary by a factor of eight, in terms of the potential for a design change to propagate to other system components. We conclude by highlighting some implications of this result for both innovating managers, as well as researchers in the field. We also assess how future work in this area might proceed, based upon these first steps in measuring "design."
Conference Paper
Software systems evolve over time due to changes in requirements, optimization of code, fixes for security and reliability bugs etc. Code churn, which measures the changes made to a component over a period of time, quantifies the extent of this change. We present a technique for early prediction of system defect density using a set of relative code churn measures that relate the amount of churn to other variables such as component size and the temporal extent of churn. Using statistical regression models, we show that while absolute measures of code chum are poor predictors of defect density, our set of relative measures of code churn is highly predictive of defect density. A case study performed on Windows Server 2003 indicates the validity of the relative code churn measures as early indicators of system defect density. Furthermore, our code churn metric suite is able to discriminate between fault and not fault-prone binaries with an accuracy of 89.0 percent.
Article
Some software design metrics are evaluated using data from a communications system. The design metrics investigated were based on the information flow metrics proposed by S. Henry and D. Kafura (1981) and the problems they encountered are discussed. The slightly simpler metrics used in this study are described. The ability of the design metrics to identify change-prone, error-prone and complex programs are contrasted with that of simple code metrics. Although one of the design metrics (informational fan-out)/ was able to identify change-prone, fault-prone and complex programs, code metrics (i.e. lines of code and number of branches) were better. In this context `better' means correctly identifying a larger proportion of change-prone, error-prone and/or complex programs, while maintaining a relatively low false identification rate (i.e. incorrectly identifying a program which did not in fact exhibit any undesirable features)
Article
One of the most important problems faced by software developers and users is the prediction of the size of a programming system and its development effort. As an alternative to "size," one might deal with a measure of the "function" that the software is to perform. Albrecht [1] has developed a methodology to estimate the amount of the "function" the software is to perform, in terms of the data it is to use (absorb) and to generate (produce). The "function" is quantified as "function points," essentially, a weighted sum of the numbers of "inputs," "outputs,"master files," and "inquiries" provided to, or generated by, the software. This paper demonstrates the equivalence between Albrecht's external input/output data flow representative of a program (the "function points" metric) and Halstead's [2] "software science" or "software linguistics" model of a program as well as the "soft content" variation of Halstead's model suggested by Gaffney [7].
Article
This paper describes a graph-theoretic complexity measure and illustrates how it can be used to manage and control program complexity. The paper first explains how the graph-theory concepts apply and gives an intuitive explanation of the graph concepts in programming terms. The control graphs of several actual Fortran programs are then presented to illustrate the correlation between intuitive complexity and the graph-theoretic complexity. Several properties of the graph-theoretic complexity are then proved which show, for example, that complexity is independent of physical size (adding or subtracting functional statements leaves complexity unchanged) and complexity depends only on the decision structure of a program.
Article
This paper presents the results of a study in which we empirically investigated the suite of object-oriented (OO) design metrics introduced in (Chidamber and Kemerer, 1994). More specifically, our goal is to assess these metrics as predictors of fault-prone classes and, therefore, determine whether they can be used as early quality indicators. This study is complementary to the work described in (Li and Henry, 1993) where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known OO analysis/design method and the C++ programming language. Based on empirical and quantitative analysis, the advantages and drawbacks of these OO metrics are discussed. Several of Chidamber and Kemerer's OO metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. Also, on our data set, they are better predictors than “traditional” code metrics, which can only be collected at a later phase of the software development processes
Building the agile enterprise: IT architecture, modularity and the cost of IT change
  • A Maccormack
  • R Lagerström
  • D Dreyfus
  • C Baldwin
MacCormack, A., Lagerström, R., Dreyfus, D., Baldwin, C.: Building the agile enterprise: IT architecture, modularity and the cost of IT change. Harvard Business School Working Paper, No. 15-060, (2015) (revised August 2016)