Preprint

Tracking Down Software Cluster Bombs: A Health State Analysis of the Free Open-source Software (FOSS) Ecosystem Open-source Software (FOSS) Ecosystem

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

More than once, computer history has shown that critical software vulnerabilities can have a large and media-effective impact on affected components. In the Free and open-source software (FOSS) ecosystem, most software is distributed via package repositories. Nowadays, keeping track of critical dependencies in a software system becomes crucial for maintaining good security practices. Especially due to new legal requirements such as the European Cyber Resilience Act, there is the need that software projects keep a transparent track record with Software Bill of Materials (SBOM) and maintain a good health state. This study summarizes the current state of available FOSS package repositories and addresses the challenge of finding problematic spots in a software ecosystem. These parts are analyzed in more detail, quantifying the health state of the FOSS ecosystem. The results show that there are well maintained projects in the FOSS ecosystem but there are also projects with a high impact that are vulnerable to supply chain attacks. This study proposes a method for a heath state analysis and shows missing elements, e.g. interfaces, for future research.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

Chapter
Full-text available
With the increasing sophistication and sheer number of cyberattacks, more and more companies come to the conclusion that they have to strengthen their cybersecurity posture. At the same time, well-educated information technology (IT) security personnel are scarce. Cybersecurity as a service (CSaaS) is one possible solution to tackle this problem by outsourcing security functions to managed security service providers (MSSP). This chapter gives on overview of common CSaaS functions and their providers. Moreover, it provides guidance especially for small- and medium-sized businesses, for asking the appropriate questions when it comes to the selection of a specific MSSP.
Conference Paper
Full-text available
Built on top of UDP, the relatively new QUIC protocol serves as the baseline for modern web protocol stacks. Equipped with a rich feature set, the protocol is defined by a 151 pages strong IETF standard complemented by several additional documents. Enabling fast updates and feature iteration, most QUIC implementations are implemented as user space libraries leading to a large and fragmented ecosystem. This work addresses the research question, “if a complex standard with a large number of different implementations leads to an insecure ecosystem?”. The relevant RFC documents were studied and “Security Consideration” items describing conceptional problems were extracted. During the research, 13 popular production ready QUIC implementations were compared by evaluating 10 security considerations from RFC9000. While related studies mostly focused on the functional part of QUIC, this study confirms that available QUIC implementations are not yet mature enough from a security point of view.
Article
Full-text available
Due to their increasing complexity, today's software systems are frequently built by leveraging reusable code in the form of libraries and packages. Software ecosystems (e.g., npm) are the primary enablers of this code reuse, providing developers with a platform to share their own and use others' code. These ecosystems evolve rapidly: developers add new packages every day to solve new problems or provide alternative solutions, causing obsolete packages to decline in their importance to the community. Developers should avoid depending on packages in decline, as these packages are reused less over time and may become less frequently maintained. However, current popularity metrics (e.g., Stars, and Downloads) are not fit to provide this information to developers because their semantics do not aptly capture shifts in the community interest. In this paper, we propose a scalable approach that uses the package's centrality in the ecosystem to identify packages in decline. We evaluate our approach with the npm ecosystem and show that the trends of centrality over time can correctly distinguish packages in decline with an ROC-AUC of 0.9. The approach can capture 87% of the packages in decline, on average 18 months before the trend is shown in currently used package popularity metrics. We implement this approach in a tool that can be used to augment the npms metrics and help developers avoid packages in decline when reusing packages from npm.
Article
Full-text available
With the growing availability and prevalence of internet-capable devices, the complexity of networks and associated connection management increases. Depending on the use case, different approaches in handling connectivity have emerged over the years, tackling diverse challenges in each distinct area. Exposing centralized web-services facilitates reachability; distributing information in a peer-to-peer fashion offers availability; and segregating virtual private sub-networks promotes confidentiality. A common challenge herein lies in connection establishment, particularly in discovering, and securely connecting to peers. However, unifying different aspects, including the usability, scalability, and security of this process in a single framework, remains a challenge. In this paper, we present the Stream Exchange Protocol (SEP) collection, which provides a set of building blocks for secure, lightweight, and decentralized connection establishment. These building blocks use unique identities that enable both the identification and authentication of single communication partners. By utilizing federated directories as decentralized databases, peers are able to reliably share authentic data, such as current network locations and available endpoints. Overall, this collection of building blocks is universally applicable, easy to use, and protected by state-of-the-art security mechanisms by design. We demonstrate the capabilities and versatility of the SEP collection by providing three tools that utilize our building blocks: a decentralized file sharing application, a point-to-point network tunnel using the SEP trust model, and an application that utilizes our decentralized discovery mechanism for authentic and asynchronous data distribution.
Conference Paper
Full-text available
Recently, Google's Open Source team presented the criticality score [1] a metric to assess "influence and importance" 1 of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community's doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google's current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google's CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).
Article
Full-text available
Context: GitHub hosts an impressive number of high-quality OSS projects. However, selecting the "right tool for the job" is a challenging task, because we do not have precise information about those high-quality projects. Objective: In this paper, we propose a data-driven approach to measure the level of maintenance activity of GitHub projects. Our goal is to alert users about the risks of using unmaintained projects and possibly motivate other developers to assume the maintenance of such projects. Method: We train machine learning models to define a metric to express the level of maintenance activity of GitHub projects. Next, we analyze the historical evolution of 2,927 active projects in the time frame of one year. Results: From 2,927 active projects, 16% become unmaintained in the interval of one year. We also found that Objective-C projects tend to have lower maintenance activity than projects implemented in other languages. Finally, software tools--such as compilers and editors--have the highest maintenance activity over time. Conclusions: A metric about the level of maintenance activity of GitHub projects can help developers to select open source projects.
Article
Full-text available
Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecosystems are extremely useful for their respective communities of developers, they face challenges related to their scale, complexity, and rate of evolution. Typical problems are backward incompatible package updates, and the risk of (transitively) depending on packages that have become obsolete or inactive. This manuscript uses the libraries.io dataset to carry out a quantitative empirical analysis of the similarities and differences between the evolution of package dependency networks for seven packaging ecosystems of varying sizes and ages: Cargo for Rust, CPAN for Perl, CRAN for R, npm for JavaScript, NuGet for the .NET platform, Packagist for PHP, and RubyGems for Ruby. We propose novel metrics to capture the growth, changeability, resuability and fragility of these dependency networks, and use these metrics to analyse and compare their evolution. We observe that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates. The majority of packages depend on other packages, but only a small proportion of packages accounts for most of the reverse dependencies. We observe a high proportion of fragile packages due to a high and increasing number of transitive dependencies. These findings are instrumental for assessing the quality of a package dependency network, and improving it through dependency management tools and imposed policies.
Conference Paper
Full-text available
NetworkX is a Python language package for exploration and analysis of networks and network algorithms. The core package provides data structures for representing many types of networks, or graphs, including simple graphs, directed graphs, and graphs with parallel edges and self loops. The nodes in NetworkX graphs can be any (hashable) Python object and edges can contain arbitrary data; this flexibility mades NetworkX ideal for representing networks found in many different scientific fields. In addition to the basic data structures many graph algorithms are implemented for calculating network properties and structure measures: shortest paths, betweenness centrality, clustering, and degree distribution and many more. NetworkX can read and write various graph formats for eash exchange with existing data, and provides generators for many classic graphs and popular graph models, such as the Erdoes-Renyi, Small World, and Barabasi-Albert models, are included. The ease-of-use and flexibility of the Python programming language together with connection to the SciPy tools make NetworkX a powerful tool for scientific computations. We discuss some of our recent work studying synchronization of coupled oscillators to demonstrate how NetworkX enables research in the field of computational networks.
Chapter
Real networks are heterogeneous structures, with edges unevenly distributed among nodes, presenting community structure, motifs, transitivity, rich clubs, and other kinds of topological patterns. Consequently, the roles played by nodes in a network can differ greatly. For example, some nodes may be connectors between parts of the network, others may be central or peripheral, etc. The objective of this chapter is to describe how we can find the most important nodes in networks. The idea is to define a centrality measure for each node in the network, sort the nodes according to their centralities, and fix our attention to the first ranked nodes, which can be considered as the most relevant ones with respect to this centrality measure.
Article
The next industrial revolution is said to be paved by the use of novel Internet of Things (IoT) technology. One important aspect of the modern IoT infrastructures is decentralised communication, often called Peer-to-Peer (P2P). In the context of industrial communication, P2P contributes to resilience and improved stability for industrial components. Current industrial facilities, however, still rely on centralised networking schemes which are considered to be mandatory to comply with security standards. In order to succeed, introduced industrial P2P technology must maintain the current level of protection and also consider possible new threats. The presented work starts with a short analysis of well-established industrial communication infrastructures and how these could benefit from decentralised structures. Subsequently, previously undefined Information Technology (IT) security requirements are derived from the new cloud based decentralised industrial automation model architecture presented in this paper. To meet those requirements, state-of-the-art communication schemes and their open source implementations are presented and assessed for their usability in the context of industrial IoT. Finally, derived building blocks for industrial IoT P2P security are presented which are qualified to comply with the stated industrial IoT security requirements.
Conference Paper
Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software pack- ages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis of how the dependency graphs of three large packaging ecosystems (npm, CRAN and RubyGems) evolve over time. We study how the existing package dependencies impact the resilience of the three ecosystems over time and to which extent these ecosystems suffer from issues related to package dependency updates. We analyse specific solutions that each ecosystem has put into place and argue that none of these solutions is perfect, motivating the need for better tools to deal with package dependency update problems.
Conference Paper
Maintaining package-based Linux operating system distributions and addressing their evolution have always been a challenge. Since all packages that form a distribution of Linux Operating system interact with each other, it leads to complicated dependency relationships of each distribution. Current package managers only provide a local view of the dependency relationship. It is still a lack for distribution releasers to obtain a global view of the package dependency relationships. In this paper we present our research that target to bridge this gap: we propose a graph method to establish entire distribution package dependency relationship and analyze the complicated relationship graph with relevant properties. We implement our method on ubuntu kylin 14.04. The experiments illustrate that our graph approach is efficient for understanding of the whole distribution and could assist a high-quality maintenance and effective evolution.
Article
Existing package and system configuration management tools suffer from an imperative model , where system administration actions such as upgrading packages or changes to system configuration files are stateful: they destructively update the state of the system. This leads to many problems, such as the inability to roll back changes easily, to run multiple versions of a package side-by-side, to reproduce a configuration deterministically on another machine, or to reliably upgrade a system. In this paper we show that we can overcome these problems by moving to a purely functional system configuration model . This means that all static parts of a system (such as software packages, configuration files and system startup scripts) are built by pure functions and are immutable, stored in a way analogously to a heap in a purely function language. We have implemented this model in NixOS , a non-trivial Linux distribution that uses the Nix package manager to build the entire system configuration from a purely functional specification.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
For the purpose of evaluating status in a manner free from the deficiencies of popularity contest procedures, this paper presents a new method of computation which takes into accountwho chooses as well ashow many choose. It is necessary to introduce, in this connection, the concept of attenuation in influence transmitted through intermediaries.
Article
The intuitive background for measures of structural centrality in social networks is reviewed and existing measures are evaluated in terms of their consistency with intuitions and their interpretability.Three distinct intuitive conceptions of centrality are uncovered and existing measures are refined to embody these conceptions. Three measures are developed for each concept, one absolute and one relative measure of the centrality of positions in a network, and one reflecting the degree of centralization of the entire network. The implications of these measures for the experimental study of small groups is examined.
The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446
  • E Rescorla
E. Rescorla, The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446 (Aug. 2018). doi:10.17487/RFC8446. URL https://www.rfc-editor.org/info/rfc8446
OpenChain Specification, Standard ISO/IEC 5230:2020, International Organization for Standardization
  • Iso Central
  • Secretary
ISO Central Secretary, OpenChain Specification, Standard ISO/IEC 5230:2020, International Organization for Standardization, Geneva, CH (2020). URL https://www.iso.org/standard/81039.html
OpenChain security assurance specification, Standard ISO/IEC 18974:2023, International Organization for Standardization
  • Iso Central
  • Secretary
ISO Central Secretary, OpenChain security assurance specification, Standard ISO/IEC 18974:2023, International Organization for Standardization, Geneva, CH (2023). URL https://www.iso.org/standard/86450.html
SPDX® Specification V2.2.1, Standard ISO/IEC 5962:2021, International Organization for Standardization
  • Iso Central
  • Secretary
ISO Central Secretary, SPDX® Specification V2.2.1, Standard ISO/IEC 5962:2021, International Organization for Standardization, Geneva, CH (2021). URL https://www.iso.org/standard/81870.html
  • E Landau
E. Landau, Zur relativen Wertbemessung der Turnierresultate, Deutsches Wochenschach 11 (366-369) (1895) 3.
  • R W Shirey
R. W. Shirey, Internet Security Glossary, Version 2, RFC 4949 (Aug. 2007). doi:10.17487/RFC4949. URL https://www.rfc-editor.org/info/rfc4949
Information security, cybersecurity and privacy protection -Guidance on managing information security risks
  • Iso Central
  • Secretary
ISO Central Secretary, Information security, cybersecurity and privacy protection -Guidance on managing information security risks, Standard ISO/IEC 27005:2022, International Organization for Standardization, Geneva, CH (2022). URL https://www.iso.org/standard/80585.html