Fig 6 - uploaded by James D. Herbsleb
Content may be subject to copyright.
Source publication
This paper presents two case studies of the development and maintenance of major OSS projects, i.e., the Apache server and Mozilla. We address key questions about their development processes, and about the software that is the result of those processes. We first studied the Apache project, and based on our results, framed a number of hypotheses tha...
Context in source publication
Context 1
... large numbers of people participate somewhat equally in these activities, or did a small number of people do most of the work? Figure 6 shows cumulative distribution contributions (as for Apache in Figure 1). The developer participation does not appear to vary as much as in the Apache project. ...
Similar publications
In shared memory multiprocessor architectures, threads can be used to implement parallelism. POSIX threads (pthreads) is a low-level bare-bones programming interface for working with OS threads. Therefore, we have extremely fine-grained control over thread management (create/join/etc), mutexes, and so on. On the other hand, openMP, as a shared-memo...
In this paper, we propose an architecture based on three layers (infrastructure, server hardware and operating system, and business applications) for a unified communications solutions exclusively sustained on open source technologies and open standards. Additionally, we discuss the importance of open standards and open source in a fully integrated...
Citations
... Prior research has extensively studied how developers contribute to open source software (OSS) projects, revealing that a small group of core developers is typically responsible for the majority of contributions while a larger group makes occasional contributions [12], [13]. Core developers have traditionally been identified through commit-based metrics, specifically those who produce 80% of changes [14]. ...
... Our work addresses several gaps in the existing literature. While previous studies focused on traditional application-level software projects [12], [13], we examine a domain-specific programming language repository, finding an even higher contribution concentration (86% from top 1%). We extend Joblin et al.'s [15] network approach by incorporating temporal engagement metrics, and enhance Nakakoji et al.'s [16] model by demonstrating how contributors can rise to prominence through specialized paths. ...
... This decline poses a significant threat to the long-term development and sustainability of OSS projects. Reduced contributor engagement can hinder maintenance and progress, making it imperative for project maintainers to develop strategies that encourage sustained participation [9][10][11]. This challenge is further amplified in larger and more complex projects, where the loss of key contributors can disrupt the overall collaboration network. ...
Open-source software (OSS) projects rely on collaborative contributions, yet sustaining long-term engagement remains a challenge. This study examines how contributor activity frequency and network centrality impact sustained contributions in 672 GitHub repositories. Using k-core decomposition and model checking, we find that contributors with higher k-core values manage more pull requests. Additionally, they show greater resilience in continuing to contribute after pull request rejections, a behavior rarely seen in non-core contributors. Moreover, the likelihood of long-term engagement increases significantly when contributors’ pull requests are processed swiftly, with delayed or rejected pull requests resulting in higher disengagement, particularly among peripheral contributors. We also observe that as project networks grow more complex over time, core contributors become essential in maintaining project sustainability. These findings offer new insights into fostering core contributor retention and underscore the need for efficient governance and PR management practices in OSS projects.
... Although there is not a definite agreement on the impacts of the practices in OSS projects, it is widely accepted that some unique features lead to high-quality software. Mockus, Fielding and Herbsleb (2002) conducted an empirical study to disclose general development patterns in OSS. Two projects, Apache and Mozilla, were examined, and some development activities were claimed to be successful accordingly. ...
Software quality assurance has been a heated topic for several decades, but relatively few analyses were performed on open source software (OSS). As OSS has become very popular in our daily life, many researchers have been keen on the quality practices in this area. Although quality management presents distinct patterns compared with those in closed-source software development, some widely used OSS products have been implemented. Therefore, quality assurance of OSS projects has attracted increased research focuses.In this paper, a survey is conducted to reveal the general quality practices in open source communities. Exploratory analysis has been carried out to disclose those quality related activities. The results are compared with those from closed-source environments and the distinguished features of the quality assurance in OSS projects have been confirmed. Moreover, this study suggests potential directions for OSS developers to follow.
... Essa abordagem considera que a responsabilidade de um desenvolvedoré proporcionalà quantidade de commits que ele realizou em arquivos pertencentes a um determinado módulo. De acordo com essa heurísticaé importante analisar os arquivos que foram alterados, uma vez que umúnico commit pode afetar vários módulos [17]. ...
Face ao crescimento de sistemas de internet das coisas (IoT), têm-se observado um aumento na demanda por Sistemas Operacionais de Tempo Real (RTOS). Proporcionalmente, cresceu o número de projetos focados em desenvolver esse tipo de sistema. Entre esses projetos, pode-se encontrar aqueles produzidos por comunidades de software livre (FLOSS). Dado que a produção por essas comunidades envolvem muitos desenvolvedores, uma prática comum abrange a modularização e a propriedade coletiva de código. Diante disso, este estudo objetivou analisar o código-fonte e entender a propriedade coletiva de código quanto à modularidade e a relação dela com os desenvolvedores. É utilizada uma amostra de nove projetos do GitHub com mais de mil estrelas. Foram apresentadas ferramentas para executar cálculos e analisar se o desenvolvedor pode ou não ser considerado proprietário de código. O algoritmo é baseado na Matriz Simples e no método Degree of Authorship (DOA). Já os notebooks transformaram os resultados em informação com auxílio de gráficos. A partir dessas análises foi possível determinar o número de contribuidores e de responsáveis em cada projeto. Verificou-se que em média o número de responsáveis por módulo variou entre 15,83% e 46,56%; a quantidade de módulos por responsável, a qual variou entre dois e vinte e sete, demonstrou que um responsável pode ser proprietário de vários módulos; e por fim a quantidade de responsáveis por módulo, em que foi possível concluir que em média um módulo pode possuir um ou dois desenvolvedores como proprietários.
... It contrasts with the hypothesis of customary computer programming in more than one way [1]. In today's scenario, OSS products are accessible without any threat and their applications can be used on most operating systems or devices, e.g., mobile, handheld devices, PC, etc. [2]. Therefore, it is required to study and investigate the possible defect distribution in OSS [3]. ...
... In the market, a lot of companies provide open-source software with good characteristics and functionality, e.g., Launchpad, GitHub, Mozilla Browser, Source-Forge, Code-Plex, Google Code, etc. These companies provide not only open-source software for users but also a platform for developers and many projects are being developed on these websites [2,4]. Any software system, no matter how secure or well-written its code is, has some potential for faults. ...
Software reliability models are used to predict the trustworthiness of software systems over time, based on the feedback obtained from testing efforts. From existing literature, a few models have been recognized for open-source software. In this paper, a software reliability growth model for open-source software has been discussed that incorporates generalized extended inverse Weibull distribution as a testing effort function. The sine–cosine algorithm has been used to estimate unknown parameters. The proposed work has been justified with numerical examples using real software failure data. For the comparative study, statistical methods, i.e., mean square error, R-squared, Theil statistic (TS), and graphical representation of the results have been used. The outcomes of the proposed model show better goodness of fit and predict significantly compared to the existing models. The suggested study may be extended by considering uncertainty factor in the future.
... In this section, we provide an overview of the distribution of maintenance effort between maintainers and contributors [4,10,28]. ...
Background: Open Source Software (OSS) fuels our global digital infrastructure but is commonly maintained by small groups of people whose time and labor represent a depletable resource. For the OSS projects to stay sustainable, i.e., viable and maintained over time without interruption or weakening, maintenance labor requires an underlying infrastructure to be supported and secured. Aims: Using the construct of human infrastructure, our study aims to investigate how maintenance labor can be supported and secured to enable the creation and maintenance of sustainable OSS projects, viewed from the maintainers' perspective. Method: In our exploration, we interviewed ten maintainers from nine well-adopted OSS projects. We coded the data in two steps using investigator-triangulation. Results: We constructed a framework of infrastructure design that provide insight for OSS projects in the design of their human infrastructure. The framework specifically highlight the importance of human factors, e.g., securing a work-life balance and proactively managing social pressure, toxicity, and diversity. We also note both differences and overlaps in how the infrastructure needs to support and secure maintenance labor from maintainers and the wider OSS community, respectively. Funding is specifically highlighted as an important enabler for both types of resources. Conclusions: The study contributes to the qualitative understanding of the importance, sensitivity, and risk for depletion of the maintenance labor required to build and maintain healthy OSS projects. Human infrastructure is pivotal in ensuring that maintenance labor is sustainable, and by extension the OSS projects on which we all depend.
... Several studies, for example, Jensen and Scacchi (2007); Crowston and Howison (2005), have shown that among hundreds of contributors, only a small group of exceptional coders (elite or core developers) contribute the majority of code and manage project growth. In Mockus et al. (2002) analysis of the Apache community projects, for example, they discovered that the top 15 contributors (out of 388 overall) contributed more than 83% of change requests and 66% of issue reports. Understanding elite developers is crucial for analyzing the community's health and sustainability (Wang et al. 2020). ...
... Sustaining a high level of activity can be achieved through the continuous addition of features and the resolution of bugs. Mockus et al. (2002) posited that a substantial developer group is necessary for addressing bugs in FOSS projects beyond the core team. These findings have been corroborated by Champion and Hill (2021). ...
Context
Free and Open Source Software (FOSS) communities’ ability to stay viable and productive over time is pivotal for society as they maintain the building blocks that digital infrastructure, products, and services depend on. Sustainability may, however, be characterized from multiple aspects, and less is known how these aspects interplay and impact community outputs, and software quality specifically.
Objective
This study, therefore, aims to empirically explore how the different aspects of FOSS sustainability impact software quality.
Method
16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects sourced from the Apache Software Foundation Incubator program. The impact of a decline in the sustainability metrics was analyzed against eight software quality metrics using Bayesian data analysis, which incorporates probability distributions to represent the regression coefficients and intercepts.
Results
Findings suggest that selected sustainability metrics do not significantly affect defect density or code coverage. However, a positive impact of community age was observed on specific code quality metrics, such as risk complexity, number of very large files, and code duplication percentage. Interestingly, findings show that even when communities are experiencing sustainability, certain code quality metrics are negatively impacted.
Conclusion
Findings imply that code quality practices are not consistently linked to sustainability, and defect management and prevention may be prioritized over the former. Results suggest that growth, resulting in a more complex and large codebase, combined with a probable lack of understanding of code quality standards, may explain the degradation in certain aspects of code quality.
... Information Systems (IS) researchers tended to focus on OSS as a product, its qualities [80,104], and how organizations could leverage open source communities to build products [4,73]. Meanwhile, SE researchers initially focused on the development process, asking questions surrounding concepts that were of particular interest to software engineers [72]. More recently, SE researchers have focused on a wide range of other aspects of OSS, such as peer review practices [87] and corporate participation in OSS projects [126]. ...
This chapter seeks to support software engineering (SE) researchers and educators in teaching the importance of theory as well as the theorizing process. Drawing on insights from other fields, the chapter presents 12 intermediate products of theorizing and what they mean in an SE context. These intermediate products serve different roles: some are theory products to frame research studies, some are theory generators, and others are components of theory. Whereas the SE domain doesn't have many theories of its own, these intermediate products of theorizing can be found widely. The chapter aims to help readers to recognize these intermediate products, their role, and how they can help in the theorizing process within SE research. To illustrate their utility, the chapter then applies the set of intermediate theorizing products to the software architecture research field. The chapter ends with a suggested structure for a 12-week course on theorizing in SE which can be readily adapted by educators.
... Overall, our analysis reveals that various aspects of development activity on the HF Hub-e.g., interactions in model, dataset, and space repositories; collaboration in model repositories; and model adoption in spaces-exhibit right-skewed, Pareto distributions, which is a well-documented pattern in OSS development [24][25][26][27][28]. While the open model development life-cycle involves unique practices which differ from OSS development [22], such as model training and fine-tuning, the observed similarities in the overall patterns of activity suggests that future research on open competition by lowering entry barriers and widening access to state-of-the-art AI [44]. ...
... On the other hand, open models can pose risks of harm by both well-intended and malicious actors, including the creation of deepfakes [48][49][50], disinformation [51,52], and malware [53,54]. A study by 25 experts concluded that open models have five distinctive properties that present both benefits and risks: broader access, greater customisability, local adaptation and inference ability, the inability to rescind model access, and the inability to monitor or moderate model usage [18]. ...
... Activity distributions follow power law patterns, with a small fraction of repositories accounting for most interactions (e.g., < 1 % for 80% of likes, 10% for 80% discussions, 30% for 80% commits, < 1 % for 80% downloads). Similarly, the collaboration networks exhibit right-skewed centrality distributions, indicating that influence is concentrated amongst few developers, congruent with prior observations that OSS development patterns generally follow Pareto distributions [24][25][26][27][28]. Influence also flows across the HF Hub, with likes per model having strong correlations with their usage in spaces ( = 0.66 , p < 0.001). ...
Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing these isolates from the network, collaboration is characterised by high reciprocity regardless of developers’ network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, the findings show that various types of activity across the HF Hub are characterised by Pareto distributions, congruent with open source software development patterns on platforms like GitHub. We conclude with recommendations for researchers, and practitioners to advance our understanding of open AI development.
... Native software is more expensive and time-consuming to develop, but it offers a higher level of customization and control that is tailored to the specific needs of a platform or environment. On the other hand, open-source frameworks enable cost reduction and faster deployment through the reuse of existing components and the contribution of a large community [17], [18], and [19]. Maintenance and security in open-source frameworks rely heavily on the robustness of the framework and community engagement, as emphasized by L. Zhao et al. [20] and J. P. Johnson [21]. ...
Over the last decade, health information systems (HIS) have undergone significant changes, particularly in embracing flexible frameworks for ongoing development. This evolution underscores the necessity for information technology (IT) infrastructures that rapidly align with clinical processes. The paper investigates the transition of hospital information systems to comprehensive strategies, examining the benefits and challenges involved. It also examines the increasing demands on health data management systems (HDMS) for patient care and biomedical research. The focus is on how the integration of the Internet of Things (IoT) and open-source enterprise resource planning (ERP) systems, such as Odoo, impacts health information management. The study evaluates the effectiveness and implications of combining these technologies. It provides examples of these integrated systems in action, particularly in resource-limited settings, and evaluates their potential to improve care. The document provides a comprehensive review of the current status and evolution of HIS and HDMS, highlighting the importance of integrating IoT and other cutting-edge technologies in healthcare. This is a crucial aspect for developing countries, where these advancements can significantly improve healthcare outcomes.