Article

The Cathedral and the Bazaar

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

I anatomize a successful open-source project, fetchmail, that was run as a deliberate test of some theories about software engineering suggested by the history of Linux. I discuss these theories in terms of two fundamentally different development styles, the "cathedral" model, representing most of the commercial world, versus the "bazaar" model of the Linux world. I show that these models derive from opposing assumptions about the nature of the software-debugging task. I then make a sustained argument from the Linux experience for the proposition that "Given enough eyeballs, all bugs are shallow," suggest productive analogies with other self-correcting systems of selfish agents, and conclude with some exploration of the implications of this insight for the future of software.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Projects that manage to grow an active and vibrant community commonly characterize as a bazaar, as proposed in Eric Raymond's well-known The Cathedral and the Bazaar essay (Raymond 1999), with light development and communication processes, fluid management, and open to everybody to contribute. The cathedral, in contrast, is a model that follows a traditional setting commonly found within commercial development organizations (and in some OSS projects), which, contextualized in an OSS project, implies that a few individuals undertake planning and development rather detached from the community (Capiluppi and Michlmayr 2007). ...
... In this study, our overarching research goal is to investigate how the development is organized in public sector OSS projects, and by extension where on this spectrum they may be positioned relative to the bazaar development model (Raymond 1999;Capiluppi and Michlmayr 2007), or if new metaphors may be needed. We conjecture that after more than 20 years of initiatives around OSS in the public sector, development in public sector OSS projects to a large extent are organized in ways that diverge from the commonly-adopted bazaar model. ...
... We conjecture that after more than 20 years of initiatives around OSS in the public sector, development in public sector OSS projects to a large extent are organized in ways that diverge from the commonly-adopted bazaar model. By extension, we assume that the extent to which the onion model applies is fundamentally different for public sector OSS projects (Nakakoji et al. 2002) compared to bazaar OSS projects (Raymond 1999) as exemplified by those originally investigated by Mockus et al. (2002). ...
Article
Full-text available
Context The adoption of Open Source Software (OSS) in Public Sector Organizations (PSOs) is on the rise, driven by benefits such as enhanced interoperability and transparency. However, PSOs encounter challenges stemming from limited technical capabilities and regulatory constraints in public procurement. Objective This study, based on a registered report, explores the organizational aspects of development in public sector OSS projects, i.e., projects initiated, developed, and governed by PSOs. We conjecture that the development diverges significantly from the commonly adopted bazaar model, wherein development is carried out collaboratively within a broader community. Method A purposefully sampled set of six public sector OSS projects was investigated using mixed-methods and compared with previously reported cases of bazaar OSS projects. Results Among the cases, we note that most (80%) of development efforts typically involve a small group of developers (<15) and rely on formalised processes. Developers are commonly procured from national and local service suppliers. Projects are planned top-down by involved PSOs with funding and contributions to development enabled through centralized or decentralized sponsorship. Projects with a centralized sponsorship have one or a few main PSOs funding the major part of the development. Decentralized sponsorship implies multiple PSOs being mutually dependent on each other to pool the necessary resources for the development. All OSS are reported as being of high quality despite limited size and contributions from their communities. Conclusions Findings suggest that public sector OSS projects deviate from the typical bazaar model, highlighting the need for tailored approaches to address challenges and solutions specific to their context.
... The first prominent account of the open-source movement was made by Eric Raymond in 1999. In his work [36], Raymond performed a comparative analysis between traditional top-down work organization (i.e. 'the cathedral') and the open-source bottom-up way (i.e. ...
... The success of the open-source movement has also been documented with an effort to characterize and quantify the 'bazaar' described by Raymond [36]. One aspect is modularity: investigating the modular network of the Debian Linux dependency ecosystem, Maillart et al. [44] found that the statistical mechanisms of in-degree use of software packages follow the multiplicative stochastic process of proportional growth (also known as preferential attachment), with continuous arrival and disappearance of packages. ...
... By studying bug bounty programs, a special form of peer production leaning close to crowd-sourcing for cybersecurity and for which it is possible to measure productivity against incentives, it was shown that individual productivity decreases as people concentrate on searching bugs in one software, even if they have been successful at finding some in the past in that same software [45]. The authors found that indeed 'given enough eyeballs, all bugs are shallow' as proposed by Raymond in his seminal paper [36], but with some subtle limitations in the extent of individual contributions, possibly including cognitive load [46]. It was advocated that contributors should rotate to generate a diversity of perspectives for performance [47], which happens to bring support and economic justification for peer production [37]. ...
Article
Full-text available
This article explores the role of hackathons for good in building a community of software and hardware developers focused on addressing global sustainable development goal (SDG) challenges. We theorize this movement as computational diplomacy: a decentralized, participatory process for digital governance that leverages collective intelligence to tackle major global issues. Analysing Devpost and GitHub data reveals that 30% of hackathons since 2010 have addressed SDG topics, employing diverse technologies to create innovative solutions. Hackathons serve as crucial kairos moments, sparking innovation bursts that drive both immediate project outcomes and long-term production. We propose that these events harness the neurobiological basis of human cooperation and empathy, fostering a collective sense of purpose and reducing interpersonal prejudice. This bottom–up approach to digital governance integrates software development, human collective intelligence and collective action, creating a dynamic model for transformative change. By leveraging kairos moments, computational diplomacy promotes a more inclusive and effective model for digital multilateral governance of the future. This article is part of the theme issue ‘Co-creating the future: participatory cities and digital governance’.
... Any individual opensource component is developed by a maintainer team. With their approval, outsiders may be permitted to contribute code [13]. Beyond this direct incorporation of external contributions, each such project often depends on others as components, recursively. ...
... A key activity in open-source projects is expanding the actor pool by introducing new maintainers and contributors into projects [13]. As a software package gains popularity, interest from potential contributors increases [16]- [18], often resulting in onboarding new maintainers and merging change requests from new contributors. ...
Preprint
Many critical information technology and cyber-physical systems rely on a supply chain of open-source software projects. OSS project maintainers often integrate contributions from external actors. While maintainers can assess the correctness of a change request, assessing a change request's cybersecurity implications is challenging. To help maintainers make this decision, we propose that the open-source ecosystem should incorporate Actor Reputation Metrics (ARMS). This capability would enable OSS maintainers to assess a prospective contributor's cybersecurity reputation. To support the future instantiation of ARMS, we identify seven generic security signals from industry standards; map concrete metrics from prior work and available security tools, describe study designs to refine and assess the utility of ARMS, and finally weigh its pros and cons.
... Firm involvement may discourage volunteers from contributing to a project (Birkinbine, 2020) and crowd-out other contributors, reducing project sustainability (Zhang et al., 2022b) . Indeed most research on OSS frames it as a decentralized peer production community of contributors whose intrinsic motivations tend a creative flame (Benkler, 2002;Bonaccorsi and Rossi, 2004;Lakhani and Wolf, 2005;Raymond, 1999) . In other words, the collective and networked efforts of individuals (Dabbish et al., 2012) and their diverse motivations (Gerosa et al., 2021) , ranging from signaling (El-Komboz and Goldbeck, 2024;Lerner and Tirole, 2002;Riehle, 2015) to "scratching an itch" (Raymond, 1999) , make OSS successful. ...
... Indeed most research on OSS frames it as a decentralized peer production community of contributors whose intrinsic motivations tend a creative flame (Benkler, 2002;Bonaccorsi and Rossi, 2004;Lakhani and Wolf, 2005;Raymond, 1999) . In other words, the collective and networked efforts of individuals (Dabbish et al., 2012) and their diverse motivations (Gerosa et al., 2021) , ranging from signaling (El-Komboz and Goldbeck, 2024;Lerner and Tirole, 2002;Riehle, 2015) to "scratching an itch" (Raymond, 1999) , make OSS successful. A recent study of the GitHub Sponsors program, which enables OSS developers to collect funds from the crowd (Conti et al., 2023) , nicely highlights the complexity of motivations and incentives in OSS: once crowdfunded, developers tend to focus on existing projects and are less likely to start new ones. ...
Preprint
Full-text available
Firms are intensifying their involvement with open source software (OSS), going beyond contributing to individual projects and releasing their own core technologies as OSS. These technologies, from web frameworks to programming languages, are the foundations of large and growing ecosystems. Yet we know little about how these anchor sponsors shape the behavior of OSS contributors. We examine Mozilla Corporation's role as incubator and anchor sponsor in the Rust programming language ecosystem, leveraging data on nearly 30,000 developers and 40,000 OSS projects from 2015 to 2022. When Mozilla abruptly exited Rust in August 2020, event-study models estimate a negative impact on ecosystem activity: a 9\% immediate drop in weekly commits and a 0.6 percentage point decline in trend. We observe an asymmetry in the shock's effects: former Mozilla developers and close collaborators continued contributing relatively quickly, whereas more distant developers showed reduced or ceased activity even six months later. An agent-based model of an OSS ecosystem with an anchor sponsor replicates these patterns. We also find a marked slowdown in new developers and projects entering Rust post-shock. Our results suggest that Mozilla served as a critical signal of Rust's quality and stability. Once withdrawn, newcomers and less-embedded developers were the most discouraged, raising concerns about long-term ecosystem sustainability.
... User-developers are encouraged to voluntarily improve the design of the code itself in exchange for the requirement that their adaptations must be re-shared with the same license [2] . Software released with a FOSS license thus establishes a gift economy [3] , which has been very well established to create rapid innovation [4,5] . Free and open-source innovation is based on widely used FOSS licenses [6] , which have consistently [7] proven to be massively successful [8] . ...
... The real secret of FOSS innovation, however, is what Eric Raymond called Linus' Law, which state "many eyes make all bugs shallow", that is based on the idea that a diverse set of technically-qualified perspectives improves the quality of a software product [4] . Linus' Law can be applied to any technological innovation arena, but it does demand technically-qualified eyes. ...
Article
Full-text available
Objective: Open source technological development has proven that innovation scales with the population of well-educated participants. Thus, to increase national innovation the number of citizens that acquire university educations should be maximized. An approach to this challenge is to offer free university education with open-source virtual classes. To quantify potential savings the objective of this study investigates the viability of offering free first year of university education in the U.S. Methods: This study calculates total savings per year after accounting for the investments on a first-year education nationally. Results: The results found that development costs for 16 courses needed to make the transition to sophomore year possible for most of American university students and to ensure they are maintained indefinitely with an endowment is 160million.ToproctortheexamstwiceayearablockgranttoeachhighschoolintheU.S.wouldbe160 million. To proctor the exams twice a year a block grant to each high school in the U.S. would be 6,400 and the total proctoring cost would be about 171million.Togetherconservativelytoserveallhighschoolgraduatingseniorsthetotalcostis171 million. Together conservatively to serve all high school graduating seniors the total cost is 331 million, which is far less than 1% of the U.S. Department of Education annual budget. Savings, again conservatively only counting current American university students on tuition alone would be 46.3billion/yearandasthestudentscouldliveathome,theywouldsaveanadditional46.3 billion/year and as the students could live at home, they would save an additional 17.4 billion/year. The savings minus the costs to provide openly accessible free freshman year of education for the entire U.S. public is $63.4 billion annually. Conclusion: This approach would increase the university-educated population and increase national innovation rates, but future work is needed.
... This principle largely stems from open-source development, where developers can work on freely chosen and decentralized tasks they may fulfill however they want, based on their interests and competencies (von Krogh et al., 2012). A knock-on effect of greater autonomy is that it delegitimizes centralized forms of control (Raymond, 1999;Turco, 2016), which leads to the emergence of tensions between structure and fluidity, or centralized authority and decentralization (Heracleous et al., 2017;O'Mahony & Ferraro, 2007). Members' willingness to participate in an open project is, however, an overlooked factor in open innovation or open strategy scholarship (Smith et al., 2018). ...
... Full autonomy implies that actors hold the ability to self-manage their degree of contribution and commitment to the open initiative and to join or leave the organization whenever they want. It thus requires for the organization to heed the expectations of its participants, as disregarding them may impede the inclusive and transparent qualities of the organizing (Hautz et al., 2017;Reischauer & Ringel, 2023;Ringel, 2019), which are associated with improved organizational performance (Chesbrough, 2003;Janssen et al., 2012;Raymond, 1999). ...
Article
Full-text available
Existing research highlights the imperative nature of addressing inherent tensions when implementing organizational openness, necessitating actors to navigate explicit or implicit emergent closure mechanisms. However, certain literature warns against the absolute conception of openness prevalent in academic and practical spheres. This article thus explores what occurs in organizations that eschew closure mechanisms in favor of openness. I draw on the ethnographic inquiry of Managers du 21e siècle, a non-profit organization championing openness as a pivotal organization tenet, whose existence has come under threat amidst escalating crises. The metaphor of organizational necrosis serves to highlight that an extremist pursuit of open principles can hamper action by fostering depersonalization, to align with extremist open values, and triggering disempowerment, through strategies that deflect conflicts of value. My first contribution emphasizes the detrimental repercussions of an extremist openness paradigm on organization sustainability. The second explores how medical metaphors can assist in grasping organizational decline.
... La actividad de producción industrial de software está actualmente atravesada medularmente por el Software Libre y de Código Abierto o Free/Libre Open Source Software (FLOSS) (Raymond, 1999, Stallman, 2004, Lerner y Schankerman, 2013. Muy difícilmente exista algún software actualmente en uso y en el mercado que no cuente con algún componente producido bajo código abierto, con una herramienta de esta naturaleza, o íntegramente de manera abierta. ...
... • Abierta: De forma similar a como Raymond (2000) evoca el modelo del bazar al analizar las comunidades de software libre (enfatizando su apertura, colaboración y descentralización), la innovación ciudadana opera bajo lógicas equivalentes al código abierto. Documenta los procesos y resultados con el objetivo de compartirlos, de modo que cualquier persona, desde otros contextos, pueda acceder a ellos, replicarlos o incluso sumarse a lo que se investiga y produce. ...
... The publish early publish often (e.g., release early, release often (Raymond, 2000)) in the open hardware context is not as much a problem in academia where an emphasis on publication means things are more complete when published. Limitations to the open hardware approach in academic scientific hardware is that open hardware metrics have not been ratified in the tenure process. ...
Article
Full-text available
The development of scientific hardware has followed a trend since the industrial revolution, which focused on centralized manufacturing of proprietary products to benefit from economies of scale. Unfortunately, scientific hardware, which favors custom, highly sophisticated, and often small specialized markets are not a particularly good fit for this model. The results of using the proprietary model of scientific hardware development are a series of challenges including the following: (1) slow innovation and limited novelty, (2) black box syndrome, (3) reduced technology transfer, (4) vendor lock in, (5) no lateral scaling, and (6) high economic costs. This work evaluates the potential for the free and open-source hardware development model to overcome these challenges and finds the open hardware approach contributes to (1) faster innovation, (2) increased transparency, (3) rapid and widespread technology transfer, 4) enhanced competition, (5) peer production and scalability, and (6) lower economic costs. Although there are some limitations to the open hardware approach, their impact is small compared to the benefits of avoiding the standard proprietary model. Funding the development of open hardware for scientific research is a clear way to enhance impact while garnering a high return on investment for funders.
... On the other hand, open source projects often attain higher software quality when continuously reviewed by independent developers. This scrutiny not only leads to early detection of bugs and security issues but also fosters a culture of proactive improvement [Ra99]. However, even if the source code is secure, this is not sufficient: Most users and companies using FOSS software do not download the source code and compile it themselves. ...
Preprint
Full-text available
Supply chain attacks have emerged as a prominent cybersecurity threat in recent years. Reproducible and bootstrappable builds have the potential to reduce such attacks significantly. In combination with independent, exhaustive and periodic source code audits, these measures can effectively eradicate compromises in the building process. In this paper we introduce both concepts, we analyze the achievements over the last ten years and explain the remaining challenges. We contribute to the reproducible builds effort by setting up a rebuilder and verifier instance to test the reproducibility of Arch Linux packages. Using the results from this instance, we uncover an unnoticed and security-relevant packaging issue affecting 16 packages related to Certbot, the recommended software to install TLS certificates from Let's Encrypt, making them unreproducible. Additionally, we find the root cause of unreproduciblity in the source code of fwupd, a critical software used to update device firmware on Linux devices, and submit an upstream patch to fix it.
... Another advantage of having open-source implementations is software security. According to Linus's law, "given enough eyeballs, all bugs are shallow" (Raymond 1999). That is, when all the source code for a project is made open to professionals worldwide, it is more likely that security checks could discover eventual flaws. ...
Article
Full-text available
Causality and eXplainable Artificial Intelligence (XAI) have developed as separate fields in computer science, even though the underlying concepts of causation and explanation share common ancient roots. This is further enforced by the lack of review works jointly covering these two fields. In this paper, we investigate the literature to try to understand how and to what extent causality and XAI are intertwined. More precisely, we seek to uncover what kinds of relationships exist between the two concepts and how one can benefit from them, for instance, in building trust in AI systems. As a result, three main perspectives are identified. In the first one, the lack of causality is seen as one of the major limitations of current AI and XAI approaches, and the “optimal” form of explanations is investigated. The second is a pragmatic perspective and considers XAI as a tool to foster scientific exploration for causal inquiry, via the identification of pursue‐worthy experimental manipulations. Finally, the third perspective supports the idea that causality is propaedeutic to XAI in three possible manners: exploiting concepts borrowed from causality to support or improve XAI, utilizing counterfactuals for explainability, and considering accessing a causal model as explaining itself. To complement our analysis, we also provide relevant software solutions used to automate causal tasks. We believe our work provides a unified view of the two fields of causality and XAI by highlighting potential domain bridges and uncovering possible limitations.
... With the release of the latest version, OpenTOPAS v4.0.0, the TOPAS collaboration has adopted a fully open-source software (OSS) model, with the entirety of the codebase freely available on the collaboration's GitHub page (https://github.com/OpenTOPAS). Motivated in part by Linus's law, in which 'given enough eyeballs, all bugs are shallow' (Raymond 1999), the collaboration encourages contributions from the broader user/scientific community. Consequently, this collaborative development aspect, brought about due to the shift to OSS, amplifies the importance of rigorous regression testing to ensure the integrity and reliability of the code as new features are introduced. ...
Article
Full-text available
Objective. To develop a regression testing system for TOPAS-nBio: a wrapper of Geant4-DNA, and the radiobiological extension of TOPAS—a Monte Carlo code for the simulation of radiation transport. This regression testing system will be made publicly available on the TOPAS-nBio GitHub page. Approach. A set of seven regression tests were chosen to evaluate the suite of capabilities of TOPAS-nBio from both a physical and chemical point of view. Three different versions of the code were compared: TOPAS-nBio-v2.0 (the previous version), TOPAS-nBio-v3.0 (the current public release), and TOPAS-nBio-v4.0 (the current developer version, planned for future release). The main aspects compared for each test were the differences in execution times, variations from other versions of TOPAS-nBio, and agreement with measurements/in silico data. Main results. Execution times of nBio-v3.0 for all physics tests were faster than those of nBio-v2.0 due to the use of a new Geant4 version. Mean point-to-point differences between TOPAS-nBio versions across all tests fell largely within 5%. The exceptions were the radiolytic yields (G values) of H2 and H2O2, which differed moderately (16% and 10% respectively) when going from nBio-v3.0 to nBio-v4.0. In all cases a good agreement with other experimental/simulated data was obtained. Significance. From a developer point of view, this regression testing system is essential as it allows a more rigorous reporting of the consequences of new version releases on quantities such as the LET or G values of chemical species. Furthermore, it enables us to test ‘pushes’ made to the codebase by collaborators and contributors. From an end-user point of view, users of the software are now able to easily evaluate how changes in the source code, made for their specific application, would affect the results of known quantities.
... Ejemplos como el surgimiento de las criptomonedas, inicialmente marginales, muestran cómo el desorden puede transformar radicalmente la economía global. Asimismo, los movimientos hacker demuestran cómo la exploración de vulnerabilidades en los sistemas puede derivar en mejoras de seguridad y avances tecnológicos (Raymond, 2001). ...
Preprint
Full-text available
Resumen: La teoría de sistemas, concebida por Ludwig von Bertalanffi (Bertalanffy, 1968), produce un marco teórico que ayuda a comprender la complejidad de los sistemas interconectados en diversas disciplinas. En la era digital, esta teoría entra en tensión con el concepto de libertad, un ideal que oscila entre la autonomía individual y la regulación sistémica. En este ensayo exploramos la capacidad de las redes digitales, la inteligencia artificial y las plataformas tecnológicas para ampliar o restringir la libertad, dependiendo de su diseño y del modelo de gobernanza aplicado. Analizamos las nociones de libertad negativa y positiva propuestas por Isaiah Berlin, así como la perspectiva de Manuel Castells sobre la dialéctica entre control y resistencia en la sociedad de la información (Berlin, 2002) (Castells, 1996). Además, introducimos el pensamiento complejo de Edgar Morin como herramienta para comprender la interacción entre orden y desorden en los sistemas digitales (Morin, 1996). Finalmente, abordamos los desafíos éticos y prácticos que surgen al intentar equilibrar la estabilidad sistémica con la autonomía individual, concluyendo que la libertad en la era digitales es no estática ni absoluta, sino un proceso dinámico que requiere una constante negociación entre restricciones y posibilidades emergentes. Abstract: Systems theory, conceived by Ludwig von Bertalanffy (Bertalanffy, 1968), establishes a key framework for understanding the complexity of interconnected systems in various disciplines. In the digital age, this theory comes into tension with the concept of freedom, an ideal that oscillates between individual autonomy and systemic regulation. In this essay, we explore the ability of digital networks, artificial intelligence, and technology platforms to expand or restrict freedom, depending on their design and governance. We analyze the notions of negative and positive freedom proposed by Isaiah Berlin and Manuel Castells' perspective on the dialectic between control and resistance in the information society (Berlin, 2002) (Castells, 1996). In addition, we introduce Edgar Morin's complex thinking as a tool to understand the interaction between order and disorder in digital systems (Morin, 1996). Finally, we address the ethical and practical challenges that arise when balancing stability systemic with individual autonomy, concluding that freedom in the digital age is neither static nor absolute, but a dynamic process that requires a constant negotiation between constraints and emerging possibilities. Resumen: La teoría de sistemas, concebida por Ludwig von Bertalanffi (Bertalanffy, 1968), produce un marco teórico que ayuda a comprender la complejidad de los sistemas interconectados en diversas disciplinas. En la era digital, esta teoría entra en tensión con el concepto de libertad, un ideal que oscila entre la autonomía individual y la regulación sistémica. En este ensayo exploramos la capacidad de las redes digitales, la inteligencia artificial y las plataformas tecnológicas para ampliar o restringir la libertad, dependiendo de su diseño y del modelo de gobernanza aplicado. Analizamos las nociones de libertad negativa y positiva propuestas por Isaiah Berlin, así como la perspectiva de Manuel Castells sobre la dialéctica entre control y resistencia en la sociedad de la información (Berlin, 2002) (Castells, 1996). Además, introducimos el pensamiento complejo de Edgar Morin como herramienta para comprender la interacción entre orden y desorden en los sistemas digitales (Morin, 1996). Finalmente, abordamos los desafíos éticos y prácticos que surgen al intentar equilibrar la estabilidad sistémica con la autonomía individual, concluyendo que la libertad en la era digitales es no estática ni absoluta, sino un proceso dinámico que requiere una constante negociación entre restricciones y posibilidades emergentes. Abstract: Systems theory, conceived by Ludwig von Bertalanffy (Bertalanffy, 1968), establishes a key framework for understanding the complexity of interconnected systems in various disciplines. In the digital age, this theory comes into tension with the concept of freedom, an ideal that oscillates between individual autonomy and systemic regulation. In this essay, we explore the ability of digital networks, artificial intelligence, and technology platforms to expand or restrict freedom, depending on their design and governance. We analyze the notions of negative and positive freedom proposed by Isaiah Berlin and Manuel Castells' perspective on the dialectic between control and resistance in the information society (Berlin, 2002) (Castells, 1996). In addition, we introduce Edgar Morin's complex thinking as a tool to understand the interaction between order and disorder in digital systems (Morin, 1996). Finally, we address the ethical and practical challenges that arise when balancing stability systemic with individual autonomy, concluding that freedom in the digital age is neither static nor absolute, but a dynamic process that requires a constant negotiation between constraints and emerging possibilities.
... Cohen's research reveals that when code authors review their own work, they only manage to identify approximately 50% of the defects that an external reviewer would have found. Even in pre-MCR days, Linus Torvalds told that given enough eyeballs, all bugs are shallow (Raymond 1999). Furthermore, peer code review is crucial despite its significant time costs. ...
Article
Full-text available
Context In collaborative software development, the peer code review process proves beneficial only when the reviewers provide useful comments. Objective This paper investigates the usefulness of Code Review Comments (CR comments) through textual feature-based and featureless approaches. Method We select three available datasets from both open-source and commercial projects. Additionally, we introduce new features from software and non-software domains. Moreover, we experiment with the presence of jargon, voice, and codes in CR Comments and classify the usefulness of CR Comments through featurization, bag-of-words, and transfer learning techniques. Results Our models outperform the baseline by achieving state-of-the-art performance. Furthermore, the result demonstrates that the commercial gigantic LLM, GPT-4o, and non-commercial naive featureless approach, Bag-of-Word with TF-IDF, are more effective for predicting the usefulness of CR Comments. Conclusion The significant improvement in predicting usefulness solely from CR Comments escalates research on this task. Our analyses portray the similarities and differences of domains, projects, datasets, models, and features for predicting the usefulness of CR Comments.
... The majority of these novel methodologies lack both code availability and incorporation into open-source libraries. In addition, SSLearn is presented as a collaborative platform that adheres to the principles of free software [30]. It encourages the incorporation of new methods into its library. ...
... Outside of the workplace, the open-source software movement [17] provides a platform for developers to DIY tools. Open-source software was initially an opportunity for developers to "scratch a personal itch" by creating their own projects [85]. An early study identified that key motivators for contributing to open-source included the intellectual stimulation of writing code, and that the code was needed either for work or non-work purposes [56]. ...
Preprint
Full-text available
Existing commercial and in-house software development tools are often inaccessible to Blind and Low Vision Software Professionals (BLVSPs), hindering their participation and career growth at work. Building on existing research on Do-It-Yourself (DIY) Assistive Technologies and customized tools made by programmers, we shed light on the currently unexplored intersection of how DIY tools built and used by BLVSPs support accessible software development. Through semi-structured interviews with 30 BLVSPs, we found that such tools serve many different purposes and are driven by motivations such as desiring to maintain a professional image and a sense of dignity at work. These tools had significant impacts on workplace accessibility and revealed a need for a more centralized community for sharing tools, tips, and tricks. Based on our findings, we introduce the "Double Hacker Dilemma" and highlight a need for developing more effective peer and organizational platforms that support DIY tool sharing.
... Governance von Open-Source-Software im öffentlichen Sektor: Make, Buy or Contribute? sowie der charakteristische Bazaar [12,13] als Steuerungsmodus scheint öffentlichen Einrichtungen jedoch schwer zu fallen, obwohl bereits zuvor Entscheidungen für die offene Lizenzierung und Veröffentlichung positiv ausgefallen sind [14]. Vor diesem Phänomen beantwortet dieser Beitrag die Frage, warum sich öffentliche Einrichtungen für den Bazaar als Steuerungsform in OSS-Projekten (nicht) entscheiden. ...
Conference Paper
Full-text available
Die Beschaffenheit von OSS-Projekten im öffentlichen Sektor variiert stark: Während einige Projekte den häufig proklamierten offenen und kollaborativen Bazaar leben, bleiben andere hinter einem Schaufenster verschlossen. Dieser Beitrag untersucht, warum sich Behörden für oder gegen die Öffnung ihrer Softwareprojekte entscheiden und gibt Einblick in ausgewählte OSS-Projekte der öffentlichen Verwaltung in Deutschland. Die Unsicherheit über ausbleibende Projektbeteiligung wird als entscheidender Faktor identifiziert, weshalb Open Source häufig nur als Transparenzinitiative und nicht als Methode der kollaborativen Innovationsstrategie eingesetzt wird.
... Higher transparency models are also more credible because they are more understandable (Craig et al., 2002). As with all open-source projects, allowing external parties to review the code can also help identify bugs in the code and accelerate development compared to a closed process (Raymond, 1999). As noted above, the next generation model is designed to be modular and flexible. ...
Preprint
Full-text available
Given the rapid pace of energy system development, the time has come to reimagine the U.S. Government's capability to model the long-term evolution of the domestic and global energy system. As a primary custodian of these capabilities, the U.S. Energy Information Administration (EIA) is embarking on the development of a long-term, modular, flexible, transparent, and robust modeling framework that can capture the key dynamics driving the energy system and economy under a wide range of future scenarios. This new capability will leverage the current state of the art in modeling to produce critical insight for researchers, decision makers, and the public. We describe the evolving demands on energy-economy modeling, the capacity and limitations of existing models, and the key features we see as necessary for addressing these demands in our new framework, which is under active development.
... Cohen's research reveals that when code authors review their own work, they only manage to identify approximately 50% of the defects that an external reviewer would have found. Even in pre-MCR days, Linus Torvalds told that given enough eyeballs, all bugs are shallow (Raymond, 1999). Furthermore, peer code review is crucial despite its significant time costs. ...
Preprint
Context: In collaborative software development, the peer code review process proves beneficial only when the reviewers provide useful comments. Objective: This paper investigates the usefulness of Code Review Comments (CR comments) through textual feature-based and featureless approaches. Method: We select three available datasets from both open-source and commercial projects. Additionally, we introduce new features from software and non-software domains. Moreover, we experiment with the presence of jargon, voice, and codes in CR comments and classify the usefulness of CR comments through featurization, bag-of-words, and transfer learning techniques. Results: Our models outperform the baseline by achieving state-of-the-art performance. Furthermore, the result demonstrates that the commercial gigantic LLM, GPT-4o, or non-commercial naive featureless approach, Bag-of-Word with TF-IDF, is more effective for predicting the usefulness of CR comments. Conclusion: The significant improvement in predicting usefulness solely from CR comments escalates research on this task. Our analyses portray the similarities and differences of domains, projects, datasets, models, and features for predicting the usefulness of CR comments.
... This latter model is referred to as the "the cathedral", with a rigid development structure that may or may not meet user expectations. This contrasts to the "the bazaar" model of open source, where the user needs drive the development, often in a haphazard fashion [1]). The majority of open source contributors were volunteers, whose motivation varied from intrinsic reasons (e.g., altruism or community identification) or extrinsic (e.g., career prospects) [2]. ...
Preprint
Full-text available
Open source software is becoming crucial in the design and testing of quantum algorithms. Many of the tools are backed by major commercial vendors with the goal to make it easier to develop quantum software: this mirrors how well-funded open machine learning frameworks enabled the development of complex models and their execution on equally complex hardware. We review a wide range of open source software for quantum computing, covering all stages of the quantum toolchain from quantum hardware interfaces through quantum compilers to implementations of quantum algorithms, as well as all quantum computing paradigms, including quantum annealing, and discrete and continuous-variable gate-model quantum computing. The evaluation of each project covers characteristics such as documentation, licence, the choice of programming language, compliance with norms of software engineering, and the culture of the project. We find that while the diversity of projects is mesmerizing, only a few attract external developers and even many commercially backed frameworks have shortcomings in software engineering. Based on these observations, we highlight the best practices that could foster a more active community around quantum computing software that welcomes newcomers to the field, but also ensures high-quality, well-documented code.
... This browser quickly rose to prominence, becoming one of the dominant web browsers alongside Microsoft Internet Explorer. In 1998, prior to the acquisition of Netscape by AOL, Netscape Communications announced its decision to release the source code of its browser, inspired by Eric S. Raymond's influential work "The Cathedral and the Bazaar," published eight months earlier (Raymond, 2000). ...
... The Linux kernel is a project with over three decades of history that maintains a robust development ecosystem. Its characteristics have inspired various software development models -the most remembered was described by [Raymond 1999] in his essay "The Cathedral and The Bazaar." Raymond discussed his observations on the Linux kernel development and the lessons from applying some "Bazaar" practices to the fetchmail, his project. ...
Conference Paper
Software development has evolved over decades, transitioning from traditional models such as the waterfall approach and the unified process to more flexible methodologies like agile methods and collaborative development strategies of Free/Libre/Open Source Software (FLOSS) projects. Alongside this trend, the global distribution of software development work has increased. This phenomenon is particularly evident in the development of FLOSS projects, where contributors from various regions worldwide collaborate asynchronously on projects. In this context, the organization of interactions among developers can significantly influence a project success or failure. An example is the Linux kernel community, which has been actively discussing the models and workload of project maintainers – a topic that has received limited attention in scientific literature. This study investigated the new maintenance methods used in the Linux kernel project. With over 30 years of development, the Linux kernel has become a benchmark for FLOSS development. We discuss how the maintainers’ workload is addressed in academic literature and by practitioners in the Linux kernel community. To achieve this, we conducted a multivocal literature review to examine the evolution of maintenance models over the years.
... Notably, from the description, it can be seen that in our framework, humans (users, communities, societies) are always the first consideration, underlying all the components and processes. In other words, Bitcoin's P2P "Bazaar" [38] can accommodate diverse voices and ideas, demonstrating the infinite power of the community. This fundamentally differs from the Ethereum model, which cores around smart contracts, with humans merely being appendages. ...
Preprint
Full-text available
Recently, the blockchain industry has drifted from its original vision of a peer-to-peer electronic cash system, as outlined in Bitcoin's whitepaper. Innovation has stagnated, and speculative activities dominate, with Ethereum's model at the core of these issues. Ethereum has led to the rise of centralization, recreating the intermediaries that blockchains were meant to eliminate. This article critically examines Ethereum's missteps, analyzes its pseudo-decentralization across participation, ownership, and distribution, and contrasts it with Bitcoin's architecture. We propose the "Common Lightning Initiative'" as a roadmap to a true P2P value network and discuss concepts like BTCFi, the P2P economy, and Web5, envisioning Bitcoin as the backbone of a future that integrates the best of Web2 and Web3. Rediscovering Bitcoin's roots and leveraging technologies like the Lightning Network can reclaim the P2P vision and pave the way for a Web5 future.
... The open-source movement has played a crucial role in AI democratization [10]. Platforms like TensorFlow [11], PyTorch [12], and scikit-learn [13] have made powerful AI tools available to anyone with an internet connection, fostering innovation and collaboration globally. ...
Article
Full-text available
The democratization of artificial intelligence (AI) involves extending access to AI technologies beyond specialized technical experts to a broader spectrum of users and organizations. This paper provides an overview of AI’s historical context and evolution, emphasizing the concept of AI democratization. Current trends shaping AI democratization are analyzed, highlighting key challenges and opportunities. The roles of pivotal stakeholders, including technology firms, educational entities, and governmental bodies, are examined in facilitating widespread AI adoption. A comprehensive framework elucidates the components, drivers, challenges, and strategies crucial to AI democratization. This framework is subsequently applied in the context of scenario analyses, offering insights into potential outcomes and implications. The paper concludes with recommendations for future research directions and strategic actions to foster responsible and inclusive AI development globally.
Conference Paper
Full-text available
In the rapidly evolving Web3 world, non-fungible token (NFT) communities are reshaping the formation, distribution, and activation of social capital in ways distinct from traditional models. However, despite their growing impact on societal prosperity, a comprehensive understanding of social capital dynamics within Web3 NFT communities remains limited. This study explores the Mfers community, a key example within Web3 NFT ecosystems. By analyzing social media and blockchain data and using a Delphi method-based human-large language model (LLM) collaboration, we uncovered unique social capital patterns across six dimensions. Our findings highlight a compelling blend of decentralization, inclusion, trust, and empowerment but also raise critical questions about wealth inequality, content quality, and ethical challenges. Based on the findings, we discussed the uniqueness of social capital in Web3 NFT communities, the tension between technical and power decentralization, and the multidimensional nature of societal prosperity. We also suggested directions for future research on decentralized online communities in the CSCW field. This study provides a systematic perspective on social capital in Web3 NFT communities and introduces an innovative human-LLM collaborative analysis, offering insights into the design and governance of benign decentralized online communities.
Article
Full-text available
Zusammenfassung Open Source Software (OSS) hat sich als integraler Bestandteil moderner Softwareentwicklung etabliert und gewinnt auch im Startup-Kontext zunehmend an Bedeutung. Diese Studie untersucht die OSS-Aktivitäten deutscher Unicorns anhand einer quantitativen Inhaltsanalyse ihrer öffentlichen Git-Repositories. Ziel ist es, Zusammenhänge zwischen den OSS-Aktivitäten und dem Unternehmenserfolg zu identifizieren. Die Ergebnisse zeigen, dass nahezu alle untersuchten Unternehmen öffentliche Git-Repositories betreiben. Es zeigt sich aber auch eine asymmetrische Verteilung der OSS-Aktivitäten, wobei einige wenige Unternehmen sehr aktiv sind, während die Mehrheit eine vergleichsweise geringe Aktivität aufweist. Ein Open-Source-Activity-Index (OSAI) ermöglicht einen standardisierten Vergleich der Unternehmen hinsichtlich ihrer OSS-Aktivitäten. Die Daten deuten darauf hin, dass Open Source zwar ein wichtiges strategisches Instrument im deutschen Unicorn-Ökosystem ist, der Erfolg der Unternehmen aber von einer Vielzahl weiterer Faktoren abhängt und nicht in erster Linie durch die Intensität der OSS-Aktivitäten bestimmt wird. Die Studie leistet einen Beitrag zur Forschung im Bereich des digitalen Unternehmertums und liefert praktische Implikationen für Startup-Gründer, Investoren und politische Entscheidungsträger hinsichtlich der strategischen Nutzung von Open Source zur Innovationsförderung.
Article
Full-text available
Under what conditions are user-generated digital content platforms responsive to pressures from users, businesses, and states? I propose that digital platforms show different levels of responsiveness to users, businesses, and states over time. Early in a platform’s life, the platform is highly sensitive to the demands of users, who have an opportunity to directly shape the institutional characteristics of the platform through the threat of user revolt. The unique power of its users stems from the network logic that underpins the value of the platform. As a platform grows and the size and centrality of its network increase, it becomes more sensitive to pressures by businesses (through boycotts) and the state (through regulation). At the same time, the power of users lessens as collective action problems become more severe and exit threats become less credible. The threat of user revolts has a temporal significance: Unless users alter the institutional architecture of the platform and lock in pro-user institutional characteristics early, the threat of user revolts becomes less consequential as the platform grows. Comparative case studies of Facebook, Wikipedia, Digg, and Reddit provide support for the theory.
Article
Today, the operating system Linux is widely used in diverse environments, as its kernel can be configured flexibly. In many configurable systems, managing such variability can be facilitated in all development phases with product-line analyses. These analyses often require knowledge about the system's features and their dependencies, which are documented in a feature model. Despite their potential, product-line analyses are rarely applied to the Linux kernel in practice, as its feature model still challenges scalability and accuracy of analyses. Unfortunately, these challenges also severely limit our knowledge about two fundamental metrics of the kernel's configurability, namely its number of features and configurations. We identify four key limitations in the literature related to the scalability, accuracy, and influence factors of these metrics, and, by extension, other product-line analyses: (1) Analysis results for the Linux kernel are not comparable, because relevant information is not reported; (2) there is no consensus on how to define features in Linux, which leads to flawed analysis results; (3) only few versions of the Linux kernel have ever been analyzed, none of which are recent; and (4) the kernel is perceived as complex, although we lack empirical evidence that supports this claim. In this paper, we address these limitations with a comprehensive, empirical study of the Linux kernel's configurability, which spans its feature model's entire history from 2002 to 2024. We address the above limitations as follows: (1) We characterize parameters that are relevant when reporting analysis results; (2) we propose and evaluate a novel definition of features in Linux as a standardization effort; (3) we contribute torte , a tool that analyzes arbitrary versions of the Linux kernel's feature model; and (4) we investigate the current and possible future configurability of the kernel on more than 3,000 feature-model versions. Based on our results, we highlight eleven major insights into the Linux kernel's configurability and make seven actionable recommendations for researchers and practitioners.
Article
Changing software is essential to add needed functionality and to fix problems, but changes may introduce defects that lead to outages. This motivates one of the oldest software quality control techniques: a temporary prevention of non-critical changes to the codebase — code freeze. Despite its widespread use in practice, research literature is scant. Historically, code freezes were used as a way to improve software quality by preventing changes during periods before software releases, but code freezes significantly slow down development. To address this shortcoming we develop and evaluate a family of code un-freeze (permitting changes) strategies tailored to different occasions and products at Meta. They are designed to un-freeze the maximum amount of code without compromising quality. The three primary dimensions to un-freeze involve a) the exact timing of (and the reasoning behind it) the code freezes, b) the parts of the organization or the codebase where the codebase freeze is applied to, and c) the method of screening of the code diffs during the code freeze with the aim to allow low risk diffs and prevent only the most risky diffs. To operationalize the drivers of outages, we consider the entire network of interdependencies among different parts of the source code, the engineers that modify the code, code complexity, and the coordination dependencies and authors’ expertise. Since the code freeze is a balancing act between reducing outages and allowing software development to proceed unimpeded, the performance of the various approaches to code un-freeze is evaluated based on the fraction of flagged/gated changes to measure overhead and the fraction of all outage-causing changes contained within the set of flagged set of changes to measure the ability of the code un-freeze to delay (or prevent) outages. We found that taking into account the risk posed by modifying individual files and the properties of the change we could un-freeze two and 2.5 times more changes correspondingly. The change level model is used by Meta in production. For example, during the winter 2023 code freeze, we see that only 16% of changes are gated. Although 42% more changes landed (were integrated into the codebase) compared to the prior year, there was a 52% decrease in outages. This reduction meant less impact on users and less strain on engineers during the holiday period. The risk model has been enormously effective at allowing low risk changes to proceed while gating high risk changes and reducing outages.
Chapter
As sensors, algorithms, and autonomous systems permeate the battlefield, modern warfare experiences datafication, eroding the establishment's power to delineate the boundaries of conflict, thereby welcoming new stakeholders. The advent of generative AI marks a transition from the Military-Industrial Complex to a Pentagon-Silicon Valley Alliance, infusing a new ethos of rapid iteration into traditional combat that facilitates the acceleration of warfare. This paradigm shift, driven by data and AI, prioritizes maximal impact and instant reprisal, fostering a state of perpetual war that portends an unsettling future for humanity.
Chapter
Once a software development project is identified, whether it is a “new idea” for a startup or an application for an organization or user groups, planning must be done on how to execute the project.
Article
Full-text available
Whether a committed reader or not, it is clear that the English language has evolved throughout the years into many different nuances. What is considered acceptable in terms of written form has been adjusted to match cultural, regional, and on a macroscopic scale, temporal changes. This paper explores these changes through innovative analysis of semantics, lexicology, syntax, and context, (to be referred to as Linguistic Patterns), analyzing data and deriving conclusions. It is apparent that there are noticeable differences in the corpora of a 16th-century author to a contemporary one. The most accessible examples of such changes are spotted in written works, whether poetry, books, or documents, as they provide valuable insight into the analysis of change. The methodology incorporates pre-existing and specifically trained models specialized in data analysis to understand and compile these changes. This study showcases the evolution of the English language interpreted by Machine Learning (ML) models and methods such as Natural Language Understanding (NLU). By feeding data, specifically written works, into such models with the foresight of expecting a wide range of differing results and analyzing the changes through the scope of time, this study showcases the change of Linguistic Patterns. The decision between model preference and proficiency is made by comparing the quality of data outputs, and systematically evaluating different model archetypes, such as Generative Pre-training Transformers (GPT) or Bidirectional Encoder Representations from Transformers (BERT). The evaluation of changes in linguistic patterns is quantifiable through statistical measures, embeddings and syntactic parsing scores. Through these steps, this study derives that the English language has experienced a robust alteration in its core, from the elimination of now-considered archaic lexicology, differences in structural and contextual cues as well as notable evolution in semantics. These findings can be utilized in historical linguistic analysis and education, as well as improving Natural Language Understanding.
Article
Full-text available
Users are an important source of innovation. Scholars suggest that established firms can gain product‐related insights by working with user communities and studies documenting various ways of working with users, as well as managers' interest in doing so. However, the link between working with user communities for product development purposes and its value for firms is not established. Coupling the use of event study methodology and regression analysis, I examine stock market reactions to corporate announcements stating that the firm is contributing software code to the community. I find that when firms state that generating insights from users regarding new and improved features and functionality is a motivation for contributing code, the market's reaction to the announcement is greater than for announcements that do not state this goal. Additional analysis provides evidence supporting the hypothesis that firms can and do benefit by working with user communities and achieve increased R&D efficiency, which leads to greater firm value.
Article
Full-text available
Globally distributed software development has been a mainstream paradigm in developing modern software systems. We have witnessed a fast-growing population of software developers from areas where English is not a native language in the last several decades. Given that English is still the de facto working language in most global software engineering teams, we need to gain more knowledge about the experiences of developers who are non-native English speakers. We conducted an empirical study to fill this research gap. In this study, we interviewed 27 Chinese developers in commercial software development and open source global software development teams and applied Bourdieu’s capital-field-habitus framework in an abductive data analysis process. Our study reveals four types of capital (language, social, symbolic, and economic) involved in their experiences and examines the interrelations among them. We found that non-native speakers’ insufficient language capital played an essential role in prohibiting them from accessing and accumulating other capital, thus reproducing the sustained and systematic disadvantaged positions of non-native English speakers in GSD teams. We further discussed the theoretical and practical implications of the study.
Article
Full-text available
This study presents a bibliometric analysis of the literature on sustainable software development and its contribution to green technology. By analyzing data from Google Scholar, this research identifies key trends, influential authors, and significant publications that have shaped the field from 1971 to 2024. The analysis reveals the increasing integration of sustainability principles within software engineering practices, emphasizing the critical role of green technologies in reducing the environmental impact of software systems. The study highlights the evolution of research focus towards energy-efficient coding, green data centers, and sustainable software frameworks, reflecting the growing importance of technology in achieving global sustainability goals. The findings also underscore the value of interdisciplinary collaboration in advancing the field, as evidenced by the diverse connections between technological, environmental, and economic themes. This research provides a comprehensive overview of the current state of sustainable software development and offers insights into future research directions that can further enhance the contribution of software engineering to environmental sustainability.
Article
Full-text available
社會作為一個有機體的系統思維,起源於交通與傳播基礎 設施在空間上的擴張。本文依此思想的物質基礎,探討網際網 路作為一個溝通系統的基礎設施意涵。本文首先探討系統概念 與近代交通及通訊網絡的關聯性,並依此主張以基礎設施作為 探索網際網路系統意涵的視角。取徑人類學者 Geoffrey Bowker 與 Susan Leigh Star 的討論,本文將線纜視為彰顯網路社會意涵 的「邊界物」,並探究這項連結的基礎設施在社群以及地理空 間整合上的關鍵角色。就社群連結而言,線纜透過劃一的通訊 協定,使得網路在研發階段整合了包括戰略、科學與商業企圖 的社群,因此體現其作為社會-技術制度的特性。就地理空間 的連結而言,海底電纜的鋪設與斷線涉及的多重地緣政治的意 涵,包括網路線纜如何遂行全球監視系統,以及線纜越洋連結 帶動的高科技走廊或科技園區如何具體化數位資本主義的向外 連結、對內產生掠奪與排除的資本積累邏輯。
ResearchGate has not been able to resolve any references for this publication.