Preprint
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Context: Over the last decades, open-source software has pervaded the software industry and has become one of the key pillars in software engineering. The incomparable growth of open source reflected that pervasion: Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially. Objective: In this study, we explore the long-term growth of open source and corroborating previous findings by replicating previous studies on measuring the growth of open source projects. Method: We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe. Results: We found growth of open source to be exhausted: After an initial exponential growth, all measurements show a monotonic downwards trend since its peak in 2013. None of the existing growth models could stand the test of time. Conclusion: Our results raise more questions on the growth of open source and the representativeness of Open Hub as a proxy for describing open source. We discuss multiple interpretations for our observations and encourage further research using alternative data sets.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Full text available here: https://ieeexplore.ieee.org/document/8477174 Free/Libre and Open Source Software (FLOSS) communities are composed, in part, of volunteers, many of whom contribute infrequently. However, these infrequent volunteers contribute to the sustainability of FLOSS projects, and should ideally be encouraged to continue participating, even if they cannot be persuaded to contribute regularly. Infrequent contributions are part of a trend which has been widely observed in other sectors of volunteering, where it has been termed “episodic volunteering” (EV). Previous FLOSS research has focused on the Onion model, differentiating core and peripheral developers, with the latter considered as a homogeneous group. We argue this is too simplistic, given the size of the periphery group and the myriad of valuable activities they perform beyond coding. Our exploratory qualitative survey of 13 FLOSS communities investigated what episodic volunteering looks like in a FLOSS context. EV is widespread in FLOSS communities, although not specifically managed. We suggest several recommendations for managing EV based on a framework drawn from the volunteering literature. Also, episodic volunteers make a wide range of value-added contributions other than code, and they should neither be expected nor coerced into becoming habitual volunteers.
Conference Paper
Full-text available
Inner source (IS) is the use of open source software development (SD) practices and the establishment of an open source-like culture within an organization. IS enables and requires developers to collaborate more than traditional SD methods such as plan-driven or agile development. To better understand IS, researchers and practitioners need to measure IS collaboration. However, there is no method yet for doing so. In this paper, we present a method for measuring IS collaboration by measuring the patch-flow within an organization. Patch-flow is the flow of code contributions across organizational boundaries such as project, organizational unit, or profit center boundaries. We evaluate our patch-flow measurement method using case study research with a software developing multi-industry company. By applying the method in the case organization, we evaluate its relevance and viability and discuss its usefulness. We found that about half (47.9%) of all code contributions constitute patch-flow between organizational units, almost all (42.2%) being between organizational units working on different products. Such significant patch-flow indicates high relevance of the patch-flow phenomenon and hence the method presented in this paper. Our patch-flow measurement method is the first of its kind to measure and quantify IS collaboration. It can serve as a base for further quantitative analyses of IS collaboration.
Article
Full-text available
A variety of research methods and techniques are available to SE researchers, and while several overviews exist, there is neither consistency in the research methods covered nor in the terminology used. Furthermore, research is sometimes critically reviewed for characteristics inherent to the methods. We adopt a taxonomy from the social sciences, termed here the ABC framework for SE research, which offers a holistic view of eight archetypal research strategies. ABC refers to the research goal which strives for generalizability over Actors (A), precise measurement of their Behavior (B), in a realistic Context (C). The ABC framework uses two dimensions widely considered to be key in research design: the level of obtrusiveness of the research, and generalizability of research findings. We discuss metaphors for each strategy and their inherent limitations and potential strengths. We illustrate these research strategies in two key SE domains: global software engineering and requirements engineering, and apply the framework on a sample of 75 articles. Finally, we discuss six ways in which the framework can advance SE research.
Conference Paper
Full-text available
Studies about diversity in Software Engineering (SE) are important to understand the disparity occurring nowadays at information technology workplaces. The goal of this work is to analyze the characteristics of diversity in SE and how to adapt SE practices when we have teams with diversity characteristics. We collected data by conducting a Systematic Literature Review (SLR) and semi-structured interviews aiming to identify what impacts of diversity can be observed in software development teams. We found that there are several challenges and barriers encountered in the work environment, and that inclusion and diversity can affect the software development teams positively.
Article
Full-text available
The products of Open Source Software (OSS) projects are widely used even in commercial mission-critical and high-availability systems. This is because both the quality of these software products is high enough for these applications and the support of software could fulfill the requirement. In general, when one wants to adopt OSS as a part of computer systems, it is required to examine the functional requirement (FR) for the OSS as well as nonfunctional requirement (NFR). In the previous paper, we focused on NFR of OSS and proposed an evaluation method based on the maturity model of OSS community. Based on the model, we tried to evaluate four major OSS communities. For the evaluation, we used human knowledge of targeted OSS community. However it was not clear how to evaluate individual OSS project in OSS community. In this paper, we focused on continuity of OSS project, as it is one of the most important factors for users to make a decision. In order to evaluate continuity, we proposed a growth model of OSS project, which is based on the size and activity of OSS Project. We evaluated the growth model using information retrieved from OSS communities from both OSS community sites and source code repositories.
Article
Full-text available
Numerous open source software projects are based on volunteers collaboration and require a continuous influx of newcomers for their continuity. Newcomers face barriers that can lead them to give up. These barriers hinder both developers willing to make a single contribution and those willing to become a project member.Objective This study aims to identify and classify the barriers that newcomers face when contributing to open source software projects.Method We conducted a systematic literature review of papers reporting empirical evidence regarding the barriers that newcomers face when contributing to open source software (OSS) projects. We retrieved 291 studies by querying 4 digital libraries. Twenty studies were identified as primary. We performed a backward snowballing approach, and searched for other papers published by the authors of the selected papers to identify potential studies. Then, we used a coding approach inspired by open coding and axial coding procedures from Grounded Theory to categorize the barriers reported by the selected studies.ResultsWe identified 20 studies providing empirical evidence of barriers faced by newcomers to OSS projects while making a contribution. From the analysis, we identified 15 different barriers, which we grouped into five categories: social interaction, newcomers’ previous knowledge, finding a way to start, documentation, and technical hurdles. We also classified the problems with regard to their origin: newcomers, community, or product.Conclusion The results are useful to researchers and OSS practitioners willing to investigate or to implement tools to support newcomers. We mapped technical and non-technical barriers that hinder newcomers’ first contributions. The most evidenced barriers are related to socialization, appearing in 75% (15 out of 20) of the studies analyzed, with a high focus on interactions in mailing lists (receiving answers and socialization with other members). There is a lack of in-depth studies on technical issues, such as code issues. We also noticed that the majority of the studies relied on historical data gathered from software repositories and that there was a lack of experiments and qualitative studies in this area.
Conference Paper
Full-text available
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.
Article
Full-text available
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. An author's commit frequency describes how often that author commits. Knowing the distribution of all commit frequencies is a fundamental part of understanding software development processes. This paper presents a detailed quantitative analysis of commit frequencies in open-source software development. The analysis is based on a large sample of open source projects, and presents the overall distribution of commit frequencies. We analyze the data to show the differences between authors and projects by project size; we also includes a comparison of successful and non successful projects and we derive an activity indicator from these analyses. By measuring a fundamental dimension of programming we help improve software development tools and our understanding of software development. We also validate some fundamental assumptions about software development.
Conference Paper
Full-text available
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
Conference Paper
Full-text available
Because of the distributed and collaborative nature of free / open source software (FOSS) projects, the development effort invested in a project is usually unknown, even after the software has been released. However, this information is becoming of major interest, especially —but not only— because of the growth in the number of companies for which FOSS has become relevant for their business strategy. In this paper we present a novel approach to estimate effort by considering data from source code management reposito-ries. We apply our model to the OpenStack project, a FOSS project with more than 1,000 authors, in which several tens of companies cooperate. Based on data from its reposito-ries and together with the input from a survey answered by more than 100 developers, we show that the model offers a simple, but sound way of obtaining software development estimations with bounded margins of error.
Preprint
Full-text available
Commit is an important activity of revision control for open-source software (OSS). Recent researches have been pursued to explore the statistical laws of such an activity, but few of those papers conducts empirical investigations on commit interval (i.e. the waiting time between two consecutive commits). In this paper, we investigated software developer's collective and individual commit behavior in terms of the distribution of commit intervals, and found that 1) the data sets of project-level commit intervals within both the lifecycle and each release of the projects analyzed roughly follow power-law distributions; and 2) lifecycle- and release-level collective commit intervals on class files can also be best fitted with power laws. These findings reveal some general (collective) collaborative development patterns of OSS projects, e.g., most of the waiting time between two consecutive commits to a central repository are short, but only a few of them experience a long duration of waiting. Then, the implications of what we found for OSS research and practice were outlined, which could provide an insight into bug detection and program refactoring based on software developers' historical commit behavior.
Conference Paper
Full-text available
Many open source projects have long become commercial. This paper shows just how much of open source software de-velopment is paid work and how much has remained volun-teer work. Using a conservative approach, we find that about 50% of all open source software development has been paid work for many years now and that many small projects are fully paid for by companies. However, we also find that any non-trivial project balances the amount of paid developer with volunteer work, and we suggest that the ratio of volun-teer to paid work can serve as an indicator for the health of open source projects and aid the management of the respec-tive communities. Index Terms—Open source software, empirical software engineering, volunteer open source, paid open source.
Conference Paper
Full-text available
The manner of development is an important factor for the success of open-source software (OSS). Through mining the information of developer's commits, researchers within the community of software engineering can investigate evolutionary aspects of OSS projects and analyze developer's behaviors and collaboration. In this paper we conducted statistical analyses on commit activity for four OSS projects, and found that (1) the commit size in terms of new definitions roughly follows a power-law distribution, and exhibits self-similarity in the temporal dimension; (2) there are five common zones for the distribution of commit activity across various releases in terms of our indicator, and there exists an interesting "deadline effects" in the last zone (i.e. so-called rushing deadline); and (3) developers do prefer to fix bugs in the stage of rushing deadline, perhaps due to deadline pressure. These findings may provide a new insight into schedule planning, resource allocation and quality assurance of OSS projects.
Article
Full-text available
For real-world software to remain satisfactory to its stake- holders requires its continual enhancement and adaptation. Acceptance of this phenomenon, termed software evolution, as intrinsic to real world software has led to an increasing interest in disciplined and systematic planning, management and improvement of the evolution process. Al- most all of the previous work on software evolution has been concerned with the evolution of large scale real-world software systems developed within a single company using traditional management techniques, or with the large scale open source software systems (LSOSSS). However, there is to our knowledge little or no work that has considered small scale open source software systems (SSOSSS). This paper presents an analysis of the evolution behavior of two small size open source software systems, the Barcode Library and Zlib. Surprisingly, unlike large scale open source software systems, the evolution behavior of these small size open source software systems appears to follow Lehman's laws for software evolution.
Article
Full-text available
Empirical research on Free/Libre/Open Source Software (FLOSS) has shown that developers tend to cluster around two main roles: “core” contributors differ from “peripheral” developers in terms of a larger number of responsibilities and a higher productivity pattern. A further, cross-cutting characterization of developers could be achieved by associating developers with “time slots”, and different patterns of activity and effort could be associated to such slots. Such analysis, if replicated, could be used not only to compare different FLOSS communities, and to evaluate their stability and maturity, but also to determine within projects, how the effort is distributed in a given period, and to estimate future needs with respect to key points in the software life-cycle (e.g., major releases). This study analyses the activity patterns within the Linux kernel project, at first focusing on the overall distribution of effort and activity within weeks and days; then, dividing each day into three 8-hour time slots, and focusing on effort and activity around major releases. Such analyses have the objective of evaluating effort, productivity and types of activity globally and around major releases. They enable a comparison of these releases and patterns of effort and activities with traditional software products and processes, and in turn, the identification of company-driven projects (i.e., working mainly during office hours) among FLOSS endeavors. The results of this research show that, overall, the effort within the Linux kernel community is constant (albeit at different levels) throughout the week, signalling the need of updated estimation models, different from those used in traditional 9am–5pm, Monday to Friday commercial companies. It also becomes evident that the activity before a release is vastly different from after a release, and that the changes show an increase in code complexity in specific time slots (notably in the late night hours), which will later require additional maintenance efforts.
Conference Paper
Full-text available
Centralized Version Control Systems have been used by many open source projects for a long time. However, in recent years several widely-known projects have migrated their repositories to Distributed Version Control Systems, such as Mercurial, Bazaar, and Git. Such systems have technical features that allow contributors to work in new ways, as various different workflows are possible. We plan to study this migration process to assess how developers' organization and their contributions are affected. As a first step, we present an analysis of the Mozilla repositories, which migrated from CVS to Mercurial in 2007. This analysis reveals both expected and unexpected aspects of the contributors' activities.
Article
Full-text available
Software evolution research has recently focused on new development paradigms, studying whether laws found in more classic development environments also apply. Previous works have pointed out that at least some laws seem not to be valid for these new environments and even Lehman has labeled those (up to the moment few) cases as anomalies and has suggested that further research is needed to clarify this issue. In this line, we consider in this paper a large set of libre (free, open source) software systems featuring a large community of users and developers. In particular, we analyze a number of projects found in literature up to now, including the Linux kernel. For comparison, we include other libre software kernels from the BSD family, and for completeness we consider a wider range of libre software applications. In the case of Linux and the other operating system kernels we have studied growth patterns also at the subsystem level. We have observed in the studied sample that super-linearity occurs only exceptionally, that many of the systems follow a linear growth pattern and that smooth growth is not that common. These results differ from the ones found generally in classical software evolution studies. Other behaviors and patterns give also a hint that development in the libre software world could follow different laws than those known, at least in some cases.
Article
Full-text available
This paper presents a theory-guided examination of the (changing) nature of volunteering through the lens of sociological modernization theories. Existing accounts of qualitative changes in motivational bases and patterns of volunteering are interpreted against the background of broader, modernization-driven social-structural transformations. It is argued that volunteer involvement should be qualified as a biographically embedded reality, and a new analytical framework of collective and reflexive styles of volunteering is constructed along the lines of the ideal-typical biographical models that are delineated by modernization theorists. Styles of volunteering are understood as essentially multidimensional, multiform, and multilevel in nature. Both structural-behavioral and motivational-attitudinal volunteering features are explored along the lines of six different dimensions: the biographical frame of reference, the motivational structure, the course and intensity of commitment, the organizational environment, the choice of (field of) activity, and the relation to paid work.
Conference Paper
Full-text available
Software repositories such as source control systems,defect tracking systems,or archived communications between project personnel are used to help manage the progress of software projects.Software practitioners and researchers are beginning to recognize ...
Conference Paper
Full-text available
With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses.
Conference Paper
Full-text available
Information contained in versioning system commits has been frequently used to support software evolution research. Concomitantly, some researchers have tried to relate commits to certain activities, e.g., large commits are more likely to be originated from code management activities, while small ones are related to development activities. However, these characterizations are vague, because there is no consistent definition of what is a small or a large commit. In this paper, we study the nature of commits in two dimensions. First, we define the size of commits in terms of number of files, and then we classify commits based on the content of their comments. To perform this study, we use the history log of nine large open source projects.
Article
Full-text available
Among various forms of innovation in industry structures and business models an increasing number of companies have shown interest in aligning themselves to an open source software model as a means to capture intellectual energy, productive software processes and relevant technical skills. This is evident both within small and niche businesses, but also within the largest companies – a phenomenon known as open-sourcing. This paper presents findings from a field study of open-sourcing of software development within two large, global technology companies. It reports on the ways in which open-sourcing is accommodated within the corporate context, and assesses the innovative strategies managers use as they engage with this phenomenon and seek to work co-operatively with open source communities. The analysis focuses on three primary areas that emerge from the data and which are seen to require particular attention in such organizations; license and IPR regime; community approach; and a modified development process.
Conference Paper
Full-text available
Some free software and open source projects have been extremely successful in the past. The success of a project is often related to the number of developers it can attract: a larger community of developers (the bazaar) identifies and corrects more software defects and adds more features via a peer-review process. In this paper two free software projects (Wine and Arla) are empirically explored in order to characterize their software lifecycle, development processes and communities. Both the projects show a phase where the number of active developers and the actual work performed on the system is constant, or does not grow: we argued that this phase corresponds to the one termed cathedral in the literature. One of the two projects (Wine) shows also a second phase: a sudden growing amount of developers corresponds to a similar growing output produced: we termed this as the bazaar phase, and we also argued that this phase was not achieved for the other system. A further analysis revealed that the transition between cathedral and bazaar was a phase by itself in Wine, achieved by creating a growing amount of new modules, which attracted new developers. Full Text at Springer, may require registration or fee
Conference Paper
Full-text available
Software development is undergoing a major change away from a fully closed software process towards a process that incorporates open source software in products and services. Just how significant is that change? To answer this question we need to look at the overall growth of open source as well as its growth rate. In this paper, we quantitatively analyze the growth of more than 5000 active and popular open source software projects. We show that the total amount of source code as well as the total number of open source projects is growing at an exponential rate. Previous research showed linear and quadratic growth in lines of source code of individual open source projects. Our work shows that open source is expanding into new domains and applications at an exponential rate. Full Text at Springer, may require registration or fee
Article
Full-text available
It is long known within the mathematical literature that the coefficient of determination R(2) is an inadequate measure for the goodness of fit in nonlinear models. Nevertheless, it is still frequently used within pharmacological and biochemical literature for the analysis and interpretation of nonlinear fitting to data. The intensive simulation approach undermines previous observations and emphasizes the extremely low performance of R(2) as a basis for model validity and performance when applied to pharmacological/biochemical nonlinear data. In fact, with the 'true' model having up to 500 times more strength of evidence based on Akaike weights, this was only reflected in the third to fifth decimal place of R(2). In addition, even the bias-corrected R(2)(adj) exhibited an extreme bias to higher parametrized models. The bias-corrected AICc and also BIC performed significantly better in this respect. Researchers and reviewers should be aware that R(2) is inappropriate when used for demonstrating the performance or validity of a certain nonlinear model. It should ideally be removed from scientific literature dealing with nonlinear model fitting or at least be supplemented with other methods such as AIC or BIC or used in context to other models in question.
Article
Full-text available
Most research on open source software communities has focused on those that are community founded. More recently, firms have founded their own open source communities. How do sponsored open source communities differ from their autonomous counterparts? With comparative examination of 12 open source projects initiated by corporate sponsors, we identify three design parameters that together help form a participation architecture—the opportunity structure extended to potential external contributors. In exploring sponsors' community design decisions, we found that sponsored open source projects were more likely to offer transparency than they were accessibility and that this had implications for their communities' growth. We contribute theoretical constructs that offer a common basis of comparison for the future study of open source projects and illustrate how the tension between control and growth affects open source community design and creation.
Conference Paper
Full-text available
The research examines the version histories of nine open source software systems to uncover trends and characteristics of how developers commit source code to version control systems (e.g., subversion). The goal is to characterize what a typical or normal commit looks like with respect to the number of files, number of lines, and number of hunks committed together. The results of these three characteristics are presented and the commits are categorized from extra small to extra large. The findings show that approximately 75% of commits are quite small for the systems examined along all three characteristics. Additionally, the commit messages are examined along with the characteristics. The most common words are extracted from the commit messages and correlated with the size categories of the commits. It is observed that sized categories can be indicative of the types of maintenance activities being performed.
Conference Paper
Full-text available
Most empirical studies about Open Source (OS) projects or products are vertical and usually deal with the flagship, successful projects. There is a substantial lack of horizontal studies to shed light on the whole population of projects, including failures. This paper presents a horizontal study aimed at characterizing OS projects. We analyze a sample of around 400 projects from a popular OS project repository. Each project is characterized by a number of attributes. We analyze these attributes statically and over time. The main results show that few projects are capable of attracting a meaningful community of developers. The majority of projects is made by few (in many cases one) person with a very slow pace of evolution.
Article
We draw on the concept of episodic volunteering (EV) from the general volunteering literature to identify practices for managing EV in free/libre/open source software (FLOSS) communities. Infrequent but ongoing participation is widespread, but the practices that community managers are using to manage EV, and their concerns about EV, have not been previously documented. We conducted a policy Delphi study involving 24 FLOSS community managers from 22 different communities. Our panel identified 16 concerns related to managing EV in FLOSS, which we ranked by prevalence. We also describe 65 practices for managing EV in FLOSS. Almost three-quarters of these practices are used by at least three community managers. We report these practices using a systematic presentation that includes context, relationships between practices, and concerns that they address. These findings provide a coherent framework that can help FLOSS community managers to better manage episodic contributors.
Conference Paper
Code forges are third party software repositories that also provide various tools and facilities for distributed software development teams to use, including source code control systems, mailing lists and communication forums, bug tracking systems, web hosting space, and so on. The main contributions of this paper are to present some new data sets relating to the technology adoption lifecycles of a group of six free, libre, and open source software (FLOSS) code forges, and to compare the lifecycles of the forges to each other and to the model presented by classical Diffusion of Innovation (DoI) theory. We find that the observed adoption patterns of code forges rarely follow the DoI model, especially as larger code forges are beset by spam and abuse. The only forge exhibiting a DoI-like lifecycle was a smaller, community-managed, special-purpose forge whose demise was planned in advance. The results of this study will be useful in explaining adoption trajectories, both to practitioners building collaborative FLOSS ecosystems and to researchers who study the evolution and adoption of socio-technical systems.
Article
Large public data sets on software evolution promise great value to both researchers and practitioners, in particular for software (development) analytics. To realise this value, the data quality of such data sets needs to be studied and improved. Despite these data sets being of a secondary nature, i.e., they were not collected by the people using them, data quality is often taken for granted, casting doubt on conclusions drawn from those data. This paper reports on an intial investigation of the quality of the software evolution data available on Ohloh, and further describes steps taken to cleanse the data set. Our goal is that other researchers, practitioners, and parties responsible for data sets such as Ohloh, use the outcomes of the validation and cleansing steps to improve quality of data sets in the public domain.
Article
Google's monolithic repository provides a common source of truth for tens of thousands of developers around the world.
Article
The Life cycle for the development of traditional commercial software is well established and discussed in various texts and research papers in detail. But in case of Open Source Software (OSS) life cycle for the development is not being discussed in much detail as there is no standardized life cycle approach exists for Open Source Software (OSS) development. Different researchers and developers have proposed various life cycles for the development of OSS with respect to their own development experience, need, or application. The main focus of this paper is on reviewing and then comparing the existing Open Source Software Life Cycle (OSSLC) presented and proposed by the various researchers and practitioners. It will give a precise view that in actual what are the various development approaches are used in case of OSS and how development of open source software starts and proceeds.
Article
There is currently a vast array of open source projects available on the web, and although they are searchable by name or description in the search engines, there is no way to search for projects by how well they perform on a given set of quality attributes (e.g. usability or maintainability). With OpenHub, we present a scalable and extensible architecture for the static and runtime analysis of open source repositories written in Python, presenting the architecture and pinpointing future possibilities with it.
Article
We review the empirical research on Free/Libre and Open-Source Software (FLOSS) development and assess the state of the literature. We develop a framework for organizing the literature based on the input-mediator-output-input (IMOI) model from the small groups literature. We present a quantitative summary of articles selected for the review and then discuss findings of this literature categorized into issues pertaining to inputs (e.g., member characteristics, technology use, and project characteristics), processes (software development practices, social processes, and firm involvement practices), emergent states (e.g., social states and task-related states), and outputs (e.g. team performance, FLOSS implementation, and project evolution). Based on this review, we suggest topics for future research, as well as identify methodological and theoretical issues for future inquiry in this area, including issues relating to sampling and the need for more longitudinal studies.
Article
In this paper, the evolution of a large sample of open source software systems will be analysed. The evolution of commercial systems has been an issue that has long been a centre of research, thus a coherent theoretical framework of software evolution has been developed and empirically tested, most notably the laws of software evolution. In exploring the evolutionary behaviour of open source systems, these results can serve as a point of reference, allowing to assess if differences exist, or which aspects of open and collaborative development styles have an impact on evolutionary behaviour. The data collection method relying on a large software repository and the respective source code control systems is described, and an overview on the collected data on several thousand projects is given. The evolutionary behaviour is explored using both a linear and a quadratic model, with the quadratic model being shown as better suited. The most interesting fact is that while in the mean the growth rate is linear or decreasing over time according to the laws of software evolution, a significant percentage of projects is able to sustain super-linear growth. There is a positive relationship between the size of a project, the number of participants, and the inequality in the distribution of work within the development team with the presence of super-linear growth patterns. On the other hand, there is evidence for a group of projects of moderate size which shows decreasing growth rates, while small projects in general exhibit linear growth. Copyright © 2007 John Wiley & Sons, Ltd.
Article
Recently, an exciting approach to solving complex problems has evolved out of computer science, called Open Source programming. In open source software development settings, programmers freely share their intellectual property ? their readable programming source code ? over the Internet. Some open source endeavors have resulted in very complex, high-quality software products, of which the best-known are the Linux operating system and the Apache Web server. A great advantage of an Internet-based open source approach is its potential to achieve global collective action toward developing robust solutions to complex programming problems. This paper argues that open source has potential application beyond computer programming. Open source principles could potentially be applied to almost any intellectual endeavor, and may be a very important innovation toward harnessing global collaboration toward solving complex public policy and management problems. Little research has been published outlining the details of how successful open source programming endeavors are achieved, such as how projects are initiated and organized over time, what rules for participation have been established, and how the methods for maintaining versions of new submissions have been managed. The institutional designs and management of open source projects could be critical for ensuring participants' willingness to collaborate and for recruiting new team members. This paper and the research program it describes, attempts to address this gap. It provides a summary of the "life cycle" of open source programming projects based on existing literature that is largely focused on high-profile open source projects like Linux and Apache Web Server. It then provides interim results from an on-going study of the institutional designs of open source programming projects. It concludes by presenting some examples of non-programming projects that are beginning to apply open source or licensing principles in areas outside of programming and by presenting an example of how these principles might be applied to complex problems beyond programming in the realm of environmental policy and management.
Article
Our recent work has addressed how and why software systems evolve over time, with a particular emphasis on software architecture and open source software systems [2, 3, 6]. In this position paper, we present a short summary of two recent projects. First, we have performed a case study on the evolution of the Linux kernel [3], as well as some other open source software (OSS) systems. We have found that several OSS systems appear not to obey some of "Lehman's laws" of software evolution [5, 7], and that Linux in particular is continuing to grow at a geometric rate. Currently, we are working on a detailed study of the evolution of one of the subsystems of the Linux kernel: the SCSI drivers subsystem. We have found that cloning, which is usually considered to be an indicator of lazy development and poor process, is quite common and is even considered to be a useful practice. Second, we are developing a tool called Beagle to aid software maintainers in understanding how large systems have...
Estimating the number of active and stable floss projects
  • Carlo Daffara
Carlo Daffara. 2007. Estimating the number of active and stable floss projects. https://robertogaloppini.net/2007/08/23/estimating-the-number-ofactive-and-stable-floss-projects/
The Open Source Survey
  • Github
Github. 2017. The Open Source Survey. http://opensourcesurvey.org/2017/ Accessed 11 Nov 2018.
What makes a project open source? Migrating from organic to synthetic communities
  • O Siobhan
  • Joel Mahony
  • West
Siobhan O'Mahony and Joel West. 2005. What makes a project open source? Migrating from organic to synthetic communities. Academy of Management conference, Technology and Innovation Management division, Honolulu, August 2005 (2005), 39.
Preliminary Results from an Empirical Study on the Growth of Open Source and Commercial Software Products
  • Giancarlo Succi
  • James Paulson
  • Armin Eberlein
Giancarlo Succi, James Paulson, and Armin Eberlein. 2001. Preliminary Results from an Empirical Study on the Growth of Open Source and Commercial Software Products. in Proceedings of the Workshop on Economics-Driven Software Engineering Research, Edser 3 (2001), 14-15. http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.8888
  • Donald E Wynn
Donald E. Wynn. 2004. Organizational Structure of Open Source Projects: A Life Cycle Approach. SAIS 2004 Proceedings 47, 6 (may 2004). https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1046&context=sais2004
FLOSSPOLS: Skills Survey Interim Report
  • Rishab Ghosh
  • Rüdiger Glott
Rishab Ghosh and Rüdiger Glott. 2005. FLOSSPOLS: Skills Survey Interim Report. Technical Report. http://flosspols.merit.unu.edu/deliverables/ D10HTML/FLOSSPOLS-D10-skills%20survey_interim_report-revision-FINAL.html
The Rise and Fall of an Online Project. Is Bureaucracy Killing Efficiency in Open Knowledge Production?
  • Nicolas Jullien
  • Kevin Crowston
  • Felipe Ortega
Nicolas Jullien, Kevin Crowston, and Felipe Ortega. 2015. The Rise and Fall of an Online Project. Is Bureaucracy Killing Efficiency in Open Knowledge Production? In Proceedings of the 11th International Symposium on Open Collaboration (OpenSym 2015). ACM, New York, NY, USA.
Growth and Duplication of Public Source Code over Time
  • Guillaume Rousseau
  • Roberto Di Cosmo
  • Stefano Zacchiroli
Guillaume Rousseau, Roberto Di Cosmo, and Stefano Zacchiroli. 2019. Growth and Duplication of Public Source Code over Time: Provenance Tracking at Scale. arXiv:1906.08076v1.