Conference PaperPDF Available

How Effective is Continuous Integration in Indicating Single-Statement Bugs?

Authors:

Abstract

Continuous Integration (CI) is the process of automatically compiling, building, and testing code changes in the hope of catching bugs as they are introduced into the code base. With bug fixing being a core and increasingly costly task in software development, the community has adopted CI to mitigate this issue and improve the quality of their software products. Bug fixing is a core task in software development and becomes increasingly costly over time. However, little is known about how effective CI is at detecting simple, single-statement bugs. In this paper, we analyze the effectiveness of CI in 14 popular open source Java-based projects to warn about 318 single-statement bugs (SStuBs). We analyze the build status at the commits that introduce SStuBs and before the SStuBs were fixed. We then investigate how often CI indicates the presence of these bugs, through test failure. Our results show that only 2% of the commits that introduced SStuBs have builds with failed tests and 7.5% of builds before the fix reported test failures. Upon close manual inspection, we found that none of the failed builds actually captured SStuBs, indicating that CI is not the right medium to capture the SStuBs we studied. Our results suggest that developers should not rely on CI to catch SStuBs or increase their CI pipeline coverage to detect single-statement bugs.
How Effective is Continuous Integration in
Indicating Single-Statement Bugs?
Jasmine Latendresse, Rabe Abdalkareem∗∗, Diego Elias Costa, and Emad Shihab
Concordia University, Montreal, Canada
Email: {j latend,diego.costa,eshihab}@encs.concordia.ca
∗∗Queen’s University, Kingston, Canada
Email: abdrabe@gmail.com
Abstract—Continuous Integration (CI) is the process of auto-
matically compiling, building, and testing code changes in the
hope of catching bugs as they are introduced into the code base.
With bug fixing being a core and increasingly costly task in
software development, the community has adopted CI to mitigate
this issue and improve the quality of their software products.
Bug fixing is a core task in software development and becomes
increasingly costly over time. However, little is known about how
effective CI is at detecting simple, single-statement bugs.
In this paper, we analyze the effectiveness of CI in 14 popular
open source Java-based projects to warn about 318 single-
statement bugs (SStuBs). We analyze the build status at the
commits that introduce SStuBs and before the SStuBs were fixed.
We then investigate how often CI indicates the presence of these
bugs, through test failure. Our results show that only 2% of
the commits that introduced SStuBs have builds with failed tests
and 7.5% of builds before the fix reported test failures. Upon
close manual inspection, we found that none of the failed builds
actually captured SStuBs, indicating that CI is not the right
medium to capture the SStuBs we studied. Our results suggest
that developers should not rely on CI to catch SStuBs or increase
their CI pipeline coverage to detect single-statement bugs.
I. INTRODUCTION
Continuous integration (CI) is commonly used in many
industry and open-source projects [5, 11, 3]. Online CI ser-
vices, such as Travis CI, continuously integrate code changes
by automating compilation, building, and testing [3, 1]. With
CI, incremental changes brought to the code base are more
atomic, which makes bug detection simpler and quicker. In
addition, early bug detection and reporting significantly reduce
maintenance overhead since it allows developers to fix faults
and make possible critical decisions earlier in the project’s
lifecycle, which leads to fewer unintended consequences [2,
5, 12].
Recently, Karampatsis and Sutton distinguished a new type
of software bugs called single-statement bugs (SStuBs) [6].
SStuBs are bugs in which the associated fixing commit
contains only single-statement changes, excluding stylistic
changes and differences in comments. At first glance, SStuBs
tend to be easy to introduce because they can be caused by
simple modifications such as changing a variable name or
arguments in a function. However, SStuBs still find their way
to software projects, even in the presence of a CI pipeline.
While prior work on CI focused on studying its usage and
benefits (e.g,. [5, 7]) and examining the reasons for failing
builds (e.g,. [2, 5]), no prior work answers the question; how
effective is CI in indicating single-statement bugs (SStuBs)?
Therefore, the main goal of our work is to empirically
investigate the effectiveness of CI in identifying and reporting
SStuBs, through failing tests. We begin by examining the
ManySStuBs4J dataset [6] and selecting 14 open-source Java
projects that contain a significant number of single-statement
bugs. We then analyze the selected projects to identify the
commits that introduce SStuBs in these projects. Finally, we
link these commits to their build results on Travis CI [4]
to examine how effective is CI in identifying SStuBs. We
formulate our study in the follow two research questions:
RQ1: How many CI builds fail when the SStuBs are in-
troduced? How many CI builds fail just before SStuBs are
fixed? We find that only 2% of the commits that introduced
SStuBs have builds that report a failure. Similarly, only 7.5%
of commits preceding the SStuBs fix commit show any signs
of build failure. In fact, the majority of SStuBs (50.5%) we
investigate have a long time-span, living in the code for more
than one month.
RQ2: From the CI builds that do fail, how many fail
due to the SStuBs? Of the 23 failed builds that we manually
inspected to determine the failure root cause, none failed due
to tests covering SStuBs. Instead, builds failed for external
reasons, such as dependency errors [10].
Our results show that CI is not effective in capturing SStuBs,
hence, developers should not depend on it for such.
II. CA SE STU DY DESIGN
The goal of our study is to investigate the effectiveness
of CI on identifying SStuBs. To that aim, we first identify
the commits that introduce these SStuBs (i.e,. bug-inducing
commits) and the commits that precede the SStuBs’ fix. Then,
we examine the CI build results that were triggered by these
commits. To do so, we triangulated three different data sets.
First, we use the ManySStuBs4J to identify commits that fix
SStuBs [6]. Then, we generate a dataset using the Commit
Guru tool to identify commits that introduce SStuBs [9]. Fi-
nally, we use the TravisTorrent to extract the build results [4].
In the following sections, we discuss the steps used to filter a
set of 14 open source projects from the ManySStuBs4J dataset
and the methodology used to address our research questions.
An overview of our approach is shown in Figure 1.
ManySStuBs4J
Dataset
Data Filtering
Selecting
projects with CI
data
Filtering
projects based
on bug type
34
Projects
DF Stage 1 DF Stage 2
TravisTorrent
Dataset
15
Projects
SStuBs types
Data Analysis
Finding buggy commits
associated to the
ManySStuBs4J fix
commits
Commit Guru
Dataset
DA Stage 1
Associating
buggy commits
with CI builds
DA Stage 2
Finding build
status and test
logs
DA Stage 3
Computing
the lifespan
of the bugs
DA Stage 4
318
bugs from
14 projects
Studied
projects
Results
Figure 1: Overview of our approach.
Table I: Descriptive statistics of the 14 selected Java projects.
Descriptive Statistics Avg Min Median Max
Project Age (years) 11 8 10 20
KLOC 434 30 380 1,898
# of commits 9,670 1,700 9,863 25,448
# of stars 11,307 5,600 9,600 40,100
Travis CI Usage (years) 6.6 5 7 10
# Bug types 10.6 7 10 15
A. Data Filtering
The ManySStuBs4J dataset [6] is composed of fixes to Java
simple bugs. In this paper, we study the small version of the
dataset that contains 25,539 SStuBs fixes mined from 100
popular open-source Java projects. While 100 projects were
reported by the ManySStuBs4J dataset, we only found 84
projects in the set. Because of time and resource limitations,
we decide to filter projects based on the availability of CI data
and diversity of bug types, which results in a studied system
of 14 Java projects. Next, we describe our filtering steps.
DF 1: Selecting projects with CI data. The goal of our
study is to investigate the effectiveness of CI at indicating
SStuBs. Hence, we must select projects with publicly available
CI data. We select projects that have adopted Travis CI, a
popular CI service provider [8]. For this, we cross-reference
the TravisTorrent [4], a public dataset of TravisCI data, and
the ManySStuBs4J dataset based on the repository names.
The repository name is composed by the owner id and the
project name, and is unique in both datasets. We find that
34 projects from the ManySStuBs4J dataset have available CI
data in TravisTorrent.
DF 2: Filtering projects based on bug type. The
ManySStuBs4J dataset presents 16 distinct bug types.
To cover a wider variety of SStuBs categories, we decide to
select the 15 projects with the highest diversity in bug types.
This resulted in a final set of Java projects containing both
SStuBs data and their CI information for a total of 1,284
bug fix commits. Table I presents the descriptive statistics of
the selected projects, showing that the projects we investigate
are mature (median of 10 years) and popular Java projects.
Some of the selected projects include Junit4, Apache Flink,
and Google Guice.
B. Data Analysis
Our study focuses on analyzing the effectiveness of CI
at indicating the bug, either when it was first introduced
(the earliest chance for CI to capture the SStuB) or right
before the fix (the latest chance for CI to capture the SStuB).
The ManySStuBs4J dataset only contains the commits that
introduced the fixes. Hence, to address our research questions,
we need data about the commits that introduced the bugs fixed
by the commits presented in the ManySStuBs4J dataset. The
analysis steps are described below.
DA 1: Finding the bug-inducing commits associated to
the ManySStuBs4J fix commits. For this, we use the tool
Commit Guru [9] to trace back the bug-inducing commit with
the fix commit hash. Commit Guru is a tool that, among other,
implements the SZZ algorithm to identify commits that are
more likely to introduce bugs into a project. The repository
Google/guava failed to be analyzed by Commit Guru due to
a faulty commit modifying all repository files. This leaves us
with a set of 14 projects for our analysis. Then, we map the
corrective commit to the commit it is fixing which corresponds
to the bug-inducing commit. We find 318 distinct SStuBs
commit and fix commit pairings.
DA 2: Associating buggy commits with CI builds. In
this step, we want to obtain CI data for the builds at bug
introduction. In other words, we need to see if bug-inducing
commits either triggered a build or are part of a push that
triggered a build. For this, we query the TravisTorrent dataset
for the bug-inducing commit hash. We find that 50 bugs are
associated to a Travis CI build.
DA 3: Finding build status and test logs. In this step, we
are interested in obtaining the build status and test logs for the
builds that are associated to the aforementioned bug-inducing
commits. Precisely, we want to obtain the build status and test
logs for (a) builds associated with the bug-inducing commits,
(b) builds preceding the build associated with the fix commits,
and (c) builds associated with the commits that introduced the
fix as illustrated in Figure 2. For phase (a), we proceed as
mentioned in DA 1. For phase (b), we query the TravisTorrent
dataset for the build that corresponds to the previous build
associated to a fix commit hash. We obtain 57 builds that
occurred because the fix is introduced. For phase (c), we query
the TravisTorrent dataset for the fix commit hash and find 366
associated builds. The results found with phase (c) are used
to compare the number of tests that run within the CI builds
after the fix commit with phase (b).
DA 4: Computing the lifespan of the bugs. To determine
Timeline
Bug-inducing
commit
Bug-fixing
commit
CI Build
Bug lifespan
Pre Bug-fixing
Commit
A B
CI Build
Figure 2: Illustration of our analysis method.
how long a SStuB remains in the code base, we query the
Commit Guru set for the commit hash associated to the bug
introduction and the author date associated to that commit. We
repeat this process for the commit hash associated to the fix.
Finally, we calculate the difference in author dates (between
introduction and fix) to determine how long the SStuB lived
in the code. Using this process, we are able to determine the
lifespan for a total of 318 SStuBs.
III. CAS E STU DY RESU LTS
In this section, we present the results of our two research
questions. For each research question, we present its motiva-
tion, the approach to answer the question, and the results.
RQ1: How many CI builds fail when the SStuBs are intro-
duced? How many CI builds fail just before SStuBs are fixed?
Motivation: Prior research shows that CI is effective at
catching bugs (e.g,. [5, 11]). While easy to fix, SStuBs can
linger in the code for quite some time if not captured by
automated tests. In this research question, we want to evaluate
the CI effectiveness at catching SSTuBs to help developers fix
them as soon as possible.
Approach: CI can help identify bugs when they are first
introduced (ideal case), or later in the development, once new
tests are added to the CI test suite. We illustrate this timeline
in Figure 2. We approach this problem by analysing the related
build status in two points: at the time when the bug was
introduced in the code (stage A in the figure) and right before
the bug was fixed by developers, by analyzing the build status
related to the commit that preceded the fix-commit (stage B).
The rationale is that, the bug introducing commit is the earliest
that CI can indicate the presence of SStuBs and the commit
that precedes the fix is the last chance for CI to indicate the
presence of SStuBs. Then, we report how many commits have
triggered the CI to fail, in stages A and B, by computing the
proportion of builds that have finished with the status “passed”,
“failed”, and “errored”.
While we are interested in evaluating CI effectiveness at
capturing SStuBs, CI builds occur at the commit level. A
single commit can introduce and/or fix multiple SStuBs, and
we find that the 318 SStuBs were introduced and fixed by
240 distinct commit pairs. For instance, we find that the 65
commits introduced 2 different SStuBs in our dataset. That
means that all our analysis of CI is based on these 240 distinct
commits, as a CI build runs at the commit level.
Results: Of the 240 SStuB related builds, only 2.0% (5) fail
when the SStuB is introduced. Table II shows the results of
our analysis. In the column “Bug-inducing commits (A)”, we
show the CI build statuses. From the 64 bug-inducing commits
Table II: Coverage of CI on 318 single-statement bugs that
were introduced and fixed on 240 distinct commits.
Bug-inducing Pre Bug-fixing
commits (A) commit (B)
# CI builds 64 (26.6%) 159 (66.2%)
# CI passed builds 41 (17.0%) 127 (52.9%)
# CI failed builds 5 (2.0%) 18 (7.5%)
# CI builds with errors 18 (5.6%) 14 (5.8%)
Figure 3: Distribution of the lifespan of 318 SSTuBs in our
dataset.
that have their code tested by the projects’ test suite, we find
that 41 pass all their tests and only 5 (2%) of them have some
failed tests. In this result, we also notice that 18 builds yield
an error, which can be caused by a myriad of reasons, such as
project build errors, server timeout, failure in the environment,
etc.
Moreover, we notice that of the 240 bug-inducing commits,
only 64 (26.6%) have a CI build data associated with them.
This indicates that the vast majority of 176 bug-inducing
commits are not tested the projects’ CI pipeline.
Of the 240 SStuB related builds, only 7.5% (18) fail just
before the fixing commit. “Column Pre Bug-fixing commits
(B)” in Table II shows the results of the CI build status just
before the fix. From the Table, we notice that the proportion
of commits that have triggered a build substantially increases
to 159 (66.2%). From these 159 builds, we find that its vast
majority (127) pass all their tests. Only 18 builds show failed
tests, and 14 builds yield errors.
Overall, only a minute fraction of SStuBs have an associated
CI build with failed tests. While the proportion at the time
of the fix (stage B) is higher than when introducing the bug
(stage A), our results indicate that SStuBs can live in the code
base without affecting the build status of the CI pipeline. To
investigate how long each SStuBs have lived in the project’s
code, we compute the lifespan of each bug. Figure 3 presents
the distribution of the SStuBs lifespan binned by six categories
of periods. Note that the majority of SStuBs (50.32%) have a
lifespan longer than one month.
Implications: Surprisingly, the majority of the builds do not
fail when SStuBs were introduced to the code base, neither at
the builds preceding the fix commit. In fact, most of the studied
SStuBs stayed hidden in the code for more than a month, with
22% of them staying in the code for at least 6 months. This
indicates some level of inadequacy of CI pipelines in finding
Table III: Why does CI build fail on 23 builds.
Bug-inducing Pre Bug-fixing
commits (A) commit (B)
Total of CI fails 5 18
Failed tests related to SStuBs 0 0
Failed tests unrelated to SStuBs 1 8
Build failed without running tests 3 5
Build failed with all passing tests 1 5
this type of bug. One reason for the the surprisingly long
lifespan of these 318 SStuBs is that these bugs may introduce
failures in non-essential parts of the software project. Another
explanation, is that such bugs were not initially bugs when
the code was first modified (in the so-called bug-inducing
commits), but they later became bugs in the system due to
some other concurrent code change.
The majority of bug-inducing commits (74%) did not
trigger any CI build, and only 5 out of 64 builds failed
the test when SStuBs were introduced in the code. Builds
that precede the bug fix are more frequent (66%), but only
18 out of 159 builds showed any failed tests. The majority
of SStuBs stay in the code for more than a month.
RQ2: From the CI builds that do fail, how many fail due to
the SStuBs?
Motivation: The observations made in RQ1 suggest that CI is
not very effective in detecting SStuBs early on. In this research
question, we are interested in finding how many of CI builds
actually captured the SStuBs.
Approach: In this question, we focus on manually analysing
the builds that have reported failed tests in RQ1. We analyze
the 5 builds that have failed in phase A and the 18 builds
that have failed in phase B, as illustrated in Figure 2. We
manually inspect the test results and test logs provided by
Travis-CI data. Once we identify the failed tests, we resort
to code analysis to investigate if the tests failures are linked
to the part of the code SStuBs is located. For failed builds,
we manually inspect the build logs and test log to verify their
relationship. It is important to note that for this step, we do
not apply formal analysis since identifying whether the build
is related to the test is straightforward.
Results: None of the CI builds that fail, do so because
of SStuBs. Table III shows the results of RQ2 in the column
“Pre bug-fixing commit (B)”. We observe that of 23 builds that
failed in our previous analysis, none were associated with the
SStuBs in the code. From the 5 builds that fail during bug-
inducing commits (A), 1 build fails due to tests completely
unrelated to the SStuBs, 3 other builds fail without even
running tests and 1 build fails due to external reasons. For
example, upon manual analysis of the test logs and source
code, we find a failing test associated to the following message
“Failure Encountered too many errors talking to a worker
node”, which is unrelated to the related SStuB.
From the 18 builds that fail right before the bug-fix commit
(B), we find that 8 builds have failed tests. However, none of
the failed tests execute the code where the SStuBs are located.
Furthermore, 5 builds fail without running any tests, due to
external reasons such as dependency errors1and 5 other builds
fail even with all passing tests also because of dependency
errors2.
Implications: Our results indicate that CI are not effective in
finding any of the 318 SStuBs we investigated. We find that,
while 9 of the 23 builds failed due to failing tests, the tests did
not cover the SStuBs code location. Moreover, the majority of
the failures were not caused by test suite but external failures
such as dependency errors. Hence, our observations suggest
that it is more common for builds to fail because of external
failures than SStuBs.
From the 23 failed builds, none are caused by a test affected
by SStuBs. Most of the failed builds (14 out of 23) are
caused by external failures, such as dependency errors, and
did not even execute the test suite.
IV. THR EATS TO VALIDITY
There are few important limitations to our work that need to
be considered when interpreting our findings. First, to identify
the commits that introduce the SStuBs, we use the Commit
Guru tool which is based on the SZZ algorithm [9]. Hence,
we are limited by the accuracy of Commit Guru. In some
cases, we may have missed instances of commits. To help
alleviate this issue, we manually investigated some of the
identified commits, and in all cases, we found that Commit
Guru has identified the correct commits. Second, our study
focuses on only 14 open source Java based projects on a
subset of SStuBs and are unlikely to generalize beyond this set
of projects. Finally, our study uses the TravisTorrent dataset,
thus, our results are limited to the correctness and quality of
the available Travis CI build data.
V. CONCLUSIONS
In this paper, we analyze the effectiveness of CI on 14
popular open source Java-based projects to warn about single-
statement bugs. To do so, we analyze the status of CI builds in
two stages: when the SStuBs was introduced in the code and
before the SStuBs were fixed. We then investigate how often
CI indicates the presence of 318 SStuBs, through test failure.
Our findings show that CI was ineffective at indicating the
presence of SStuBs, with no build failure caused by SStuBs.
In fact, the majority of the studied SStuBs stay in the code
for more than a month, which further corroborates with our
assessment on the CI inadequacy in capturing SStuBs. These
results should, however, be considered preliminary. Future
work should focus on enlarging our dataset and examining
other programming languages and CI services. Another direc-
tion could be to collect qualitative data from projects to assess
the quality of the CI pipelines. Interesting future work is to
build tools that determine the effectiveness of the CI pipeline.
1https://travis-ci.org/github/graylog2/graylog2-server/builds/31332558
2https://travis-ci.org/github/prestodb/presto/builds/36184662
REFERENCES
[1] R. Abdalkareem, S. Mujahid, and E. Shihab. “A Ma-
chine Learning Approach to Improve the Detection of
CI Skip Commits”. In: IEEE Transactions on Software
Engineering (2020), pp. 1–1. DOI: 10.1109/TSE.2020.
2967380.
[2] Rabe Abdalkareem et al. “Which Commits Can Be CI
Skipped?” In: IEEE Transactions on Software Engineer-
ing (2019), pp. 1–1. DO I: 10.1109/TSE.2019.2897300.
[3] Moritz Beller, Georgios Gousios, and Andy Zaidman.
“Oops, My Tests Broke the Build: An Explorative
Analysis of Travis CI with GitHub”. In: Proceedings of
the 14th International Conference on Mining Software
Repositories. MSR ’17. IEEE Press, 2017, pp. 356–367.
[4] Moritz Beller, Georgios Gousios, and Andy Zaidman.
“TravisTorrent: Synthesizing Travis CI and GitHub for
Full-Stack Research on Continuous Integration”. In:
Proceedings of the 14th working conference on mining
software repositories. MSR ’17. 2017.
[5] Michael Hilton et al. “Usage, Costs, and Benefits of
Continuous Integration in Open-Source Projects”. In:
Proceedings of the 31st IEEE/ACM International Con-
ference on Automated Software Engineering. ASE 2016.
2016, pp. 426–437.
[6] Rafael-Michael Karampatsis and Charles Sutton.
“How Often Do Single-Statement Bugs Occur? The
ManySStuBs4J Dataset”. In: Proceedings of the Inter-
national Conference on Mining Software Repositories
(MSR 2020). 2020.
[7] Courtney Miller et al. “Why do People Give Up
FLOSSing? A Study of Contributor Disengagement in
Open Source”. In: International Conference on Open
Source Systems. OSS. Springer, 2019, pp. 116–129.
DO I: https://doi.org/10.1007/978-3- 030-20883-7 11.
[8] Johannes Nicolai. GitHub welcomes all CI tools - The
GitHub Blog. https://github.blog /2017- 11- 07- github-
welcomes-all-ci-tools/. (accessed on 01/26/2021). Nov.
2017.
[9] Christoffer Rosen, Ben Grawi, and Emad Shihab.
“Commit Guru: Analytics and Risk Prediction of Soft-
ware Commits”. In: Proceedings of the 2015 10th
Joint Meeting on Foundations of Software Engineering.
ESEC/FSE 2015. 2015, pp. 966–969.
[10] Dain Sundstrom. Build #2954 - prestodb/presto - Travis
CI. https: //travis- ci.org/github /prestodb/ presto/builds/
41060068. (accessed on 02/24/2021). 2020.
[11] Bogdan Vasilescu et al. “Quality and productivity out-
comes relating to continuous integration in GitHub”.
In: Proceedings of the 2015 10th Joint Meeting on
Foundations of Software Engineering. FSE ’15. ACM.
2015, pp. 805–816.
[12] Carmine Vassallo et al. “Automated Reporting of Anti-
Patterns and Decay in Continuous Integration”. In:
Proceedings of the 41st International Conference on
Software Engineering. ICSE ’19. IEEE Press, 2019,
pp. 105–115.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Continuous integration (CI) frameworks, such as Travis CI, are growing in popularity, encouraged by market trends towards speeding up the release cycle and building higher-quality software. A key facilitator of CI is to automatically build and run tests whenever a new commit is submitted/pushed. Despite the many advantages of using CI, it is known that the CI process can take a very long time to complete. One of the core causes for such delays is the fact that some commits (e.g., cosmetic changes) unnecessarily kick off the CI process. Therefore, the main goal of this paper is to automate the process of determining which commits can be CI skipped through the use of machine learning techniques. We first extracted 23 features from historical data of ten software repositories. Second, we conduct a study on the detection of CI skip commits using machine learning where we built a decision tree classifier. We then examine the accuracy of using the decision tree in detecting CI skip commits. Our results show that the decision tree can identify CI skip commits with an average AUC equal to 0.89. Furthermore, the top node analysis shows that the number of developers who changed the modified files, the CI-Skip rules, and commit message are the most important features to detect CI skip commits. Finally, we investigate the generalizability of identifying CI skip commits through applying cross-project validation, and our results show that the general classifier achieves an average 0.74 of AUC values.
Article
Full-text available
Continuous Integration (CI) frameworks such as Travis CI, automatically build and run tests whenever a new commit is submitted/pushed. Although there are many advantages in using CI, e.g., speeding up the release cycle and automating the test execution process, it has been noted that the CI process can take a very long time to complete. One of the possible reasons for such delays is the fact that some commits (e.g., changes to readme files) unnecessarily kick off the CI process. Therefore, the goal of this paper is to automate the process of determining which commits can be CI skipped. We start by examining the commits of 58 Java projects and identify commits that were explicitly CI skipped by developers. Based on the manual investigation of 1,813 explicitly CI skipped commits, we first devise an initial model of a CI skipped commit and use this model to propose a rule-based technique that automatically identifies commits that should be CI skipped. To evaluate the rule-based technique, we perform a study on unseen datasets extracted from ten projects and show that the devised rule-based technique is able to detect and label CI skip commits, achieving Areas Under the Curve (AUC) values between 0.56 and 0.98 (average of 0.73). Additionally, we show that, on average, our technique can reduce the number of commits that need to trigger the CI process by 18.16%. We also qualitatively triangulated our analysis on the importance of skipping the CI process through a survey with 40 developers. The survey results showed that 75% of the surveyed developers consider it to be nice, important or very important to have a technique that automatically flags CI skip commits. To operationalize our technique, we develop a publicly available prototype tool, called CI-SKIPPER, that can be integrated with any git repository and automatically mark commits that can be CI skipped.
Conference Paper
Full-text available
Software processes comprise many steps; coding is followed by building, integration testing, system testing, deployment, operations, among others. Software process integration and automation have been areas of key concern in software engineering, ever since the pioneering work of Osterweil; market pressures for Agility, and open, decentralized, software development have provided additional pressures for progress in this area. But do these innovations actually help projects? Given the numerous confounding factors that can influence project performance, it can be a challenge to discern the effects of process integration and automation. Software project ecosystems such as GitHub provide a new opportunity in this regard: one can readily find large numbers of projects in various stages of process integration and automation, and gather data on various influencing factors as well as productivity and quality outcomes. In this paper we use large, historical data on process metrics and outcomes in GitHub projects to discern the effects of one specific innovation in process automation: continuous integration. Our main finding is that continuous integration improves the productivity of project teams, who can integrate more outside contributions, without an observable diminishment in code quality.
Conference Paper
Continuous Integration (CI) is a widely-used software engineering practice. The software is continuously built so that changes can be easily integrated and issues such as unmet quality goals or style inconsistencies get detected early. Unfortunately, it is not only hard to introduce CI into an existing project, but it is also challenging to live up to the CI principles when facing tough deadlines or business decisions. Previous work has identified common anti-patterns that reduce the promised benefits of CI. Typically, these anti-patterns slowly creep into a project over time before they are identified. We argue that automated detection can help with early identification and prevent such a process decay. In this work, we further analyze this assumption and survey 124 developers about CI anti-patterns. From the results, we build CI-Odor, a reporting tool for CI processes that detects the existence of four relevant anti-patterns by analyzing regular build logs and repository information. In a study on the 18,474 build logs of 36 popular Java projects, we reveal the presence of 3,823 high-severity warnings spread across projects. We validate our reports in a survey among 13 original developers of these projects and through general feedback from 42 developers that confirm the relevance of our reports.
Chapter
Established contributors are the backbone of many free/libre open source software (FLOSS) projects. Previous research has shown that it is critically important for projects to retain contributors and it has also revealed the motivations behind why contributors choose to participate in FLOSS in the first place. However, there has been limited research done on the reasons why established contributors disengage, and factors (on an individual and project level) that predict their disengagement. In this paper, we conduct a mixed-methods empirical study, combining surveys and survival modeling, to identify the reasons and predictive factors behind established contributor disengagement. We find that different groups of established contributors tend to disengage for different reasons; however, overall contributors most commonly cite some kind of transition (e.g., switching jobs or leaving academia). We also find that factors such as the popularity of the projects a contributor works on, whether they have experienced a transition, when they work, and how much they work are all factors that can be used to predict their disengagement from open source.
Conference Paper
Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI rising as a big success story in automated software engineering, it has received almost no attention from the research community. For example, how widely is CI used in practice, and what are some costs and benefits associated with CI? Without answering such questions, developers, tool builders, and researchers make decisions based on folklore instead of data. In this paper, we use three complementary methods to study the usage of CI in open-source projects. To understand which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from the most commonly used CI system. To understand why projects use or do not use CI, we surveyed 442 developers. With this data, we answered several key questions related to the usage, costs, and benefits of CI. Among our results, we show evidence that supports the claim that CI helps projects release more often, that CI is widely adopted by the most popular projects, as well as finding that the overall percentage of projects using CI continues to grow, making it important and timely to focus more research on CI.
Conference Paper
Software quality is one of the most important research sub-areas of software engineering. Hence, a plethora of research has focused on the prediction of software quality. Much of the software analytics and prediction work has proposed metrics, models and novel approaches that can predict quality with high levels of accuracy. However, adoption of such techniques remain low; one of the reasons for this low adoption of the current analytics and prediction technique is the lack of actionable and publicly available tools. We present Commit Guru, a language agnostic analytics and prediction tool that identifies and predicts risky software commits. Commit Guru is publicly available and is able to mine any GIT SCM repository. Analytics are generated at both, the project and commit levels. In addition, Commit Guru automatically identifies risky (i.e., bug-inducing) commits and builds a prediction model that assess the likelihood of a recent commit introducing a bug in the future. Finally, to facilitate future research in the area, users of Commit Guru can download the data for any project that is processed by Commit Guru with a single click. Several large open source projects have been successfully processed using Commit Guru. Commit Guru is available online at commit.guru. Our source code is also released freely under the MIT license.