A preview of this full-text is provided by Springer Nature.
Content available from Empirical Software Engineering
This content is subject to copyright. Terms and conditions apply.
https://doi.org/10.1007/s10664-022-10171-0
Predicting health indicators for open source projects
(using hyperparameter optimization)
Tianpei Xia1·Wei Fu1·Rui Shu1·Rishabh Agrawal1·Tim Menzies1
Accepted: 17 March 2022
/
©The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
Software developed on public platform is a source of data that can be used to make pre-
dictions about those projects. While the individual developing activity may be random and
hard to predict, the developing behavior on project level can be predicted with good accu-
racy when large groups of developers work together on software projects. To demonstrate
this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions
about the recent status of those projects (as of April 2020). We find that traditional estima-
tion algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support
vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees
(CART) have high error rates. But that error rate can be greatly reduced using hyperparame-
ter optimization. To the best of our knowledge, this is the largest study yet conducted, using
recent data for predicting multiple health indicators of open-source projects. To facilitate
open science (and replications and extensions of this work), all our materials are available
online at https://github.com/arennax/Health Indicator Prediction.
Keywords Hyperparameter optimization ·Project health ·Machine learning
Communicated by: Federica Sarro
Tim Menzies
timm@ieee.org
Tianpei Xia
txia4@ncsu.edu
Wei F u
fuwei.ee@gmail.com
Rui Shu
rshu@ncsu.edu
Rishabh Agrawal
agrawa3@ncsu.edu
1Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Published online: 22 June 2022
Empirical Software Engineering (2022) 27: 122
Content courtesy of Springer Nature, terms of use apply. Rights reserved.