Wei Fu’s research while affiliated with North Carolina State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Transcript of survey questions
Differential evolution. Pseudocode based on Storn’s algorithm (Storn and Price 1997)
The trends in number of contributors across 14 real-time OS projects since 2018. Zephyr (one in purple) has out-paced other similar software projects. (data source: The Apache Software Foundation)
Importance of 7 Indicators to Project Health (based on the survey)
The hyperparameters to be tuned in CART

+13

Predicting health indicators for open source projects (using hyperparameter optimization)
  • Article
  • Publisher preview available

June 2022

·

382 Reads

·

18 Citations

Empirical Software Engineering

Tianpei Xia

·

Wei Fu

·

Rui Shu

·

[...]

·

Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates. But that error rate can be greatly reduced using hyperparameter optimization. To the best of our knowledge, this is the largest study yet conducted, using recent data for predicting multiple health indicators of open-source projects. To facilitate open science (and replications and extensions of this work), all our materials are available online at https://github.com/arennax/Health_Indicator_Prediction.

View access options

Predicting Project Health for Open Source Projects (using the DECART Hyperparameter Optimizer)

June 2020

·

57 Reads

Software developed on public platforms are a source of data that can be used to make predictions about those projects. While the activity of a single developer may be random and hard to predict, when large groups of developers work together on software projects, the resulting behavior can be predicted with good accuracy. To demonstrate this, we use 78,455 months of data from 1,628 GitHub projects to make various predictions about the current status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like k-nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates (usually more than 50% wrong, sometimes over 130% wrong, median values). But that error rate can be greatly reduced using the DECART hyperparameter optimization. DECART is a differential evolution (DE) algorithm that tunes the CART data mining system to the particular details of a specific project. To the best of our knowledge, this is the largest study yet conducted, using the most recent data, for predicting multiple health indicators of open-source projects. Further, due to our use of hyperparameter optimization, it may be the most successful. Our predictions have less than 10% error (median value) which is much smaller than the errors seen in related work. Our results are a compelling argument for open-sourced development. Companies that only build in-house proprietary products may be cutting themselves off from the information needed to reason about those projects.

Citations (1)


... Understanding when a software project is in a healthy state remains a critical yet unsolved challenge in software development. While repositories provide extensive data about project activities, from code changes to community interactions, current approaches struggle to convert this wealth of information into actionable insights about project health [1,2]. This gap affects both practitioners managing projects and researchers studying software development. ...

Reference:

Introducing Repository Stability
Predicting health indicators for open source projects (using hyperparameter optimization)

Empirical Software Engineering