PreprintPDF Available

Story Points Changes in Agile Iterative Development

Authors:

Abstract

Story Points (SP) are an effort unit that is used to represent the relative effort of a work item. In Agile software development, SP allows a devel- opment team to estimate their delivery capacity and facilitate the sprint plan- ning activities. Although Agile embraces changes, SP changes after the sprint planning may negatively impact the sprint plan. To minimize the impact, there is a need to better understand the SP changes and an automated approach to predict the SP changes. Hence, to better understand the SP changes, we examine the prevalence, accuracy, and impact of information changes on SP changes. Through the analyses based on 19,349 work items spread across seven open-source projects, we find that on average, 10% of the work items have SP changes. These work items typically have SP value increased by 58%-100% rel- ative to the initial SP value when they were assigned to a sprint. We also find that the unchanged SP reflect the development time better than the changed SP. Our qualitative analysis shows that the work items with changed SP of- ten have the information changes relating to updating the scope of work. Our empirical results suggest that SP and the scope of work should be reviewed prior or during sprint planning to achieve a reliable sprint plan. Yet, it could be a tedious task to review all work items in the product (or sprint) backlog. Therefore, we develop a classifier to predict whether a work item will have SP changes after being assigned to a sprint. Our classifier achieves an AUC of 0.69-0.8, which is significantly better than the baselines. Our results suggest that to better manage and prepare for the unreliability in SP estimation, the team can leverage our insights and the classifier during the sprint planning. To facilitate future studies, we provide the replication package and the datasets, which are available online.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Story point estimation is a task to estimate the overall effort required to fully implement a product backlog item. Various estimation approaches (e.g., Planning Poker, Analogy, and expert judgment) are widely-used, yet they are still inaccurate and may be subjective, leading to ineffective sprint planning. Recent work proposed Deep-SE, a deep learning-based Agile story point estimation approach, yet it is still inaccurate, not transferable to other projects, and not interpretable. In this paper, we propose GPT2SP, a Transformer-based Agile Story Point Estimation approach. Our GPT2SP employs a GPT-2 pre-trained language model with a GPT-2 Transformer-based architecture, allowing our GPT2SP models to better capture the relationship among words while considering the context surrounding a given word and its position in the sequence and be transferable to other projects, while being interpretable. Through an extensive evaluation on 23,313 issues that span across 16 open-source software projects with 10 existing baseline approaches for within-and cross-project scenarios, our results show that our GPT2SP approach achieves a median MAE of 1.16, which is (1) 34%-57% more accurate than existing baseline approaches for within-project estimations; (2) 39%-49% more accurate than existing baseline approaches for cross-project estimations. The ablation study also shows that the GPT-2 architecture used in our approach substantially improves Deep-SE by 6%-47%, highlighting the significant advancement of the AI for Agile story point estimation. Finally, we develop a proof-of-concept tool to help practitioners better understand the most important words that contributed to the story point estimation of the given issue with the best supporting examples from past estimates. Our survey study with 16 Agile practitioners shows that the story point estimation task is perceived as an extremely challenging task. In addition, our AI-based story point estimation with explanations is perceived as more useful and trustworthy than without explanations, highlighting the practical need of our Explainable AI-based story point estimation approach.
Article
Full-text available
Replication studies increase our confidence in previous results when the findings are similar each time, and help mature our knowledge by addressing both internal and external validity aspects. However, these studies are still rare in certain software engineering fields. In this paper, we replicate and extend a previous study, which denotes the current state-of-the-art for multi-objective software effort estimation, namely CoGEE. We investigate the original research questions with an independent implementation and the inclusion of a more robust baseline (LP4EE), carried out by the first author, who was not involved in the original study. Through this replication, we strengthen both the internal and external validity of the original study. We also answer two new research questions investigating the effectiveness of CoGEE by using four additional evolutionary algorithms (i.e., IBEA, MOCell, NSGA-III, SPEA2) and a well-known Java framework for evolutionary computation, namely JMetal (rather than the previously used R software), which allows us to strengthen the external validity of the original study. The results of our replication confirm that: (1) CoGEE outperforms both baseline and state-of-the-art benchmarks statistically significantly (p < 0.001); (2) CoGEEs multi-objective nature makes it able to reach such a good performance; (3) CoGEEs estimation errors lie within claimed industrial human-expert-based thresholds. Moreover, our new results show that the effectiveness of CoGEE is generally not limited to nor dependent on the choice of the multi-objective algorithm. Using CoGEE with either NSGA-II, NSGA-III, or MOCell produces human competitive results in less than a minute. The Java version of CoGEE has decreased the running time by over 99.8% with respect to its R counterpart. We have made publicly available the Java code of CoGEE to ease its adoption, as well as, the data used in this study in order to allow for future replication and extension of our work.
Article
Full-text available
Scrum, the most popular agile method and project management framework, is widely reported to be used, adapted, misused, and abused in practice. However, not much is known about how Scrum actually works in practice, and critically, where, when, how and why it diverges from Scrum by the book. Through a Grounded Theory study involving semi-structured interviews of 45 participants from 30 companies and observations of five teams, we present our findings on how Scrum works in practice as compared to how it is presented in its formative books. We identify significant variations in these practices such as work breakdown, estimation, prioritization, assignment, the associated roles and artefacts, and discuss the underlying rationales driving the variations. Critically, we claim that not all variations are process misuse/abuse and propose a nuanced classification approach to understanding variations as standard, necessary, contextual, and clear deviations for successful Scrum use and adaptation.
Conference Paper
Automated techniques to estimate Story Points (SP) for user stories in agile software development came to the fore a decade ago. Yet, the state-of-the-art estimation techniques’ accuracy has room for improvement. In this paper, we present a new approach for SP estimation, based on analysing textual features of software issues by employing latent Dirichlet allocation (LDA) and clustering. We first use LDA to represent issue reports in a new space of generated topics. We then use hierarchical clustering to agglomerate issues into clusters based on their topic similarities. Next, we build estimation models using the issues in each cluster. Then, we find the closest cluster to the new coming issue and use the model from that cluster to estimate the SP. Our approach is evaluated on a dataset of 26 open source projects with a total of 31,960 issues and compared against both baselines and state-of-the-art SP estimation techniques. The results show that the estimation performance of our proposed approach is as good as the state-of-the-art. However, none of these approaches is statistically significantly better than more naive estimators in all cases, which does not justify their additional complexity. We therefore encourage future work to develop alternative strategies for story points estimation. The experimental data and scripts we used in this work are publicly available to allow for replication and extension.
Article
"Optimal cutpoints" for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in "optimal" cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.
Article
Effort estimation is an essential task in a software project as it helps to establish feasible plans for the implementation of a project. It largely influences success or failure of the project. Project planning becomes more efficient with accurate effort estimates, thus providing a number of benefits to the organization. Estimating effort in agile projects has been a challenging task for researchers. Several studies exist in that domain. While some have considered people-related factors, others have catered for project-related factors for estimating effort. Others have adopted machine learning (ML) techniques to produce an accurate estimation. This paper presents a model to estimate and predict effort in a sprint using ML techniques while considering various factors that affect a sprint. The model has been evaluated using various regression algorithms, namely linear regression, K-nearest neighbor, decision tree, polynomial kernel, radius basis function and multi-layer perception (MLP). The model has produced more reliable estimates, with low error values using MLP algorithm.