Article

A Radial Basis Function Neural Network for Predicting the Effort of Software Projects Individually Developed in Laboratory Learning Environments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Prediction techniques have been applied for predicting dependent variables related to Higher Education students such as dropout, grades, course selection, and satisfaction. In this research, we propose a prediction technique for predicting the effort of software projects individually developed by graduate students. In accordance with the complexity of a software project, it can be developed among teams, by a team or even at individual level. The teaching and training about development effort prediction of software projects represents a concern in environments related to academy and industry because underprediction causes cost overruns, whereas overprediction often involves missed financial opportunities. Effort prediction techniques of individually developed projects have mainly been based on expert judgment or based on mathematical models. This research proposes the application of a mathematical model termed Radial Basis function Neural Network (RBFNN). The hypothesis to be tested is the following: effort prediction accuracy of a RBFNN is statistically better than that obtained from a Multiple Linear Regression (MLR). The projects were developed by following a disciplined development process in controlled environments. The RBFNNandMLRwere trained from a data set of 328 projects developed by 82 students between the years 2005 and 2010, then, the models were tested using a data set of 116 projects developed by 29 students between the years 2011 and first semester of 2012. Results suggest that aRBFNNhaving as independent variables new and changed code, reused code and programming language experience of students can be used at the 95.0% confidence level for predicting the development effort of individual projects when they have been developed based upon a disciplined process in academic environments.
Content may be subject to copyright.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The data used in this study were obtained from a controlled experiment where students developed their projects following a process specifically designed to develop software projects at disciplined manner in academic environments [33]. In our study, all students belonged to an HE graduate programme related to computer science. ...
... The data set of software projects used in our study were obtained by following a development process specifically designed to be applied by HE graduate students [33]. This process has been used to predict development effort [33] and productivity [22][23][24]. ...
... The data set of software projects used in our study were obtained by following a development process specifically designed to be applied by HE graduate students [33]. This process has been used to predict development effort [33] and productivity [22][23][24]. ...
Article
Full-text available
Productivity prediction of a software engineer is necessary to determine whether corrective actions are needed and to identify improvement options to produce better results. It can be performed from abstraction levels such as organization, team project, individual project, or task. Software engineering education and training has approached its efforts at individual level. In this study, we propose the application of a data mining technique named support vector regression (SVR) to predict the productivity of individuals (i.e., graduate students). Its prediction accuracy was compared to that of a statistical regression model, and to those of two neural networks. After applying a Wilcoxon statistical test, results suggest that an SVR with linear kernel using new and changed lines of code, and programming language experience as independent variables, could be used for predicting the individual productivity of a Higher Education graduate student, when software projects coded in either Java or C++ programming languages, have been developed by following a disciplined process specifically proposed for academic environments.
Article
The financial market is often unpredictable and extremely susceptible to political, economic and other factors. How to achieve accurate predictions of financial time series is very important for scientific research and financial enterprise management. Based on this, this article takes the application of the improved RBF neural network(NN) algorithm in financial time series forecasting as the research object, and explores how to use the improved RBF NN algorithm to predict the stock market price, with a view to reducing investment risks and increasing returns for the majority of stock investors to provide help. This article uses the stock market prices of three listed companies in May 2019 as the data samples for this survey, including 72 training sample data and 21 test sample data. These three stocks were predicted by using the improved RBF NN algorithm Experiments, the experimental results show that the prediction errors of the improved RBF NN algorithm for the three stocks are 2.14%, 0.69% and 1.47%, while the traditional RBF NN algorithm’s prediction errors for the stocks are 5.74%, 2.38% and 11.37%. This shows that the improved algorithm is significantly more accurate and more effective than traditional algorithms. Therefore, the application of the improved RBF NN algorithm in financial time series prediction will be more extensive in the future.
Article
Full-text available
Software verification and validation (V & V) is one of the significant areas of software engineering for developing high quality software. It is also becoming part of the curriculum of a universities' software and computer engineering departments. This paper reports the experience of teaching undergraduate software engineering students and discusses the main problems encountered during the course, along with suggestions to overcome these problems. This study covers all the different topics generally covered in the software verification and validation course, including static verification and validation. It is found that prior knowledge about software quality concepts and good programming skills can help students to achieve success in this course. Further, team work can be chosen as a strategy, since it facilitates students' understanding and motivates them to study. It is observed that students were more successful in white box testing than in black box testing.
Article
Full-text available
Capstone projects are a common part of engineering education. In a capstone project, learning takes place mainly through solving problems during the project. Therefore, understanding what problems the capstone project teams encounter increases understanding on what the students can learn. We collected problems encountered by eleven capstone project teams in a software development project course at Aalto University. Each team used a root cause analysis method twice during their project to identify the problems and their cause-and-effect relationships. The number of identified problems was 103–247 per team.We analysed the problems qualitatively and summarized them under the following four main topics: system functionality, system quality, communication and taking responsibility. The problems created opportunities for learning about software engineering. However, in some teams the problems worsened so much that they created educationally detrimental situations. For example, learning a new programming language from scratch is valuable for a student, but it may start taking too much attention from many other educational aspects of the project. We give suggestions for mitigating the educationally detrimental situations in capstone projects. The suggestions include an iterative development process, team formation practicalities, reasonable project topics, customer education, instructions on selecting and adopting crucial tools, emphasizing learning, and adding control to ensure the use of the desired working practices. Our results help the teachers of similar courses in evaluating the potential that their courses have on reaching specific educational goals and in improving their courses by decreasing educationally detrimental situations.
Article
Full-text available
An early warning system can help to identify at-risk students, or predict student learning performance by analyzing learning portfolios recorded in a learning management system (LMS). Although previous studies have shown the applicability of determining learner behaviors from an LMS, most investigated datasets are not assembled from online learning courses or from whole learning activities undertaken on courses that can be analyzed to evaluate students' academic achievement. Previous studies generally focus on the construction of predictors for learner performance evaluation after a course has ended, and neglect the practical value of an "early warning" system to predict at-risk students while a course is in progress. We collected the complete learning activities of an online undergraduate course and applied data-mining techniques to develop an early warning system. Our results showed that, time-dependent variables extracted from LMS are critical factors for online learning. After students have used an LMS for a period of time, our early warning system effectively characterizes their current learning performance. Data-mining techniques are useful in the construction of early warning systems; based on our experimental results, classification and regression tree (CART), supplemented by AdaBoost is the best classifier for the evaluation of learning performance investigated by this study.
Article
Full-text available
Summary The prediction of software development effort has been focused mostly on the accuracy comparison of algorithmic models rather than on the suitability of the approach for building software effort prediction systems. Several estimation techniques have been developed to predict the Effort estimation. In this paper the main focus is on investigating the accuracy of the prediction of effort using RBFN network which can be used for functional approximation. The use of RBFN to estimate software development effort requires the determination of its architecture parameters according to the characteristics of COCOMO, especially the number of input neurons, no of hidden neurons, centers ci, width σ1 and weight wi. In the aspect of learning, the RBFN network is much faster than other network because the learning process in this network has two stages and both stages can be made efficient by appropriate learning algorithms. The proposed network is empirically validated using COCOMO'81 dataset which is used to train and test the designed RBFN network and found that the RBFN designed with the K-means clustering algorithm performs better, in terms cost estimation accuracy.
Article
Full-text available
ContextSoftware engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results.Objective To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems.MethodA new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random ‘predictions’, that is guessing, and (3) calculation of effect sizes.ResultsPreviously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing.Conclusions Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.
Article
Full-text available
Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment.
Article
Full-text available
Software estimation is a tedious and daunting task in project management and software development. Software estimators are notorious in predicting software effort and they have been struggling in the past decades to provide new models to enhance software estimation. The most critical and crucial part of software estimation is when estimation is required in the early stages of the software life cycle where the problem to be solved has not yet been completely revealed. This paper presents a novel log-linear regression model based on the use case point model (UCP) to calculate the software effort based on use case diagrams. A fuzzy logic approach is used to calibrate the productivity factor in the regression model. Moreover, a multilayer perceptron (MLP) neural network model was developed to predict software effort based on the software size and team productivity. Experiments show that the proposed approach outperforms the original UCP model. Furthermore, a comparison between the MLP and log-linear regression models was conducted based on the size of the projects. Results demonstrate that the MLP model can surpass the regression model when small projects are used, but the log-linear regression model gives better results when estimating larger projects
Article
Full-text available
In Iran, admission to medical school is based solely on the results of the highly competitive, nationwide Konkoor examination. This paper examines the predictive validity of Konkoor scores, alone and in combination with high school grade point averages (hsGPAs), for the academic performance of public medical school students in Iran. This study followed the cohort of 2003 matriculants at public medical schools in Iran from entrance through internship. The predictor variables were Konkoor total and subsection scores and hsGPAs. The outcome variables were (1) Comprehensive Basic Sciences Exam (CBSE) scores; (2) Comprehensive Pre-Internship Exam (CPIE) scores; and (3) medical school grade point averages (msGPAs) for the courses taken before internship. Pearson correlation and regression analyses were used to assess the relationships between the selection criteria and academic performance. There were 2126 matriculants (1374 women and 752 men) in 2003. Among the outcome variables, the CBSE had the strongest association with the Konkoor total score (r = 0.473), followed by msGPA (r = 0.339) and the CPIE (r = 0.326). While adding hsGPAs to the Konkoor total score almost doubled the power to predict msGPAs (R2 = 0.225), it did not have a substantial effect on CBSE or CPIE prediction. The Konkoor alone, and even in combination with hsGPA, is a relatively poor predictor of medical students' academic performance, and its predictive validity declines over the academic years of medical school. Care should be taken to develop comprehensive admissions criteria, covering both cognitive and non-cognitive factors, to identify the best applicants to become "good doctors" in the future. The findings of this study can be helpful for policy makers in the medical education field.
Article
Full-text available
Building computerized mechanisms that will accurately, immediately and continually recognize a learner’s affective state and activate an appropriate response based on integrated pedagogical models is becoming one of the main aims of artificial intelligence in education. The goal of this paper is to demonstrate how the various kinds of evidence could be combined so as to optimize inferences about affective states during an online self-assessment test. A formula-based method has been developed for the prediction of students’ mood, and it was tested using data emanated from experiments made with 153 high school students from three different regions of a European country. The same set of data is analyzed developing a neural network method. Furthermore, the formula-based method is used as an input parameter selection module for the neural network method. The results vindicate to a great degree the formula-based method’s assumptions about student’s mood and indicate that neural networks and conventional algorithmic methods should not be in competition but complement each other for the development of affect recognition systems. Moreover, it becomes apparent that neural networks can provide an alternative for and improvements over tutoring systems’ affect recognition methods.
Article
Full-text available
Discipline is an essential prerequisite for the development of large and complex software-intensive systems. However, discipline is also important on the level of individual development activities. A major challenge for teaching disciplined software development is to enable students to experience the benefits of discipline and to overcome the gap between real professional scenarios and scenarios used in software engineering university courses. Students often do not have the chance to internalize what disciplined software development means at both the individual and collaborative level. Therefore, students often feel overwhelmed by the complexity of disciplined development and later on tend to avoid applying the underlying principles. The Personal Software Process (PSP) and the Team Software Process (TSP) are tools designed to help software engineers control, manage, and improve the way they work at both the individual and collaborative level. Both tools have been considered effective means for introducing discipline into the conscience of professional developers. In this paper, we address the meaning of disciplined software development, its benefits, and the challenges of teaching it. We present a quantitative study that demonstrates the benefits of disciplined software development on the individual level and provides further experience and recommendations with PSP and TSP as teaching tools.
Conference Paper
Full-text available
Background: Many cost estimation papers are based on finding a "new" estimation method, trying out the method on one or two past datasets and "proving" that the new method is better than linear regression. Aim: This paper aims to explain why this approach to model comparison is often invalid and to suggest that the PROMISE repository may be making things worse. Method: We identify some of the theoretical problems with studies that compare different estimation models. We review some of the commonly used datasets from the viewpoint of the reliability of the data and the validity of the proposed linear regression models. Discussion points: It is invalid to select one or two datasets to "prove" the validity of a new technique because we cannot be sure that, of the many published datasets, those chosen are the only ones that favour the new technique. When new models are compared with regression models, researchers need to understand how to use regression analysis appropriately. The use of linear regression presupposes: a linear relationship between dependent and independent variables, no significant outliers, no significant skewness, no relationship between the variance of the dependent variable and the magnitude of the variable. If all these conditions are not true, standard statistical practice is to use a robust regression or transform the data. The logarithmic transformation is appropriate in many cases, and for the Desharnais dataset gives better results than the regression model presented in the PROMISE repository. Conclusions: Simplistic studies comparing data intensive methods with linear regression will be scientifically valueless, if the regression techniques are applied incorrectly. They are also suspect if only a small number of datasets are used and the selection of those datasets is not scientifically justified.
Article
To get a better prediction of costs, schedule, and the risks of a software project, it is necessary to have a more accurate prediction of its development effort. Among the main prediction techniques are those based on mathematical models, such as statistical regressions or machine learning (ML). The ML models applied to predicting the development effort have mainly based their conclusions on the following weaknesses: (1) using an accuracy criterion which leads to asymmetry, (2) applying a validation method that causes a conclusion instability by randomly selecting the samples for training and testing the models, (3) omitting the explanation of how the parameters for the neural networks were determined, (4) generating conclusions from models that were not trained and tested from mutually exclusive data sets, (5) omitting an analysis of the dependence, variance and normality of data for selecting the suitable statistical test for comparing the accuracies among models, and (6) reporting results without showing a statistically significant difference. In this study, these six issues are addressed when comparing the prediction accuracy of a radial Basis Function Neural Network (RBFNN) with that of a regression statistical (the model most frequently compared with ML models), to feedforward multilayer perceptron (MLP, the most commonly used in the effort prediction of software projects), and to general regression neural network (GRNN, a RBFNN variant). The hypothesis tested is the following: the accuracy of effort prediction for RBFNN is statistically better than the accuracy obtained from a simple linear regression (SLR), MLP and GRNN when adjusted function points data, obtained from software projects, is used as the independent variable. Samples obtained from the International Software Benchmarking Standards Group (ISBSG) Release 11 related to new and enhanced projects were used. The models were trained and tested from a leave-one-out cross-validation method. The criteria for evaluating the models were based on Absolute Residuals and by a Friedman statistical test. The results showed that there was a statistically significant difference in the accuracy among the four models for new projects, but not for enhanced projects. Regarding new projects, the accuracy for RBFNN was better than for a SLR at the 99% confidence level, whereas the MLP and GRNN were better than for a SLR at the 90% confidence level.
Article
Expert-based effort prediction in software projects can be taught, beginning with the practices learned in an academic environment in courses designed to encourage them. However, the length of such courses is a major concern for both industry and academia. Industry has to work without its employees while they are taking such a course, and academic institutions find it hard to fit the course into an already tight schedule. In this research, the set of Personal Software Process (PSP) practices is reordered and the practices are distributed among fewer assignments, in an attempt to address these concerns. This study involved 148 practitioners taking graduate courses who developed 1,036 software course assignments. The hypothesis on which it is based is the following: When the activities in the original PSP set are reordered into fewer assignments, the result is expert-based effort prediction that is statistically significantly better.
Article
The implementation of teaching performance assessments has prompted a range of concerns. Some educators question whether these assessments provide information beyond what university supervisors gain through their formative evaluations and classroom observations of candidates. This research examines the relationship between supervisors’ predictions and candidates’ performance on a summative assessment based on a capstone teaching event, the Performance Assessment for California Teachers. The study, based on records for 337 teacher candidates over a 2-year period, specifically addresses the following questions: To what extent do university supervisors predict candidates’ total scores? On which questions and categories of the assessment do supervisors most accurately predict their candidates’ scores? Do supervisors predict scores more accurately for high- and low-performing candidates? The findings indicate that university supervisors’ perspectives did not always correspond with outcomes on the performance assessment, particularly for high and low performers.
Article
Student satisfaction is important in the evaluation of distance education courses as it is related to the quality of online programs and student performance. Interaction is a critical indicator of student satisfaction; however, its impact has not been tested in the context of other critical student- and class-level predictors. In this study, we tested a regression model for student satisfaction involving student characteristics (three types of interaction, Internet self-efficacy, and self-regulated learning) and class-level predictors (course category and academic program). Data were collected in a sample of 221 graduate and undergraduate students responding to an online survey. The regression model was tested using hierarchical linear modeling (HLM). Learner–instructor interaction and learner–content interaction were significant predictors of student satisfaction but learner–learner interaction was not. Learner–content interaction was the strongest predictor. Academic program category moderated the effect of learner–content interaction on student satisfaction. The effect of learner–content interaction on student satisfaction was stronger in Instructional Technology and Learning Sciences than in psychology, physical education or family, consumer, and human development. In sum, the results suggest that improvements in learner–content interaction yield most promise in enhancing student satisfaction and that learner–learner interaction may be negligible in online course settings.
Article
Parametric cost estimation models need to be continuously calibrated and improved to assure more accurate software estimates and reflect changing software development contexts. Local calibration by tuning a subset of model parameters is a frequent practice when software organizations adopt parametric estimation models to increase model usability and accuracy. However, there is a lack of understanding about the cumulative effects of such local calibration practices on the evolution of general parametric models over time.
Article
Predicting student academic performance has long been an important research topic in many academic disciplines. The present study is the first study that develops and compares four types of mathematical models to predict student academic performance in engineering dynamics – a high-enrollment, high-impact, and core course that many engineering undergraduates are required to take. The four types of mathematical models include the multiple linear regression model, the multilayer perception network model, the radial basis function network model, and the support vector machine model. The inputs (i.e., predictor variables) of the models include student's cumulative GPA, grades earned in four pre-requisite courses (statics, calculus I, calculus II, and physics), and scores on three dynamics mid-term exams (i.e., the exams given to students during the semester and before the final exam). The output of the models is students' scores on the dynamics final comprehensive exam. A total of 2907 data points were collected from 323 undergraduates in four semesters. Based on the four types of mathematical models and six different combinations of predictor variables, a total of 24 predictive mathematical models were developed from the present study. The analysis reveals that the type of mathematical model has only a slight effect on the average prediction accuracy (APA, which indicates on average how well a model predicts the final exam scores of all students in the dynamics course) and on the percentage of accurate predictions (PAP, which is calculated as the number of accurate predictions divided by the total number of predictions). The combination of predictor variables has only a slight effect on the APA, but a profound effect on the PAP. In general, the support vector machine models have the highest PAP as compared to the other three types of mathematical models. The research findings from the present study imply that if the goal of the instructor is to predict the average academic performance of his/her dynamics class as a whole, the instructor should choose the simplest mathematical model, which is the multiple linear regression model, with student's cumulative GPA as the only predictor variable. Adding more predictor variables does not help improve the average prediction accuracy of any mathematical model. However, if the goal of the instructor is to predict the academic performance of individual students, the instructor should use the support vector machine model with the first six predictor variables as the inputs of the model, because this particular predictor combination increases the percentage of accurate predictions, and most importantly, allows sufficient time for the instructor to implement subsequent educational interventions to improve student learning.
Article
Project courses are an important component of some software engineering curricula. They are capstone projects where teams of students experience the various practices for developing software. Instructors play the roles of coaches in guiding the students during the various phases of their project. Nowadays, software development processes fall into two major paradigms. The Disciplined software process paradigm defines best practices and their relationships on the basis of roles, activities and artifacts. The Agile process paradigm, which is based on values of simplicity, communication, and feedback, uses simple practices to enable a team to tune the practices to their unique situation. The two process paradigms have great value in general and one is likely to be more efficient than the other in any specific development project. However, it could be interesting to find out how each of these process paradigms performs in learning environments. To achieve this we conducted an observational study in an academic environment. Six teams of four students developed their own versions of a software product based on the same requirements. Three teams used a Disciplined process and three teams used an Agile process. This study is based on four observations: the quality of the implementation of the requirement, the total project effort, the process activity effort and the product size. The data to support each of these observations are presented. In this study, however, the Disciplined paradigm provides less project implementation with a better realization of quality. This study indicates that the more efficient approach for capstone projects for inexperienced students in software engineering would be a Disciplined process paradigm.
Article
The authors examined factors associated with membership of university graduates in the dues-based alumni association of their alma mater. Logistic regression was used to analyze variables that came from survey responses and from an existing database. All participants had attended a public doctoral-granting research university in the South. Graduates were more likely to be alumni association members if they: (a) were donors, (b) had a telephone number on record, (c) were relatively older, (d) had positive experiences as alumni, (e) had positive perceptions of the alumni association, (f) were more frequently involved with the alma mater, and (g) were aware of other members of the alumni association. Alumni were less likely to be alumni association members if they were: (a) employed at the alma mater, (b) had a higher level of degree attainment, (c) had positive feelings about student experiences, and (d) had positive university perceptions. Empirical testing confirmed the utility of several variables of the prediction model in identifying the best prospects for alumni association membership. KeywordsAlumni association–Membership–Members–Dues–Dues-based–Alumni giving–Fundraising–Friend-raising–Alumni–Alma mater
Article
In this work we present an evolutionary morphological approach to solve the software development cost estimation (SDCE) problem. The proposed approach consists of a hybrid artificial neuron based on framework of mathematical morphology (MM) with algebraic foundations in the complete lattice theory (CLT), referred to as dilation-erosion perceptron (DEP). Also, we present an evolutionary learning process, called DEP(MGA), using a modified genetic algorithm (MGA) to design the DEP model, because a drawback arises from the gradient estimation of morphological operators in the classical learning process of the DEP, since they are not differentiable in the usual way. Furthermore, an experimental analysis is conducted with the proposed model using five complex SDCE problems and three well-known performance metrics, demonstrating good performance of the DEP model to solve SDCE problems.
Article
Student motivation is an important factor for the successful completion of an e-learning course. Detecting motivational problems for particular students at an early stage of a course opens the door for instructors to be able to provide additional motivating activities for these students. This paper analyzes how the behavior patterns in the interaction of each particular student with the contents and services in a learning management system (LMS) can be used to predict student motivation and if this student motivation can be used to predict the successful completion of an e-learning course. The interactions of 180 students of six different universities taking a course in three consecutive years are analyzed.
Article
Software development has become an essential investment for many organizations. Software engineering practitioners have become more and more concerned about accurately predicting the cost and quality of software product under development. Accurate estimates are desired but no model has proved to be successful at effectively and consistently predicting software development cost. In this paper, we propose the use of wavelet neural network (WNN) to forecast the software development effort. We used two types of WNN with Morlet function and Gaussian function as transfer function and also proposed threshold acceptance training algorithm for wavelet neural network (TAWNN). The effectiveness of the WNN variants is compared with other techniques such as multilayer perceptron (MLP), radial basis function network (RBFN), multiple linear regression (MLR), dynamic evolving neuro-fuzzy inference system (DENFIS) and support vector machine (SVM) in terms of the error measure which is mean magnitude relative error (MMRE) obtained on Canadian financial (CF) dataset and IBM data processing services (IBMDPS) dataset. Based on the experiments conducted, it is observed that the WNN-Morlet for CF dataset and WNN-Gaussian for IBMDPS outperformed all the other techniques. Also, TAWNN outperformed all other techniques except WNN.
Article
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to accurately classify some e-learning students, whereas another may succeed, three decision schemes, which combine in different ways the results of the three machine learning techniques, were also tested. The method was examined in terms of overall accuracy, sensitivity and precision and its results were found to be significantly better than those reported in relevant literature.
Article
As software becomes more complex and its scope dramatically increases, the importance of research on developing methods for estimating software development efforts has perpetually increased. Such accurate estimation has a prominent impact on the success of projects. Out of the numerous methods for estimating software development efforts that have been proposed, line of code (LOC)-based constructive cost model (COCOMO), function point-based regression model (FP), neural network model (NN), and case-based reasoning (CBR) are among the most popular models. Recent research has tended to focus on the use of function points (FPs) in estimating the software development efforts, however, a precise estimation should not only consider the FPs, which represent the size of the software, but should also include various elements of the development environment for its estimation. Therefore, this study is designed to analyze the FPs and the development environments of recent software development cases. The primary purpose of this study is to propose a precise method of estimation that takes into account and places emphasis on the various software development elements. This research proposes and evaluates a neural network-based software development estimation model.
Article
This paper provides a comparative study on support vector regression (SVR), radial basis functions neural networks (RBFNs) and linear regression for estimation of software project effort. We have considered SVR with linear as well as RBF kernels. The experiments were carried out using a dataset of software projects from NASA and the results have shown that SVR significantly outperforms RBFNs and linear regression in this task.
Article
Neural networks are being used in areas of prediction and classification, the areas where statistical methods have traditionally been used. Both the traditional statistical methods and neural networks are looked upon as competing model-building techniques in literature. This paper carries out a comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications. Tabular presentations highlighting the important features of these articles are also provided. This study aims to give useful insight into the capabilities of neural networks and statistical methods used in different kinds of applications.
Article
Estimating the amount of effort required for developing an information system is an important project management concern. In recent years, a number of studies have used neural networks in various stages of software development. This study compares the prediction performance of multilayer perceptron and radial basis function neural networks to that of regression analysis. The results of the study indicate that when a combined third generation and fourth generation languages data set were used, the neural network produced improved performance over conventional regression analysis in terms of mean absolute percentage error.
Article
ContextIn training disciplined software development, the PSP is said to result in such effect as increased estimation accuracy, better software quality, earlier defect detection, and improved productivity. But a systematic mechanism that can be easily adopted to assess and interpret PSP effect is scarce within the existing literature.ObjectiveThe purpose of this study is to explore the possibility of devising a feasible assessment model that ties up critical software engineering values with the pertinent PSP metrics.MethodA systematic review of the literature was conducted to establish such an assessment model (we called a Plan–Track–Review model). Both mean and median approaches along with a set of simplified procedures were used to assess the commonly accepted PSP training effects. A set of statistical analyses further followed to increase understanding of the relationships among the PSP metrics and to help interpret the application results.ResultsBased on the results of this study, PSP training effect on the controllability, manageability, and reliability of a software engineer is quite positive and largely consistent with the literature. However, its effect on one’s predictability on project in general (and on project size in particular) is not implied as said in the literature. As for one’s overall project efficiency, our results show a moderate improvement. Our initial finding also suggests that a prior stage PSP effect could have an impact on later stage training outcomes.ConclusionIt is concluded that this Plan–Track–Review model with the associated framework can be used to assess PSP effect regarding a disciplined software development. The generated summary report serves to provide useful feedback for both PSP instructors and students based on internal as well as external standards.
Article
Producing accurate and reliable project cost estimations at an early stage of a project’s life cycle remains a substantial challenge in the information technology field. This research benchmarks the performance of various approaches to estimating IT project effort and duration. Empirical data were gathered from various “real-world” organizations including several prominent Israeli high-tech companies as well as from the International Software Benchmarking Standards Group (ISBSG) IT project database. The study contrasts two types of models that have been employed to estimate project duration and effort separately: linear regression estimation models and models deriving from a more novel approach based on artificial neural networks (ANNs).
Article
Regression analysis to generate predictive equations for software development effort estimation has recently been complemented by analyses using less common methods such as fuzzy logic models. On the other hand, unless engineers have the capabilities provided by personal training, they cannot properly support their teams or consistently and reliably produce quality products. In this paper, an investigation aimed to compare personal Fuzzy Logic Models (FLM) with a Linear Regression Model (LRM) is presented. The evaluation criteria were based mainly upon the magnitude of error relative to the estimate (MER) as well as to the mean of MER (MMER). One hundred five small programs were developed by thirty programmers. From these programs, three FLM were generated to estimate the effort in the development of twenty programs by seven programmers. Both the verification and validation of the models were made. Results show a slightly better predictive accuracy amongst FLM and LRM for estimating the development effort at personal level when small programs are developed.
Article
Medical schools continue to seek robust ways to select students with the greatest aptitude for medical education, training and practice. Tests of general cognition are used in combination with markers of prior academic achievement and other tools, although their predictive validity is unknown. This study compared the predictive validity of the Undergraduate Medicine and Health Sciences Admission Test (UMAT), the admission grade point average (GPA), and a combination of both, on outcomes in all years of two medical programmes. Subjects were students (n = 1346) selected since 2003 using UMAT scores and attending either of New Zealand's two medical schools. Regression models incorporated demographic data, UMAT scores, admission GPA and performance on routine assessments. Despite the different weightings of UMAT used in selection at the two institutions and minor variations in student demographics and programmes, results across institutions were similar. The net predictive power of admission GPA was highest for outcomes in Years 2 and 5 of the 6-year programme, accounting for 17-35% of the variance; UMAT score accounted for < 10%. The highest predictive power of the UMAT score was 9.9% for a Year 5 written examination. Combining UMAT score with admission GPA improved predictive power slightly across all outcomes. Neither UMAT score nor admission GPA predicted outcomes in the final trainee intern year well, although grading bands for this year were broad and numbers smaller. The ability of the general cognitive test UMAT to predict outcomes in major assessments within medical programmes is relatively minor in comparison with that of the admission GPA, but the UMAT score adds a small amount of predictive power when it is used in combination with the GPA. However, UMAT scores may predict outcomes not studied here, which underscores the need for further validation studies in a range of settings.
Conference Paper
The software project effort estimation is an important aspect of software engineering practices. The improvement in accuracy of estimations is a topic that still remains as one of the greatest challenges of software engineering and computer science in general. In this work, the effort estimation for shortscale software projects, developed in academic setting, is modeled by two techniques: statistical regression and neural network. Two groups of software projects were made. One group of projects was used to calculate linear regression parameters and to train a neural network. The two models were then compared on both groups, the one used for their calculation and the other that was not used before. The accuracy of estimates was measured by using the magnitude of error relative to the estimate (MER) for each project and its mean MMER over each group of projects. The hypothesis accepted in this paper suggested that a feed forward neural network could be used for predicting short-scale software projects.
Conference Paper
In this research a general regression neural network (GRNN) was applied for estimating the development effort in software projects that have been developed in laboratory learning environments. The independent variables of the GRNN were two size measures as well as a developer measure. This GRNN was trained from a dataset of projects developed from the year 2005 to the year 2008 and then this GRNN was validated by estimating the effort of a new dataset integrated by projects developed from the year 2009 o the first months of the year 2010. Accuracy results from the GRNN model were compared with a statistical regression model. Results suggest that a GRNN could be used for estimating the development effort of software projects when two kinds of lines of code as well as the programming language experience of developers are used as independent variables.
Article
In recent years, grey relational analysis (GRA), a similarity-based method, has been proposed and used in many applications. However, we found that most traditional GRA methods only consider nonweighted similarity for predicting software development effort. In fact, nonweighted similarity may cause biased predictions, because each feature of a project may have a different degree of relevance to the development effort. Therefore, this paper proposes six weighted methods, including nonweighted, distance-based, correlative, linear, nonlinear, and maximal weights, to be integrated into GRA for software effort estimation. Numerical examples and sensitivity analyses based on four public datasets are used to show the performance of the proposed methods. The experimental results indicate that the weighted GRA can improve estimation accuracy and reliability from the nonweighted GRA. The results also demonstrate that the weighted GRA performs better than other estimation techniques and published results. In summary, we can conclude that weighted GRA can be a viable and alternative method for predicting software development effort.
Article
ContextSoftware development effort estimation (SDEE) is the process of predicting the effort required to develop a software system. In order to improve estimation accuracy, many researchers have proposed machine learning (ML) based SDEE models (ML models) since 1990s. However, there has been no attempt to analyze the empirical evidence on ML models in a systematic way.Objective This research aims to systematically analyze ML models from four aspects: type of ML technique, estimation accuracy, model comparison, and estimation context.Method We performed a systematic literature review of empirical studies on ML model published in the last two decades (1991–2010).ResultsWe have identified 84 primary studies relevant to the objective of this research. After investigating these studies, we found that eight types of ML techniques have been employed in SDEE models. Overall speaking, the estimation accuracy of these ML models is close to the acceptable level and is better than that of non-ML models. Furthermore, different ML models have different strengths and weaknesses and thus favor different estimation contexts.ConclusionML models are promising in the field of SDEE. However, the application of ML models in industry is still limited, so that more effort and incentives are needed to facilitate the application of ML models. To this end, based on the findings of this review, we provide recommendations for researchers as well as guidelines for practitioners.
Article
The increasing popularity of e-learning has created a need for accurate student achievement prediction mechanisms, allowing instructors to improve the efficiency of their courses by addressing specific needs of their students at an early stage. In this paper, a student achievement prediction method applied to a 10-week introductory level e-learning course is presented. The proposed method uses multiple feed-forward neural networks to dynamically predict students' final achievement and to cluster them in two virtual groups, according to their performance. Multiple-choice test grades were used as the input data set of the networks. This form of test was preferred for its objectivity. Results showed that accurate prediction is possible at an early stage, more specifically at the third week of the 10-week course. In addition, when students were clustered, low misplacement rates demonstrated the adequacy of the approach. The results of the proposed method were compared against those of linear regression and the neural-network approach was found to be more effective in all prediction stages. The proposed methodology is expected to support instructors in providing better educational services as well as customized assistance according to students' predicted level of performance.
Article
Although function points (FP) are considered superior to source lines of code (SLOC) for estimating software size and monitoring developer productivity, practitioners still commonly use SLOC. One reason for this is that individuals who fill different roles on a development team, such as managers and developers, may perceive the benefits of FP differently. We conducted a survey to determine whether a perception gap exists between managers and developers for FP and SLOC across several desirable properties of software measures. Results suggest managers and developers perceive the benefits of FP differently and indicate that developers better understand the benefits of using FP than managers.
Article
Software development effort prediction is considered in several international software processes as the Capability Maturity Model-Integrated (CMMi), by ISO-15504 as well as by ISO/IEC 12207. In this paper, data of two kinds of lines of code gathered from programs developed with practices based on the Personal Software Process (PSP) were used as independent variables in three models for estimating and predicting the development effort. Samples of 163 and 80 programs were used for verifying and validating, respectively, the models. The prediction accuracy comparison among a multiple linear regression, a general regression neural network, and a fuzzy logic model was made using as criteria the magnitude of error relative to the estimate (MER) and mean square error (MSE). Results accepted the following hypothesis: effort prediction accuracy of a general regression neural network is statistically equal than those obtained by a fuzzy logic model as well as by a multiple linear regression, when new and change code and reused code obtained from short-scale programs developed with personal practices are used as independent variables.
Article
This paper summarizes several classes of software cost estimation models and techniques: parametric models, expertise‐based techniques, learning‐oriented techniques, dynamics‐based models, regression‐based models, and composite‐Bayesian techniques for integrating expertise‐based and regression‐based models. Experience to date indicates that neural‐net and dynamics‐based techniques are less mature than the other classes of techniques, but that all classes of techniques are challenged by the rapid pace of change in software technology. The primary conclusion is that no single technique is best for all situations, and that a careful comparison of the results of several approaches is most likely to produce realistic estimates.
Article
We have in previous studies reported our findings and concern about the reliability and validity of the evaluation procedures used in comparative studies on competing effort prediction models. In particular, we have raised concerns about the use of accuracy statistics to rank and select models. Our concern is strengthened by the observed lack of consistent findings. This study offers more insights into the causes of conclusion instability by elaborating on the findings of our previous work concerning the reliability and validity of the evaluation procedures. We show that model selection based on the accuracy statistics MMRE, MMER, MBRE, and MIBRE contribute to conclusion instability as well as selection of inferior models. We argue and show that the evaluation procedure must include an evaluation of whether the functional form of the prediction model makes sense to better prevent selection of inferior models.
Article
Software engineering is human intensive. Thus, it is important to understand and evaluate the value of different types of experiences, and their relation to the quality of the developed software. Many job advertisements focus on requiring knowledge of, for example, specific programming languages. This may seem sensible at first sight, but is it really possible to capture software development performance using this kind of simple measure? On the other hand, maybe it is sufficient to have general knowledge in programming and then it is enough to learn a specific language within the new job. Two key questions are (1) whether prior knowledge of a specific language actually does improve software quality and (2) whether it is possible to capture performance using simple quantitative measures? This paper presents an empirical study where the experience, for example with respect to a specific programming language, of students is assessed using a quantitative survey at the beginning of a course on the personal software process (PSP), and the outcome of the course is evaluated, for example, using the number of defects and development time. Statistical tests are used to analyze the relationship between experience/background and the performance of the students in terms of software quality. The results are mostly unexpected, for example, we are unable to show any significant relation between experience in the programming language used and the number of defects detected.
Article
Students’ perception on course satisfaction through student surveys has become more influential in institutional operations because their experience in study may affect not only the prospective student’s decision in choosing the institution for their tertiary education, but also the retention of existing students. Student course satisfaction is a multivariate nonlinear problem. Neural network (NN) techniques have been successfully applied to approximating nonlinear functions in many disciplines, but there has been little information available in applying NN to the modelling of student course satisfaction. In this paper, based on the student survey results collected from 43 courses in 11 semesters from 2002 to 2007, statistical analysis and NN techniques are incorporated for establishing some dynamic models for analysing and predicting student course satisfaction. The factors identified from this process also allow new strategies to be drawn for improving course satisfaction in the future. This study shows that both the number of students (NS) enrolled to a course and the high distinction (HD) rate in final grading are the two most influential factors to student course satisfaction. The three-layer multilayer perceptron (MLP) models outperform the linear regressions in predicting student course satisfaction, with the best outcome being achieved by combining both NS and HD as the input to the networks.
Article
Fuzzy models have been recently used for estimating the development effort of software projects and this practice could start with short scale programs. In this paper, new and changed (N&C) as well as reused code were gathered from small programs developed by 74 programmers using practices of the Personal Software Process; these data were used as input for a fuzzy model for estimating the development effort. Accuracy of this fuzzy model was compared with the accuracy of a statistical regression model. Two samples of 163 and 68 programs were used for verifying and validating respectively the models; the comparison criterion was the Mean Magnitude of Error Relative to the estimate (MMER). In verification and validation stages, fuzzy model kept a MMER lower or equal than that regression model and an accuracies comparison of the models based on ANOVA, did not show a statistically significant difference amongst their means. This result suggests that fuzzy logic could be used for predicting the effort of small programs based upon these two kinds of lines of code.