Chapter

Lessons Learned from Software Analytics In Practice

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this chapter, we share our experience and views on software data analytics in practice with a retrospect to our previous work. Over ten years of joint research projects with the industry, we have encountered similar data analytics patterns in diverse organizations and in different problem cases. We discuss these patterns following a 'software analytics' framework: problem identification, data collection, descriptive statistics and decision making. We motivate the discussion by building our arguments and concepts around our experiences of the research process in six different industry research projects in four different organizations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 6 It associates scores with each variable, which can be used to rank the variables based on their contribution to the model. Figure 2 depicts the process that we used in this study; a process quite similar to the one used by Bener et al. [36]. First, we define the goal of the data analytics activity, which is to develop a function for predicting the issue fix time using the data that SAP collects on its processes for fixing vulnerabilities in pre-release and post-release software. ...
... Berner et al. [36] advise that models are sensitive to time. This work supports the claim because it shows that the issue fix time is sensitive to the month of closing the issue. ...
... We can infer from this result that the prediction models are time sensitive; that is, they depend on the data collection period. This supports Berner et al. advice to consider recently collected data when developing prediction models [36]. We infer, though, that prediction models are not sufficient for modeling issue fix time since they provide a static view. ...
Article
Full-text available
Finding and fixing software vulnerabilities has become a major struggle for most software-development companies. While generally without alternative, such fixing efforts are a major cost factor, which is why companies have a vital interest in focusing their secure software development activities such that they obtain an optimal return on this investment. We investigate, in this paper, quantitatively the major factors that impact the time it takes to fix a given security issue based on data collected automatically within SAP’s secure development process and we show how the issue fix time could be used to monitor the fixing process. We use three machine-learning methods and evaluate their predictive power in predicting the time to fix issues. Interestingly, the models indicate that the impact of vulnerability type has a small impact on issue fix time. The time it takes to fix an issue instead seems much more related to the component in which the potential vulnerability resides, the project related to the issue, the development groups that address the issue, and the closeness of the software release date. This indicates that the software structure, the fixing processes, and the development groups are the dominant factors that impact the time spent to address security issues. SAP can use the models to implement a continuous improvement of its secure software development process and to measure the impact of individual improvements. Other companies can use similar models and mechanisms an be a learning organization.
... Data of major software development company are considered, covering security issues before and after the release, covering project and component-based data. Some of the other research in this area includes [9][10][11][12]. In the study of [13], the focus is on identifying the influencing factors that impact SAP company's software vulnerability fix time. ...
Article
Full-text available
Secured software development must employ a security mindset across software engineering practices. Software security must be considered during the requirements phase so that it is included throughout the development phase. Do the requirements gathering team get the proper input from the technical team? This paper unearths some of the data sources buried within software development phases and describes the potential approaches to understand them. Concepts such as machine learning and deep learning are explored to understand the data sources and explore how these learnings can be provided to the requirements gathering team. This knowledge system will help bring objectivity in the conversations between the requirements gathering team and the customer's business team. A literature review is also done to secure requirements management and identify the possible gaps in providing future research direction to enhance our understanding. Feature engineering in the landscape of software development is explored to understand the data sources. Experts offer their insight on the root cause of the lack of security focus in requirements gathering practices. The core theme is statistical modeling of all the software artifacts that hold information related to the software development life cycle. Strengthening of some traditional methods like threat modeling is also a key area explored. Subjectivity involved in these approaches can be made more objective.
... In recent years, a number of researchers and industrial practitioners (at companies such as Microsoft) have demonstrated the effectiveness of static code metrics to build predictive analytics. A commonly reported effect by a number of researchers like (Al Dallal and Briand 2010; Shatnawi andLi 2008, 2010;Madeyski and Jureczko 2015;Chidamber et al. 1998;Menzies et al. 2007c;Alves et al. 2010;Bener et al. 2015;Oliveira et al. 2014) is that OO metrics show a strong correlation with fault proneness. A comprehensive list of research on the correlation between product metrics and fault proneness can be found in Table 1 of the survey by Rathore and Kumar (2019). ...
Article
Full-text available
The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in order to improve quality. This research seeks methods that generate demonstrably useful guidance on “what to do” within the context of a specific software project. Specifically, we propose XTREE (for within-project planning) and BELLTREE (for cross-project planning) to generating plans that can improve software quality. Each such plan has the property that, if followed, it reduces the expected number of future defect reports. To find this expected number, planning was first applied to data from release x. Next, we looked for change in release x + 1 that conformed to our plans. This procedure was applied using a range of planners from the literature, as well as XTREE. In 10 open-source JAVA systems, several hundreds of defects were reduced in sections of the code that conformed to XTREE’s plans. Further, when compared to other planners, XTREE’s plans were found to be easier to implement (since they were shorter) and more effective at reducing the expected number of defects.
... Therefore, to produce insightful results for industrial practice, researchers need to use industrial data and create analytics. 28 Further on, relying on machine learning techniques, data mining methods could be used to extract knowledge from collected data and, when possible, use that knowledge to improve the software engineering process. 29 ...
Article
Information technology companies currently use data mining techniques in different areas with the goal of increasing the quality of decision‐making and to improve their business performance. The study described in this paper uses a data mining approach to produce an effort estimation of a software development process. It is based on data collected in a Croatian information technology company. The study examined 27 software projects with a total effort exceeding 42 000 work hours. The presented model employs a modified Cross‐Industry Standard Process for Data Mining, where prior to model creation, additional clustering of projects is performed. The results generated by the proposed approach generally had a smaller effort estimation error than the results of human experts. The applied approach has proved that sound results can be gained through the use of data mining within the studied area. As a result, it would be wise to use such estimates as additional input in the decision‐making process.
... Hence, we focused on these research questions: (1) What are the key characteristics of the data science process applied in service and software development projects? and (2) What are the challenges of applying the data science process in the projects? ...
Conference Paper
Data science has the potential to create value and deep customer insight for service and software engineering. Companies are increasingly applying data science to support their service and software development practices. The goal of our research was to investigate how data science can be applied in software development organisations. We conducted a qualitative case study with an industrial partner. We collected data through a workshop, focus group interview and feedback session. This paper presents the data science process recommended by experienced data scientists and describes the key characteristics of the process, i.e., agility and continuous learning. We also report the challenges experienced while applying the data science process in customer projects. For example, the data scientists highlighted that it is challenging to identify an essential problem and ensure that the results will be utilised. Our findings indicate that it is important to put in place an agile, iterative data science process that supports continuous learning while focusing on a real business problem to be solved. In addition, the application of data science can be demanding and requires skills for addressing human and organisational issues.
Article
Although intensive research on software analytics has been going on for nearly a decade, a repeated complaint in software analytics is that industrial practitioners find it hard to apply the results generated from data science. This theme issue aims to reflect on actionable analytics for software engineering and to document a catalog of success stories in which analytics has been proven actionable and useful, in some significant way, in an organization. This issue features five articles covering promising analytical methods for improving change triage, strategic maintenance, and team robustness, as well as the success stories of applying analytical tools during an organizational transformation.
Conference Paper
Full-text available
Background: During all levels of software testing, the goal should be to fail the code. However, software developers and testers are more likely to choose positive tests rather than negative ones due to the phenomenon called confirmation bias. Confirmation bias is defined as the tendency of people to verify their hypotheses rather than refuting them. In the literature, there are theories about the possible effects of confirmation bias on software development and testing. Due to the tendency towards positive tests, most of the software defects remain undetected, which in turn leads to an increase in software defect density. Aims: In this study, we analyze factors affecting confirmation bias in order to discover methods to circumvent confirmation bias. The factors, we investigate are experience in software development/testing and reasoning skills that can be gained through education. In addition, we analyze the effect of confirmation bias on software developer and tester performance. Method: In order to measure and quantify confirmation bias levels of software developers/testers, we prepared pen-and-paper and interactive tests based on two tasks from cognitive psychology literature. These tests were conducted on the 36 employees of a large scale telecommunication company in Europe as well as 28 graduate computer engineering students of Bogazici University, resulting in a total of 64 subjects. We evaluated the outcomes of these tests using the metrics we proposed in addition to some basic methods which we inherited from the cognitive psychology literature. Results: Results showed that regardless of experience in software development/testing, abilities such as logical reasoning and strategic hypotheses testing are differentiating factors in low confirmation bias levels. Moreover, the results of the analysis to investigate the relationship between code defect density and confirmation bias levels of software developers and testers showed that there is a direct correlation between confirmation bias and defect proneness of the code. Conclusions: Our findings show that having strong logical reasoning and hypothesis testing skills are differentiating factors in the software developer/tester performance in terms of defect rates. We recommend that companies should focus on improving logical reasoning and hypothesis testing skills of their employees by designing training programs. As future work, we plan to replicate this study in other software development companies. Moreover, we will use confirmation bias metrics in addition to product and process metrics in for software defect prediction. We believe that confirmation bias metrics would improve the prediction performance of learning based defect prediction models which we have been building over a decade.
Conference Paper
Full-text available
The goal of software metrics is the identification and measurement of the essential parameters that affect software development. Metrics can be used to improve software quality and productivity. Existing metrics in the literature are mostly product or process related. However, thought processes of people have a significant impact on software quality as software is designed, implemented and tested by people. Therefore, in defining new metrics, we need to take into account human cognitive aspects. Our research aims to address this need through the proposal of a new metric scheme to quantify a specific human cognitive aspect, namely “confirmation bias”. In our previous research, in order to quantify confirmation bias, we defined a methodology to measure confirmation biases of people. In this research, we propose a metric suite that would be used by practitioners during daily decision making. Our proposed metric set consists of six metrics with a theoretical basis in cognitive psychology and measurement theory. Empirical sample of these metrics are collected from two software companies that are specialized in two different domains in order to demonstrate their feasibility. We suggest ways in which practitioners may
Conference Paper
Full-text available
In this paper, we present a preliminary analysis of factors such as company culture, education and experience, on confirmation bias levels of software developers and testers. Confirmation bias is defined as the tendency of people to verify their hypotheses rather than refuting them and thus it has an effect on all software testing.
Conference Paper
Full-text available
In cognitive psychology, confirmation bias is defined as the tendency of people to verify hypotheses rather than refuting them. During unit testing software developers should aim to fail their code. However, due to confirmation bias, most defects might be overlooked leading to an increase in software defect density. In this research, we empirically analyze the effect of confirmation bias of software developers on software defect density.
Article
Full-text available
The National Institute of Standards and Technology in the US has estimated that software defects and problems annually cost 59.5 billions the U.S. economy (http://www.abeacha.com/NIST_press_release_bugs_cost.htm). The study is only one of many that demonstrate the need for significant improvements in software development processes and practices. US Federal agencies, that depend on IT to support their missions and spent at least $76 billion on IT in fiscal year 2011, experienced numerous examples of lengthy IT projects that incurred cost overruns and schedule delays while contributing little to mission-related outcomes (www.gao.gov/products/GAO-12-681). To reduce the risk of such problems, the US Office of Management and Budget recommended deploying an agile software delivery, which calls for producing software in small, short increments (GAO, 2012). Consistent with this recommendation, this paper is about the application of Business Process Management to the improvement of software and system development through SCRUM or agile techniques. It focuses on how organizational behavior and process management techniques can be integrated with knowledge management approaches to deploy agile development. The context of this work is a global company developing software solutions for service operators such as cellular phone operators. For a related paper with a comprehensive overview of agile methods in project management see Stare (2013). Through this comprehensive case study we demonstrate how such an integration can be achieved. SCRUM is a paradigm shift in many organizations in that it results in a new balance between focus on results and focus on processes. In order to describe this new paradigm of business processes this work refers to Enterprise Knowledge Development (EKD), a comprehensive approach to map and document organizational patterns. In that context, the paper emphasizes the concept of patterns, reviews the main elements of SCRUM and shows how combining SCRUM and EKD provides organizations with a comprehensive framework for managing and improving software and system development.
Article
Full-text available
Software analytics guide practitioners in decision making throughout the software development process. In this context, prediction models help managers efficiently organize their resources and identify problems by analyzing patterns on existing project data in an intelligent and meaningful manner. Over the past decade, the authors have worked with software organizations to build metric repositories and predictive models that address process-, product-, and people-related issues in practice. This article shares their experience over the years, reflecting the expectations and outcomes both from practitioner and researcher viewpoints.
Conference Paper
Full-text available
Background: Reopened issues may cause problems in managing software maintenance effort. In order to take actions that will reduce the likelihood of issue reopening the possible causes of bug reopens should be analysed. Aims: In this paper, we investigate potential factors that may cause issue reopening. Method: We have extracted issue activity data from a large release of an enterprise software product. We consider four dimensions, namely developer activity, issue proximity network, static code metrics of the source code changed to fix an issue, issue reports and fixes as possible factors that may cause issue reopening. We have done exploratory analysis on data. We build logistic regression models on data in order to identify key factors leading issue reopening. We have also conducted a survey regarding these factors with the QA Team of the product and interpreted the results. Results: Our results indicate that centrality in the issue proximity network and developer activity are important factors in issue reopening. We have also interpreted our results with the QA Team to point out potential implications for practitioners. Conclusions: Quantitative findings of our study suggest that issue complexity and developers workload play an important role in triggering issue reopening.
Conference Paper
Full-text available
We present an integrated measurement and defect prediction tool: Dione. Our tool enables organizations to measure, monitor, and control product quality through learning based defect prediction. Similar existing tools either provide data collection and analytics, or work just as a prediction engine. Therefore, companies need to deal with multiple tools with incompatible interfaces in order to deploy a complete measurement and prediction solution. Dione provides a fully integrated solution where data extraction, defect prediction and reporting steps fit seamlessly. In this paper, we present the major functionality and architectural elements of Dione followed by an overview of our demonstration.
Article
Full-text available
ContextDefect prediction research mostly focus on optimizing the performance of models that are constructed for isolated projects (i.e. within project (WP)) through retrospective analyses. On the other hand, recent studies try to utilize data across projects (i.e. cross project (CP)) for building defect prediction models for new projects. There are no cases where the combination of within and cross (i.e. mixed) project data are used together.Objective Our goal is to investigate the merits of using mixed project data for binary defect prediction. Specifically, we want to check whether it is feasible, in terms of defect detection performance, to use data from other projects for the cases (i) when there is an existing within project history and (ii) when there are limited within project data.Method We use data from 73 versions of 41 projects that are publicly available. We simulate the two above-mentioned cases, and compare the performances of naive Bayes classifiers by using within project data vs. mixed project data.ResultsFor the first case, we find that the performance of mixed project predictors significantly improves over full within project predictors (p-value < 0.001), however the effect size is small (Hedges′ g = 0.25). For the second case, we found that mixed project predictors are comparable to full within project predictors, using only 10% of available within project data (p-value = 0.002, g = 0.17).Conclusion We conclude that the extra effort associated with collecting data from other projects is not feasible in terms of practical performance improvement when there is already an established within project defect predictor using full project history. However, when there is limited project history, e.g. early phases of development, mixed project predictions are justifiable as they perform as good as full within project models.
Article
Full-text available
after the release of software products are expensive and time-consuming to fix. Each of these failures has different reasons pointing into different portions of code. We conduct a retrospective analysis on bugs reported after beta release of Eclipse versions. Our objective is to investigate what went wrong during the development process. We identify six in-process metrics that have explanatory effects on beta-release bugs. We conduct statistical analyses to check relationships between files and metrics. Our results show that files with beta-release bugs have different characteristics in terms of in-process metrics. Those bugs are specifically concentrated on Eclipse files with little activity: few edits by few committers. We suggest that in-process metrics should be investigated individually to identify beta-release bugs. Companies may benefit from such a retrospective analysis to understand characteristics of failures. Corrective actions can be taken earlier in the process to avoid similar failures in future releases.
Conference Paper
Full-text available
Background: In our previous research, we built defect pre- diction models by using confirmation bias metrics. Due to confirmation bias developers tend to perform unit tests to make their programs run rather than breaking their code. This, in turn, leads to an increase in defect density. The performance of prediction model that is built using confir- mation bias was as good as the models that were built with static code or churn metrics. Aims: Collection of confirmation bias metrics may result in partially “missing data” due to developers’ tight sched- ules, evaluation apprehension and lack of motivation as well as staff turnover. In this paper, we employ Expectation- Maximization (EM) algorithm to impute missing confirma- tion bias data. Method: We used four datasets from two large-scale com- panies. For each dataset, we generated all possible missing data configurations and then employed Roweis’ EM algo- rithm to impute missing data. We built defect prediction models using the imputed data. We compared the perfor- mances of our proposed models with the ones that used com- plete data. Results: In all datasets, when missing data percentage is less than or equal to 50% on average, our proposed model that used imputed data yielded performance results that are comparable with the performance results of the models that used complete data. Conclusions: We may encounter the “missing data” prob- lem in building defect prediction models. Our results in this study showed that instead of discarding missing or noisy data, in our case confirmation bias metrics, we can use ef- fective techniques such as EM based imputation to overcome this problem.
Conference Paper
Full-text available
Numbers are not data and data analysis does not necessarily produce information and knowledge. Statistics, data mining, and artificial intelligence are disciplines focused on extracting knowledge from data. They provide tools for testing hypotheses, predicting new observations, quantifying population effects, and summarizing data efficiently. In these fields, measurable data is used to derive knowledge. However, a clean, exact and complete dataset, which is analyzed professionally, might contain no useful information for the problem under investigation. The term Information Quality (InfoQ) was coined by [15] as the potential of a dataset to achieve a specific (scientific or practical) goal using a given data analysis method. InfoQ is a function of goal, data, data analysis, and utility. Eight dimensions that relate to these components help assess InfoQ: Data Resolution, Data Structure, Data Integration, Temporal Relevance, Generalizability, Chronology of Data and Goal, Construct Operationalization, and Communication. The eight dimensions can be used for developing streamlined evaluation metrics of InfoQ.We describe two studies where InfoQ was integrated into research methods courses, guiding students in evaluating InfoQ of prospective and retrospective studies. The results and feedback indicate the importance and usefulness of InfoQ and its eight dimensions for evaluating empirical studies.
Conference Paper
Full-text available
During all levels of software testing, the goal should be to fail the code to discover software defects and hence to increase software quality. However, software developers and testers are more likely to choose positive tests rather than negative ones. This is due to the phenomenon called confirmation bias which is defined as the tendency to verify one"s own hypotheses rather than trying to refute them. In this work, we aimed at identifying the factors that may affect confirmation bias levels of software developers and testers. We have investigated the effects of company size, experience and reasoning skills on bias levels. We prepared pen-and-paper and interactive tests based on two tasks from cognitive psychology literature. During pen-and-paper test, subjects had to test given hypotheses, whereas interactive test required both hypotheses generation and testing. These tests were conducted on employees of one large scale telecommunications company, three small and medium scale software companies and graduate computer engineering students resulting in a total of eighty-eight subjects. Results showed regardless of experience and company size, abilities such as logical reasoning and strategic hypotheses testing are differentiating factors in low confirmation bias levels. Therefore, education and/or training programs that emphasize mathematical reasoning techniques are useful towards production of high quality software. Moreover, in order to investigate the relationship between code defect density and confirmation bias of software developers, we performed an analysis among developers who are involved with a software project in a large scale telecommunications company. We also analyzed the effect of confirmation bias during software testing phase. Our results showed that there is a direct correlation between confirmation bias and defect proneness of the code.
Article
Full-text available
The thought processes of people have a significant impact on software quality, as software is designed, developed and tested by people. Cognitive biases, which are defined as patterned deviations of human thought from the laws of logic and mathematics, are a likely cause of software defects. However, there is little empirical evidence to date to substantiate this assertion. In this research, we focus on a specific cognitive bias, confir-mation bias, which is defined as the tendency of people to seek evidence that verifies a hypothesis rather than seeking evidence to falsify a hypothesis. Due to this confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code. Therefore, confirmation bias is believed to be one of the factors that lead to an increased software defect density. In this research, we present a metric scheme that explores the impact of developers' confirmation bias on software defect density. In order to estimate the effectiveness of our metric scheme in the quantification of confirmation bias within the context of software development, we performed an empirical study that addressed the prediction of the defective parts of software. In our empirical study, we used confirmation bias metrics on five datasets obtained from two companies. Our results provide empirical evidence that human thought processes and cognitive aspects deserve further investigation to improve decision making in software development for effective process management and resource allocation.
Article
Full-text available
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation: i.e. the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of super-trees vs smaller sub-trees. Results: For ten data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: (a) the estimation variance of cluster sub-trees is usually larger than that of cluster super-trees; (b) if analogy is restricted to the cluster trees with lower variance then effort estimates have a significantly lower error (measured using MRE and a Wilcoxon test, 95% confidence, compared to nearest-neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
Conference Paper
Full-text available
Decision making under uncertainty is a critical problem in the field of software engineering. Predicting the software quality or the cost/ effort requires high level expertise. AI based predictor models, on the other hand, are useful decision making tools that learn from past projects' data. In this study, we have built an effort estimation model for a multinational bank to predict the effort prior to projects' development lifecycle. We have collected process, product and resource metrics from past projects together with the effort values distributed among software life cycle phases, i.e. analysis & test, design & development. We have used Clustering approach to form consistent project groups and Support Vector Regression (SVR) to predict the effort. Our results validate the benefits of using AI methods in real life problems. We attain Pred(25) values as high as 78% in predicting future projects.
Conference Paper
Full-text available
This paper presents a comparison of test-first and test-last development approaches on a customer account management software of a telecommunication company. While the management had plans for initiating a process that enforces test-first development over test-last, they also had concerns about the tradeoffs. Therefore, an exploratory case study with quantitative analysis is carried out on a pilot project where the code metrics and estimated manual inspection efforts of both approaches are compared. Our results indicate no statistical difference between the two approaches in terms of design (CK) metrics. On the other hand, we observe that test-last development yields significantly simpler code in terms of cyclomatic complexity and less (though not significant) manual inspection effort. Hence, our initial results indicate no superiority of test-first over test-last development in the described industrial context.
Conference Paper
Full-text available
Software teams should follow a well defined goal and keep their work focused. Work fragmentation is bad for efficiency and quality. In this paper we empirically investigate the relationship between the fragmentation of developer contributions and the number of post-release failures. Our approach is to represent developer contributions with a developer-module network that we call contribution network. We use network centrality measures to measure the degree of fragmentation of developer contributions. Fragmentation is determined by the centrality of software modules in the contribution network. Our claim is that central software modules are more likely to be failure-prone than modules located in surrounding areas of the network. We analyze this hypothesis by exploring the network centrality of Microsoft Windows Vista binaries using several network centrality measures as well as linear and logistic regression analysis. In particular, we investigate which centrality measures are significant to predict the probability and number of post-release failures. Results of our experiments show that central modules are more failure-prone than modules located in surrounding areas of the network. Results further confirm that number of authors and number of commits are significant predictors for the probability of post-release failures. For predicting the number of post-release failures the closeness centrality measure is most significant.
Conference Paper
Full-text available
Companies usually have limited amount of data for effort estimation. Machine learning methods have been preferred over parametric models due to their flexibility to calibrate the model for the available data. On the other hand, as machine learning methods become more complex they need more data to learn from. Therefore the challenge is to increase the performance of the algorithm when there is limited data. In this research we used a relatively complex machine learning algorithm, neural networks, and showed that stable and accurate estimations are achievable with an ensemble using associative memory. Our experimental results revealed that our proposed algorithm (ENNA) achieves on the average PRED(25) = 36.4 which is a significant increase compared to Neural Network (NN) PRED(25) = 8.
Article
While software metrics are a generally desirable feature in the software management functions of project planning and project evaluation, they are of especial importance with a new technology such as the object-oriented approach. This is due to the significant need to train software engineers in generally accepted object-oriented principles. This paper presents theoretical work that builds a suite of metrics for object-oriented design. In particular, these metrics are based upon measurement theory and are informed by the insights of experienced object-oriented software developers. The proposed metrics are formally evaluated against a widelyaccepted list of software metric evaluation criteria.
Conference Paper
During all levels of software testing, the goal should be to fail the code to discover software defects and hence to increase software quality. However, software developers and testers are more likely to choose positive tests rather than negative ones. This is due to the phenomenon called confirmation bias which is defined as the tendency to verify one's own hypotheses rather than trying to refute them. In this work, we aimed at identifying the factors that may affect confirmation bias levels of software developers and testers. We have investigated the effects of company size, experience and reasoning skills on bias levels. We prepared pen-and-paper and interactive tests based on two tasks from cognitive psychology literature. During pen-and-paper test, subjects had to test given hypotheses, whereas interactive test required both hypotheses generation and testing. These tests were conducted on employees of one large scale telecommunications company, three small and medium scale software companies and graduate computer engineering students resulting in a total of eighty-eight subjects. Results showed regardless of experience and company size, abilities such as logical reasoning and strategic hypotheses testing are differentiating factors in low confirmation bias levels. Therefore, education and/or training programs that emphasize mathematical reasoning techniques are useful towards production of high quality software. Moreover, in order to investigate the relationship between code defect density and confirmation bias of software developers, we performed an analysis among developers who are involved with a software project in a large scale telecommunications company. We also analyzed the effect of confirmation bias during software testing phase. Our results showed that there is a direct correlation between confirmation bias and defect proneness of the code.
Conference Paper
We present an integrated measurement and defect prediction tool: Dione. Our tool enables organizations to measure, monitor, and control product quality through learning based defect prediction. Similar existing tools either provide data collection and analytics, or work just as a prediction engine. Therefore, companies need to deal with multiple tools with incompatible interfaces in order to deploy a complete measurement and prediction solution. Dione provides a fully integrated solution where data extraction, defect prediction and reporting steps fit seamlessly. In this paper, we present the major functionality and architectural elements of Dione followed by an overview of our demonstration.
Conference Paper
Fault localization in telecommunication sector is a major challenge. Most companies manually try to trace faults back to their origin. Such a process is expensive, time consuming and ineffective. Therefore in this study we automated manual fault localization process by designing and implementing an intelligent software tool (Xiruxe) for a local telecommunications company. Xiruxe has a learning-based engine which uses powerful AI algorithms, such as Naïve Bayes, Decision Tree and Multi Layer Perceptrons, to match keywords and patterns in the fault messages. The initial deployment results show that this intelligent engine can achieve a misclassification rate as low as 1.28%.
Article
These days, applications are expected to efficiently manage an ever growing range of data. Yet it is rarely practical to embed a traditional relational database such as Oracle or MySQL within an application, due to not only size constraints but also because of the additional administrative overhead these databases require. While embeddable databases are small enough to ensure little additional total increase in application size, they often have little to offer by way of capabilities and power. So what's a developer to do? For many, the answer is SQLite (http://www.sqlite.org/), an open source embeddable database with an amazingly small footprint (less than 250 Kb) but packing a powerful array of features. The Definitive Guide to SQLite is the first book to devote complete coverage to the latest version of this powerful database. Offering experienced database developers a thorough overview of its capabilities and APIs, yet cognizant of newcomers who may be making their first foray into the database environment with SQLite, this book acts as both an ideal tutorial and reference guide.
Article
IntroductionTwo-Way Cross-Classification ModelTwo-Fold Nested ModelThree Factor, Nested-Factorial ModelGeneral Description of the AVE TableAdditional ExamplesComputational Procedure for the AVE Method Properties of the AVE Estimates
Article
Recommendation systems in software engineering (SE) should be designed to integrate evidence into practitioners experience. Bayesian networks (BNs) provide a natural statistical framework for evidence-based decision-making by incorporating an integrated summary of the available evidence and associated uncertainty (of consequences). In this study, we follow the lead of computational biology and healthcare decision-making, and investigate the applications of BNs in SE in terms of 1) main software engineering challenges addressed, 2) techniques used to learn causal relationships among variables, 3) techniques used to infer the parameters, and 4) variable types used as BN nodes. We conduct a systematic mapping study to investigate each of these four facets and compare the current usage of BNs in SE with these two domains. Subsequently, we highlight the main limitations of the usage of BNs in SE and propose a Hybrid BN to improve evidence-based decision-making in SE. In two industrial cases, we build sample hybrid BNs and evaluate their performance. The results of our empirical analyses show that hybrid BNs are powerful frameworks that combine expert knowledge with quantitative data. As researchers in SE become more aware of the underlying dynamics of BNs, the proposed models will also advance and naturally contribute to evidence based-decision-making.
Article
Issue management is one of the major challenges of software development teams. Balanced workload allocation of developers who are responsible for the maintenance of the software product would impact the long-term reliability of the product. In this paper, we analyse the issue report, issue ownership, and issue resolve patterns of two large software products over a period of time. We use GINI index to estimate the inequalities in issue ownership over time. Our results indicate that a small group of developers tends to take the ownership of a large portion of new issues especially when the active issue count is relatively high in the software development life cycle. We discuss the implications of this trend and propose long-term issue management strategies to deal with them.
Conference Paper
Developer teams may naturally emerge independent of managerial decisions, organizational structure, or work locations in large software. Such self organized collaboration teams of developers can be traced from the source code repositories. In this paper, we identify the developer teams in the collaboration network in order to present the work team evolution and the factors that affect team stability for a large, globally developed, commercial software. Our findings indicate that: a) Number of collaboration teams do not change over time, b) Size of the collaboration teams increases over time, c) Team activity is not related with team size, d) Factors related to team size, location and activity affect the stability of teams over time.
Article
Defect prediction has been evolved with variety of metric sets, and defect types. Researchers found code, churn, and network metrics as significant indicators of defects. However, all metric sets may not be informative for all defect categories such that only one metric type may represent majority of a defect category. Our previous study showed that defect category sensitive prediction models are more successful than general models, since each category has different characteristics in terms of metrics. We extend our previous work, and propose specialized prediction models using churn, code, and network metrics with respect to three defect categories. Results show that churn metrics are the best for predicting all defects. The strength of correlation for code and network metrics varies with defect category: Network metrics have higher correlations than code metrics for defects reported during functional testing and in the field, and vice versa for defects reported during system testing.
Article
Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods. Method: Nine learners were combined with 10 preprocessing options to generate 9 × 10 = 90 solo methods. These were applied to 20 datasets and evaluated using seven error measures. This identified the best n (in our case n = 13) solo methods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8, and 13 solo methods were then combined to generate 12 multimethods, which were then compared to the solo methods. Results: 1) The top 10 (out of 12) multimethods significantly outperformed all 90 solo methods. 2) The error rates of the multimethods were significantly less than the solo methods. 3) The ranking of the best multimethod was remarkably stable. Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.
Article
Companies usually have limited amount of data for effort estimation. Machine learning methods have been preferred over parametric models due to their flexibility to calibrate the model for the available data. On the other hand, as machine learning methods become more complex, they need more data to learn from. Therefore the challenge is to increase the performance of the algorithm when there is limited data. In this paper, we use a relatively complex machine learning algorithm, neural networks, and show that stable and accurate estimations are achievable with an ensemble using associative memory. Our experimental results show that our proposed algorithm (ENNA) produces significantly better results than neural network (NN) in terms of accuracy and robustness. We also analyze the effect of feature subset selection on ENNA’s estimation performance in a wrapper framework. We show that the proposed ENNA algorithm that use the features selected by the wrapper does not perform worse than those that use all available features. Therefore, measuring only company specific key factors is sufficient to obtain accurate and robust estimates about software cost estimation using ENNA.
Conference Paper
Software effort estimation is critical for resource allocation and planning. Accurate estimates enable managers to distribute the workload among resources in a balanced manner. The actual workload of developers may be different from the values observed in project management tools. In this research, we provide a summary of our experiences regarding: a) effort estimation activities, b) the developer workload distribution through churn data and c) a method of using churn data to track estimation process. Our experience report depends on our collaborative work with our industry partners operating in various domains in Turkey. As a result, we observe that effort estimation is taken as an important topic. However, there is a large space for research to transfer the ad-hoc methods employed in industry to empirical ones. Interestingly, we observe that resource allocations based on initial estimates/plans do not conform to actual values. The common characteristic of developer contribution in different projects is: More than 80% of edits in code are performed by a small number of developers.
Conference Paper
Software engineering is a tedious job that includes people, tight deadlines and limited budgets. Delivering what customer wants involves minimizing the defects in the programs. Hence, it is important to establish quality measures early on in the project life cycle. The main objective of this research is to analyze problems in software code and propose a model that will help catching those problems earlier in the project life cycle. Our proposed model uses machine learning methods. Principal component analysis is used for dimensionality reduction, and decision tree, multi layer perceptron and radial basis functions are used for defect prediction. The experiments in this research are carried out with different software metric datasets that are obtained from real-life projects of three big software companies in Turkey. We can say that, the improved method that we proposed brings out satisfactory results in terms of defect prediction
Conference Paper
Software fails and fixing it is expensive. Research in failure prediction has been highly successful at modeling software failures. Few models, however, consider the key cause of failures in software: people. Understanding the structure of developer collaboration could explain a lot about the reliability of the final product. We examine this collaboration structure with the developer network derived from code churn information that can predict failures at the file level. We conducted a case study involving a mature Nortel networking product of over three million lines of code. Failure prediction models were developed using test and post-release failure data from two releases, then validated against a subsequent release. One model's prioritization revealed 58% of the failures in 20% of the files compared with the optimal prioritization that would have found 61% in 20% of the files, indicating that a significant correlation exists between file-based developer network metrics and failures.
Conference Paper
Often software systems are developed by organizations consisting of many teams of individuals working together. Brooks states in the Mythical Man Month book that product quality is strongly affected by organization structure. Unfortunately there has been little empirical evidence to date to substantiate this assertion. In this paper we present a metric scheme to quantify organizational complexity, in relation to the product development process to identify if the metrics impact failure-proneness. In our case study, the organizational metrics when applied to data from Windows Vista were statistically significant predictors of failure-proneness. The precision and recall measures for identifying failure-prone binaries, using the organizational metrics, was significantly higher than using traditional metrics like churn, complexity, coverage, dependencies, and pre-release bug measures that have been used to date to predict failure-proneness. Our results provide empirical evidence that the organizational metrics are related to, and are effective predictors of failure-proneness.