ArticlePDF Available

A Case Study: Software Defect Root Causes

  • ZZ Dats Ltd.

Abstract and Figures

Software quality assurance to comply with user requirements enables software development companies to be competitive. Maintaining a high quality level requires continuous monitoring and development. If there are quality problems, the company’s reputation is suffering and its costs increase because of investing in time and eliminating the consequences of the problems. The aim of the present article is to identify the most essential root causes of software defect. The e-service “Invoice Submission” of Riga City Municipality is used as an example. The results of the study can provide useful information for developing improvement activities for e-service higher quality. The analysis is based on the information that is available in the developer’s user request database. The Ishikawa method is used to analyse the causes of defects.
Content may be subject to copyright.
Information Technology and Management Science
ISSN 2255-9094 (online)
ISSN 2255-9086 (print)
December 2017, vol. 20, pp. 54–57
doi: 10.1515/itms-2017-0009
©2017 Laila Bergmane, Jānis Grabis, Edžus Žeiris.
This is an open access article licensed under the Creative Commons Attribution License
(, in the manner agreed with De Gruyter Open.
A Case Study: Software Defect Root Causes
Laila Bergmane1, Jānis Grabis2, Edžus Žeiris3
1, 2 Riga Technical University,Latvia, 3 ZZ Dats Ltd., Latvia
Abstract – Software quality assurance to comply with user
requirements enables software development companies to be
competitive. Maintaining a high quality level requires continuous
monitoring and development. If there are quality problems, the
company’s reputation is suffering and its costs increase because of
investing in time and eliminating the consequences of the
problems. The aim of the present article is to identify the most
essential root causes of software defect. The e-service “Invoice
Submission” of Riga City Municipality is used as an example. The
results of the study can provide useful information for developing
improvement activities for e-service higher quality. The analysis is
based on the information that is available in the developer’s user
request database. The Ishikawa method is used to analyse the
causes of defects.
Keywords – Defect analysis, defect causes, software quality.
A human being can make an error, which produces a defect
in a program code document. If a defect in the code is executed,
the system may fail to function properly causing a failure.
Defects in software, systems or documents may result in
failures, but not all defects do so. Defects occur because human
beings are fallible and because there is time pressure, complex
code, complexity of infrastructure, changing technologies,
and/or many system interactions [12].
Correction of defects is costly and the cost increases
exponentially with every subsequent stage. They also directly
affect software development cycle-time. Therefore, defect
prevention not only enables one to reduce costs but also
minimises the development time [2].
Development organisations that deliver software-based
systems have to face serious problems on how to control the
progress of test activities and quality of software products
throughout the project life cycle in order to estimate test
completion criteria, and if the project will end on time [10].
Testing activities play a major role in quality assurance and
their non-compliance with the requirements is often the reason
why users are dissatisfied with the product.
The quality of software needs to be assured through a proper
development process. The development process must be
improved on a regular basis according to the actual usage
feedback. If a bug is in software, in particular, it is necessary to
investigate a root cause of the bug in order to work out a proper
measure to prevent it from recurring. Many reasons contribute
towards software bugs in the project such as product, process
and project related reasons [3]. Companies that are not good
enough to resolve defect causes, risk with their reputation, loss
of customer loyalty and cost cutting effects. To mitigate these
risks, it is essential to perform improvement activities, which
will improve the quality of the software. This requires a
problem analysis of the causes of product defects.
A defect causal analysis has three key principles:
reducing defects to improve quality;
applying local expertise;
focusing on systematic defects.
The first principle says that we can improve software quality
by focusing on the prevention and early detection of defects.
The second determines that the cause detection must involve a
software development team that can explain why these defects
have occured. The third principle says that, with relatively small
investments, the focus on systemic defects can have a
significant impact on quality [6].
To determine the context in this paper, the definition of the
“defect” of the ISTQB term is used – a flaw in a component or
system that can cause the component or system to fail to
perform its required function, e.g., an incorrect statement or
data definition. A defect, if encountered during execution, may
cause a failure of the component or system [11]. A program is
said to be buggy if that contains a large number of bugs, or bugs
that seriously interfere with its functionality [3].
The quality requirements are determined by standards and
internal quality procedures of companies [8]. Software quality
must meet user requirements. There is no software which does
not have a defect. Even in the case where no defects can be
found in the software, this does not prove that they are not there.
In the software development industry, both nationally and
globally, the competitiveness of a company is closely related to
its ability to develop high quality information technology
solutions. Therefore, quality related issues are important for any
software development company. Maintaining the quality level
according to user requirements requires continuous
management and preventive action.
E-service “Invoice Submission” is a service whose
availability is disturbed due to existing defects. The quality
problems indicate that there are systematic repetitions of
software defects that significantly impact its full use for users.
Misleading software adversely affects the company’s
reputation, and defect fixing and resources spent to fix these
problems increase business costs, and instead of using the
resources to develop new solutions, they are used to resolving
existing software defects. The aim of this study is to investigate
the root causes of defects. The e-service “Invoice Submission”
has been used for a case study. User request database provides
data for identifying the root causes. User requests in the
database are classified information. The causes of the defects
are given in a description way. Results of this paper can be used
for continuous improvement activities in e-service quality
Download Date | 1/12/18 1:40 AM
Information Technology and Management Science
_______________________________________________________________________________________________ 2017/20
There have been several studies which analysed root causes
of defects in different systems using one or more data sources.
A study published in 2001 investigated 40 incident cases of
web site functionality failures and found out that 80 % of all
failures were software failures and human errors. A large
number of failures occured during routine maintenance,
software upgrades and system integration. The authors of the
paper could not find out whether these failures wetr mainly due
to system complexity, inadequate testing and/or poor
understanding of system dependecies. They also indicated that
other significant causes of software failure were system
overload, resource exhaustion and complex fault recovery
routines [5].
In a study about software development companies, in which
36 highly-qualified quality assurance testers and developers
from open source projects were interviewed, it was concluded
that the most common quality problems were due to the fact that
the tester did not have enough information about the software
to be tested. It was also mentioned that testing was rigorous,
and no fair software quality assurance policy was available in
written form [8].
In a study, using the defect classification approach, algorithm
and functional type defects during the development process
were found late during system integration and testing. The
mistakes were related to human factors – individual errors and
lack of domain knowledge about a specific industry and system
A case study about C compiler from the GNU Compiler
Collection, which is an application consisting of over
300 000 lines of codes and can be divided based on
functionality into 13 well-defined components, showed that a
significant percentage of software failures were associated with
changes that spread across the system, i.e., were due to
nonlocalized faults. The authors of the study also analysed fligh
software failures of NASA mission. The software includes
multiple software applications, consisting of millions of lines of
code in over 8000 files. The authors of the paper states that the
most common sources of failures were requirements and coding
faults, each contributing to about 33 percent of the failures.
Requirement faults included incorrect, changed and missing
requirements. The third largest fault type was related to data
problems and it contributed to 14 percent of the failures. Design
faults led to less than 6 percent of the failures. Additionally,
4 percent of failures were due to process or procedural issues,
2 percent due to integration faults, and 1 percent – due to
simulation or testing problems [9].
A study, which focused only on failures caused by defects in
data-parallel programs, showed that most failures (84.5 %)
were caused by defects in data processing rather than defects in
code logic. The authors emphasised, “the tremendous data
volume and various dynamic data sources make data processing
error-prone”. They also concluded that 22.5 % of failures are
table-level and their major reasons were programmers’
mistakes and frequent changes of data schema. There were also
62 % of row-level failures and most of them were caused by
exceptional data. The authors concluded that programmers
could not know all of exceptional data in advance [7].
Almost all of the studies under review [1], [8], [9] indicated
that the revealed defects were related to testing problems. Other
reasons were related with change management, lack of
knowledge, requirements faults, data processing problems, and
defects in code logic. Testing problems are typically related to
testing process improvement activities. One of the most popular
options how to improve this process at companies is to use one
of the test maturity models, such as TPI Next, CMMI or TMMi,
which helps companies evaluate the current testing situation
and, based on the recommendations or guidelines suggested by
the models, move towards improving the testing process at the
company [14].
The e-service “Invoice Submission” of Riga City Municipality
was selected for the case study.
The e-service “Invoice Submission” is an online service that
offers its users an electronic submission of invoices via the web
service API, XML file uploads or manual invoice information
entry. The invoice data validation is against the XSD scheme,
which provides solutions for expanding or limiting input data at
different levels.
The development of the e-service project started in 2011 and
its development till today has been carried out in ten phases.
The project team consists of project manager, tester, system
analyst and programmer. The project manager is the only one
who works on the project from its beginning, but other team
members have changed several times.
Within the present research, several questions have been
Why do the same defects repeat systematically?
Does the technology used in the example solution affect
the quality of e-service?
What are the most common causes of defects in the
production environment?
The data analysed were obtained from the supplier quality
management information system. All defect requests received
from users that were classified as a “defect” were selected from
the database. The selected requests in the database were
registered in the period of 5 April 2017–7 July 2017–. A total
of 102 defect requests were received during this period. For
analysis, the authors used the resolved defects and the
information provided by the programmer and tester about the
progress of defect handling and their causes. Each defect
mentioned in the request was assigned to an apropriate category
(Table I). IEEE Standard Software Anomaly was used to
categorise the defects. They were classified by considering
impact on requirement classes [4]. Functional defects were
subdivided into the subcategories (categories 1–3). Thereafter,
the number of defects in each category was determined.
Download Date | 1/12/18 1:40 AM
Information Technology and Management Science
_______________________________________________________________________________________________ 2017/20
No. Category Description
1. Development
Actual or potential cause of
failure is due to deficiencies in
the requirements analysis,
development, testing,
implementation or maintenance
2. Integration with
other systems
Actual or potential cause of
failure is related to
3. Data processing Actual or potential cause of
failure is any defect affecting data
4. Usability Actual or potential cause of
failure is related to usability (ease
of use) requirements.
5. Security Actual or potential cause of
failure is related to security
requirements, such as those for
authentication, authorisation,
accountability (e.g., audit trail or
event logging), etc.
6. Performance Actual or potential cause of
failure is related to performance
requirements (e.g., capacity,
computational accuracy, response
time, throughput, or availability).
7. Serviceability Actual or potential cause of
failure is related to requirements
for reliability, maintainability, or
supportability (e.g., complex
design, undocumented code,
ambiguous or incomplete error
logging, etc.).
8. Other Would not cause any of the
effects above
To analyse the causes of software defects, the Ishikava
method was used [13]. Based on the examples of method
approach, four categories of causes were identified [6]:
methods, which might be incomplete, ambiguous,
wrong, or unenforced;
tools and environment, which might be clumsy,
unreliable, or defective;
people, who might lack adequate training or
input and requirements, which might be incomplete,
ambiguos, or defective.
As a result of the analysis, the most important and most
common causes were summarised and grouped by cause
categories, depicting them in the form of Ishikawa diagram.
In 2015, 17 defect requests were received from e-service
users and this number increased to 64 in the subsequent year.
Compared with 2015, it is at least three times more. The trend
of 2017 in the first three months shows that the number of
requests does not decrease significantly, but rather increases.
Their number at the moment of the study has already reached
21 defect requests.
The number of defect requests obtained by classifying all
defects by category is given in Table II.
No. Category Number of
1. Development process 59
2. Integration with other systems 10
3. Data processing 20
4. Usability 1
5. Security 0
6. Performance 1
7. Serviceability 0
8. Other 11
According to the results provided in Table II, the missed
defects in the production environment are due to deficiencies in
the development process. Among all user requests, in 74 cases
of defect causes, it was mentioned that they had not been found
during testing. Part of the defects could have been discovered
earlier if the test method based on testing the characteristics had
been used.
In some cases, the nature of the defects makes it clear that the
tester lacks general knowledge about the e-service solution
architecture and it is not enough to do integrity tests. In four
cases, defects were reported in the e-service because the tester
had not notified about new software defect removals that would
require testing. Other five cases showed that defects occurred
because functionality had been changed in another part of the
system. Several defect requests showed that in many cases the
tester and the system analyst had not been informed about
changes implemented in functionality, thus creating situations
where the described functional requirements were not actual in
documents. All of these cases indicate problems in the change
management process.
The second highest number of defects is related to data
processing. These defects relate to generation of a PDF
document that is affected by the degree of complexity of
invoices. These problems cannot be completely avoided due to
the technology used in the solution. Because the number of
functionality defects is significantly higher than in other
categories, this proportion indicates that the shortcomings of the
development process and the problems with generating PDF
documents most significantly affect the quality of the e-service.
According to the defect rate, the third largest defect category is
defects that have various other reasons, such as the temporary
unavailability of a web service or a server. Some of the cases
are affected by the human factor.
As a result of the analysis, the main causes of the defects are
summarised in Fig. 1. It shows that the main causes of problems
include categories “methods” and “human”. It demonstrates
that there is an opportunity to improve software development
The study identified that systematic repetition of defects was
due to the fact that the solution used restrictive technologies t hat
could affect the quality of the service.
Download Date | 1/12/18 1:40 AM
Information Technology and Management Science
_______________________________________________________________________________________________ 2017/20
Fig. 1. Defect causes.
The results of the analysis show that functionality defects are
the most common ones in the production environment, a total
of 89 out of 102. The main causes of defects are related to
problems in the development process, the technologies used in
the solution, insufficient testing and lack of knowledge of the
solution architecture.
V. C
The results of the study have confirmed the causes of the
defects mentioned in the related studies; namely, the most
common causes of defects are related to testing problems [9]
and awareness of changes in software [8].
The main conclusions of the research are as follows:
the example studied shows that the technology used in
the solution may limit the choices of quality
improvements, but does not prove that this is a common
one of the most common root causes of defects is related
to deficiencies in the development process, which is
also confirmed by the example under consideration;
the used example shows that it is necessary to create
new defect cause categories, which is useful for a
systematic defect identification step in the defect cause
analysis process;
defect cause classification will help more precisly
identify if the problems in the development process are
related to design/analysis, coding, testing or
infrastructure fields;
in the example above, 71 % of all defects were not
found during testing and this was mentioned in the
related studies as one of the main causes of failures.
The limitation of the present study is related to the fact that
the causes of the example software defects are identified based
on the information provided in defect requests, the number of
which may not be sufficient to ensure more objective
determination of their causes.
Future research may be devoted to the implementation of
new software defect cause classification at the company in
order to support the defect cause analysis process.
[1] M. Leszak, D. E. Perry, and D. Stoll, “A Case Study in Root Cause Defect
Analysis,” Proceedings of the 22nd international conference on Software
engineering – ICSE ’00, pp. 433–434, Jun. 2000.
[2] A. A. Shenvi, “Defect Prevention with Orthogonal Defect Classification,”
Proceeding of the 2nd annual conference on India software engineering
conference – ISEC ’09, p. 83, 2009.
[3] V. Gupta, N. Ganeshan, and T. Singhal Kumar, “Determining the Root
Causes of Various Software Bugs Through Software Metrics”, 2015 2nd
International Conference on Computing for Sustainable Global
Development (INDIACom), 11–13 March, New Delhi, p. 1212, 2015.
[4] “IEEE Standard Classification for Software Anomalies,” Revision of IEEE
Std 1044-2009, p. 8, 2010.
[5] S. Pertet and P. Narasimhan, Causes of Failure in Web Applications.
Parallel Data Laboratory Carnegie Mellon University Pittsburgh, PA
15213-3890, CMU-PDL-05-109, pp. 3–8, Dec. 2005 [Online]. Aviable:
[6] D. N. Card, “Learning from Our Mistakes with Defect Causal Analysis,”
IEEE Software, vol. 15, no. 1, pp. 56–63, 1998.
[7] S. Li, H. Zhou, T. Xiao, H. Lin, W. Lin, and T. Xie, “A characteristic
study on failures of production distributed data-parallel programms,”
Proceeding of the 2013 International Conference on Software
Engineering, p. 963–972, 2013.
[8] N. Nuzhat, K. Aihab, and K. Ahmed, “Survey to Improve Software
Quality Assurance in Developing Countries,” International Journal of
Technology and Research, Islamabad 3.1 1–6, pp. 3–5, 2015.
[9] M. Hamill and K. Goseva-Popstojanova, “Common Trends in Software
Fault and Failure Data,” IEEE Transactions on Software Engineering,
vol. 35, no. 4, pp. 484–496, Jul. 2009.
[10] N. Hrgarek, “Fabasoft Best Practices and Test Metrics Model,” Journal
of Information and Organizational Sciences, vol 31, no 1, Austria, pp. 75,
2007 [Online]. Available:
[11] “ISTQB Glossary” [Online]. Available:
[12] ISTQB, “Why is Testing Necessary,” Certified Tester, Foundation Level
Syllabus, p. 11, 2012 [Online]. Available:
[13] D. Dhandapani, “Applying the Fishbone diageam and Pareto principle to
Domino,” 2004 [Online]. Available:
[14] ISTQB, “Types of Process Improvement,” Certified Tester, Advanced
Level Syllabus Test Manager, p. 60, 2012 [Online]. Available:
Laila Bergmane is a student at the professional Master study programme
“Information Technology” at the Institute of Information Technology of Riga
Technical University (Latvia). She holds ISTQB certificate. This is her first
publication. She also works as a Software Tester at ZZ Dats Ltd.
Jānis Grabis holds a Doctoral degree and is a Professor at Riga Technical
University (Latvia) as well as the Head of the Institute of Information
Technology. His main research interests lie within the application of
mathematical programming methods in information technology, enterprise
applications and system integration. He has published more than 60 scientific
papers, including a monograph on supply chain configuration. He has led a
number of national projects and participated in five projects in collaboration
with the University of Michigan-Dearborn (USA) and funded mainly by
industrial partners, such as SAP America and Ford Motor Company.
Edžus Žeiris holds a Doctoral degree and is a Deputy Director at the ZZ Dats
Ltd. He holds CISM and PMP certificates. Research interests include design of
e-services, security and evaluation of electronic services.
Download Date | 1/12/18 1:40 AM
... Jaminan kualitas perangkat lunak apabila menjadi syarat yang harus dipenuhi perusahaan pengembang perangkat lunak, maka dapat membuat perusahaan lebih kompetitif dalam setiap pembuatan perangkat lunak, namun mempertahankan tingkat kualitas yang tinggi memerlukan pemantauan dan pengembangan secara terus menerus [1]. Kualitas software biasanya diukur dari jumlah cacat yang ada pada produk yang dihasilkan [2]. ...
... Cacat adalah suatu karakteristik yang mengurangi kegunaan atau value suatu item atau semacam kelemahan, ketidaksempurnaan, atau kekurangan [1]. Cacat perangkat lunak merupakan segala cacat atau ketidaksempurnaan di dalam produk perangkat lunak (program komputer, perencanaan, dokumentasi terkait, atau data) atau proses perangkat lunak (aktivitas, metode dan transformasi yang digunakan untuk mengembangkan dan mengelola produk perangkat lunak) [17]. ...
Full-text available
Software defects are one of the main contributors to information technology waste and lead to rework, thus consuming a lot of time and money. Software defect prediction has the objective of defect prevention by classifying certain modules as defective or not defective. Many researchers have conducted research in the field of software defect prediction using NASA MDP public datasets, but these datasets still have shortcomings such as class imbalance and noise attribute. The class imbalance problem can be overcome by utilizing SMOTE (Synthetic Minority Over-sampling Technique) and the noise attribute problem can be solved by selecting features using Particle Swarm Optimization (PSO), So in this research, the integration between SMOTE and PSO is applied to the classification technique machine learning naïve Bayes and logistic regression. From the results of experiments that have been carried out on 8 NASA MDP datasets by dividing the dataset into training and testing data, it is found that the SMOTE + PSO integration in each classification technique can improve classification performance with the highest AUC (Area Under Curve) value on average 0,89 on logistic regression and 0,86 in naïve Bayes in the training and at the same time better than without combining the two.
... Perawatan meliputi perbaikan sistem apabila ada kecacatan, penambahan fitur ataupun perbaruan data pengguna [11] baik kepala sekolah, wali kelas atau operator/admin. Kecacatan didefinisikan sebagai kesalahan hasil pengolahan data atau kesalahan pengkodean yang berakibat bug pada tampilan halaman [12,13]. Sistem Informasi Pengelolaan Data Nilai Siswa pada SD Negeri Jambangan 1 Kabupaten Ngawi ...
Full-text available
Nilai siswa merupakan representasi hasil belajar siswa yang ditepuh dalam satu semester. Nilai akhir siswa diambil dari hasil pengolahan beberapa nilai siswa. SD Negeri Jambangan 1 merupakan sekolah dasar negeri yang pengolahan data nilai siswanya dilakukan secara manual dengan menggunakan Microsoft Excel serta dokumentasi pembukuan. Pengolahan data nilai siswa dengan cara manual dapat menimbulkan beberapa resiko seperti rusak dan hilangnya data nilai siswa. Sistem informasi pengolahan data nilai siswa dikembangkan dengan tujuan mengurangi resiko rusak dan hilangnya data serta mempermudah proses pengolahan data. Sistem ini akan dibangun berbasis website menggunakan model pengembangan Software Development Life Cycle (SDLC) dengan metode Waterfall. Tahapan penelitian berupa analisis kebutuhan, perancangan, implementasi, pengujian dan perawatan. Pengujian sistem ini dilakukan dengan metode Black Box untuk menguji fungsionalitas sistem serta User Acceptance Testing (UAT) untuk menguji sistem apakah sistem mampu memenuhi kebutuhan user. Hasil pengujian black box menyatakan sistem ini sudah layak di gunakan. Hasil pengujian UAT didapatkan 96,42% responden setuju sistem ini dapat memudahkan dan mempercepat pihak sekolah dalam mengelola dan mengolah data nilai siswa. Penelitian ini menghasilkan sebuah sistem yang dapat mengolah data siswa dari data nilai dasar hingga menjadi nilai akhir siswa.
... The unidentified defect at an early stage can be one of the factors causing the increase in the cost of repairing the program in the next step and causing software failure. Software failure is a condition where the software fails to meet user requirements that cause the quality of the software is doubtful and can be risk to the company's reputation which can cause loss of customers and cut production costs, and in the software testing process, error detection and failure are challenging activities [6], [7]. ...
... Factors such as effort, productivity, time, cost of development and quality are negatively affected when software artifacts are produced due to the work required to correct these defects. According to Zhivich & Cunningham (2009) and Bergmane et al. (2017) it is also known that the cost of labor for defect correction increases as the development process progresses. In this way, initiatives to correct errors and anomalies must be carried out as soon as ...
Full-text available
Code reviews and inspections have the purpose to ensure that the code has sufficient quality to be released. It is generally seen as an economical way of finding errors, increase team productivity and sharing technical and product knowledge among team members. This approach is traditionally adopted in software development companies, but their practices may be useful in other contexts, such as in the process of learning software engineering. In this sense, this study proposes an innovative framework for conducting code reviews in a Computer Science course. The proposed framework can be applied in any object-oriented program language, and it is sufficiently concise to be applied in the classroom, namely in a 90-minute session in which all students are invited to collaborate in this process. The findings suggest that code reviews in an academic context can help students to strategically reflect about the performed work, enhance their soft skills, and increase their ability to work in groups. On the other hand, as the main challenges, the findings reveal that students typically don't have previous experience in performing inspections and it can become difficult to perform a complete inspection in a classroom session.
... Factors such as effort, productivity, time, cost of development and quality are negatively affected when software artifacts are produced due to the work required to correct these defects. In [2,3] it is also known that the cost of labor for defect correction increases as the development process progresses. In this way, initiatives to correct errors and anomalies must be carried out as soon as possible. ...
Conference Paper
Full-text available
Software engineering is continuously facing the challenges of growing complexity among software applications with increase in volume of bugs and is hampering software production process. Software bugs are frequent in practice. Various types of bugs occur more commonly and frequently cause of failures in software development process. Software bugs can be a cause to produce incorrect or unexpected results in the system. So, the researchers have found growing importance of software bugs in development process and have attributed factors of product metrics, process metrics and project metrics as prominent causes behind software bugs. The researchers have further isolated each metric with contributing factors towards software bugs. The main objectives of this paper are to make developers aware of these bugs' arrival reasons so they can avoid them. The researchers have finally concluded that product metrics oriented software bugs are majorly contributing towards deterioration of software quality. NOMENCLATURE WMC (Weighted Methods per Class): The sum of normalized complexity of every method in a given class. Usually we just use the number of methods in a given class. DIT (Depth of Inheritance Tree): The maximum length from the root to a given class in the inheritance hierarchy CBO (Coupling between Object Classes): the number of classes that use the member functions and/or the instance variables of a given class LCOM (Lack of Cohesion on Methods): for each instance variable calculate the percentage of methods using it, then the average percentage for all variables subtracted from 100%
Conference Paper
Full-text available
SCOPE is adopted by thousands of developers from tens of different product teams in Microsoft Bing for daily web-scale data processing, including index building, search ranking, and advertisement display. A SCOPE job is composed of declarative SQL-like queries and imperative C# user-defined functions (UDFs), which are executed in pipeline by thousands of machines. There are tens of thousands of SCOPE jobs executed on Microsoft clusters per day, while some of them fail after a long execution time and thus waste tremendous resources. Reducing SCOPE failures would save significant resources. This paper presents a comprehensive characteristic study on 200 SCOPE failures/fixes and 50 SCOPE failures with debugging statistics from Microsoft Bing, investigating not only major failure types, failure sources, and fixes, but also current debugging practice. Our major findings include (1) most of the failures (84.5%) are caused by defects in data processing rather than defects in code logic; (2) table-level failures (22.5%) are mainly caused by programmers' mistakes and frequent data-schema changes while row-level failures (62%) are mainly caused by exceptional data; (3) 93% fixes do not change data processing logic; (4) there are 8% failures with root cause not at the failure-exposing stage, making current debugging practice insufficient in this case. Our study results provide valuable guidelines for future development of data-parallel programs. We believe that these guidelines are not limited to SCOPE, but can also be generalized to other similar data-parallel platforms.
Full-text available
Software companies have to face serious problems about how to measure the progress of test activities and quality of software products in order to estimate test completion criteria, and if the shipment milestone will be reached on time. Measurement is a key activity in testing life cycle and requires established, managed and well documented test process, defined software quality attributes, quantitative measures, and using of test management and bug tracking tools. Test metrics are a subset of software metrics (product metrics, process metrics) and enable the measurement and quality improvement of test process and/or software product. The goal of this paper is to briefly present Fabasoft best practices and lessons learned during functional and system testing of big complex software products, and to describe a simple test metrics model applied to the software test process with the purpose to better control software projects, measure and increase software quality.
Quality is a central factor in software industry. Software quality depends upon the user's satisfaction, which can be attaining through applying standards and quality procedure. These standards and procedures might be company's local or in house procedures. In this time, achieving quality software is very key factor because of the high customer demands and market pressure. Developed countries are being good at in software industry and improving day by day. On the other hand, developing countries like Pakistan are struggling with Software quality and, cannot maintain reputation in International Market. Software Quality lacks due to several reasons. This paper addressed the problems which cause of interest lacking to improve the software quality by higher authorities and software assurance team. We have provided solution to the addressed difficulties by changing the method of survey as from interview to questionnaire , we also includes more questions related software quality assurance, to give the flexible time to concerning people for their best suggestions.
This report investigates the causes and prevalence of failure in Web applications. Data was collected by surveying case studies of system failures and by examining incidents of website outages listed on technology websites such as CNET. com and eweek. com. These studies suggest that software failures and human error account for about 80% of failures. The report also contains an appendix that serves as a quick reference for common failures observed in Web applications. This appendix lists over 40 incidents of real-world site outages, outlining how these failures were detected, the estimated downtime, and the subsequent recovery action.
Conference Paper
"Learning from the past" to make systems more efficient in terms of cost and time is the hallmark of any engineering discipline. Prevention of defects is the "holy grail" of "learning's from the past" i.e. every product generation we learn from our defects and prevent it from recurring in the next generation. Taking this philosophy further "capturing defects in the earlier stage of the life cycle" is a means of preventing defects in the later stages of the product life cycle. Correction of defects is costly and the cost increases exponentially with every subsequent stage. This also impacts the cycle-time directly. Defect prevention then not only helps in cost reduction but also helps in cutting down the development time. When we talk of software development, we are talking of hundreds of defects (in-process as well as post release). It would be very inefficient and time-consuming to have preventive action planning for each of them, as most of them could be symptoms of some common root cause. If we have to handle software defects of such large magnitude the best way then would be to classify them into patterns and then do root cause analysis on those patterns. There are number of methodologies available in the industry to do this classification and IBM's ODC (Orthogonal defect classification) is one such methodology. The methodology classifies each defect into orthogonal (mutually exclusive) attributes some technical and some managerial. These attributes provide all the information to be able to sift through the enormous volume of data and arrive at patterns on which root-cause analysis can be done. This coupled with good action planning and tracking can achieve high degree of defect reduction and cross learning. The questions always then are -- can methodologies be really applied to do software defect prevention in a structured way? How can an abstract defect classification mechanism really help to identify patterns? How does one measure the effectiveness of such actions? Who should do these activities? How does all this fit into the existing process framework? This paper is an attempt to answer these questions by sharing the experiences of using the IBM-ODC methodology in a case study of real-life software development project (DVD player product). The motivation of this paper is not to discuss the ODC methodology itself but rather to demonstrate through a case study, the structured process for defect prevention using some attributes of ODC for defect classification and its related interpretations for causal analysis, action planning and results tracking. The fitments of the scheme into the bigger picture of defect prevention, cross learning and mapping to elements of the SEI-CMMI framework are some of the highlights of this paper. The paper concludes with a summary of learning's and some points to ponder.
The benefits of the analysis of software faults and failures have been widely recognized. However, detailed studies based on empirical data are rare. In this paper, we analyze the fault and failure data from two large, real-world case studies. Specifically, we explore: 1) the localization of faults that lead to individual software failures and 2) the distribution of different types of software faults. Our results show that individual failures are often caused by multiple faults spread throughout the system. This observation is important since it does not support several heuristics and assumptions used in the past. In addition, it clearly indicates that finding and fixing faults that lead to such software failures in large, complex systems are often difficult and challenging tasks despite the advances in software development. Our results also show that requirement faults, coding faults, and data problems are the three most common types of software faults. Furthermore, these results show that contrary to the popular belief, a significant percentage of failures are linked to late life cycle activities. Another important aspect of our work is that we conduct intra- and interproject comparisons, as well as comparisons with the findings from related studies. The consistency of several main trends across software systems in this paper and several related research efforts suggests that these trends are likely to be intrinsic characteristics of software faults and failures rather than project specific.
A Case Study in Root Cause Defect Analysis
  • M Leszak
  • D E Perry
  • D Stoll
M. Leszak, D. E. Perry, and D. Stoll, "A Case Study in Root Cause Defect Analysis," Proceedings of the 22nd international conference on Software engineering -ICSE '00, pp. 433-434, Jun. 2000.
Applying the Fishbone diageam and Pareto principle to Domino
  • D Dhandapani
D. Dhandapani, "Applying the Fishbone diageam and Pareto principle to Domino," 2004 [Online]. Available: