Table 1 - uploaded by Raymond R. Panko
Content may be subject to copyright.
Studies of Spreadsheet Errors

Studies of Spreadsheet Errors

Source publication
Article
Full-text available
Although spreadsheet programs are used for small "scratchpad" applications, they are also used to develop many large applications. In recent years, we have learned a good deal about the errors that people make when they develop spreadsheets. In general, errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the iss...

Contexts in source publication

Context 1
... we have moved beyond such anecdotal evidence, into the realm of systematic field audits and laboratory experiments. Tables 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. ...
Context 2
... 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. Since 1995, when field audits began to use good (although usually not excellent) methodologies, 94% of the 88 spreadsheets audited in 7 studies have contained errors, reflecting improvement in auditing methodologies. ...
Context 3
... field audit studies shown in Table 1 report a simple measure of spreadsheet errors-the percentage of spreadsheets that contain at least one serious error. (Minor errors are discounted). ...
Context 4
... this is the percentage of non-label cells containing errors. Table 1 shows that just as faults/KLOC usually falls into a narrow range (Panko, 2005a), the CERs seen in spreadsheet studies have been very similar. The relatively few divergent numbers that appear within a life cycle stage, furthermore, come when only certain cells are considered-generally cells with high potentials for error. ...
Context 5
... there are quantitative errors, in which the spreadsheet gives an incorrect result. The studies in Table 1 look only at quantitative errors. However there are also qualitative errors that may lead to quantitative errors later, during maintenance, what-if analysis, or other activities. ...

Similar publications

Chapter
Full-text available
Climate models are constructed from mathematical equations that describe the behavior of its components: Atmosphere, Ocean, Ice and Land. This chapter describes the structure of a coupled climate model, including common features and concepts used across different components. Coupling is the term used to define the modeling of the interactions of th...
Conference Paper
Full-text available
Piles are vertical structural elements which can be driven, bored or drilled. For determining the allowable and ultimate load carrying capacity of pile, a number of arbitrary or empirical methods are used. The paper focuses on determining the pile bearing capacity using α-method, β-method and SPT method specified in BNBC 2015. Spreadsheets for esti...
Article
Full-text available
Fluid film thickness in a compliant foil bearing is greatly influenced by the deflection of the bearing structure. Therefore, in order to properly model performance of a foil bearing, it is mandatory that the deflection of the compliant bearing structure due to the generated hydrodynamic pressure is determined accurately. This article proposes an e...

Citations

... Also, the IEEE terminology does not stipulate that an audit can only occur after the system has been released to its users for their use. Some studies (e.g., [37][38]) use the term "field audit" to refer to a QA exercise performed by the authors or other end users for spreadsheets that are already being used in organizations. When discussing other relevant studies in this paper, if needed, we will replace the term "field audit" by another more appropriate term that conforms to the IEEE terminology. ...
... Many spreadsheet development projects started their life cycles with this stage, and only little effort was spent on the previous stages (i.e., the Problem and Scope Identification stage and the Specification, Modeling, and Design stage) [45]. Also, many spreadsheet developers rarely did much planning before they started filling the cells in a spreadsheet [37][46] [52]. Such a practice likely generates several problems: (a) an unnecessarily complex model which takes longer time to build, (b) assumptions made that were not part of the original intention, and (c) a lack of common understanding of what the model is doing. ...
... It is well-understood that spreadsheets are indispensable to business (e.g., Croll 2007, Grossman, Mehrotra, andÖzlük 2007), but can also be a source of risk and costly errors (e.g., EuSpRIG 2023, Panko 1998, but also Powell, Baker, and Lawson 2008a, 2008b, 2009a, 2009b. Researchers and practitioners have long been working on articulating standards and practices to reduce risk and error, and also to increase the productivity of spreadsheet programmers, and the effectiveness of spreadsheet users in organizations. ...
Preprint
This paper presents a taxonomy for analytical spreadsheet models. It considers both the use case that a spreadsheet is meant to serve, and the engineering resources devoted to its development. We extend a previous three-type taxonomy, to identify nine types of spreadsheet models, that encompass the many analytical spreadsheet models seen in the literature. We connect disparate research literature to distinguish between an "analytical solution" and an "industrial-quality analytical spreadsheet model". We explore the nature of each of the nine types, propose definitions for some, relate them to the literature, and hypothesize on how they might arise. The taxonomy aids in identifying where various spreadsheet development guidelines are most useful, provides a lens for viewing spreadsheet errors and risk, and offers a structure for understanding how spreadsheets change over time. This taxonomy opens the door to many interesting research questions, including refinements to itself.
... The question repeatedly arises, how can we convince people that their text-treatment practice, habits, concepts, and approaches do more harm than good? Even fully developed pieces of software lose their power by misuse (Panko, 1998(Panko, , 2013Csernoch and Dani, 2022;EuSpRIG Horror Stories, 2022;Sebestyén et al., 2022). On the one hand, scientific papers presented on this subject reach neither the target population (Ben-Ari and Yeshno, 2006;Wing, 2006;Csernoch, 2009Csernoch, , 2010Ben-Ari, 2015;EuSpRIG Horror Stories, 2022) nor the appropriate segment of education (Malmi et al., 2019(Malmi et al., , 2022. ...
Article
Full-text available
An extremely high number of erroneous text-based documents are in circulation, being multiplied, serving as samples both in the office world and in education. Creating, editing, and modifying these documents generates a serious financial loss when both human and machine resources are considered. Our research team developed an application and a testing method for building up an objective measurement system, in order to see the rate of loss in the handling of erroneous documents. In the pre-testing period, the error factor of a sample text was set up based on the logged activities of an expert researcher. It was found that first level modifications require about five times more human and machine resources in a short, one-paragraph text burdened with layout errors than in its properly formatted version. Further testing is required to find out how demanding longer texts and more serious modifications are, but it is already obvious that erroneous text-editing, and the lack of fundamental computational thinking skills involve unnecessary use of our resources.
... Therefore, it is easy to infer that the value of at least one data cell is erroneous. Since data errors are not randomly distributed [15] , the error distribution in the dataset can be learned to predict which data cell is more likely to have an error value, so as to make more accurate inferences. For example, in this dataset, there is a "SIGMOD Conference" that is similar to the valid value of the "SIGMOD Record" in the attribute domain of the D.Venue. ...
... A large number of experiments on error detection have been conducted through artificial error introduction [49,50] . Related studies [15,51] show that in the real world, about 5% of the data cells in a dataset are subjected to data errors for various reasons. Therefore, this paper sets the error rate to 5% by default when artificially introducing errors into the D-A and Adult datasets. ...
... Researchers (Panko, 2005;Powell et al., 2007) have identified various errors such as: ...
Thesis
Full-text available
How long is it going to last, and what is it going to cost? These are the two most basic questions asked during a turnaround. These questions are asked by all parties affected by the turnaround. The answers to these questions may influence an employee to leave or stay or if an investor will want to invest in the turnaround. Being able to answer these basic questions may shape the outcome of the actual turnaround. In this thesis, an algebraic model was developed to determine the duration, the breakeven point, the cash nadir point and the value of the nadir during a turnaround. An autoethnographic approach was adopted to understand the research problem, which resulted in a process of mathematising the knowledge gained from real-world experiences. Fundamental moments of turnarounds were derived from the Variable Finance Capacity model to answer the key questions of a turnaround. The thesis provides a framework for utilising the model and fundamental moments to make informed decisions
... Tools that are built on top of Microsoft Excel, such as qBase or DART-PCR, involve copy/paste manipulations from the raw data files, necessitating the careful triage of results to fit the confines of the formulas and dataflow. This can lead to calculation errors that go unnoticed [6]. Recently developed tools, such as SATQPCR [7], PIPE-T [8] or Auto-qPCR [9], come with limitations regarding input formats, the ability to easily adjust plotting parameters, or the lack of comprehensive functionalities such as inter-plate calibration. ...
Article
Full-text available
Background Reverse transcription quantitative real-time PCR (RT-qPCR) is a well-established method for analysing gene expression. Most RT-qPCR experiments in the field of microbiology aim for the detection of transcriptional changes by relative quantification, which means the comparison of the expression level of a specific gene between different samples by the application of a calibration condition and internal reference genes. Due to the numerous data processing procedures and factors that can influence the final result, relative expression analysis and interpretation of RT-qPCR data are still not trivial and often necessitate the use of multiple separate software packages capable of performing specific functions. Results Here we present qRAT, a stand-alone desktop application based on R that automatically processes raw output data from any qPCR machine using well-established and state-of-the-art statistical and graphical techniques. The ability of qRAT to analyse RT-qPCR data was evaluated using two example datasets generated in our laboratory. The tool successfully completed the procedure in both cases, returning the expected results. The current implementation includes functionalities for parsing, filtering, normalizing and visualisation of relative RT-qPCR data, like the determination of the relative quantity and the fold change of differentially expressed genes as well as the correction of inter-plate variation for multiple-plate experiments. Conclusion qRAT provides a comprehensive, straightforward, and easy-to-use solution for the relative quantification of RT-qPCR data that requires no programming knowledge or additional software installation. All application features are available for free and without requiring a login or registration.
... On the other hand, it is well-documented that the flexibility of spreadsheets also makes them error prone [79][80][81]83]. Without formal types or data structures, spreadsheets suffer from classes of error that in traditional programming languages are easily detected, or completely prevented. ...
... However, sport scientists and performance analysts should be aware of the dangers and risks associated with data analysis (and subsequently, visualisation) in spreadsheet programs. For instance, upon an audit of 13 real-world spreadsheets, an average of 88% contained errors (Panko, 1998). Common issues with spreadsheets, which do not bode well for data analysis and subsequent visualisation, include inconsistent naming, extra white spaces between characters in cells, misrepresenting dates, incorrectly coding missing data, irregular data sheets and performing calculations on raw data files (Broman & Woo, 2018). ...
... Translation error, logic error [70]; Assignment bug, Iteration bug, Array bug [36]; Logical bug [28]; Lexical bugs [29] Language Error (LE) ...
Preprint
Full-text available
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
... From the viewpoint of personal attitude to spread risk, almost a half of the respondents believe that the risk entailed by the use of spreadsheet is low, fewer than 10% believe that the risk does not exist, or that there is an extremely high risk. This can be interpreted as a product of overconfidence, which was proven several times during the research, and for which Panko (1998) points out that it is "corrosive because it tends to blind people to the need for taking steps to reduce risks." It is alarming that one third of respondents working in top management believe that there are no spreadsheet risks. ...
Article
The aim of the paper was to look into the degree of awareness among end users of the existence of spreadsheet risks. A systematic overview of literature resulted in identifying research questions, to which answers were given by analysing data gathered by means of a field survey. The research included 161 respondents. Despite the opinion that spreadsheets are very important for performing their tasks to a significant extent, the respondents think that their use is not combined to a significant extent with risks such as errors, credibility, security, data abuse, and poor analysis that may result in making wrong decisions, lack of version control, inadequate qualifications of users, lack of spreadsheet development guidelines, loss of data, breach of legal regulations, and unauthorised access to data. The majority of organisations where the respondents are employed do not possess defined spreadsheet risk management strategies, nor have adopted standards and rules for spreadsheet use.