Table 1 - uploaded by Raymond R. Panko
Content may be subject to copyright.
Studies of Spreadsheet Errors

Studies of Spreadsheet Errors

Source publication
Article
Full-text available
Although spreadsheet programs are used for small "scratchpad" applications, they are also used to develop many large applications. In recent years, we have learned a good deal about the errors that people make when they develop spreadsheets. In general, errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the iss...

Contexts in source publication

Context 1
... we have moved beyond such anecdotal evidence, into the realm of systematic field audits and laboratory experiments. Tables 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. ...
Context 2
... 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. Since 1995, when field audits began to use good (although usually not excellent) methodologies, 94% of the 88 spreadsheets audited in 7 studies have contained errors, reflecting improvement in auditing methodologies. ...
Context 3
... field audit studies shown in Table 1 report a simple measure of spreadsheet errors-the percentage of spreadsheets that contain at least one serious error. (Minor errors are discounted). ...
Context 4
... this is the percentage of non-label cells containing errors. Table 1 shows that just as faults/KLOC usually falls into a narrow range (Panko, 2005a), the CERs seen in spreadsheet studies have been very similar. The relatively few divergent numbers that appear within a life cycle stage, furthermore, come when only certain cells are considered-generally cells with high potentials for error. ...
Context 5
... there are quantitative errors, in which the spreadsheet gives an incorrect result. The studies in Table 1 look only at quantitative errors. However there are also qualitative errors that may lead to quantitative errors later, during maintenance, what-if analysis, or other activities. ...

Similar publications

Chapter
Full-text available
Climate models are constructed from mathematical equations that describe the behavior of its components: Atmosphere, Ocean, Ice and Land. This chapter describes the structure of a coupled climate model, including common features and concepts used across different components. Coupling is the term used to define the modeling of the interactions of th...
Conference Paper
Full-text available
Piles are vertical structural elements which can be driven, bored or drilled. For determining the allowable and ultimate load carrying capacity of pile, a number of arbitrary or empirical methods are used. The paper focuses on determining the pile bearing capacity using α-method, β-method and SPT method specified in BNBC 2015. Spreadsheets for esti...
Article
Full-text available
Fluid film thickness in a compliant foil bearing is greatly influenced by the deflection of the bearing structure. Therefore, in order to properly model performance of a foil bearing, it is mandatory that the deflection of the compliant bearing structure due to the generated hydrodynamic pressure is determined accurately. This article proposes an e...

Citations

... Researchers (Panko, 2005;Powell et al., 2007) have identified various errors such as: ...
Thesis
Full-text available
How long is it going to last, and what is it going to cost? These are the two most basic questions asked during a turnaround. These questions are asked by all parties affected by the turnaround. The answers to these questions may influence an employee to leave or stay or if an investor will want to invest in the turnaround. Being able to answer these basic questions may shape the outcome of the actual turnaround. In this thesis, an algebraic model was developed to determine the duration, the breakeven point, the cash nadir point and the value of the nadir during a turnaround. An autoethnographic approach was adopted to understand the research problem, which resulted in a process of mathematising the knowledge gained from real-world experiences. Fundamental moments of turnarounds were derived from the Variable Finance Capacity model to answer the key questions of a turnaround. The thesis provides a framework for utilising the model and fundamental moments to make informed decisions
... Tools that are built on top of Microsoft Excel, such as qBase or DART-PCR, involve copy/paste manipulations from the raw data files, necessitating the careful triage of results to fit the confines of the formulas and dataflow. This can lead to calculation errors that go unnoticed [6]. Recently developed tools, such as SATQPCR [7], PIPE-T [8] or Auto-qPCR [9], come with limitations regarding input formats, the ability to easily adjust plotting parameters, or the lack of comprehensive functionalities such as inter-plate calibration. ...
Article
Full-text available
Background Reverse transcription quantitative real-time PCR (RT-qPCR) is a well-established method for analysing gene expression. Most RT-qPCR experiments in the field of microbiology aim for the detection of transcriptional changes by relative quantification, which means the comparison of the expression level of a specific gene between different samples by the application of a calibration condition and internal reference genes. Due to the numerous data processing procedures and factors that can influence the final result, relative expression analysis and interpretation of RT-qPCR data are still not trivial and often necessitate the use of multiple separate software packages capable of performing specific functions. Results Here we present qRAT, a stand-alone desktop application based on R that automatically processes raw output data from any qPCR machine using well-established and state-of-the-art statistical and graphical techniques. The ability of qRAT to analyse RT-qPCR data was evaluated using two example datasets generated in our laboratory. The tool successfully completed the procedure in both cases, returning the expected results. The current implementation includes functionalities for parsing, filtering, normalizing and visualisation of relative RT-qPCR data, like the determination of the relative quantity and the fold change of differentially expressed genes as well as the correction of inter-plate variation for multiple-plate experiments. Conclusion qRAT provides a comprehensive, straightforward, and easy-to-use solution for the relative quantification of RT-qPCR data that requires no programming knowledge or additional software installation. All application features are available for free and without requiring a login or registration.
... On the other hand, it is well-documented that the flexibility of spreadsheets also makes them error prone [79][80][81]83]. Without formal types or data structures, spreadsheets suffer from classes of error that in traditional programming languages are easily detected, or completely prevented. ...
... However, sport scientists and performance analysts should be aware of the dangers and risks associated with data analysis (and subsequently, visualisation) in spreadsheet programs. For instance, upon an audit of 13 real-world spreadsheets, an average of 88% contained errors (Panko, 1998). Common issues with spreadsheets, which do not bode well for data analysis and subsequent visualisation, include inconsistent naming, extra white spaces between characters in cells, misrepresenting dates, incorrectly coding missing data, irregular data sheets and performing calculations on raw data files (Broman & Woo, 2018). ...
... Translation error, logic error [70]; Assignment bug, Iteration bug, Array bug [36]; Logical bug [28]; Lexical bugs [29] Language Error (LE) ...
Preprint
Full-text available
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
... This issue goes even further when experts are aware of end-users' erroneous spreadsheet documents causing serious financial losses and search for explanations. In his research, Panko concluded that the behavior of end-users can be explained by their bad habit of thinking [10]- [12]. However, our research in computer problem-solving approaches, attention modes [7] [14], thinking modes [6], and mathability reveal that thinking is not bad, but applied in incorrect, nonsuitable situations, primarily focusing on low-mathability approaches in hyper attention mode. 1 Closely related to this issue is the topic of digital literacy and competence, because not even the new generations are born with digital skills which need to be consciously acquired and continuously developed, similar to mother tongue language skills. ...
... On the one hand, end-users are not able to function, as they should according to the definition. On the other hand, their productivity, effectiveness is much lower than excepted [10]- [12] [22]. Consequently, they perfectly fulfill the good customer image of software companies but lack computational thinking skills [21]. ...
... Considering this purpose, the major characteristic of the digital documents is changeability, modifiability. This intention is well served with the definition of correctly edited and formatted spreadsheet [10]- [12] [28] [29] and text documents [18]. ...
... This issue goes even further when experts are aware of end-users' erroneous spreadsheet documents causing serious financial losses and search for explanations. In his research, Panko concluded that the behavior of end-users can be explained by their bad habit of thinking [10]- [12]. However, our research in computer problem-solving approaches, attention modes [7] [14], thinking modes [6], and mathability reveal that thinking is not bad, but applied in incorrect, nonsuitable situations, primarily focusing on low-mathability approaches in hyper attention mode. 1 Closely related to this issue is the topic of digital literacy and competence, because not even the new generations are born with digital skills which need to be consciously acquired and continuously developed, similar to mother tongue language skills. ...
... On the one hand, end-users are not able to function, as they should according to the definition. On the other hand, their productivity, effectiveness is much lower than excepted [10]- [12] [22]. Consequently, they perfectly fulfill the good customer image of software companies but lack computational thinking skills [21]. ...
... Considering this purpose, the major characteristic of the digital documents is changeability, modifiability. This intention is well served with the definition of correctly edited and formatted spreadsheet [10]- [12] [28] [29] and text documents [18]. ...
Article
The dissemination of research results might be as crucial as the research itself. The widely accepted two major forms of dissemination are written papers and live presentations. On the surface, if we see the problem in hyper attention mode, these documents are different in nature. However, their preparation requires the same problemsolving approach, and beyond that, they share the fundamental rules of design since both are extended text documents, with varying proportions of text and/or non-text contents and static and/or dynamic media types. Closely related to this problem is the phenomenon of different types of attention (hyper and deep attention), thinking (fast and slow) modes, and problem-solving approaches (high- and low-mathability). In the world of immense and various forms of data and information, the role of so-called hyper attention is fundamental and inevitable, but the presence of deep attention is essential as well. In the present paper the knowledge items involved and shared in the design and the preparation of text-based documents are detailed from the view of concept-based problem-solving, the perspective of attention and thinking modes, along with samples originated from various sources and subject matters. The aims are to discuss the theoretical background of effective and efficient document design and preparation and to call attention to the consequences of ignoring, neglecting the proper use of attention types and thinking modes
... Finally, this theme also probed techniques that are employed when documenting spreadsheets. Statistics in Table 7 support the argument that documentation is rare in spreadsheets (Panko, 1998). They reveal that employees simply use cell comments or text in their spreadsheets as approaches to document their spreadsheet models. ...
Preprint
Full-text available
This paper explores the impacts of spreadsheets on business operations in a water utility parastatal in Malawi, Sub-Saharan Africa. The organisation is a typical example of a semi-government body operating in a technologically underdeveloped country. The study focused on spreadsheet scope of use and life cycle as well as organisational policy and governance. The results will help define future spreadsheet usage by influencing new approaches for managing potential risks associated with spreadsheets in the organization. Generally, findings indicate that the proliferation of spreadsheets in the organization has provided an enabling environment for business automation. The paper also highlights management, technological and human factor issues contributing to high risks associated with the pervasive spreadsheet use. The conclusions drawn from the research confirms that there is ample room for improvement in many areas such as implementation of comprehensive policies and regulations governing spreadsheet development processes and adoption.
... Beyond this potentials, we also have to deal with the risks of spreadsheet programming, thoroughly detailed, analyzed, [13], [20]- [29], and solved to some extent [30]- [34]. In general, the wider research community tends to focus on the financial losses caused by poorly designed spreadsheets rather than on the development of "real" spreadsheet methodologies. ...
... In the coding process, as detailed above, a limited number of spreadsheet functions -Sprego functions, Table II are applied to reduce the cognitive load of the 500 built-in spreadsheet functions [117]. Furthermore, whenever it is possible, array formulas are created [2], [109], [110] to reduce copying errors [20], [26] and making formulas as secure as possible. (5) The final step of the problem-solving process is discussion and debugging. ...
... Here it must be emphasize here that this step is as important in spreadsheet programming as it is in other programming languages. Furthermore, being aware of the relatively high number of errors in spreadsheet documents, utmost care must be taken to discuss and debug spreadsheeting issues [13], [20]- [29]. ...
Article
Full-text available
The paper presents the details of a four-year project to test the effectiveness of teaching spread-sheeting with spreadsheet programming, instead of the traditional, widely accepted surface approach methods. The novel method applied in the project, entitled Sprego (Spreadsheet Lego), is a concept-based problem-solving approach adapted from the didactics of other sciences and computer programming. In the experimental group contextualized, real-world programming problems are presented in a spreadsheet environment. A semi-unplugged data-driven analysis is carried out based on each problem, which is followed by the building of a feasible algorithm, expressed by natural language expressions. The coding is completed in the following step by applying a limited number of spreadsheet (Sprego) functions, multilevel, and array formulas. The final steps of the process are discussion and debugging. On the other hand, classical, tool-centered approaches are applied in the control groups. Our research reveals that the traditional surface approach methods for teaching spreadsheeting do not provide long lasting, reliable knowledge which would provide students and end-users with effective problem-solving strategies, while Sprego does. Beyond this finding, the project proves that Sprego supports schema construction and extended abstraction, which is one of the major hiatus points of traditional surface navigation methods. The project also reveals that developing computational thinking skills should not be downgraded, and the misconceptions of self-taught end-users and user-friendly applications should be reconsidered, especially their application in educational environments. Gaining effective computer problem-solving skills and knowledge-transfer abilities is not magic, but a time-consuming process which requires consciously developed and effective methods, and teachers who accept the incremental nature of the sciences.
... Although changes to gene names and software will help, they won't solve the overarching problem with spreadsheets; that (i) errors occur silently, (ii) errors can be hidden amongst thousands of rows of data, and (iii) they are difficult to audit. Research shows that errors are surprisingly common in the business setting [4], which raises the question as to how common such errors are in science. The difficulty in auditing spreadsheets makes them generally incompatible with the principles of computational reproducibility [5]. ...
Article
Full-text available
Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.