Table 1 - uploaded by Raymond R. Panko
Content may be subject to copyright.
Studies of Spreadsheet Errors

Studies of Spreadsheet Errors

Source publication
Article
Full-text available
Although spreadsheet programs are used for small "scratchpad" applications, they are also used to develop many large applications. In recent years, we have learned a good deal about the errors that people make when they develop spreadsheets. In general, errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the iss...

Contexts in source publication

Context 1
... we have moved beyond such anecdotal evidence, into the realm of systematic field audits and laboratory experiments. Tables 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. ...
Context 2
... 1 through 3 summarize key data from these studies. Table 1 contains data from 13 field audits involving real-world spreadsheet spreadsheets. Since 1995, when field audits began to use good (although usually not excellent) methodologies, 94% of the 88 spreadsheets audited in 7 studies have contained errors, reflecting improvement in auditing methodologies. ...
Context 3
... field audit studies shown in Table 1 report a simple measure of spreadsheet errors-the percentage of spreadsheets that contain at least one serious error. (Minor errors are discounted). ...
Context 4
... this is the percentage of non-label cells containing errors. Table 1 shows that just as faults/KLOC usually falls into a narrow range (Panko, 2005a), the CERs seen in spreadsheet studies have been very similar. The relatively few divergent numbers that appear within a life cycle stage, furthermore, come when only certain cells are considered-generally cells with high potentials for error. ...
Context 5
... there are quantitative errors, in which the spreadsheet gives an incorrect result. The studies in Table 1 look only at quantitative errors. However there are also qualitative errors that may lead to quantitative errors later, during maintenance, what-if analysis, or other activities. ...

Similar publications

Chapter
Full-text available
Climate models are constructed from mathematical equations that describe the behavior of its components: Atmosphere, Ocean, Ice and Land. This chapter describes the structure of a coupled climate model, including common features and concepts used across different components. Coupling is the term used to define the modeling of the interactions of th...
Conference Paper
Full-text available
Piles are vertical structural elements which can be driven, bored or drilled. For determining the allowable and ultimate load carrying capacity of pile, a number of arbitrary or empirical methods are used. The paper focuses on determining the pile bearing capacity using α-method, β-method and SPT method specified in BNBC 2015. Spreadsheets for esti...
Article
Full-text available
Fluid film thickness in a compliant foil bearing is greatly influenced by the deflection of the bearing structure. Therefore, in order to properly model performance of a foil bearing, it is mandatory that the deflection of the compliant bearing structure due to the generated hydrodynamic pressure is determined accurately. This article proposes an e...

Citations

... To help exemplify our ideas, our prototype (described in Section 2) focuses on critical thinking of the kind invoked when working with spreadsheets, and in particular in the activity of data-driven shortlisting (described in Section 1.2). Spreadsheets make a particularly interesting case study because the scope of risk and error in spreadsheets has a long history of popular and scholarly scrutiny [Panko, 1998, Powell et al., 2008, which makes them unique amongst application types that have extremely wide use across domains, professions, and organisations. The impacts of spreadsheet-related errors are huge and continuous, as carefully documented by EuSprIG. 2 Errors have numerous sources Aurigemma, 2010, Rajalingham et al., 2008]: incorrect formulas, incorrect data, issues with data structuring [Chalhoub and Sarkar, 2022], errors in comprehension [Srinivasa Ragavan et al., 2021], unit-related errors , and mistakes in copy-paste reuse [Joharizadeh et al., 2020], are but a few among many. ...
Preprint
Generative AI, with its tendency to "hallucinate" incorrect results, may pose a risk to knowledge work by introducing errors. On the other hand, it may also provide unprecedented opportunities for users, particularly non-experts, to learn and apply advanced software features and greatly increase the scope and complexity of tasks they can successfully achieve. As an example of a complex knowledge workflow that is subject to risks and opportunities from generative AI, we consider the spreadsheet. AI hallucinations are an important challenge, but they are not the greatest risk posed by generative AI to spreadsheet workflows. Rather, as more work can be safely delegated to AI, the risk is that human critical thinking -- the ability to holistically and rigorously evaluate a problem and its solutions -- is degraded in the process. The solution is to design the interfaces of generative AI systems deliberately to foster and encourage critical thinking in knowledge work, building primarily on a long history of research on critical thinking tools for education. We discuss a prototype system for the activity of critical shortlisting in spreadsheets. The system uses generative AI to suggest shortlisting criteria and applies these criteria to sort rows in a spreadsheet. It also generates "provocations": short text snippets that critique the AI-generated criteria, highlighting risks, shortcomings, and alternatives. Our prototype opens up a rich and completely unexplored design space of critical thinking tools for modern AI-assisted knowledge work. We outline a research agenda for AI as a critic or provocateur, including questions about where and when provocations should appear, their form and content, and potential design trade-offs.
... End-user programming, such as formulas in spreadsheets, has been extremely error-prone. For example, Panko [42] found there is a high probability of error aecting the bottom lines of any substantial spreadsheet. Despite the prevalence of consequential errors, it has been hard to get end users to write tests. ...
Preprint
Full-text available
What if end users could own the software development lifecycle from conception to deployment using only requirements expressed in language, images, video or audio? We explore this idea, building on the capabilities that generative Artificial Intelligence brings to software generation and maintenance techniques. How could designing software in this way better serve end users? What are the implications of this process for the future of end-user software engineering and the software development lifecycle? We discuss the research needed to bridge the gap between where we are today and these imagined systems of the future.
... Also, the IEEE terminology does not stipulate that an audit can only occur after the system has been released to its users for their use. Some studies (e.g., [37][38]) use the term "field audit" to refer to a QA exercise performed by the authors or other end users for spreadsheets that are already being used in organizations. When discussing other relevant studies in this paper, if needed, we will replace the term "field audit" by another more appropriate term that conforms to the IEEE terminology. ...
... Many spreadsheet development projects started their life cycles with this stage, and only little effort was spent on the previous stages (i.e., the Problem and Scope Identification stage and the Specification, Modeling, and Design stage) [45]. Also, many spreadsheet developers rarely did much planning before they started filling the cells in a spreadsheet [37][46] [52]. Such a practice likely generates several problems: (a) an unnecessarily complex model which takes longer time to build, (b) assumptions made that were not part of the original intention, and (c) a lack of common understanding of what the model is doing. ...
Article
Full-text available
Spreadsheets are very common for information processing to support decision making by both professional developers and non-technical end users. Moreover, business intelligence and artificial intelligence are increasingly popular in the industry nowadays, where spreadsheets have been used as, or integrated into, intelligent or expert systems in various application domains. However, it has been repeatedly reported that faults often exist in operational spreadsheets, which could severely compromise the quality of conclusions and decisions based on the spreadsheets. With a view to systematically examining this problem via survey of existing work, we have conducted a comprehensive literature review on the quality issues and related techniques of spreadsheets over a 35.5-year period (from January 1987 to June 2022) for target journals and a 10.5-year period (from January 2012 to June 2022) for target conferences. Among other findings, two major ones are: (a) Spreadsheet quality is best addressed throughout the whole spreadsheet life cycle, rather than just focusing on a few specific stages of the life cycle. (b) Relatively more studies focus on spreadsheet testing and debugging (related to fault detection and removal) when compared with spreadsheet specification, modeling, and design (related to development). As prevention is better than cure, more research should be performed on the early stages of the spreadsheet life cycle. Enlightened by our comprehensive review, we have identified the major research gaps as well as highlighted key research directions for future work in the area.
... It is well-understood that spreadsheets are indispensable to business (e.g., Croll 2007, Grossman, Mehrotra, andÖzlük 2007), but can also be a source of risk and costly errors (e.g., EuSpRIG 2023, Panko 1998, but also Powell, Baker, and Lawson 2008a, 2008b, 2009a, 2009b. Researchers and practitioners have long been working on articulating standards and practices to reduce risk and error, and also to increase the productivity of spreadsheet programmers, and the effectiveness of spreadsheet users in organizations. ...
Preprint
This paper presents a taxonomy for analytical spreadsheet models. It considers both the use case that a spreadsheet is meant to serve, and the engineering resources devoted to its development. We extend a previous three-type taxonomy, to identify nine types of spreadsheet models, that encompass the many analytical spreadsheet models seen in the literature. We connect disparate research literature to distinguish between an "analytical solution" and an "industrial-quality analytical spreadsheet model". We explore the nature of each of the nine types, propose definitions for some, relate them to the literature, and hypothesize on how they might arise. The taxonomy aids in identifying where various spreadsheet development guidelines are most useful, provides a lens for viewing spreadsheet errors and risk, and offers a structure for understanding how spreadsheets change over time. This taxonomy opens the door to many interesting research questions, including refinements to itself.
... Several studies have shown that electronic spreadsheets are effective and efficient for handling work-related accounting (Rackliffe & Ragland, 2016, Schneider, et al., 2017. In addition, the accounting profession emphasizes that accounting education graduates must have the ability to operationalize and good spreadsheets design (Frownfelter-Lohrke, 2017), such as utilizing functions to analyze and process data in the field of accounting (Rackliffe & Ragland, 2016) and financial reporting (Panko, 1998). On the other hand, spreadsheets are one of the computational thinking tools. ...
Article
Full-text available
Computational thinking is seen as a basic and essential skill in the 21st century, but its development issue in accounting education is still relatively limited. Therefore, anchored in Wing’s (2006) computational thinking theory, this study makes a case for developing problem-solving skills, which involve abstraction, decomposition, generalization, evaluation, and algorithmic. Given the importance of computational thinking in spreadsheets teaching, this classroom-based study aims to investigate the use of constructionism-based accounting spreadsheets designing activities to promote college students’ computational thinking in the accounting spreadsheets design context. This study used a participatory qualitative approach to investigate the phenomenon of constructing accounting spreadsheets design between students and lecturers in the classroom. Thirty-eight Accounting education majors were participants in the study. In this study, data were collected from students’ work artifacts, classroom observations, and discussion notes from college students. Referring to qualitative content analysis, findings show that constructionism-based accounting spreadsheets designing activities help the college students develop their computational thinking of spreadsheets errors or deficiencies realities found in spreadsheets design they observed. This study shows that using both constructionism activities and spreadsheets as a learning resource loaded with critical, creative, systematic, and logical thinking has the potential to promote student computational thinking in accounting spreadsheets classrooms. This study gave two main recommendations to practicioner’ accounting education, (a) when teaching the topic of accounting spreadsheets design, a lecturer must provide college students with spreadsheets errors taxonomy, and (b) in terms of activity design, spreadsheets learning should be directed to developing the college students’ computational thinking.
... The question repeatedly arises, how can we convince people that their text-treatment practice, habits, concepts, and approaches do more harm than good? Even fully developed pieces of software lose their power by misuse (Panko, 1998(Panko, , 2013Csernoch and Dani, 2022;EuSpRIG Horror Stories, 2022;Sebestyén et al., 2022). On the one hand, scientific papers presented on this subject reach neither the target population (Ben-Ari and Yeshno, 2006;Wing, 2006;Csernoch, 2009Csernoch, , 2010Ben-Ari, 2015;EuSpRIG Horror Stories, 2022) nor the appropriate segment of education (Malmi et al., 2019(Malmi et al., , 2022. ...
Article
Full-text available
An extremely high number of erroneous text-based documents are in circulation, being multiplied, serving as samples both in the office world and in education. Creating, editing, and modifying these documents generates a serious financial loss when both human and machine resources are considered. Our research team developed an application and a testing method for building up an objective measurement system, in order to see the rate of loss in the handling of erroneous documents. In the pre-testing period, the error factor of a sample text was set up based on the logged activities of an expert researcher. It was found that first level modifications require about five times more human and machine resources in a short, one-paragraph text burdened with layout errors than in its properly formatted version. Further testing is required to find out how demanding longer texts and more serious modifications are, but it is already obvious that erroneous text-editing, and the lack of fundamental computational thinking skills involve unnecessary use of our resources.
... Therefore, it is easy to infer that the value of at least one data cell is erroneous. Since data errors are not randomly distributed [15] , the error distribution in the dataset can be learned to predict which data cell is more likely to have an error value, so as to make more accurate inferences. For example, in this dataset, there is a "SIGMOD Conference" that is similar to the valid value of the "SIGMOD Record" in the attribute domain of the D.Venue. ...
... A large number of experiments on error detection have been conducted through artificial error introduction [49,50] . Related studies [15,51] show that in the real world, about 5% of the data cells in a dataset are subjected to data errors for various reasons. Therefore, this paper sets the error rate to 5% by default when artificially introducing errors into the D-A and Adult datasets. ...
... Researchers (Panko, 2005;Powell et al., 2007) have identified various errors such as: ...
Thesis
Full-text available
How long is it going to last, and what is it going to cost? These are the two most basic questions asked during a turnaround. These questions are asked by all parties affected by the turnaround. The answers to these questions may influence an employee to leave or stay or if an investor will want to invest in the turnaround. Being able to answer these basic questions may shape the outcome of the actual turnaround. In this thesis, an algebraic model was developed to determine the duration, the breakeven point, the cash nadir point and the value of the nadir during a turnaround. An autoethnographic approach was adopted to understand the research problem, which resulted in a process of mathematising the knowledge gained from real-world experiences. Fundamental moments of turnarounds were derived from the Variable Finance Capacity model to answer the key questions of a turnaround. The thesis provides a framework for utilising the model and fundamental moments to make informed decisions
... Tools that are built on top of Microsoft Excel, such as qBase or DART-PCR, involve copy/paste manipulations from the raw data files, necessitating the careful triage of results to fit the confines of the formulas and dataflow. This can lead to calculation errors that go unnoticed [6]. Recently developed tools, such as SATQPCR [7], PIPE-T [8] or Auto-qPCR [9], come with limitations regarding input formats, the ability to easily adjust plotting parameters, or the lack of comprehensive functionalities such as inter-plate calibration. ...
Article
Full-text available
Background Reverse transcription quantitative real-time PCR (RT-qPCR) is a well-established method for analysing gene expression. Most RT-qPCR experiments in the field of microbiology aim for the detection of transcriptional changes by relative quantification, which means the comparison of the expression level of a specific gene between different samples by the application of a calibration condition and internal reference genes. Due to the numerous data processing procedures and factors that can influence the final result, relative expression analysis and interpretation of RT-qPCR data are still not trivial and often necessitate the use of multiple separate software packages capable of performing specific functions. Results Here we present qRAT, a stand-alone desktop application based on R that automatically processes raw output data from any qPCR machine using well-established and state-of-the-art statistical and graphical techniques. The ability of qRAT to analyse RT-qPCR data was evaluated using two example datasets generated in our laboratory. The tool successfully completed the procedure in both cases, returning the expected results. The current implementation includes functionalities for parsing, filtering, normalizing and visualisation of relative RT-qPCR data, like the determination of the relative quantity and the fold change of differentially expressed genes as well as the correction of inter-plate variation for multiple-plate experiments. Conclusion qRAT provides a comprehensive, straightforward, and easy-to-use solution for the relative quantification of RT-qPCR data that requires no programming knowledge or additional software installation. All application features are available for free and without requiring a login or registration.
... On the other hand, it is well-documented that the flexibility of spreadsheets also makes them error prone [79][80][81]83]. Without formal types or data structures, spreadsheets suffer from classes of error that in traditional programming languages are easily detected, or completely prevented. ...