Conference PaperPDF Available

A domain terms visualization tool for spreadsheets

Authors:

Abstract and Figures

Earlier research on spreadsheet authors' mental models has shown that people possess several types of information about spreadsheets. The research results also suggested that a tool intended to aid in comprehension and debugging of spreadsheets should make prominent real-world and problem domain concepts and map them easily to spreadsheet-specific details. This poster introduces a new spreadsheet visualization tool which was developed to demonstrate that it is possible to have an easy-to-use spreadsheet understanding and debugging tool that relieves users from spreadsheet details and lets them utilize more of their mental model of the application domain. The tool translates traditional spreadsheet formulas into problem domain narratives and highlights referenced cells.
Content may be subject to copyright.
A Domain Terms Visualization Tool for
Spreadsheets
Bennett Kankuzi
University of Eastern Finland
School of Computing, Joensuu Campus
Joensuu, Finland
Email: bfkankuzi@gmail.com
Jorma Sajaniemi
University of Eastern Finland
School of Computing, Joensuu Campus
Joensuu, Finland
Email: jorma.sajaniemi@uef.fi
Abstract—Earlier research on spreadsheet authors’ mental
models has shown that people possess several types of information
about spreadsheets. The research results also suggested that a tool
intended to aid in comprehension and debugging of spreadsheets
should make prominent real-world and problem domain concepts
and map them easily to spreadsheet-specific details.
This poster introduces a new spreadsheet visualization tool
which was developed to demonstrate that it is possible to have
an easy-to-use spreadsheet understanding and debugging tool
that relieves users from spreadsheet details and lets them utilize
more of their mental model of the application domain. The tool
translates traditional spreadsheet formulas into problem domain
narratives and highlights referenced cells.
I. INT ROD UC TI ON
Despite being one of the most widely used applications,
many spreadsheets have non-trivial errors that are hard to
detect [6]. We have thus chosen to address this problem by
studying mental models of spreadsheet authors in order to
better understand why the spreadsheet process is so error-prone
and to be able to devise new tools that better correspond to
spreadsheet authors’ mental processes.
Doyle and Ford [3] define a mental model as “a mental
image of the world around us that we carry in our heads de-
picting only selected concepts and relationships that represent
real systems” and thus, a mental model for a spreadsheet does
not carry all possible information, but just those aspects of the
spreadsheet that the user finds appropriate for the current task.
Earlier research we conducted [4] revealed that spreadsheet
authors have (at least) three mental models of a spreadsheet:
the real-world model that comprises general knowledge of the
world around us; the domain model that represents knowledge
of the problem domain and the functionality of the spreadsheet
in problem domain or application terms; and the spreadsheet
model that codes the expressions and data relationships in
the spreadsheet. The role of these mental models also vary
depending on the nature of the task at hand such that when
explaining a spreadsheet, the real-world and domain models
are prominent; in debugging, spreadsheet authors constantly
switch back and forth between the domain and spreadsheet
models; and particularly in error fixing, the spreadsheet model
is prominent. These findings suggest that a tool intended to aid
in comprehension of a spreadsheet should make prominent
real-world and domain concepts and map them easily to the
Fig. 1. Visualization of a Sales Report spreadsheet. The cell cursor is in C9,
whose formula is visible at the formula bar and narrated in domain terms in
the light green box. Cells referred to in the domain narrative are highlighted
with light green background. All formula cells are marked with a pink right
border.
spreadsheet; in debugging, a tool should not only display
possible spreadsheet errors in spreadsheet terms (e.g., cell
references) but also in domain terms in order to help authors
discern errors with a more pronounced mapping between the
problem domain and the spreadsheet.
It is against this background that we have developed an
interactive spreadsheet visualization tool whose purpose is
to ease the mapping between the domain/real-world mental
models and the spreadsheet mental model and thus relieving
users from spreadsheet details and letting them utilize more
of their mental model of the application domain.
II. TH E TOO L
The tool displays formulas in higher level symbolic names,
that is, in domain/real-world terms. The domain/real-world
information is extracted from labels (headers) through spatial
layout information of each cell referenced to in the for-
mula. We call this displayed domain/real-world information
adomain/real-world narrative or simply a domain narrative.
Features and characteristics: The tool has five distin-
guishing features and characteristics. First, the tool displays
formulas as a domain/real-world terms narrative in a box
just right below the active formula cell. For example, in
the spreadsheet depicted in Figure 1, a formula in cell C9
given as “=SUM(C5:C8)” in spreadsheet-specific terms is
automatically translated to “SUM( Jan |James Bourne ... Jan
Fig. 2. Example of a domain narrative when the column name is missing.
B4 is translated to “Cash” and B5 to “Inventories” and hence the formula in
B6 is translated as “Cash + Inventories”.
|Jasmine Hunt )” in domain terms. For each referenced cell,
its corresponding domain/real-world terms are extracted from
its column and row labels. The column and row names are
separated by |”; the naming obeys a “column |row” direction
following the standard spreadsheet convention. Narratives are
positioned a little bit lower than the active formula cell to
avoid distractions when navigating through the cells.
Second, to track changes automatically, the narratives are
automatically re-generated as one works through the spread-
sheet. Automatic generation of visualizations is important
because users are not comfortable with tools that require much
user intervention [7].
Third, referenced cells are automatically highlighted and
their background colour matches with the background colour
of the narrative. For example in Figure 1, “Jan |James Bourne”
or C5, “Jan |Chris Hewitt” or C6, “Jan |Pat Hill” or C7,
and “Jan |Jasmine Hunt” or C8 are all highlighted in light
green colour. We have thus applied Gestalt Law of Similarity
whereby objects of the same colour are naturally perceived as
related [8]. In this case, matching background colours of the
narrative and highlighted cells helps the user to automatically
perceive the two as related.
Fourth, all formula cells are marked with a pink right border
so that a user can have a general overview of the structure of
a spreadsheet and differentiate formula cells from text cells as
well as input number cells. For example, in Figure 1, cell E5
is not a formula cell while cell E9 (with a pink right border)
is a formula cell. We have thus applied cognitive theories of
visual search [2] to provide a minimal clutter pop-out effect
that results in better speed and accuracy than the attentional
template approach utilized in normal spreadsheet systems.
Fifth, the tool is superimposed on the spreadsheet display.
This is important because users find it tedious and confusing to
determine the correspondence between a separate visualization
and the spreadsheet itself [1].
Label extraction: The domain/real-world information dis-
played in the narratives is extracted from labels through spatial
layout information. For each cell, its corresponding domain
terms are extracted from its column and row labels. An input
cell which does not have its corresponding column (row) name,
has its row (column) name used as its symbolic name as
illustrated in Figure 2.
Implementation: The tool is implemented as an add-on to a
popular spreadsheet system, Microsoft Excel, so that it should
not be completely different from the traditional spreadsheet
environment. This was to take advantage of existing user ex-
perience as their familiarity with existing tools and techniques
could become worthless if the tool is too different from what
they know and how they are used to work [5]. Experience is an
asset, and learning new tools is hard. Even if one manages to
convince end-users of a new tool’s benefits, they will still have
to learn to cope with it in practice. As such, their experience
with existing tools and techniques could become worthless if
the tool is too different from what they know and how they
are used to work [5]. Therefore, we took effort to make sure
that the tool should be available in the traditional spreadsheet
environment and not be completely different from it.
Potential uses: When creating or editing a spreadsheet, the
tool can help one in tracking effected changes as active for-
mula cell ranges are highlighted and a corresponding problem
domain description is displayed. This has the potential to help
in creating an error free spreadsheet as it provides an intuitive
on-spot feedback to the user on changes being done to the
spreadsheet. For the same reasons, the tool may also help in
the comprehension of spreadsheets created by others.
The tool may also help in the comprehension of a spread-
sheet as it can give a user a general overview of the structure
of a spreadsheet through the pink right border markings on all
formula cells. Finally, the tool may also help in locating errors.
For example, a problem domain description or narrative for a
given formula cell, that does not match with what is expected
in the problem domain, could provide a visual cue for an error.
III. CONCLUSION
This poster presented a description of a domain terms
visualization tool for spreadsheets that we are proposing to
aid in spreadsheet comprehension and debugging. This tool
has also been empirically evaluated with results showing that
the tool was easy to learn, helped users to locate more errors
in spreadsheets and generally increased the use of the domain
mental model of its users. Part of our future work on the tool
would be to attempt to allow spreadsheet authors to enter or
edit formulas using domain narratives.
REF ER EN CE S
[1] J. S. Davis, “Tools for spreadsheet auditing,” International Journal of
Human-Computer Studies, vol. 45, no. 4, pp. 429–442, 1996.
[2] R. Desimone and J. Duncan, “Neural Mechanisms of Selective Visual
Attention,” Annual Review of Neurosciences, vol. 18, pp. 193–222, 1995.
[3] J. K. Doyle and D. N. Ford, “Mental Models Concepts for System
Dynamics Research,” System Dynamics Review, vol. 14, no. 1, pp. 3–
29, 1998.
[4] B. Kankuzi and J. Sajaniemi, “An Empirical Study of Spreadsheet
Authors’ Mental Models in Explaining and Debugging Tasks,” in 2013
IEEE Symposium on Visual Languages and Human-Centric Computing
(VL/HCC). IEEE, 2013, pp. 15–18.
[5] D. Kulesz, “From Good Practices to Effective Policies for Preventing
Errors in Spreadsheets,” Proceedings of EuSpRIG 2011 Conference, 2011.
[6] R. R. Panko, “What we know about spreadsheet errors,” Journal of
Organizational and End User Computing, vol. 10, no. 2, pp. 15–21, 1998.
[7] J. Sajaniemi, “Modeling spreadsheet audit: A rigorous approach to
automatic visualization,” Journal of Visual Languages & Computing,
vol. 11, no. 1, pp. 49–82, 2000.
[8] M. Wertheimer, A Source Book of Gestalt Psychology. London:
Routledge & Kegan Paul, 1938.
... The execution of the inspection is done by human experts (studies indicate that teams detect more defects than individuals and that experienced practitioners perform only about 16 -23% better than untrained students [6]). The use of tools is optional -tools in this area support human experts, e.g. by providing interactive checklists or a visualization of the spreadsheet which is adjusted for inspection purposes such as [17] or [26]. ...
... In all experiments, only five participants stated that they did not like the study too much. The remaining participants stated that it was very good (26) or good (39) (Figure 7a). We think that participants who like a study are more likely to give their best during an experiment whereas participants who do not like a study tend to rush through it to get it over with. ...
... We think that participants who like a study are more likely to give their best during an experiment whereas participants who do not like a study tend to rush through it to get it over with. A similar picture can be drawn for SIF in general where eight participants stated not to like SIF too much, while the remaining participants stated that it was very good (26) or good (37) (Figure 7b). In addition, we asked the participants to rate the visual presentation and the maintainability of the spreadsheet they had to maintain in the main task. ...
Preprint
Spreadsheets are powerful tools which play a business-critical role in many organizations. However, many bad decisions taken due to faulty spreadsheets show that these tools need serious quality assurance. Furthermore, while collaboration on spreadsheets for maintenance tasks is common, there has been almost no support for ensuring that the spreadsheets remain correct during this process. We have developed an approach named Spreadsheet Guardian which separates the specification of spreadsheet test rules from their execution. By automatically executing user-defined test rules, our approach is able to detect semantic faults. It also protects all collaborating spreadsheet users from introducing faults during maintenance, even if only few end-users specify test rules. To evaluate Spreadsheet Guardian, we implemented a representative testing technique as an add-in for Microsoft Excel. We evaluated the testing technique in two empirical evaluations with 29 end-users and 42 computer science students. The results indicate that the technique is easy to learn and to apply. Furthermore, after finishing maintenance, participants with spreadsheets "protected" by the technique are more realistic about the correctness of their spreadsheets than participants who employ only "classic", non-interactive test rules based on static analysis techniques. Hence, we believe Spreadsheet Guardian can be of use for business-critical spreadsheets.
... Mental models: Developers have mental models of the spreadsheets with which they develop or interact [47] [48][49] [50]. Thus, understanding their mental models will lead to better knowledge of why spreadsheet development is error-prone and enable the development of new tools and techniques that better correspond to spreadsheet developers' cognitive abilities. ...
... Thus, understanding their mental models will lead to better knowledge of why spreadsheet development is error-prone and enable the development of new tools and techniques that better correspond to spreadsheet developers' cognitive abilities. Three mental models in the developers' minds were proposed: real-world model, domain model, and spreadsheet model [47] [48][49] [50]. When explaining a spreadsheet, the real-world and domain models are prominent, while the spreadsheet model is suppressed. ...
Article
Full-text available
Spreadsheets are very common for information processing to support decision making by both professional developers and non-technical end users. Moreover, business intelligence and artificial intelligence are increasingly popular in the industry nowadays, where spreadsheets have been used as, or integrated into, intelligent or expert systems in various application domains. However, it has been repeatedly reported that faults often exist in operational spreadsheets, which could severely compromise the quality of conclusions and decisions based on the spreadsheets. With a view to systematically examining this problem via survey of existing work, we have conducted a comprehensive literature review on the quality issues and related techniques of spreadsheets over a 35.5-year period (from January 1987 to June 2022) for target journals and a 10.5-year period (from January 2012 to June 2022) for target conferences. Among other findings, two major ones are: (a) Spreadsheet quality is best addressed throughout the whole spreadsheet life cycle, rather than just focusing on a few specific stages of the life cycle. (b) Relatively more studies focus on spreadsheet testing and debugging (related to fault detection and removal) when compared with spreadsheet specification, modeling, and design (related to development). As prevention is better than cure, more research should be performed on the early stages of the spreadsheet life cycle. Enlightened by our comprehensive review, we have identified the major research gaps as well as highlighted key research directions for future work in the area.
... The execution of the inspection is done by human experts (studies indicate that teams detect more defects than individuals and that experienced practitioners perform only about 16 -23% better than untrained students [7]). The use of tools is optional -tools in this area support human experts, e.g. by providing interactive checklists or a visualization of the spreadsheet which is adjusted for inspection purposes such as [15] or [21]. ...
Article
Full-text available
Spreadsheets are powerful tools that play a business‐critical role in many organizations. However, many bad decisions taken due to faulty spreadsheets show that these tools need serious quality assurance. Furthermore, while collaboration on spreadsheets for maintenance tasks is common, there has been almost no support for ensuring that the spreadsheets remain correct during this process. We have developed an approach named Spreadsheet Guardian, which separates the specification of spreadsheet test rules from their execution. By automatically executing user‐defined test rules, our approach is able to detect semantic faults. It also protects all collaborating spreadsheet users from introducing faults during maintenance, even if only few end‐users specify test rules. To evaluate Spreadsheet Guardian, we implemented a representative testing technique as an add‐in for Microsoft Excel. We evaluated the testing technique in 2 empirical evaluations with 29 end‐users and 42 computer science students. The results indicate that the technique is easy to learn and to apply. Furthermore, after finishing maintenance, participants with spreadsheets “protected” by the technique are more realistic about the correctness of their spreadsheets than participants who use only “classic,” noninteractive test rules on the basis of static analysis techniques. Hence, we believe that Spreadsheet Guardian can be of use for business‐critical spreadsheets.
... Kankuzi and Sajaniemi [14][15][16] developed a visualization tool that narrates formulas in terms of domain terms derived from the corresponding row and column headers. When a user chooses a particular formula cell, a formula narrative box is displayed showing the functions of the formula together with row and column headers for each cell referenced in the formula. ...
... Materials: We developed a tool (Kankuzi and Sajaniemi, 2014) that translates cell references in traditional spreadsheet formulas into problem domain/real-world terms and highlights all cells referenced by the active formula cell. The problem domain/realworld terms are extracted from labels (headers) through spatial layout information. ...
Article
Full-text available
To address the problem of errors in spreadsheets, we have investigated spreadsheet authors mental models in a hope of finding cognition-based principles for spreadsheet visualization and debugging tools. To this end, we have conducted three empirical studies. The first study explored the nature of mental models of spreadsheet authors during explaining and debugging tasks. It was found that several mental models about spreadsheets are activated in spreadsheet authors minds. Particularly, when explaining a spreadsheet, the real-world and domain mental models are prominent, and the spreadsheet model is suppressed; however, when locating and fixing an error, one must constantly switch back and forth between the domain model and the spreadsheet model, which requires frequent use of the mapping between problem domain concepts and their spreadsheet model counterparts. The second study examined the effects of replacing traditional spreadsheet formulas with problem domain narratives in the context of a debugging task. Domain narratives were found to be easy to learn and they helped participants to locate more errors in spreadsheets. Furthermore, domain narratives also increased the use of the domain mental model and appeared to improve the mapping between the domain and spreadsheet models. The third study investigated the effects of allowing spreadsheet authors to fix errors by editing domain narratives, thus relieving them from the use of traditional low-level cell references. This scenario was found to promote spreadsheet authors using even more of their domain mental model in a manner that completely overshadowed the use of their spreadsheet mental model. Thus, from a mental model perspective, it is possible to devise a new spreadsheet paradigm that uses domain narratives in place of traditional spreadsheet formulas, thus automatically presenting spreadsheet content so that it prompts spreadsheet authors to think in a manner that closely corresponds to their mental models of the application domain.
Conference Paper
Full-text available
In this paper, we report on an empirical study exploring the nature of mental models of spreadsheet authors when they are explaining and debugging their own spreadsheets. Study participants were first asked to explain one of their own spreadsheets, and then to find and fix seeded errors in the same spreadsheets. Talk-aloud protocols were collected and analyzed to reveal the nature of participants' mental models in these activities. The findings indicate that the authors explain their spreadsheets mainly in terms of real-world and problem domain concepts; in debugging, they constantly switch between problem domain concepts and spreadsheet-specific concepts, although they mainly use spreadsheet-specific concepts to fix an identified error. These study findings provide insights on the need for developing spreadsheet authoring and debugging tools that correspond to spreadsheet authors' mental models of spreadsheets.
Article
Full-text available
Although spreadsheet programs are used for small "scratchpad" applications, they are also used to develop many large applications. In recent years, we have learned a good deal about the errors that people make when they develop spreadsheets. In general, errors seem to occur in a few percent of all cells, meaning that for large spreadsheets, the issue is how many errors there are, not whether an error exists. These error rates, although troubling, are in line with those in programming and other human cognitive domains. In programming, we have learned to follow strict development disciplines to eliminate most errors. Surveys of spreadsheet developers indicate that spreadsheet creation, in contrast, is informal, and few organizations have comprehensive policies for spreadsheet development. Although prescriptive articles have focused on such disciplines as modularization and having assumptions sections, these may be far less important than other innovations, especially cell-by-cell code inspection after the development phase.
Article
Although “mental models” are of central importance to system dynamics research and practice, the field has yet to develop an unambiguous and agreed upon definition of them. To begin to address this problem, existing definitions and descriptions of mental models in system dynamics and several literatures related to cognitive science were reviewed and compared. Available definitions were found to be overly brief, general, and vague, and different authors were found to markedly disagree on the basic characteristics of mental models. Based on this review, we concluded that in order to reduce the amount of confusion in the literature, the mental models concept should be “unbundled” and the term “mental models” should be used more narrowly. To initiate a dialogue through which the system dynamics community might achieve a shared understanding of mental models, we propose a new definition of “mental models of dynamic systems” accompanied by an extended annotation that explains the definitional choices made and suggests terms for other cognitive structures left undefined by narrowing the mental model concept. Suggestions for future research that could improve the field's ability to further define mental models are discussed. © 1998 John Wiley & Sons, Ltd.
Article
Computations in spreadsheets are hard to grasp and consequently many errors remain unnoticed. The problem with the hidden errors lies in the invisibility of the structure of calculations. As a result, auditing and visualization tools are required to make spreadsheets easier to comprehend and to make errors easier to detect. This paper presents a theoretical model of spreadsheets and a technique to describe spreadsheet auditing tools. These are then employed to describe and compare various tools. Moreover, two new visualization mechanisms are introduced.The spreadsheet model reflects not only current spreadsheet systems but also the way people actually use spreadsheets. Theoretically, it is impossible to check the correctness of a spreadsheet without a formal definition of its computations, but our hope is to find visualizations that point out parts of spreadsheets that contain anomalies, i.e. potential locations of errors. The model helps us to understand how such anomalies can be defined.
Article
Few tools are available for understanding and debugging spreadsheets, but they are needed because spreadsheets are being used for large, important business applications. The key to understanding spreadsheets is to clarify the data dependencies among cells. We developed and evaluated two interactive tools which aid in investigating data dependencies, an on-line flowchart-like diagram and a tool which represents dependencies by drawing arrows among cells on the display of the spreadsheet. Users found both tools helpful, but preferred the arrow tool.
Article
Thanks to the enormous flexibility they provide, spreadsheets are considered a priceless blessing by many end-users. Many spreadsheets, however, contain errors which can lead to severe consequences in some cases. To manage these risks, quality managers in companies are often asked to develop appropriate policies for preventing spreadsheet errors. Good policies should specify rules which are based on "known-good" practices. While there are many proposals for such practices in literature written by practitioners and researchers, they are often not consistent with each other. Therefore no general agreement has been reached yet and no science-based "golden rules" have been published. This paper proposes an expert-based, retrospective approach to the identification of good practices for spreadsheets. It is based on an evaluation loop that cross-validates the findings of human domain experts against rules implemented in a semi-automated spreadsheet workbench, taking into account the context in which the spreadsheets are used.