Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 1
Preprint. Final version will be available as: BLANDFORD, A., GREEN, T. R. G., FURNISS, D. &
MAKRI, S. (forthcoming) Evaluating system utility and conceptual fit using CASSM. To appear in
International Journal of Human–Computer Studies. That version may differ from the current one.
Evaluating system utility and conceptual fit using
Ann Blandford1, Thomas R. G. Green2, Dominic Furniss1 & Stephann Makri1
1 UCLIC, University College London, Remax House, 31-32 Alfred Place, London WC1E 7DP, UK
2University of Leeds, UK
Tel. +44 20 7679 5288
There is a wealth of user-centred evaluation methods (UEMs) to support the analyst in assessing interactive
systems. Many of these support detailed aspects of use – for example: Is the feedback helpful? Are labels
appropriate? Is the task structure optimal? Few UEMs encourage the analyst to step back and consider how well a
system supports users’ conceptual understandings and system utility. In this paper, we present CASSM, a method
which focuses on the quality of ‘fit’ between users and an interactive system. We describe the methodology of
conducting a CASSM analysis and illustrate the approach with three contrasting worked examples (a robotic arm,
a digital library system and a drawing tool) that demonstrate different depths of analysis. We show how CASSM
can help identify re-design possibilities to improve system utility. CASSM complements established evaluation
methods by focusing on conceptual structures rather than procedures. Prototype tool support for completing a
CASSM analysis is provided by Cassata, an open source development.
CASSM, usability evaluation methods, conceptual structures, co-evolution.
Imagine you are planning a journey to another continent. Let us say you live in York, England and are travelling to San
Jose, California. One thing you need to do is book a flight. But where from, and where to? Is Leeds/Bradford or London
a better choice? You much prefer direct flights, but when you select flights from anywhere in the UK to San Jose, they all
involve transfers. You search for a US map that includes airport information; there appear to be three not far away from
San Jose, but then you need to know their names, you need ground transportation information, and you still need to know
whether you can fly there direct from the UK (and, if so, from which airport). An apparently simple task of booking a
flight has become rather complex. This is an example of a conceptual misfit between what users require and what current
flight booking websites typically offer: sites work in terms of flights between airports; users work in terms of journeys
This misfit between user and system conceptualizations is a kind of usability difficulty that is not readily identified using
existing evaluation methods such as Heuristic Evaluation (Nielsen, 1994) or Cognitive Walkthrough (Wharton, Rieman,
Lewis & Polson, 1994), for which the instruction materials typically encourage the analyst to focus on the device and
draw on their own experience regarding the domain. Unless the evaluator is alert to the idea of ‘misfits’, it is also
unlikely to emerge through empirical evaluation approaches such as think-aloud protocols (Ericsson & Simon, 1993);
indeed, Nørgaard and Hornbæk (2006) report that think-aloud protocols are often used to confirm anticipated problems
and typically focus on usability rather than utility of systems.
The identification of misfits typically represents a design opportunity: in the example above, a site that makes it easier to
plan the journey as well as booking the flight (and possibly ground transportation too) could be attractive to users and
create a bigger market for integrated travel arrangements. More generally, the identification of a user concept that is
missing from the system representation is indicative of a possible design change (though not all user concepts are easily
converted into elements of the system design). Under this view, evaluation is less about fixing bugs and more about
identifying new design possibilities that better match user requirements. Evaluation directly informs re-design. It does
not necessarily dictate, or even indicate, the precise form of that re-design, but highlights the possibility in the form of
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 2
The opening example was presented as a simple scenario, and only one misfit was highlighted; there might be many
other misfits of interest in this domain. To identify them it is necessary to have a systematic approach. CASSM has been
developed as such a systematic approach. It supports the analysis of conceptual misfits between the way the user thinks
and the representation implemented within the system. The name, CASSM, stands for “Concept-based Analysis of
Surface and Structural Misfits”, and is pronounced “chasm” to invoke the idea of a gulf (Norman, 1986) between user
and system; the different kinds of misfits (surface and structural) are discussed below. This approach fills a niche in the
space of analytical evaluation techniques by focusing specifically on conceptual fit rather than, for example, task
structures, what the user knows about the system, user experience or the modalities of interaction. In particular, it
encourages a high level view of system fitness for purpose, and hence supports reasoning about new design alternatives
rather than incremental improvements.
BACKGROUND: ATTRIBUTES OF EVALUATION METHODS
When developing a new evaluation method, it is important to consider what properties that method should have
(Blandford and Green, forthcoming). We have already stated that one of the aims in developing CASSM was that it
should focus on a system’s fitness for purpose, and hence lead to new design possibilities. Wixon (2003) argues that this
kind of utility should be central to choosing and using any evaluation technique. Hartson, Andre & Williges (2001)
identify this criterion as ‘downstream utility’ in their discussion of how to evaluate evaluation methods; John and Marks
(1997) consider an important aspect of downstream utility to be the persuasiveness of a method. Several other criteria for
assessing evaluation methods have been identified and discussed in the literature.
Historically, simple problem count (the number and/or severity of problems an evaluation method helped identify) was
considered an important criterion. Many studies through the 1980s reported on comparisons between different evaluation
methods considering which helped analysts identify more usability difficulties; these studies were systematically
reviewed, and their flaws highlighted, by Gray and Salzman (1998). Since that review, this criterion has been relatively
de-emphasised. In the development of CASSM, we have been much more concerned with the quality and utility of
insights than with their number.
More recently, reliability has emerged as a central concern. Reliability is a measure of consistency of results across
different evaluators. Reliability between evaluators has been generally found to be low, even with evaluators who are
given the same data (e.g. video data) and asked to apply the same analytical method (Hertzum and Jacobsen, 2001).
Some methods, such as Heuristic Evaluation (Nielsen, 1994) and Cognitive Walkthrough (Wharton et al., 1994),
advocate employing a team of evaluators to mitigate against the evaluator effect. In the development of CASSM, we
have not been concerned with reliability in this sense: if two different evaluators achieve different insights, that is not
considered a problem as long as the insights are valid (see below). A single evaluator can identify some misfits, and that
should be useful for informing redesign. Employing more evaluators, like conducting a more thorough analysis, increases
costs, and a judgment has to be made about whether this will yield commensurate benefits. The CASSM approach does
not include an explicit recommendation regarding number of evaluators.
A related criterion that has been considered is thoroughness. Hartson et al. (2001) define this as the proportion of real
problems found (as compared to the real problems that exist in the system). This definition makes two assumptions that
are, in our view, inherently problematic: firstly that it is possible to count problems (e.g. the number of problems counted
will depend on the level of abstraction of problem descriptions); and secondly that it is possible to know how many ‘real’
problems there are in the system (a real problem could be one that is encountered by a particular proportion of users, that
is very infrequent but catastrophic, that leads to errors in a certain percentage of interactions, or any of several other
definitions). A third assumption implicit in this definition is that any method should be able to identify all classes of
usability problems. In developing CASSM, we have explicitly not been aiming to achieve thoroughness, but to fill a
niche. That is: to identify a kind of usability difficulty that is ignored by established evaluation methods.
In line with their approach of defining reliability and thoroughness in numerical terms, Hartson et al. (2001) also define
validity and effectiveness numerically. In particular, they define the validity of a method as the proportion of problems
found that are real. In our view, this definition suffers the same difficulties as their definition of thoroughness: that it is
difficult – and ultimately not productive – to attach meaningful counts to these variables. We adopt a rather looser, but
nevertheless stringent, definition of validity, which is that a method should support the analyst in correctly predicting
user behaviour or identifying problems that users will have with the system. In particular, the approach should not result
in many false positives (issues that are identified as problems that actually are not).
Of course, by this definition, a method that supported no predictions at all would achieve high validity (no false
positives); however, it would clearly not be useful. A method also needs to be productive, in the sense of supporting the
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 3
identification of usability issues. Every usability issue will be outside the scope of some evaluation methods (e.g. a
matrix algebra approach is unlikely to have much to say about the effectiveness of team working), so another
requirement concerns scope. The scope of a method refers to the kinds of issues it does and does not address. This should
be clear to users of the method so that analysts can both select an appropriate method for addressing their current
concerns and also appreciate the limitations as well as the value of their analysis. CASSM has not been developed to
identify all possible usability issues, but particularly those related to conceptual misfits between users and systems.
The criteria discussed so far have related to the outputs of evaluation methods; a second class of criteria relate to learning
and applying a method. Surprisingly little discussion has focused on the idea that HCI methods must themselves be
usable. This means that they must be learnable, cost-effective and also fit within design practice (one might want to be
more specific about what particular kinds of design practice are to be addressed).
Learnability of a method will be a strong determinant of take-up and use in practice. We are not aware of many studies of
the learnability of methods in HCI, although Blandford, Buckingham Shum and Young (1998) studied the practicalities
of teaching (and learning) PUM, and John and Packer (1995) investigated the learnability of Cognitive Walkthrough. In
our development of CASSM, ease of learning and ease of use were important criteria. In particular, the approach is
intentionally ‘sketchy’ with iterative deepening, so that initial insights can be achieved with minimal investment and the
analysis can be made more thorough over time if the situation demands it. Throughout the development, we have
conducted informal studies of the use of the approach, which we discuss below.
As Wixon (2003) notes, there is still little agreement on what features an HCI method should possess. In developing
CASSM, we have not been particularly concerned with productivity, reliability or thoroughness. Rather, when we started
this project, our overall objective was to develop a method that was useful and usable.
For us, the important elements of being useful were that the method should:
• fill a niche in the space of evaluation methods (concerning conceptual misfits); and
• deliver valid insights: in particular, the analysis should be grounded in real user data wherever possible.
Similarly, elements of being usable were that the method should:
• support iterative deepening of analysis to enable analysts to choose how much effort to invest;
• provide tool support to facilitate analysis;
• be easy to learn; and
• fit with established design practice without disrupting it.
Figure 1 summarises the attributes discussed above. Those in bold are the ones we have identified as our objectives in
developing CASSM; in the discussion section, we review the extent to which we have achieved these objectives.
Stage of evaluation
A: downstream utility
A2: lead to new design possibilities
H1: fill a niche
J1: quick and easy to apply
J2: support iterative deepening
B: problem count
H: have clear scope
J: cost effectiveness
K: fit with practice
L: tool support
Figure 1: Summary of properties of evaluation methods
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 4
CASSM: AN OVERVIEW
As noted above, we are not aware of any other usability evaluation approach that is designed to support the systematic
analysis of conceptual fit between user and system. Moran’s (1983) ETIT and the Yoked State Spaces (YSS) of Payne,
Squibb and Howes (1990) strongly informed our thinking about CASSM, as they are based on the idea of comparing user
and system representations of the same task; however, neither ETIT nor YSS was codified so that non-specialists could
apply it. Our own earlier work on PUM (Blandford & Young, 1996; Young, Green & Simon, 1989) and ERMIA (Green
& Benyon, 1996) also informed the design of CASSM, which has drawn on important representational elements of those
approaches. While PUM and ERMIA were both codified (in tutorial materials), we perceive both as demanding too much
precision and detail to apply routinely. The final approach that has heavily influenced our thinking is Green’s Cognitive
Dimensions of Notations (CDs: Green, 1989). CDs provide an informal vocabulary for describing various misfits
between user and system. Precise definitions of many CDs are now encapsulated within the Cassata analysis tool
CASSM focuses on the quality of fit between the user’s conceptual model and that implemented within a system,
encompassing a method for both data gathering and data analysis. Within evolutionary development, a recognition of
poor fit highlights opportunities for re-design (Hsi, 2004).
As highlighted by Vicente (1999), misfits between user and system can originate from users having inaccurate mental
models of the system (which should be corrected through improved system design or training) or from systems
implementing an inaccurate representation of the user’s domain of working. Some systems define their own domain; for
example, the notion of ‘links’ depends on the notion of a hypertext; for these systems, the design challenge is often to
help users to develop an appropriate mental model of the system. Then, the question is typically not whether there is a
good conceptual misfit but rather how easy it is for users to acquire an appropriate mental model. In other words, a
CASSM analysis would typically focus more on the ease of acquiring a good conceptual fit, with a view to considering
how the system might be re-designed to support acquisition of an appropriate mental model, rather than re-designing to
accommodate the users’ existing mental models.
CASSM is a method and modelling representation, supported by a prototype tool, Cassata. CASSM compares the users’
concepts with the concepts implemented within the system and interface. Conceptual analysis draws out commonalities
across similar users to create the profile of a typical user of a particular type; the analyst can then assess the quality of fit
between user and system. CASSM supports reasoning about two broad classes of misfits. Surface misfits are concepts
salient to the user but partially or wholly missing from the system and, conversely, concepts imposed on the user by the
system that might be difficult to learn or work with. Structural misfits are cases where the relationships between concepts
in the user’s model do not coincide with those in the system, resulting in situations where a change to the system
representation causes user difficulties. Many structural misfits correspond to CDs such as premature commitment,
viscosity and hidden dependencies (Green & Petre, 1996; Green, 1989; Green 1990).
Surface misfits go back, although not by that name, to Norman’s (1986) analysis of whether the user of the system could
understand the designer’s conceptual model and whether the system appropriately implemented pre-existing user
concepts. The advance made by CASSM is to provide a structured methodology with which to identify them. Structural
misfits, in contrast, are hardly present in previous usability evaluation methods.
This recognition of different types of misfits between user and system is reflected in the name, CASSM, “Concept-based
Analysis of Surface and Structural Misfits”. Earlier papers on this approach (Blandford, Wong, Connell & Green, 2002;
Connell, Green & Blandford, 2003) were published under the name OSM. This paper presents a more complete account
of the method, including an account of how it can support substantive re-design (rather than the detailed usability
problems that are the focus of many evaluation methods). In the terms of Nørgaard and Hornbæk (2006), CASSM
considers both usability and utility because it supports the identification of both tasks that are more difficult than
necessary (e.g. because something that is conceptually simple demands complex work-arounds using a particular system)
and tasks that are impossible (such as the journey planning outlined above).
In the following sections, we summarize the approach to conducting a CASSM analysis and the kinds of insights the
approach affords; briefly describe Cassata, the CASSM analysis tool; and discuss informal evaluations of the approach.
We illustrate CASSM using three contrasting examples:
• a simple robotic arm interface that illustrates identification of and reasoning with surface misfits, and what can be
achieved with minimal user data;
• a digital library interface that illustrates the use of verbal protocol data; and
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 5
• a simple drawing tool that illustrates a structural misfit.
These examples have been chosen to be simple enough to discuss in detail and to illustrate different aspects of the
CASSM approach. We have completed a preliminary CASSM analysis of the journey planning / flight booking example
with which we opened this paper, using think-aloud data from four participants. However, the resulting analysis is too
complex for detailed presentation in this paper (since our purpose here is to illustrate the CASSM approach rather than
discuss flight booking systems).
We close the paper by discussing the extent to which we have achieved our objectives in the development and testing of
CASSM, and identifying future work.
CASSM: THE METHOD
The method of conducting a CASSM analysis is presented in detail in a downloadable tutorial (Blandford, Connell &
Green, 2004). Here, we summarise the ideas underpinning the approach and important stages of analysis.
The focus for conducting a CASSM analysis is on concepts. A concept may be an entity or an attribute. Very early on in
analysis, it is not necessary to distinguish between these two different types of concept, but as analysis proceeds the
distinction becomes important. An entity is usually something that can be created or deleted within the system. An
attribute is a property of an entity – usually one that can be set when the entity is initially created, or that can be
subsequently changed. Attributes are always associated with an entity, so some entities are listed simply because they
have attributes that can be changed, even if they themselves cannot be created or deleted (they are fixed).
For every concept, the analyst determines whether it is present, difficult or absent for the user, at the interface and in the
underlying system. A concept is considered to be present for the user if users naturally and easily think in terms of it.
Concepts might be difficult for the user if they are hard to learn, discover or work with: these difficulties may be of
different kinds and levels of importance; it is up to the analyst, working with user data if possible, to assess the severity
of difficulties. Concepts may be absent for the user if users show little recognition that the concept has been implemented
in the system; this may result in users not using the system as intended (e.g. missing features of it) or having great
difficulty in getting the system to work as desired. Similarly, for the interface and system, concepts may be present,
difficult or absent. Concepts that are present in one place but absent or difficult in another are potential sources of misfits
between the system and user, and might be triggers to consider re-design possibilities.
For every concept, the analyst also considers the actions that can be performed on it and whether these actions are easy,
difficult in some way or impossible.
Finally, the analyst might want to consider relationships between concepts. These are particularly important when the
analysis focuses on changing the state of the system, since changing one thing may change another; but there are other
examples where the relationships between concepts play a minor role and can be put aside. In this paper, we do not
discuss relationships in detail: such detail can be found elsewhere (Blandford, Green and Connell, 2005; Blandford,
Connell and Green, 2004).
CASSM does not fit into the classic mould of either an expert evaluation technique (such as Heuristic Evaluation or
Cognitive Walkthrough, which assume that the analyst understands the user’s perspective from the outset) or an
empirical evaluation approach. A CASSM analysis generally requires access to users, in order to gather their perceptions
of the domain in which they are working. It also requires access to some form of system documentation (and ideally the
running system if it exists yet).
For gathering user concepts, some form of verbal data is helpful. This might be from a think-aloud protocol (of the user
working with a current system), from Contextual Inquiry interviews (Beyer & Holtzblatt, 1998), from other kinds of
interviews (e.g. Critical Decision Method interviews (Hoffmann, Crandall & Shadbolt, 1998)) or from user-oriented
documentation – for example, describing user procedures for completing tasks. The more sources of data, the better, but
this has to be balanced against any need for speed, efficiency or the practicalities of accessing different sources. While
explicitly articulated concepts are the easiest to identify and record, tacit understanding can sometimes be inferred from
observational data. For example, the user of a hypertext who never articulates the idea of a “link” but nevertheless
navigates seamlessly between pages can generally be assumed to be comfortable with that concept.
Data sources should give information about how the system is used in its normal context of use. This is particularly
important: to return to our opening example, if the user task had been presented as “select a direct flight from London
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 6
Heathrow to San Francisco”, few (if any) misfits would have been identified: it is important that the user task is the
domain task of planning a journey rather than the device one of selecting a flight that is known to exist.
In this paper, we present examples of CASSM analyses that illustrate different data sources for the user side:
• A robotic arm example, where we did not have access to real users, so had to make do with user-oriented system
documentation and some brief video clips of the system in use. This is not ideal, but shows what is possible in limited
• A digital library, where we collected think-aloud protocols from four library users.
• A drawing tool, where we drew on our own experience of working with the system as well as earlier empirical studies
of drawing tools (Connell et al., 2003).
For gathering underlying system concepts, some form of system description is needed. If a prototype system or full
implementation is available, that is clearly a useful source; other descriptions can include user manuals or system
• For the robotic arm, a prototype system existed, and the developer helped clarify our understanding of the design.
• For the digital library, we had access to the functioning system, but no access to documentation or developers.
• For the drawing tool, we had access to the running system and standard user documentation.
Finally, interface concepts can be gathered if the analyst has access to a working system or an interface description.
Who is the ‘user’ and what is the ‘system’?
Identifying the ‘user’ is harder than it might first appear. Notionally, the ‘user’ is a typical individual who interacts with
the rest of the system. The user description is usually the result of merging the descriptions from several individual users
who have similar perceptions of the system with which they work. In the digital library example, data from four users has
been integrated into a single coherent account.
The ‘system’ is not just a computer system, but is the ‘rest of the system’ with which the user interacts. This can make
describing the ‘interface’ challenging at times.
• In the robotic arm example, the system is the computer interface to the arm controller, and also the arm itself –
everything that mediates between user and ‘real world’.
• In the digital library example, we have analysed the ‘system’ as being the user interface and underlying search
engine, document database, etc., but not the broader environment within which the library might be located.
• The drawing tool has also been analysed as a stand-alone system.
By contrast, in an earlier analysis of an ambulance control system, in which the individual user interacts with several
pieces of technology as well as colleagues (Blandford et al., 2002), the ‘system’ was a much more complex set of
interoperating sub-systems, each of which has an interface to the user group being studied.
As with many kinds of qualitative data analysis, it is not possible to lay down clear rules on how to group users together
into a ‘typical user’ or where to draw the boundary of a ‘system’. The decisions taken should be appropriate to the
purpose of the analysis. There is no unique right answer. As discussed below, this is an aspect of expertise in being a
There are different ways of conducting a CASSM analysis, but we have found that the following generally works well.
More detail can be found in the tutorial (Blandford et al., 2004). Analysis can be terminated at the end of any phase,
depending on the rigour required; this means that the costs associated with analysis can be determined by the analyst.
User–system concepts: The first step is to extract user and system concepts from the data and simply compare them to
see how well they fit with each other. One way to conduct an analysis is to go through a transcription of users talking or
documentation, highlighting words that might represent core concepts within the user’s conceptualisation of the system
they are working with. For the system description, it is important to focus on the underlying system representation rather
than interface widgets. For example, a CASSM description of a flight booking system would focus on concepts such as
places, prices, carriers, dates and times rather than buttons and links. This first phase of analysis yields surface misfits.
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 7
Entities and attributes, and the interface: The next steps are to start distinguishing between entities and attributes and to
pay attention to the interface. This step includes consideration of whether each concept is present, difficult or absent for
the user, in the underlying system and at the interface. This deeper analysis helps identify more surface misfits.
Actions: Once the interface is sufficiently well specified, the next step is to consider actions and how the user changes
the state of the system. For an entity, the actions are creating and deleting; for an attribute, they are initial setting of and
changing the value. Considering actions serves two purposes. First, actions that are hard or impossible indicate further
surface misfits. An action may be hard for various reasons: maybe it involves a long and tedious action sequence or it is
difficult for the user to discover the affordance. Impossible actions fall into two categories: those which might be
required but which the user can’t perform (these are problematic) and those aspects of a system that are fixed, and rightly
so (these are not problematic). Second, action information is used for identifying some structural misfits, as discussed
Structural misfits: The final step is to consider structural misfits. There are two ways to go about doing this. The text-
book one is to add information about relationships to the analysis, and use the full CASSM specification to derive
structural misfits (Blandford, Green & Connell, 2005). The ‘unofficial’ one, which often works better, is to take
definitions of structural misfits (such as ‘viscosity’) and assess whether they apply to any aspect of the system as
designed, then retrospectively work out what that must mean for the CASSM model, in terms of concepts, actions and
relationships. This reverse engineering can help to highlight valuable concepts that would otherwise remain implicit.
As noted above, analysis can stop after any of these phases; also, a thorough analysis typically involves some iteration
through the different phases as the analyst gets a deeper understanding of the system. However, not every analysis has to
be thorough to be valuable. The additional costs need to be weighed against potential benefits. As discussed above, the
concern is more with usefulness of analysis than inter-rater reliability or thoroughness.
CASSATA: THE TOOL
Cassata (Green & Blandford, 2004) is an open source tool to support CASSM analysis. It is implemented in the cross-
platform language Tcl/Tk, and has been tested on both PCs and Macs. It has supported a range of CASSM analyses of
varying levels of complexity, most of which have been made available in a library of worked examples (Green &
Blandford, 2004). Cassata supports both data entry and limited forms of data analysis (i.e. checking a system description
against definitions of various Cognitive Dimensions).
An example of a Cassata window is shown in Figure 2. Here, the main area of the grid shows entities and attributes, and
their properties. The columns denote the following information: E/A indicates whether the description is of an entity or
an attribute; the second column is the name of the entity or attribute (attributes are right-justified, and are those of the
immediately preceding entity); the following three columns indicate whether the concept is present, difficult or absent for
the user, interface and system (respectively); the next two columns describe how easy it is to create or delete an entity, or
set or change an attribute; and the final column is for the analyst to enter notes – e.g. to explain something in the analysis
or to record an insight about the system being analysed. This last column is included because much of the value of
conducting an analysis comes from the insights that emerge while doing it, rather than from inspecting the resulting
model: this is a place to capture those insights. The rows at the bottom of the table (below the obvious boundary) include
information about relationships. For example, in Figure 2 we have recorded that the user-centric concept of ‘position of
an object’, which is not available from the system, has to be deduced from (‘maps-onto’) the concept of ‘position of
gripper’, a concept which is represented within the system. In a later example, we shall illustrate some other possible
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 8
Figure 2: CASSM description of robotic arm. The upper table describes entities (left-justified) and their attributes (right-
justified in the same column). The lower table describes relationships. The contents of the table are described in Example 1.
Using a tool turned out to have several virtues, some of which we did not anticipate. First, it made the recording of an
analysis much faster. Certain columns of the analysis table can only contain a small range of entries, such as ‘present’,
‘difficult’ and ‘absent’, and Cassata allows these to be quickly selected from a click-through menu.
Second, the tool made the analyses much more consistent, in terms of encouraging consistency of approach. During
development of the method, before a tool was developed, we found that each analyst subtly modified the analysis
method, making it difficult to compare analyses.
Third, the tool encourages more complete analyses. This is a most important contribution. It is all too easy to forget to
analyse whether a particular attribute can be changed as well as set, or to forget to consider whether it is as easy to delete
entities as to create them.
Finally, the Cassata tool contains built-in algorithms for the detection of certain types of usability issue. Some of these
are at the level of surface misfits, such as entities that can be created but not deleted; others are at the structural level,
such as the viscosity problem illustrated below. At least one new user of Cassata declared that the built-in algorithms
were much the most important contribution.
As discussed above, inter-rater reliability and thoroughness (requirements C and D in Figure 1) were of less concern to us
in our development than validity (requirement E) and cost effectiveness (requirement J). However, the consistency
encouraged by the tool made the scope of the method clearer (Blandford et al., forthcoming); completeness of analyses
contributed to cost effectiveness; and consistency contributed to quality (and comparability) of analyses.
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 9
EXAMPLE 1: THE INTERFACE TO A ROBOTIC ARM
Our first example considers a novel but simple device for which our only access to real users was possession of 6 video
sequences showing the device in use (the device was subsequently destroyed in a flood, making it impossible to gather
more user data); however, we did have contact with the developers. This example is included because of its simplicity
and because it illustrates an analysis based on limited data. This shows what is possible with limited data; it also means
that we can present all the data and ‘walk through’ the analysis in this paper. This analysis was completed as part of an
exercise in comparing the scope of CASSM with that of other evaluation approaches (Blandford et al., forthcoming).
Figure 3: A State Transition Network diagram of the system.
The system was a robotic manipulator for use by wheelchair-bound people (Parsons, Warner, White & Gill, 1997). The
manipulator was to be used in a domestic context for everyday tasks such as feeding and grooming, and was developed to
prove that a sophisticated manipulator could be produced at reasonable cost: usability issues were considered informally,
if at all.
The description of the system that was used for generating the CASSM analysis was as follows. This description is user-
oriented, but was produced for us by the system developers. This is the only verbal data available about the system: the
limited video data was used to validate findings, but could not be used to generate them.
Significant concepts are discussed more fully below. Some concepts, such as “motor control commands in a special
command language to a dedicated microprocessor”, are not considered because it should not be necessary for the user to
be aware of these details: one of the challenges of analysis is that the analyst has to reflect on what users actually need to
know. While the motors and motor control commands will matter for maintenance and upgrade of the system, they
should not matter to the user in normal operation.
Some less relevant sections of the original text have been paraphrased or omitted for clarity; these are marked in square
brackets. The system is…
“[…] a robotic manipulator for use by wheelchair-bound people. The arm is intended to be used in a domestic context for everyday tasks
such as feeding and grooming […] The arm consists of eight joints, powered by motors, which can move either individual joints or the
whole arm at once, via the input devices.
The input devices interface with a Windows-based application which in turn sends motor control commands in a special command
language to a dedicated microprocessor, which actually controls the movement of the arm. [Consider the task of moving] the robotic arm
to a certain position, without making use of any pre-taught positions, as though it were to be used to turn on a light switch. […] From the
main menu of the application this covers the options move and movearm. Move allows the user to specify a particular arm joint and in
what direction it can be moved, as well as controlling its speed. Movearm allows the user to move the arm as a whole in a particular
direction. At present there is no feedback to the user other than that provided by the visual feedback of the arm’s position.
The interface […] is going to be implemented as a Windows application, using a menu format. Menu options will be selected in order to
operate the arm. There are two methods of input, which can be used concurrently or as alternatives.
The gesture input system […] allows a variety of unique gestures to form the gesture vocabulary. The gesture system is presently
implemented so that a cursor moves along underneath the menu options continuously in turn, and if the correct gesture is made when the
cursor is underneath a particular option, then that option is selected. Another gesture acts as a toggle between high and low speed of the
cursor. A final gesture is an escape option, which automatically stops the arm if the arm is moving, and returns the user to the main menu.
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 10
The voice recognition system allows direct menu option selection simply by saying the menu option out loud.”
A high-level description of the menu options and possible transitions is provided in Figure 3. This has been derived from
both the written description and discussions with the developers. The core concepts that can be extracted from this
written description are as follows:
• Moving an object to a position (e.g. a light switch to the ‘on’ position, or a brush through the hair) is a core task. The
orientation of the object might also matter, as might its speed (e.g. moving a hairbrush for grooming).
• Joints (powered by motors) – we do not distinguish between individual joints and the whole arm initially.
• Input devices (the user has to choose between these, depending on their physical abilities). These are also referred to
in the text above as “methods of input”, of which instances are the “gesture input device” and the “voice recognition
• Moving the “arm to a certain position” actually means having the gripper at the end of the arm in a certain position,
and probably orientation and speed. Initially, we do not consider the case where the positions of individual joints on
the arm matter (though this could be important in a restricted space).
• There are various menus, and options within those menus, from which the user has to make a selection.
• There is a cursor, which is located on a menu option, and which moves with a particular speed.
• For each input system, the user has to remember the vocabulary (the gestures or voice commands).
• The way to move an object is to move the gripper while it is holding that object (or pressing against that object), so
there are relationships (which we call maps_onto) between the position, orientation and speed of an object and of the
gripper while it is in contact with that object.
These ideas can be captured in Cassata as shown in the “entities and attributes” column of Figure 2. The final bullet point
is represented as relationships between attributes (those of the gripper and corresponding attributes of the object in the
Having laid out the concepts, the next step is to consider whether they are present to the user, represented at the interface
(considering the menu interface and the actual arm as together comprising the interface), and represented in the
underlying system model. Without access to real users, it was necessary to use judgment to assign values for this. Since it
was expected that users would be trained on the system, it was judged that all items would be known to the user. In
contrast, information about objects in the world was not available to the system. Similarly, in the relationships table, the
mappings between the gripper properties and those of the object are known to the user but not represented at the interface
or in the underlying system.
Moving on to consider how the state is changed, we note that the user cannot create or delete any of the entities, but that
this is not problematic (all are “fixed” in the system); we assume that the values of attributes are initially fixed too (they
have some default setting when the user starts to use the system), but that the user can change the values of attributes
(such as speed, position and orientation). Again, the judgment of whether they are likely to be easy, difficult or
impossible was based on documentation and discussions with the developers. As shown in Figure 2, we judged some
changes to be easy and other hard. The most interesting set of assessments is that pertaining to the object in the world –
namely that it can only be moved indirectly (by manipulating the gripper rather than the object itself).
From the process of conducting this sketchy analysis, some potential difficulties of using the system became apparent,
suggesting promising design changes:
• The user has to align the gripper with objects in the world (e.g. the light switch), in terms of position, speed and
orientation. The mapping from the one to the other is non-trivial, particularly if the user has limited movement. In
particular, the user may have difficulty judging how far away something is and getting the speed right on approach.
While the use of pre-taught positions reduces this difficulty for routine and predictable activities, it will be a
challenge for non-standard activities. A more sophisticated mechanism for supporting users in this detailed
manipulation would greatly improve usability of the device.
• For grabbing objects, the user has to get the orientation and openness of the gripper right. Again, specific support
should be provided for fine-tuning the relationship between the gripper and object.
• The positions and movements of most joints will usually be of no direct interest to the user except when there is some
obstacle to be circumvented. The user’s main interest is likely to be in the properties of the gripper, and therefore the
main task will involve moving the whole arm. The menu does make this task the easiest to perform (see Figure 3), but
the requirement to re-select MoveArm before every direction change means that moving the whole arm is more
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 11
complex than strictly necessary, suggesting a re-design of the menu to enable the user to specify a series of directions
one after the other.
• Both input devices pose some difficulties: of accurate voice recognition, or of timely and appropriate use of gesture.
Users need to learn the vocabularies necessary to use the device(s) of their choice – probably not something that can
be designed out without a radical design change.
• Some menu transitions are difficult. A task-oriented analysis would help to further probe this issue and propose
detailed design solutions.
• An additional point emerged semi-independently of the CASSM analysis, which is that while the user is looking at
the gripper to get its attributes right, (s)he cannot also be looking at the screen to work with gesture control. Although
this point is not explicitly represented in the Cassata model, it emerged in the course of doing the analysis through
thinking about the different entities (gripper and cursor) and how the user would divide attention between them, given
the recognized importance of the gripper position.
Other, more focused, analysis methods, such as Cognitive Walkthrough (Wharton et al., 1994), identify detailed
interaction problems – for example, a possible confusion between the terms “move” and “movearm” as the user is
making a selection at the interface – which do not easily emerge from a CASSM analysis. Conversely, issues concerning
the mapping between the device and the real world objects would be less likely to emerge from a Cognitive Walkthrough
than they have from the CASSM analysis, though it partly depends on the expertise of the analyst and their prior
experience of this kind of system. The insights obtained from applying any method depend on both what features the
method encourages the analyst to consider and the expertise of the analyst. A tool orients the mind of the analyst in a
particular direction – to be more sensitive to some issues and less sensitive to others. This issue is explored in more detail
by Blandford et al. (forthcoming), who report on a comparison of the findings from eight different analyses of this
robotic arm. That study provided video evidence to support most of the findings from the CASSM analysis – for
example, showing the arm moving in one direction, stopping, then being moved back, indicating difficulties with fine
movement. It also showed that CASSM facilitates evaluation in terms of conceptual fit, which the other methods that
were applied did not, but that other methods are better suited for reasoning about issues such as timing, task structure or
the modalities used in interaction (criterion H1 in Figure 1).
EXAMPLE 2: A DIGITAL LIBRARY SYSTEM
The first example was chosen to illustrate the early stages of CASSM analysis. Our second example is drawn from an
analysis of the ACM digital library (DL), based on observations of four library users. Our purpose in presenting extracts
from this analysis is to illustrate the use of verbal protocol data to inform CASSM analysis.
Participants were all students completing a Masters course in Human–Computer Interaction and all working on their
individual research projects at the time of the study. These participants are representative of one important class of users
who make regular use of the ACM DL. Participants were invited to search for articles relevant to their current
information needs, and to think aloud while doing so. Occasionally, the observer intervened with questions in the style of
Contextual Inquiry (Beyer & Holtzblatt, 1998) to encourage participants to talk more explicitly about their understanding
of the systems they were working with. Participants were invited to work with whatever information resources they
chose; in practice, all four chose to work mainly with the ACM DL. Audio data was recorded and transcribed, and notes
were made of key user actions. In this analysis, five categories of user concepts were identified:
• Concepts concerning the information they were looking for, relating to the topic of the search, the domain within
which they were searching, particular ideas they were interested in and particular researchers who were pertinent to
• Features of the resources (the ACM DL, other digital libraries, the HCI Bibliography, Google and the Web of
Knowledge) that they used.
• Concepts from the domain of publishing, relating to journals, articles, publishers, authors, etc.
• Concepts pertaining to electronic search, such as query terms, results lists and their properties.
• Features of themselves as consumers of information, including their interests, domain knowledge, search expertise,
access rights to particular resources and research direction.
For the purpose of tracing how verbal protocol data feeds into analysis and helps identify new design possibilities, we
focus on the first of these categories: how users described what they were looking for.
Participant 1 was looking for fairly focused material:
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 12
“I was looking at something on ‘language patterns’ yesterday on Google, so I’d like to have a look a bit more on that. It
was originally thought up by Christopher Alexander, who is usually in design and architecture but I think there might
have been studies applying the work to HCI.”
He regarded this as “an unspecified topic”.
This participant is talking not just about a topic (how pattern languages can be used in HCI), but also about a particular
idea (pattern language), the researcher who originated that idea (Christopher Alexander) and domains of research
(design, architecture, HCI). He reinforced some of this understanding, saying:
“Christopher Alexander is the guy that has been attributed to coming up with the idea.”
And “I’d go to the ACM because it’s an HCI topic”.
He illustrated his understanding of the cross-referencing of material between texts (that the work of one researcher would
be referred to in a work by another), saying that he “came across it by accident in a book by Heath and Luff.”
He also recognized that there were different ways of expressing this topic as a search query. For example:
“I know the topic I want is either ‘language patterns’ or ‘pattern languages’.”
Participant two also started with not just a topic (focus groups) but the view that the material of interest would pertain to
a particular domain (HCI):
“I was thinking of looking for how focus groups are used in HCI in terms of evaluating initial designs and stuff like
She recognized that she was uncertain about how to describe this topic to the search engine:
“I’d put in “focus groups” using quotes so that, well I’m guessing that it deals with it as one word and, I’m not sure. I’ll
put a plus. I’m never quite sure how the search, I wouldn’t say random, but you seem to get funny stuff back from the
ACM in terms of results.”
She also recognized that there was material returned in response to a search that was of more or less interest to her:
“I’d just go through some of this stuff and see what I find interesting. Year 2000 focus groups. I’m thinking that’s about
the year 2000 bug, so it won’t be that relevant.”
Participant 3 was also concerned with the interestingness of particular topics:
“I’m specifically interested in designing interactive voice systems, so I wouldn’t be interested in a lot of these titles.”
He had some difficulty identifying suitable search terms and, like participants 1 and 2, was concerned about finding
articles in the domain of HCI:
“I’ll change it to ‘voice response usability’ because ‘usability’ is fairly specific compared to ‘design’, which could be
technology, or architecture: it could be in a different domain. And ACM’s big, it’s got all the different disciplines within
Participant 4 expressed an understanding of broader and more narrow topics as a focus for search:
“I’m going to look for something on qualitative data analysis as a broad thing and see if I can get something to do with
dialogue analysis or something to do with conversation.”
Participant 4 echoed themes from earlier participants concerning topics and usefulness (which, for the purposes of this
analysis we assume is the same as interestingness: it is unlikely that differentiating between these concepts will be useful
for the analysis):
“This one says something about dialogue structure and coding schemes and although it’s not really to do with my
topic, it might be useful.”
She also indicated that she was exploring how to describe the interesting topic in search terms:
“I’m going to go back to my original search list and put in ‘dialogue’ and ‘coding’ because I hadn’t thought about
looking for that term.”
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 13
Entity / attribute
features in domains
works in domain
ref'd in (other) paper
Figure 4. Concepts relating to information sought
These extracts show how the same concepts are expressed (albeit in slightly different ways) by multiple participants,
giving assurance that they are significant concepts from a user perspective. These concepts are summarized in Figure 4.
This data table has been exported from Cassata.
The concepts that are present for the user emerge directly from the analysis. To form the assessments relating to system
and interface, it was necessary to look at the ACM DL and its interface. For example, there was no direct representation
of topics, specific ideas or domains in the underlying system, so these are listed as being absent from the system
representation. However, by reading the interface representation, such as the titles or abstracts of papers, the user can
(probably) infer the topic, ideas and domain, so they have been labelled as ‘difficult’ in Figure 4. Nevertheless, most
attributes of these entities are generally absent from the interface. The user is forced to find work-arounds for expressing
these concepts. If we consider how they might be represented in the DL (e.g. exploiting existing meta-data), we realise
that these ideas are similar (but not identical) to the ‘general terms’ and ‘index terms’ in the ACM classification system.
We could imagine an implementation of the search facility that allowed the user to easily prioritize (for example) articles
that included the general term “Human Factors”. The current implementation of the ACM DL hides this possibility, and
none of our participants discovered it. Participant 3 expressed this succinctly:
“ACM’s big, it’s got all the different disciplines within it, so I’m just trying to focus in on usability related stuff.”
This indicates a promising design change: to make it easier for users to focus a search by particular general terms or
classifiers within the ACM classification system (or, indeed, to be able to browse by classifier).
One concept that the DL represents comparatively well is that of a researcher (or at least, an author). Indeed, although
information about what domain an individual works in and what ideas they have originated is poorly represented, other
information about them – notably people they have co-authored with – is clearly represented through a link to
“collaborative colleagues”. Although none of the participants in this study referred to the idea that an individual might
collaborate with others, or hinted that that might be an interesting or important item of information, this information may
be of use to other user groups of the ACM DL (this is an empirical question). In this case, the representation of a concept
(who collaborates with whom) within the DL does not represent a difficulty to the user, because the use of this feature is
In this example, we have shown how an analysis of verbal data and the existing DL implementation can highlight both
strengths and weaknesses of the conceptual model underlying the current system design, and identify new design
possibilities that – in this case – would represent incremental improvements to the interaction (criterion A2 in Figure 1).
EXAMPLE 3: A DRAWING TOOL
The two examples discussed so far have only considered surface misfits. Our final example focuses directly on structural
misfits in a system where a central issue is change of state and its consequences. These misfits may exist in any system
that permits users to design (or construct) new representations. Examples include word processors (which allow users to
create and change documents), drawing programs, music editors, HTML editors and programming environments.
Structural misfits exist if changes that are conceptually simple for the user are difficult or unwieldy to perform using a
particular system. Systems that enable the user to engage in a design activity are typically more complex than our first
Evaluating systems using CASSM
AEB, TRGG, DF, SM, April 2007 14
two examples, so this analysis has a different flavour from the two above: it omits the initial analysis of surface misfits
and focuses directly on the more detailed analysis required to reason about structural misfits.
This example is taken from a study of a drawing tool. Surface misfits included users having difficulty learning about the
‘handles’ used to resize objects and the ‘layers’ that objects reside on. However, we focus on the difficulties of making
changes to a drawing once it has been constructed. For example, adding a child onto a family tree may require many
individual actions as illustrated in Figure 5: Freda, Graham and Maria all need to be moved (each by a different distance),
as do the lines connecting them to their parents; the horizontal lines need to be re-sized; and Helen and her line need to
be inserted. This particular example illustrates the CD of viscosity (Green, 1990), or resistance to change. Green (1990)
distinguished two types of viscosity, repetition and knock-on. An informal example of repetition viscosity is going
through a large batch of images by hand and changing the resolution of each one, without the benefit of any batch-
processing macro; the user has to repeat the same action many times. An informal example of knock-on viscosity is
changing the size of an image in a paper and then having to inspect the rest of the paper to check for any bad page breaks
that result, and correct them, plus perhaps re-computing the table of contents if the pagination has changed; here the user
has to clean up the mess caused by the first action. (In North America, a better-known phrase for the same concept
appears to be ‘domino effect’.)
Figure 5. A simple conceptual action requires many device actions (dashed lines are those that needed to be moved or resized;
names in italics had to be moved or inserted).
More formally, repetition viscosity occurs when what the user sees as a single action within his or her conceptual model
cannot be performed as a single device action, but instead requires repetitive actions. This has been formally defined (and
implemented in Cassata) as follows. Changing attribute P of entity A, A(P), needs many actions if:
A(P) is not directly modifiable
B(Q) affects A(P)
B(Q) is modifiable
A consists-of B
In the case of a family tree, the position of the family (as it appears on the printed page) cannot be directly changed
(except if the whole family is selected and moved as a block, which cannot be done when new members are being added
or there are space constraints). The position of the family is defined by the positions of all its members, which can be
individually changed. In CASSM terms:
family(position) is not directly modifiable
member(position) affects family(position)
member(position) is modifiable
family consists-of member
In this case, the intuitive idea of ‘repetition viscosity’ is easier to work with than its formal representation, but the
exercise of working out why such a drawing tool is viscous (for tasks such as preparing a family tree) has highlighted
some of the important concepts (families, their members and their relative positions) that pertain in this case.
Evaluating systems using CASSM Download full-text
AEB, TRGG, DF, SM, April 2007 15
The second type of viscosity is knock-on viscosity: the property that changing one attribute may lead to the need to
adjust other things to restore internal consistency. Changing A(P) has possible knock-on if:
A(P) is modifiable
A(P) affects B(Q)
there is a goal constraint on B(Q)
In the case of a family tree, we have intuitively the idea that the structure of a family is immutable (the order in which
children arrive cannot be changed!), that the visually represented position of a person in the family tree reflects the
family order, and that the family order as displayed should be the same as the family order ‘in the world’:
member(position) is modifiable
member(position) affects family(order)
there is a goal constraint on family(order)
Again, the informal understanding of viscosity seems to be easier to work with than the formal one, but the act of
formalising highlights the fact that this example suffers from both kinds of viscosity, and highlights some of the
important user concepts (position in family, etc.) that might otherwise remain implicit. Figures 6 and 7 show
(respectively) the description of this example in Cassata and the output of automated analysis of this example. The
relationships shown in Figure 6 are represented in a standardised format (‘actor’, ‘relationship’, ‘ acted-on’) to express
the ideas outlined above. As shown in Figure 7, the automated analysis identifies an additional, possibly spurious, case of
repetition viscosity because the test does not include a check of whether it is reasonable to change the family order.
Figure 6. Cassata description of the family tree example.