ArticlePDF Available

The Usability Metric for User Experience

Authors:
  • Dunlap and Associates Inc.

Abstract and Figures

The Usability Metric for User Experience (UMUX) is a four-item Likert scale used for the subjective assessment of an application’s perceived usability. It is designed to provide results similar to those obtained with the 10-item System Usability Scale, and is organized around the ISO 9241–11 definition of usability. A pilot version was assembled from candidate items, which was then tested alongside the System Usability Scale during usability testing. It was shown that the two scales correlate well, are reliable, and both align on one underlying usability factor. In addition, the Usability Metric for User Experience is compact enough to serve as a usability module in a broader user experience metric.
Content may be subject to copyright.
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
The Usability Metric for User Experience
Kraig Finstad
Intel
Ò
Corporation, 2501 NW 229th Ave., M/S RA1-222, Hillsboro, OR 97124, United States
article info
Article history:
Received 21 September 2009
Accepted 6 April 2010
Available online 6 May 2010
Keywords:
Usability
User experience
Scale
Metric
abstract
The Usability Metric for User Experience (UMUX) is a four-item Likert scale used for the subjective assess-
ment of an application’s perceived usability. It is designed to provide results similar to those obtained
with the 10-item System Usability Scale, and is organized around the ISO 9241-11 definition of usability.
A pilot version was assembled from candidate items, which was then tested alongside the System Usabil-
ity Scale during usability testing. It was shown that the two scales correlate well, are reliable, and both
align on one underlying usability factor. In addition, the Usability Metric for User Experience is compact
enough to serve as a usability module in a broader user experience metric.
Ó2010 Elsevier B.V. All rights reserved.
1. Introduction
Measuring and tracking usability is an ongoing challenge for
organizations that are concerned with improving user experience.
A popular and cost-effective approach to usability measurement is
the use of standardized surveys. When the Information Technology
(IT) department at Intel
Ò
decided to standardize on a usability
inventory, it selected the System Usability Scale (SUS). The SUS is
a 10-item, five-point Likert scale with a weighted scoring range
of 0–100 and which has been shown to be a reliable measure of
usability. It is anchored with one as Strongly Disagree and five as
Strongly Agree. According to Holyer (1993), it correlates at 0.86
with the 50-item Software Usability Measurement Inventory (Kira-
kowski et al., 1992). Tullis and Stetson (2004) found the SUS to out-
perform the Questionnaire for User Interface Satisfaction (Chin
et al., 1988) and the Computer System Usability Questionnaire
(Lewis, 1995) at assessing website usability. The SUS was adopted
as a standard usability measure because of these performance
characteristics, in addition to being free and relatively compact.
It proved to be easy for project teams to understand, but several is-
sues emerged. As IT at Intel
Ò
began to pursue a more comprehen-
sive approach to user experience, the SUS was originally
considered as a usability module for a more comprehensive index
of user experience. This definition describes user experience as a
lifecycle consisting of: Marketing and Brand Awareness, Acquisi-
tion and Installation, Product or Service Use, Product Support,
and Removal/End of Life (Sward and Macarthur, 2007). However,
it became apparent that simply adapting the SUS to work as a
Product Use component was not feasible. Early trials with internal
project teams showed that a 10-item Product Use module would
be too large when other elements such as Product Support were
factored in and required their own additional scales. The concept
of user experience covers a lot of ground: any Product Use or
usability component of a larger user experience index would have
to be much more compact than 10 items. Also, in its original form,
the SUS did not lend itself well to electronic distribution in a global
environment due to non-native English speakers not understand-
ing the word ‘‘cumbersome” in SUS Item 8 (Finstad, 2006), and it
used a five-point Likert scale which has been shown to be inade-
quate in many cases. Diefenbach et al. (1993) found that seven-
point scales outperformed five-point scales in reliability, accuracy,
and ease of use, while Cox’s (1980) review of Likert scales found
the optimal number of alternatives to be seven. Finstad (in press)
found that respondents were more likely to provide non-integer
interpolations (e.g., saying ‘‘three and a half” instead of ‘‘three”
or ‘‘four”) in the five-point SUS than in a seven-point alternate ver-
sion of the same instrument. These interpolations indicate a mis-
match between the scale and a user’s actual evaluation. From a
more theoretical standpoint, the SUS items did not map well onto
the concepts that comprise usability according to ISO 9241-11
(1998), namely effectiveness, efficiency, and satisfaction. These
mappings are important because the SUS is not a diagnostic tool;
it can indicate whether there is s a problem with a system’s usabil-
ity but not what those problems actually are. It is often used as a
starting point in usability efforts, but an alignment with known
usability factors can provide a stronger foundation for user experi-
ence efforts.
These issues with the SUS motivated a research program aimed
at developing a replacement. The goal was to provide an inventory
that was substantially shorter than the SUS and therefore appropri-
ate as the usability component of a larger user experience index.
An early attempt at item set reduction aimed to leverage a single
0953-5438/$ - see front matter Ó2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.intcom.2010.04.004
E-mail address: kraig.a.finstad@intel.com
Interacting with Computers 22 (2010) 323–327
Contents lists available at ScienceDirect
Interacting with Computers
journal homepage: www.elsevier.com/locate/intcom
Author's personal copy
ease of use item from the SUS. A SUS survey with 43 responses was
conducted on an enterprise portal product. It was found that Item 3
in the SUS, ‘‘I thought [the system] was easy to use” correlated with
the final SUS score at r= 0.89, p< 0.01; the strongest correlation in
the set of 10 items. This result was not surprising in light of recent
findings. Sauro and Dumas (2009) have demonstrated the utility of
a general ease of use Likert item, and have also shown a promising
alternative in the Subjective Mental Effort Questionnaire (SMEQ).
Tedesco and Tullis (2006) found that a single ‘‘Overall this task
was: Very Difficult...Very Easy” Likert item correlated significantly
with usability test performance. This direction motivated further
analysis of SUS surveys and SUS Item 3 with other systems, but
no consistent pattern emerged. In some cases SUS Item 3 corre-
lated most strongly with the final SUS score, and in others it did
not. The idea of reducing an instrument to one general ease of
use item was abandoned. Instead, a new direction was taken
the development of a concise scale that would more closely con-
form to the ISO 9241-11 (1998) definition of usability, would
minimize bias and language issues, and would still perform as well
as the baseline it was intended to replace. In this case the baseline
was the updated, internationally-appropriate SUS (with ‘‘cumber-
some” clarified as ‘‘awkward”) and the performance goal of total
SUS score to the total score of the new scale was set at a correlation
of 0.80 or better. The resulting instrument is the Usability Metric
for User Experience (UMUX), and this paper outlines the research
and development of this usability component of a more general
user experience measurement model.
2. Pilot study
A pilot study was developed to explore these possibilities. The
end goal of the pilot study was the determination of how candidate
Likert items would fare in an analysis of actual responses to items.
2.1. Method
2.1.1. Participants
A total of 42 Intel
Ò
employees were recruited as part of a larger
worldwide usability test. As a control for cultural and language fac-
tors in both the usability task and the candidate Likert items, par-
ticipants were recruited worldwide. Users from the United States,
Germany, Ireland, the Netherlands, China, the Philippines, Malay-
sia, and Israel participated in this study.
2.1.2. Materials
A pool of candidate Likert items was developed that were re-
lated to the ISO 9241-11 (1998) definition of usability. A total of
12 such items were developed, four each for effectiveness, effi-
ciency, and satisfaction. Some were intentionally generic, while
others were behavior-based (e.g., ‘‘I don’t make many errors with
this system”) or emotion-based (e.g., ‘‘I would prefer to use some-
thing other than this system”). These candidate items used a five-
point scale so they could be used alongside the SUS in an actual
post-deployment usability survey. Also like the SUS, they used an
alternating positive/negative keying to control for acquiescence
bias. These 12 candidate items and their usability factors are listed
in Table 1.
2.1.3. Design and procedure
Participants first engaged in a usability test of an enterprise
software prototype involving the selection of contract workers
and adding them to a database. After completing the usability test,
participants received a modified version of the SUS. The first three
items were candidate items, followed by the SUS, which was then
followed by three more candidate items. This format presented the
SUS as an intact instrument in order to achieve a valid final score
for analysis. Each participant therefore responded to six candidate
items, two per usability component (effectiveness, efficiency, and
satisfaction), in addition to the SUS. This allowed a direct per-par-
ticipant comparison of candidate item responses with a final SUS
score. Presentation of candidate items was counterbalanced across
participants. Response to the Likert items was verbal, with the en-
tire items read aloud to help ensure comprehension of the scale.
The facilitator recorded responses manually. After completion of
the composite survey, participants were thanked for their time, de-
briefed, and excused.
2.2. Results
2.2.1. Item correlations
The odd items in the SUS were scored as [score 1], and the
even items were scored as [5 score]. This aligned all scores in
one direction, removing the positive/negative keying of the lan-
guage in the instrument. It also allowed zeroes at the bottom of
the range. The ten rescored items were summed and multiplied
by 2.5, providing a range of 0–100 (Brooke, 1996). The critical mea-
sure of this study was the correlation of UMUX candidate items
(scored similarly to the SUS) with the final SUS score. A high corre-
lation coefficient indicated that the candidate item was in line with
the total SUS score, regardless of direction. That is, a good candi-
date item would correlate highly with the SUS regardless of
whether the SUS itself was indicating good or poor usability. This
is a different approach from that used in developing the original
SUS, which selected candidates based on their tendencies toward
extreme (non-neutral) responses (Brooke, 1996). The UMUX is in-
tended to match the performance of the SUS, so alignment with
existing measures is more important.
As the UMUX was being designed to reflect the ISO 9241-11
(1998) definition of usability with as few items as possible, the
highest-correlating candidate items for each usability component
were chosen for further study. Table 2 below summarizes these
results.
All the correlations in this table were negative due to the nega-
tive keying of the candidates; for instance, if the application was
usable then the participants disagreed on the item. The more gen-
eral items with language like ‘‘I am satisfied... tended to correlate
poorly. As a point of comparison, the correlations of the items in
the SUS to the SUS score itself varied from r= 0.36 to r= 0.78.
No participants required assistance with the terminology or
phrasing of the UMUX candidate items. This was taken as evidence
Table 1
Candidate items used (pilot study).
Usability
component
Candidate item
Efficiency [This system] saves me time.
I tend to make a lot of mistakes with [this system].
I don’t make many errors with [this system].
I have to spend a lot of time correcting things with [this
system].
Effectiveness [This system] allows me to accomplish my tasks.
I think I would need a system with more features for my
tasks.
I would not need to supplement [this system] with an
additional one.
[This system’s] capabilities would not meet my
requirements.
Satisfaction I am satisfied with [this system].
I would prefer to use something other than [this system].
Given a choice, I would choose [this system] over others.
Using [this system] was a frustrating experience.
Note: Bracketed text is custom-replaced by relevant system.
324 K. Finstad / Interacting with Computers 22 (2010) 323–327
Author's personal copy
that the items were appropriate for an international English-speak-
ing audience.
2.2.2. Analysis of preliminary instrument
These results motivated an analysis to determine how the best
candidate items would perform if they comprised an actual instru-
ment that yielded a SUS-like usability score. If candidate item data
from the pilot study could produce a result comparable to the SUS,
those items would be subjected to a wider scale validation study
with new participants. The preliminary UMUX was comprised of
the three highest-correlating candidate items from each ISO
(1998) usability factor shown in Table 2, plus the overall ease of
use from the SUS (‘‘I thought the system was easy to use”), which
had shown earlier to be promising as a general question with
r= 0.89, p< 0.01 (see Section 1).
Data for the analysis consisted of the 21 response sets from the
pilot study that included the candidate items. These four candidate
items from a five-point scale were used with a 2.5 multiplier, pro-
viding a score range of 0–40 (compared to 0–100 for the SUS). The
preliminary UMUX attained a mean score of 24 out of 40, and the
SUS for the same participants attained a mean score of 60 out of
100. Both of these scores were 60% of their respective maximums.
The preliminary UMUX correlated with the SUS at r= 0.81, p< 01.
2.3. Discussion
The pilot study identified the three most promising candidates
to be included in a measurement instrument along with an addi-
tional ease of use item. The results for the candidate items were
in line with correlations achieved by the SUS itself. When com-
bined into a preliminary user experience inventory, the four candi-
date items met the research program’s goal of a correlation higher
than 0.80 with the SUS.
3. Survey study
The next step was to design an experiment directly comparing
the SUS with the new UMUX instrument.
3.1. Method
3.1.1. Participants
Participants consisted of two groups of users of enterprise soft-
ware at Intel
Ò
. System 1 was a contract worker enterprise applica-
tion that had been rated as having poor usability, and System 2 was
an audio-conferencing application that had been rated as having
good usability. Valid responses received from survey requests re-
sulted in System 1 with n= 273 and System 2 with n= 285.
3.1.2. Materials
Some minor changes were made to the candidate items to build
the experimental UMUX to better balance the positive/negative
keying and to clear up some potential confusion with the Efficiency
item. For comparison with the original items in Table 2, see the
completed UMUX in Table 3.
The UMUX used in this survey study was a seven-point Likert
scale, anchored with one as Strongly Disagree and seven as
Strongly Agree. Like the SUS, all other response options were num-
bered but otherwise unlabeled. This move to a seven-point scale
gave the UMUX an initial range of 0–60, after applying the 2.5 mul-
tiplier from the SUS. These UMUX scores could be presented as a
percentage of the maximum (60) to provide a final range compara-
ble to that found in the SUS (0–100). The SUS was also modified as
per Finstad (2006), clarifying ‘‘cumbersome” as ‘‘awkward”.
3.1.3. Design and procedure
This study used a between-subjects design, with participants
using one of two systems (having poor usability or good usability)
and then responding to both the UMUX and the SUS. Presentation
of the instruments was counterbalanced so that half the participants
responded to the UMUX first, and the other half responded to the SUS
first. These instruments were administered electronically through a
combination of email invitation and online survey tool.
3.2. Results
3.2.1. Principal components
A common first step in validating instruments is through prin-
cipal components analysis (Tabachnik and Fidell, 1989). The results
from the initial principal component extraction are shown below
in Table 4.
The strength of the first principal component led to the conclu-
sion that UMUX items were aligning along one usability compo-
nent. This perspective is supported by the scree plot of the
components, shown in Fig. 1 below.
Tabachnik and Fidell (1989) recommend the point where the
scree plot line changes direction as a determinant of the number
of components; this plot’s direction drops off dramatically after
the first component. This is strong evidence for the scale measuring
one ‘‘usability” component. Because no secondary components
Table 2
Items having highest correlation with overall SUS score (pilot study).
Usability component Candidate UMUX Item r
Efficiency I have to spend a lot of time correcting things with [this system]. 0.48
*
Effectiveness [This system’s] capabilities would not meet my requirements. 0.50
*
Satisfaction Using [this system] was a frustrating experience. 0.76
*
Note: Bracketed text is custom-replaced by relevant system.
*
p< 0.05.
Table 3
Usability components and scale items (survey study).
Usability
component
Candidate UMUX item
Effectiveness [This system’s] capabilities meet my requirements.
Satisfaction Using [this system] is a frustrating experience.
Overall [This system] is easy to use.
Efficiency I have to spend too much time correcting things with
[this system].
Note: Bracketed text is custom-replaced by relevant system.
Table 4
Principal components (survey study).
Principal component Eigenvalue Percent of variance explained
1 3.37 84.37
2 0.31 7.83
3 0.20 4.88
4 0.12 2.92
K. Finstad / Interacting with Computers 22 (2010) 323–327 325
Author's personal copy
emerged from the analysis, no attempts at further extractions or
rotations were performed. The SUS provided a similar one-compo-
nent extraction, with no additional elements emerging. For a more
thorough treatment of factoring in the SUS, see Lewis and Sauro
(2009), who found evidence that the SUS may be comprised of
two factors (usability and learnability). The conclusion from this
analysis is that both instruments were unidimensional and align
on just one component (usability) rather than several.
3.2.2. Reliability
Instruments need to measure an underlying construct consis-
tently. At the early stages of a metric’s development, one way to
establish this is through reliability estimation. Cronbach’s alpha
is a correlation coefficient that indicates how well a factor is being
measured. The rule of thumb for Cronbach’s alpha is that a coeffi-
cient of higher than an absolute value of 0.70 indicates a high de-
gree of internal reliability. Instruments farther along in their
development are subjected to more longitudinal reliability mea-
sures. The Cronbach’s alpha for both instruments indicated high
reliability: 0.94 for the UMUX and 0.97 for the SUS. Therefore, both
instruments were reliable.
3.2.3. Validity and sensitivity
The overall correlation of UMUX with the SUS, across both sys-
tem conditions, was r= 0.96, p< 0.001. These results exceed the
goal criterion of r> 0.80, providing evidence of validity. T-tests
demonstrated that System 2 was more usable than System 1,
t(533) = 39.04, r= 0.86, p< 0.01 for UMUX, t(556) = 44.47,
r= 0.89, p< 0.01 for SUS, thereby providing evidence for sensitivity.
The breakdown of usability inventory scores and correlations is
shown in Table 5.
3.2.4. Item correlations
After the UMUX had been developed and finalized, the perfor-
mance of its individual items was examined in two applied situa-
tions. All final UMUX items were analyzed for their contribution
to the overall UMUX score, both as a post-usability test question-
naire (n= 45) and in the first seven internal usability projects com-
pleted with the new scale as a standard instrument (n= 272). The
results shown in Table 6 demonstrate significant item-total corre-
lations. It was therefore concluded that all UMUX items were valid
contributors to the overall score.
4. Discussion
4.1. Implementation
The UMUX can be administered electronically as a survey, or as
a follow-up in usability testing. It is simple to administer, as it re-
quires no branching or reordering of items. The UMUX is imple-
mented as shown below, where bracketed text is custom-
replaced by the relevant system.
1. [This system’s] capabilities meet my requirements.
1234567
Strongly Strongly
Disagree Agree
2. Using [this system] is a frustrating experience.
1234567
Strongly Strongly
Disagree Agree
3. [This system] is easy to use.
1234567
Strongly Strongly
Disagree Agree
4. I have to spend too much time correcting things with
[this system].
1234567
Strongly Strongly
Disagree Agree
4.2. Analysis
Once data are collected, they need to be properly recoded, with
a method that borrows from the SUS. Odd items are scored as
[score 1], and even items are scored as [7 score]. As with the
SUS, this removes the positive/negative keying of the items and al-
lows a minimum score of zero. Each individual UMUX item has a
range of 0 6 after recoding, giving the entire four-item scale a
preliminary maximum of 24. To achieve parity with the 0–100
range provided by the SUS, a participant’s UMUX score is the
sum of the four items divided by 24, and then multiplied by 100.
This calculation replaces the earlier methodology of weighting
items by a 2.5 multiplier. These scores across participants are then
averaged to find a mean UMUX score. It is this mean score and its
confidence interval that become the application’s UMUX metrics
for a system’s usability tracking and goal-setting.
4.3. Limitations
The UMUX, like the SUS, provides a subjective evaluation of a
system’s usability. Its scoring has yet to be compared to objective
Fig. 1. Scree plot of principal components (survey study).
Table 5
Means, standard deviations, and correlation (survey study).
System UMUX (0–100) SUS (0–100) r
System 1 27.66 (20.54) 28.77 (18.19) 0.84
*
System 2 87.91 (15.98) 88.39 (13.18) 0.81
*
*
p< 0.001.
Table 6
Correlations of UMUX items with overall score (survey study).
Scale item Post-
test r
Surveys
r
1. [This system’s] capabilities meet my requirements. 0.78
*
0.85
*
2. Using [this system] is a frustrating experience. 0.76
*
0.89
*
3. [This system] is easy to use. 0.76
*
0.87
*
4. I have to spend too much time correcting things with
[this system].
0.69
*
0.81
*
*
p< 0.05.
326 K. Finstad / Interacting with Computers 22 (2010) 323–327
Author's personal copy
metrics, such as error rates and task timings, in a full experiment.
Additionally, as it is currently the first module in a planned series
of user experience measures, it only measures usability.
As the UMUX consists of only four Likert items, it has fewer to-
tal data points available to respondents than the SUS, although the
move to a seven-point scale does provide some mitigation. It has
four seven-point items for a total of 28 data points, while the
SUS has ten five-point items for a total of 50 data points. By com-
parison, a singular ease of use item like that used in Tedesco and
Tullis (2006) may have only five data points. Reducing the total
information capacity of a survey effort can result in a less sensitive
measure. Once validity and reliability are established, there is still
a potential risk of application beyond the metric’s scope. For exam-
ple, a simple ease of use item may do an exemplary job of measur-
ing ease of use, but a particular user experience professional needs
to determine whether that information is sufficient as a usability
metric.
4.4. Conclusion
It can be concluded that the Usability Metric for User Expe-
rience is a reliable, valid, and sensitive alternative to the Sys-
tem Usability Scale. It correlates with the SUS at a rate higher
than 0.80, its items align on one usability factor, and it is fully
capable as a standalone subjective usability metric. It is also
aligned to a fundamental learning for the user experience com-
munity: in order to measure user experience effectively, its
components need to be measured efficiently. The compact size
of the UMUX is suited to a more fully realized measurement
model of user experience. Such a model would go beyond
Product Use, and would include other product lifecycle stages
such as Brand Awareness and Installation (Sward and Macar-
thur, 2007). Sward (personal communication, August 5, 2009)
indicates significant progress in this area. The UMUX is well-
positioned as a foundation for developing future instruments,
with the ultimate goal of metrics that can target any of a
product’s user experience aspects in a way that is concise
and cross-validated.
Acknowledgements
Thanks to David Sward of Symantec™ for his work on the user
experience lifecycle model, Pete Lockhart of Intel
Ò
for item lan-
guage suggestions, Charles Lambdin of Intel
Ò
for statistical assis-
tance, and Linda Wooding of Intel
Ò
for management support in
implementing this research.
References
Brooke, J., 1996. SUS: A ‘‘quick and dirty” usability scale. In: Jordan, P.W., Thomas, B.,
Weerdmeester, B.A., McClelland, A.L. (Eds.), Usability Evaluation in Industry.
Taylor and Francis, London.
Chin, J.P., Diehl, V.A., Norman, K., 1988. Development of an instrument measuring
user satisfaction of the human–computer interface. In: Proceedings of ACM CHI
’88, Washington, DC, pp. 213–218.
Cox III, E.P., 1980. The optimal number of response alternatives for a scale: a review.
Journal of Marketing Research 17, 407–422.
Diefenbach, M.A., Weinstein, N.D., O’Reilly, J., 1993. Scales for assessing perceptions
of health hazard susceptibility. Health Education Research 8, 181–192.
Finstad, K., 2006. The system usability scale and non-native english speakers.
Journal of Usability Studies 1 (4), 185–188.
Finstad, K., in press. Response interpolation and scale sensitivity: evidence against
five-point scales. Journal of Usability Studies.
Holyer, A., 1993. Methods for Evaluating user Interfaces. Cognitive Science Research
Paper No. 301. School of Cognitive and Computing Sciences, University of Sussex.
ISO 9241-11 (1998). Ergonomic Requirements for Office Work with Visual Display
Terminals (VDTs). Part 11: Guidance on Usability.
Kirakowski, J., Porteous, M., Corbett, M., 1992. How to use the software usability
measurement inventory: the users’ view of software quality. In: Proceedings
European Conference on Software Quality, Madrid.
Lewis, J., 1995. IBM Computer usability satisfaction questionnaires: Psychometric
evaluation and instructions for use. International Journal of Human–Computer
Interaction 7 (1), 57–78.
Lewis, J.R., Sauro, J., 2009. The factor structure of the System Usability Scale. In:
Proceedings of the Human–Computer Interaction International Conference
(HCII 2009), San Diego CA, USA.
Sauro, J., Dumas, J.S., 2009. Comparison of three one-question, post-task usability
questionnaires. In: Proceedings of the 27th International Conference on Human
Factors in Computing Systems. Boston.
Sward, D., Macarthur, G., 2007. Making user experience design a business strategy.
Towards a UX Manifesto; SIGCHI Workshop. Lancaster, UK, September 3–4.
Tabachnik, B.G., Fidell, L.S., 1989. Using Multivariate Statistics, 2nd ed. Harper
Collins, New York.
Tedesco, D., Tullis, T., 2006. A comparison of methods for eliciting post-task
subjective ratings in usability testing. Usability Professionals Association (UPA)
2006, 1–9.
Tullis, T.S., Stetson, J.N., 2004. A comparison of questionnaires for assessing website
usability. In: Proceedings of UPA 2004, June 7–11.
K. Finstad / Interacting with Computers 22 (2010) 323–327 327
... Bu anket için daha önce pek çok çalışmada kullanılan Sistem Kullanılabilirlik Ölçeği (System Usability Scale) kullanılmıştır. Bu ölçeğin cronbach alfa değeri 0,97'dir, yani güvenirliği yüksek bir ölçektir(Finstad, 2010).El yazısı tanıma sitemleri yazıları ve kelimeleri daha iyi tanımaya başladıkça bu sistemler giderek daha kullanışlı hale gelecektir. Ayrıca çok daha hassas dokunmatik ekranlara sahip tablet bilgisayarlar için artık pek çok el yazısı tanıma sistemi uygulaması mevcuttur. ...
... Complete results of every participant also yielded UMUX questionnaire score (Finstad, 2010), evaluated according to the standard key (answers on a scale of 0 to 6 summed up, then mapped to a 0-100 range). ...
Article
Full-text available
In user experience design, prototypes are an indispensable tool for early diagnosis of usability issues. Designing usability testing to accommodate a prototype's limited interactivity is essential to obtain relevant participant feedback. Hotspot Highlighting is a technique employed by all prominent prototyping tools to allow usability testers to see which areas of the prototype are clickable. The current body of knowledge lacks definite answers on how highlighting impacts usability testing results, compared to scenarios where participants complete tasks fully on their own. Can studies be treated the same, regardless of whether Hotspot Highlighting is enabled? What are the recommendations for how and when Hotspot Highlighting can or should be used? To investigate, we conduct a between-subjects experiment with 80 participants and 240 task completions in which we compare user behavior depending on the presence of Hotspot Highlighting. Its results indicate that Hotspot Highlighting can affect participant behavior before and after a highlight is displayed, leading to potentially different usability findings if left unaccounted for. The guidance of highlights changes the targets of clicks and encourages cognitively efficient finding of solutions by intentionally triggering the highlights. Considering the potential of Hotspot Highlighting to facilitate the usability testing of prototypes with limited interactivity, we discuss potential adaptations of the technique that address its current issues for more methodologically sound usability evaluation.
... Os respondentes escolhem a opção que melhor reflete sua opinião. Essas escalas são comumente usadas para quantificar sentimentos, percepções e opiniões em pesquisas sociais e psicológicas (Finstad, 2010). Para o trabalho foi adotada a aplicação do questionário com uma escala Likert de 7 pontos. ...
Article
Full-text available
O presente trabalho teve como objetivo o desenvolvimento de um sistema para a aplicação da metodologia de Mapa de Empatia para a análise de UX (User Experience) e UI (User Interface), sendo para tanto realizado o levantamento dos principais conceitos relacionados, com o conseguente criação da solução que auxilia desenvolvedores na identificação de pontos fortes e a serem melhorados em suas propostas de interface. Para isso, foram consideradas pesquisas bibliográficas que fundamentam e revelam aspectos e teorias quanto ao funcionamento do Design Thinking e seus componentes fundamentais. Ainda, foram considerados meios, no qual a Inteligência Artificial possa colaborar para a facilitação do desenvolvimento de Mapa de Empatia, com a sugestão de conteúdos que possam colaborar com os resultados, por meio de um Web Crawler. Ainda, foram considerados os principais aspectos necessários para facilitação do entendimento quanto aos requisitos adequados para o desenvolvimento do sistema e suas interfaces. Assim, foi necessário o levantamento de dados quanto ao entendimento do trabalho realizado junto a UX e UI, o que auxiliou na proposta de desenvolvimento de maneira geral, por meio de possíveis sugestões automáticas e correções, a serem realizadas por meio da utilização de um Chatbot para interação.
... The usability test was established taking as a reference the research of [5], [6] who implemented it in their research work, one in the medical context and the other in the educational context. This usability test is composed of 15 items, of which 6 are measurable using a 5-point Likert scale, with the score of 1 point being a bad score and 5 an excellent score, covering the needs of the system, the user's experience, the ease of interaction and the time it takes to interact [7], adding 2 items, one referring to the visual presentation and the other to the clarity of the instructions. Another 5 items are control items, in which the age, gender, the group to which the students correspond, and the start and end time are requested. ...
Article
Full-text available
This work was developed with the purpose of evaluating the usability of an augmented reality (AR) application, designed to visualize three-dimensional structures through physical markers. This, with the intention of strengthening the design of the app based on the usability results, thus achieving improvements in the user experience and in the problem-solving practices of three-dimensional statics and the analysis of forces in equilibrium. The evaluation instruments were applied to a sample of 90 students of the School of Mechanical and Electrical Engineering of the Universidad Autónoma de Nuevo León. These instruments consist of 15 items, 6 of which register measurable parameters under the 5-point Likert scale, while the rest are control items. The usability tests yielded information that allowed implementing improvements in a new version of the AR app, which was tested again, obtaining positive results in all measured parameters, averaging an overall score with an increase of 7.9% with respect to the beta version of the app and savings in waiting times of approximately 10 seconds.
... After each participant carried out all tasks, they then completed the SUS questionnaire [28]. Usability Metric for User Experience (UMUX) [46], Post-Study System Usability Questionnaire (PSSUQ) [47] and the After Scenario Questionnaire (ASQ) [6,48] are other commonly used instruments for measuring the usability of the application, but the SUS scale has been used and validated in a large number of studies over the years to assess perceived usability [49], which makes it highly reliable and allows it to be compared with other systems and applications. Its reliability and consistency have been statistically proved, achieving a Cronbach's Alpha 0.92 [49], showing consistency even for small sample sizes [50]. ...
Article
Full-text available
Background Adverse effects are a common burden for cancer patients, impacting their well‐being and diminishing their quality of life. Therefore, it is essential to have a clinical decision support system that can proactively monitor patient progress to prevent and manage complications. Aims This research aims to thoroughly test the usability and user‐friendliness of a medical device designed for managing adverse events for cancer patients and healthcare professionals (HCPs). The study seeks to assess how well the device meets both patients' and HCPs' needs in real‐world scenarios. Methods and Results The study used a multi‐method approach to obtain a comprehensive understanding of participants experience and objective measure of usability. The testing was conducted with a diverse group of participants of six patients and six HCPs. Analysis included a descriptive summary of the demographic data, scenario completion rates, System Usability Scale (SUS) questionnaire score, and qualitative feedback from users. All participants successfully completed 100% of the activities, indicating a high level of understanding and usability across both user groups. Only two out of six patients encountered errors during the login activities, but these errors were unrelated to product safety. The obtained SUS score is in the 90th percentile for both user groups, classifying the device as grade A and highlighting its superior usability. Patients and HCPs found the interface intuitive and expressed an interest in incorporating the application into their daily routines and would recommend the application to others. Conclusion The assessed digital health medical device demonstrates excellent usability, safety, and ease of use for oncology patients and HCPs. Based on the received constructive feedback, minor improvements were identified for further refinement of the application that do not affect either its intended functionality or the overall functioning of the tool. Future work will focus on implementing these improvements and conducting further usability studies in clinical environments.
Article
The continuing decline of trust in the processing and fair use of personal data has been well documented. A web standard called Solid (SOcial LInked Data) is being developed to halt and mitigate the decline. It puts users in control of their data by enabling them to securely store their data in decentralized personal online data stores (pods). However, little is known about the impact of such pods. Therefore, we measured the effects of the use of a Solid pod on users’ perceived transparency, understanding, control and trust, as well as on usability and behavioural intention. We set up an experiment that compares the perceptions of users about how their data is being processed and stored on a cookies-enabled news recommendation website with a similar Solid-pod-enabled news recommendation website. Although our findings do not provide evidence that this has significantly impacted users’ trust, perceived usability, or intention to use the system, it has been found to significantly impact users’ perceived transparency and control.
Article
Full-text available
The objective of this study is to adapt and evaluate the Turkish version of the Chatbot Usability Scale (BUS-11) through a confirmatory factor analysis method. The BUS-11 scale has been established in various languages except for Turkish; thus, its validation and dissemination could serve as a means to improve chatbot interaction satisfaction among the Turkish-speaking population and hence foster growth in Turkey’s conversational agent market. To achieve this aim, seven customer-oriented chatbots were rated on pre-designed tasks by participants. Data collection involved using the Turkish-adapted BUS11 (TBUS-11) to assess individuals’ experiences after interacting with Turkish-speaking chatbots, along with the Turkish version of the UMUX-LITE scale. Results show that TBUS-11 has been demonstrated to be highly reliable with a strong convergent validity with the UMUX-LITE already validated in Turkish. Moreover, the analysis demonstrated that the dataset supported the five-factor structure of the original version of the scale, thus confirming the psychometric properties of the TBUS. The study successfully adapted the BUS-11 into Turkish, providing a reliable and valid tool for assessing chatbot usability in the Turkish-speaking market. This can potentially enhance user satisfaction and promote the growth of conversational agents in Turkiye.
Article
Full-text available
A series of usability tests was run on two enterprise software applications, followed by verbal administration of the System Usability Scale. The original instrument with its 5-point Likert items was presented, as well as an alternate version modified with 7-point Likert items. Participants in the 5-point scale condition were more likely than those presented with the 7-point scale to interpolate, i.e., attempt a response between two discrete values presented to them. In an applied setting, this implied that electronic radio-button style survey tools using 5-point items might not be accurately measuring participant responses. This finding supported the conclusion that 7-point Likert items provide a more accurate measure of a participant's true evaluation and are more appropriate for electronically-distributed and otherwise unsupervised usability questionnaires.
Conference Paper
Full-text available
This paper urges the User Centered Design (UCD) community to broaden its perspective to deliver the user experiences demanded by consumers. The challenge for UCD is to integrate with other business disciplines; pool organizational resources; and drive a user-centric approach throughout the organization. This results in UX being embedded in business strategy and emerges as the basis for everything the company does. This paper presents a set of interrelated strategies to assist in delivering a sustainable competitive advantage through compelling user experiences. These include linking UX to the bottom line of the firm; implementing a User Experience Design (UXD) Program; and managing the UXD capability.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach’s alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach's alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Conference Paper
Full-text available
Five methods for eliciting subjective ratings after each task in a usability test were evaluated. The methods included simple Likert scales as well as a technique derived from Usability Magnitude Estimation. They were tested in a large-scale online study in which participants performed six tasks on an Intranet site. Performance data for the tasks reflected the same pattern as all of the subjective ratings. All five methods yielded significant differences in the subjective ratings for the tasks. A sub-sampling analysis showed that one method yielded the most reliable results at the small sample sizes typical of usability tests.
Article
Full-text available
The Software Usability Measurement Inventory is a rigorously tested and proven method of measuring software quality from the end user's point of view.SUMI is a consistent method for assessing the quality of use of a software product or prototype, and can assist with the detection of usability flaws before a product is shipped.It is backed by an extensive reference database embedded in an effective analysis and report generation tool.
Article
A conceptual framework employing the distinction between stimulus-centered and subject-centered scales is presented as a basis for reviewing 80 years of literature on the optimal number of response alternatives for a scale. Concepts and research from information theory and the absolute judgment paradigm of psychophysics are used. The author reviews the major factors influencing the quality of scaled information, points out areas in particular need of additional research, and makes some recommendations for the applied researcher.