ArticlePDF Available

Revisiting the Factor Structure of the System Usability Scale

Authors:
  • MeasuringU

Abstract and Figures

In 2009, we published a paper in which we showed how three independent sources of data indicated that, rather than being a unidimensional measure of perceived usability, the System Usability Scale apparently had two factors: Usability (all items except 4 and 10) and Learnability (Items 4 and 10). In that paper, we called for other researchers to report attempts to replicate that finding. The published research since 2009 has consistently failed to replicate that factor structure. In this paper, we report an analysis of over 9,000 completed SUS questionnaires that shows that the SUS is indeed bidimensional, but not in any interesting or useful way. A comparison of the fit of three confirmatory factor analyses showed that a model in which the SUS's positive-tone (odd-numbered) and negative-tone (even-numbered) were aligned with two factors had a better fit than a unidimensional model (all items on one factor) or the Usability/Learnability model we published in 2009. Because a distinction based on item tone is of little practical or theoretical interest, we recommend that user experience practitioners and researchers treat the SUS as a unidimensional measure of perceived usability, and no longer routinely compute Usability and Learnability subscales.
Content may be subject to copyright.
Vol. 12, Issue 4, August 2017 pp. 183192
Copyright © 20162017, User Experience Professionals Association and the authors. Permission to make digital or
hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on
the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requ ires prior specific
permission and/or a fee. URL: http://www.upassoc.org.
Revisiting the Factor Structure of
the System Usability Scale
James (Jim) R. Lewis
Senior HF Engineer
IBM Corp.
5901 Broken Sound Parkway
Suite 514C
Boca Raton, FL 33487
USA
jimlewis@us.ibm.com
Jeff Sauro
MeasuringU
Principal
jeff@measuringu.com
Abstract
In 2009, we published a paper in which we showed how
three independent sources of data indicated that, rather than
being a unidimensional measure of perceived usability, the
System Usability Scale apparently had two factors: Usability
(all items except 4 and 10) and Learnability (Items 4 and
10). In that paper, we called for other researchers to report
attempts to replicate that finding. The published research
since 2009 has consistently failed to replicate that factor
structure. In this paper, we report an analysis of over 9,000
completed SUS questionnaires that shows that the SUS is
indeed bidimensional, but not in any interesting or useful
way. A comparison of the fit of three confirmatory factor
analyses showed that a model in which the SUS’s positive-
tone (odd-numbered) and negative-tone (even-numbered)
were aligned with two factors had a better fit than a
unidimensional model (all items on one factor) or the
Usability/Learnability model we published in 2009. Because a
distinction based on item tone is of little practical or
theoretical interest, we recommend that user experience
practitioners and researchers treat the SUS as a
unidimensional measure of perceived usability, and no longer
routinely compute Usability and Learnability subscales.
Keywords
System Usability Scale, SUS, factor structure, perceived
usability, perceived learnability, confirmatory factor analysis
184
Journal of Usability Studies Vol. 12, Issue 4, August 2017
Introduction
In this section, we discuss our reasoning as to why we revisited the factor structure of SUS,
what is the SUS, the psychometric properties of SUS, and our objectives for this study.
Why Revisit the Factor Structure of the System Usability Scale (SUS)?
There are still lessons to be learned in the domain of standardized usability testingstill
work to do. For example, what is the real factor structure of the SUS? (Lewis, 2014, p.
675).
The SUS (Brooke, 1996) is a very popular (if not the most popular) standardized questionnaire
for the assessment of perceived usability. Sauro and Lewis (2009), in a study of unpublished
industrial usability studies, found that the SUS accounted for 43% of post-test questionnaire
usage. It has been cited in over 1,200 publications (Brooke, 2013).
The SUS was designed to be a unidimensional (one factor) measurement of perceived usability
(Brooke, 1996). Once researchers began to publish data sets (or correlation matrices) from
sample sizes large enough to support factor analysis, it began to appear that SUS might be
bidimensional (having a structure with two factors). Factor analyses of data from three
independent studies (Borsci, Federici, & Lauriola, 2009; Lewis & Sauro, 2009, which included a
reanalysis of the SUS item correlation matrix published by Bangor, Kortum, & Miller, 2008)
indicated a consistent two-factor structure (with Items 4 and 10 aligning on a factor separate
from the remaining items). Lewis and Sauro named the two factors Usability (all items except 4
and 10) and Learnability (Items 4 and 10).
This was an exciting finding, with support from three independent sources. These new scales
had good psychometric properties (e.g., coefficient alpha greater than 0.70). A sensitivity
analysis using data from 19 tests provided evidence of the differential utility of the new scales.
The promise of this research was that practitioners could continue to use the standard SUS
but, at no extra cost, could also take advantage of the new scales to extract additional
information from their SUS data. Google Scholar metrics (visited 9/17/2016) indicate the paper
that reported this finding (Lewis & Sauro, 2009) has been cited over 350 times.
Unfortunately, analyses conducted since 2009 (Kortum & Sorber, 2015; Lewis, Brown, & Mayes,
2015; Lewis, Utesch, & Mayer, 2013, 2015; Sauro & Lewis, 2011) have typically resulted in a
two-factor structure but have not consistently replicated the item-factor alignment that seemed
apparent in 2009 (a separation of Items 4 and 10). Research by Borsci, Federici, Bacci, Gnaldi,
and Bartolucci (2015) suggested the possibility that one- versus the two-factor structure
(Usability/Learnability) might depend on the level of user experience, but Lewis, Utesch, and
Maher (2015) were not able to replicate this finding. Otherwise, the more recent analyses have
been somewhat consistent with a general alignment of positive- and negative-tone items on
separate factorsthe type of unintentional structure that can occur with sets of mixed-tone
items (Barnette, 2000; Davis, 1989; Pilotte & Gable, 1990; Schmitt & Stuits, 1985; Schriesheim
& Hill, 1981; Stewart & Frye, 2004; Wong, Rindfleisch, & Burroughs, 2003). Specific reported
structures have included the following (and note that in every case the second factor has
included Items 4 and 10, but not in isolation):
Factor 1: Items 1, 3, 5, 7, 9; Factor 2: Items 2, 4, 6, 8, 10 (Kortum & Sorber, 2015;
Lewis, Brown, & Mayes, 2015)
Factor 1: Items 1, 3, 5, 6, 7, 8, 9; Factor 2: Items 2, 4, 10 (Kortum & Sorber, 2015)
Factor 1: Items 1, 2, 3, 5, 7, 9; Factor 2: Items 4, 6, 8, 10 (Sauro & Lewis, 2011)
Factor 1: Items 1, 9; Factor 2: Items 2, 3, 4, 5, 6, 7, 8, 10 (Borsci et al., 2015; Lewis,
Utesch, & Maher, 2015)
When we published our 2009 paper, we were following the data. Our paper has been influential,
with over 350 recorded citations. Unfortunately, as clear as the factor structure appeared to be
in 2009, analyses since then have failed to replicate the reported Usability/Learnability structure
with alarming consistency. We believe it is time to reassess the factor structure of the SUS, and
have brought together the largest collection of completed SUS questionnaires of which we are
185
Journal of Usability Studies Vol. 12, Issue 4, August 2017
aware (N > 9,000) to, as definitively as possible, compare the fit of various models of the factor
structure of the SUS.
What Is the SUS?
As shown in Figure 1, the standard version of the SUS has 10 items, each with five steps
anchored with "Strongly Disagree" and "Strongly Agree." It is a mixed-tone questionnaire in
which the odd-numbered items have a positive tone and the even-numbered items have a
negative tone. The first step in scoring a SUS is to determine each item's score contribution,
which will range from 0 (a poor experience) to 4 (a good experience). For positively-worded
items (odd numbers), the score contribution is the scale position minus 1. For negatively-
worded items (even numbers), the score contribution is 5 minus the scale position. To get the
overall SUS score, multiply the sum of the item score contributions by 2.5, which produces a
score that can range from 0 (very poor perceived usability) to 100 (excellent perceived
usability) in 2.5-point increments.
Figure 1. The standard System Usability Scale. Note: Item 8 shows "awkward" in place of the
original "cumbersome" (Finstad, 2006; Sauro & Lewis, 2009).
Psychometric Properties of the SUS
The SUS has excellent psychometric properties. Research has consistently shown the SUS to
have reliabilities at or just over 0.90 (Bangor et al., 2008; Lewis, Brown, & Mayes, 2015; Lewis
& Sauro, 2009; Lewis, Utesch, & Maher, 2015), far above the minimum criterion of 0.70 for
measurements of sentiments (Nunnally, 1978). The SUS has also been shown to have
acceptable levels of concurrent validity (Bangor, Joseph, Sweeney-Dillon, Stettler, & Pratt,
2013; Bangor et al., 2008; Kortum & Peres, 2014; Lewis, Brown, & Mayes, 2015; Peres, Pham,
Philips, 2013) and sensitivity (Kortum & Bangor, 2013; Kortum & Sorber, 2015; Lewis & Sauro,
2009; Tullis & Stetson, 2004). Norms are available to guide the interpretation of the SUS
(Bangor, Kortum, & Miller, 2008, 2009; Sauro, 2011; Sauro & Lewis, 2016).
Objective of the Current Study
The objective of this current study is to revisit the factor structure of the SUS. The strategy is to
use a very large sample of completed SUS questionnaires to (a) use exploratory factor analysis
to reveal the apparent alignment of items, then (b) use confirmatory factor analysis to assess
the goodness of fit for three models of item-factor alignment: the Unidimensional model (all 10
186
Journal of Usability Studies Vol. 12, Issue 4, August 2017
SUS items on one factor), the Usability/Learnability model (Items 4 and 10 on one factor, all
other items on a second factor), and a Tone model (based on the tone of the SUS items, with
positive tone items on one factor, negative tone items on a second factor).
Method
For this study, we assembled a data set of 9,156 completed SUS questionnaires from 112
unpublished industrial usability studies and surveys from a range of software products and
websites. Most of the datasets did not have a sufficient sample size for factor analysis, but
combined, this is the largest collection of completed SUS questionnaires of which we are aware
and provides considerable power for statistical analysis (MacCallum, Browne, & Sugawara,
1996). All analyses were conducted using standard SUS item contribution scores rather than
raw scores so score directions were consistent (0-4 point scales; low = poor experience; high =
good experience).
Results
In the following sections, we discuss the results as they relate to the exploratory analyses and
the confirmatory factor analyses.
Exploratory Analyses
Investigators have used a variety of methods to explore the structure of the SUS. To address
the variety of techniques in the literature, we used three popular methods available in IBM SPSS
Statistics Version 23: principal components analysis (PCAstrictly speaking, not a factor
analytic method, but commonly used), unweighted least squares factor analysis (ULSFA
minimizes the sum of the squared differences between the observed and reproduced correlation
matrices), and maximum likelihood factor analysis (MLFAproduces parameter estimates that
are most likely to have produced the observed correlation matrix if the sample is from a
multivariate normal distribution). The use of these three methods allows the determination that
the observed factor structure is robust across the different analytical approaches.
The eigenvalues from the exploratory analyses were 5.637, 1.467, 0.547, 0.491, 0.379, 0.344,
0.317, 0.309, 0.257, and 0.251. Parallel analysis of the eigenvalues (Ledesma & Valero-Mora,
2007; Patil, Singh, Mishra, & Donovan, 2007) indicated a two-factor solution. As shown in
Table 1, all three methods (with Varimax-rotated two-factor structures) were consistent with
the Tone model (positive and negative tone items loading more strongly on different
components/factors).
Table 1. Component/Factor Loadings for Three Exploratory Structural Analyses
Item
PCA 1
PCA 2
ULSFA 1
ULSFA 2
MLFA 1
MLFA 2
1
0.048
0.771
0.638
0.115
0.638
0.116
2
0.739
0.372
0.388
0.686
0.391
0.689
3
0.361
0.798
0.790
0.348
0.793
0.347
4
0.852
0.061
0.108
0.777
0.108
0.772
5
0.211
0.819
0.770
0.219
0.767
0.223
6
0.771
0.339
0.354
0.725
0.348
0.732
7
0.321
0.753
0.706
0.320
0.712
0.316
8
0.767
0.422
0.431
0.742
0.428
0.745
9
0.364
0.778
0.756
0.356
0.751
0.356
10
0.833
0.180
0.213
0.773
0.216
0.766
187
Journal of Usability Studies Vol. 12, Issue 4, August 2017
Confirmatory Factor Analyses
Confirmatory factor analysis (CFA) differs from exploratory factor analysis (EFA) in that an EFA
produces unconstrained results that the researcher examines for structural clues, but a CFA is
constrained to a precisely defined model (Cliff, 1987). Researchers can conduct CFAs on
multiple proposed models and compare their indices of goodness-of-fit to assess which model
has the best fit to the given data. Jackson, Gillaspy, Jr., and Purc-Stephenson (2009) have
recommended reporting fit statistics that have different measurement properties such as the
comparative fit index (CFIa score of 0.90 or higher indicates good fit), the root-mean-square
error of approximation (RMSEAvalues less than 0.08 indicate acceptable fit), and the Bayesian
information criterion (BIClower values are preferred). It is common to also report chi-square
tests of absolute model fit, but when sample sizes are very large, such tests almost always lead
to rejection of the hypothesis of adequate fit (Kline, 2011), making them uninformative.
Instead, we have focused on comparative fit metrics.
We used the lavaan package in the statistical program R (Rosseel, 2012) to conduct CFA on the
three models of the SUS described in the introduction. Figures 2, 3, and 4 illustrate the three
models (created using SPSS AMOS 24). Model 1 (Figure 2) represents the unidimensional model
of SUS, which was over-identified with 55 sample moments and 20 parameters (df = 35).
Model 2, the two-factor Usability and Learnability model shown in Figure 3, was also over-
identified with 55 sample moments and 21 parameters (df = 34). Model 3, the two-factor
positive-negative model shown in Figure 4, was also over-identified with 55 sample moments
and 21 parameters (df = 34). Table 2 shows the results of the comparative fit analyses of the
three models (with 90% confidence intervals for RMSEA produced in lavaan by default).
Figure 2. Model 1, the unidimensional SUS.
188
Journal of Usability Studies Vol. 12, Issue 4, August 2017
Figure 3. Model 2, the bidimensional SUS (Usability/Learnability).
Figure 4. Model 3, the bidimensional SUS (Positive/Negative Tone).
Table 2. Results of CFAs of Three Structural Models of the SUS
Model
CFI
90%
lower
RMSEA
90%
upper
BIC
1
0.799
0.187
0.190
0.193
11801
2
0.838
0.170
0.173
0.176
9543
3
0.958
0.085
0.088
0.091
2449
Consistent with the results from the EFA, the multiple fit statistics indicated that the best-fitting
model was the Positive/Negative Tone model. That was the only one of the three models that
had a CFI greater than 0.90, and its RMSEA, despite not quite achieving the criterion of being
less than 0.08 for acceptable fit, was about half of that for the other two models. Notably, there
189
Journal of Usability Studies Vol. 12, Issue 4, August 2017
was no overlap among the RMSEA confidence intervals, which is evidence of statistically
significant differences. The Bayesian information criterion (BIC) was also lowest (best) for the
Positive/Negative Tone model.
Conclusion
One of the strengths of the scientific method is its self-correction when the accumulation of
evidence indicates a need to do so. It can be disappointing when an interesting finding fails to
survive continuing scrutiny, but this is how our knowledge advancesby keeping a distant
reaction to results rather than rooting for a particular outcome.
In 2009, we published a paper (Lewis & Sauro, 2009) in which we showed how three
independent sources of data indicated that, rather than being a unidimensional measure of
perceived usability, the System Usability Scale apparently had two factors: Usability (all items
except 4 and 10) and Learnability (Items 4 and 10). In that paper, we called for other
researchers to report attempts to replicate that finding, and we also continued this investigation
in our own research. That paper has been cited over 350 times.
The published research since 2009 has consistently failed to replicate that Usability/Learnability
factor structure. In this paper, we reported an analysis of over 9,000 completed SUS
questionnaires that shows that the SUS is indeed bidimensional, but not in any interesting or
useful way. A comparison of the fit of three confirmatory factor analyses showed that a model in
which the SUS’s positive-tone (odd-numbered) and negative-tone (even-numbered) were
aligned with two factors had a better fit than a unidimensional model (all items on one factor) or
the Usability/Learnability model we published in 2009.
Thus, the factor structure of the SUS appears to be bidimensional, but apparently not in any
interesting way. It is well known that mixed tone questionnaires like the SUS often exhibit this
type of nuisance structure when factor analyzed (Barnette, 2000; Davis, 1989; Pilotte & Gable,
1990; Schmitt & Stuits, 1985; Schriesheim & Hill, 1981; Stewart & Frye, 2004; Wong et al.,
2003). The same pattern has been reported for the Usability Metric for User Experience (UMUX)
(Lewis, Utesch, & Maher, 2013), another metric of perceived usability that has items with mixed
tone. Davis (1989), in his development of the Technology Acceptance Model, started with a pool
of mixed tone items, but found that the mixed tone was causing problems in his attempt to get
clear factors for Perceived Usefulness and Perceived Ease-of-Use. He consequently eliminated
the negative-tone items from consideration.
It is possible that the SUS might have internal structure that is obscured by the effect of having
mixed tone items, but we found no significant evidence supporting that hypothesis. It is
interesting to note in Table 1 that the magnitude of the factor loadings for Items 4 and 10 in all
three exploratory analyses were greater than those for Items 2, 6, and 8 on the negative tone
factor, suggesting (but not proving) that there might be some research contexts in which they
would emerge as an independent factor.
Because a distinction based on item tone is of little practical or theoretical interest when
measuring with the SUS, it is, with some regret but based on accumulating evidence, that we
recommend that user experience practitioners and researchers treat the SUS as a
unidimensional measure of perceived usability, and no longer routinely compute or report
Usability and Learnability subscales.
Recommendations for Researchers
Researchers should be cautious in their use of the Usability/Learnability factor structure
reported by Lewis and Sauro (2009). As shown in Table 1, Items 4 and 10 loaded more strongly
on the negative tone factor than the other three items. It might be the case that the
Usability/Learnability structure appears in certain special circumstances (e.g., as reported by
Borsci et al., 2015 in their investigation of the amount of experience users have with a product),
but such findings require replication. Although the evidence strongly suggests that the SUS is
bidimensional as a function of item tone, these dimensions are of little theoretical or practical
interest. Unless there is compelling evidence in a specific domain of research to support
interpretation of an alternative structure, the best research policy is to interpret the SUS as a
unidimensional measure of perceived usability.
190
Journal of Usability Studies Vol. 12, Issue 4, August 2017
Tips for Usability Practitioners
The following are some guidelines for practitioners:
Do not routinely compute Usability and Learnability subscales from SUS data.
Instead, routinely compute the standard overall SUS and interpret it as a
unidimensional measure of perceived usability.
Only if you are working in a context in which the Usability and Learnability subscales
have been shown to reliably occur, should you compute and report them.
References
Bangor, A., Joseph, K., Sweeney-Dillon, M., Stettler, G., & Pratt, J. (2013). Using the SUS to
help demonstrate usability’s value to business goals. In Proceedings of the Human Factors
Society and Ergonomics Society Annual Meeting (pp. 202205). Santa Monica, CA: HFES.
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation of the System Usability
Scale. International Journal of Human-Computer Interaction, 24, 574594.
Bangor, A., Kortum, P. T., & Miller, J. T. (2009). Determining what individual SUS scores mean:
Adding an adjective rating scale. Journal of Usability Studies. 4(3), 114123.
Barnette, J. J. (2000). Effects of stem and Likert response option reversals on survey internal
consistency: If you feel the need, there is a better alternative to using those negatively
worded stems. Educational and Psychological Measurement, 60, 361370.
Borsci, S., Federici, S., Bacci, S., Gnaldi, M., & Bartolucci, F. (2015). Assessing user satisfaction
in the era of user experience: Comparison of the SUS, UMUX and UMUX-LITE as a function
of product experience. International Journal of Human-Computer Interaction, 31(8), 484
495.
Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of the System Usability
Scale: A test of alternative measurement models. Cognitive Processes, 10, 193197.
Brooke, J. (1996). SUS: A ‘quick and dirty’ usability scale. In P. Jordan, B. Thomas, & B.
Weerdmeester (Eds.), Usability Evaluation in Industry (pp. 189194). London, UK: Taylor &
Francis.
Brooke, J. (2013). SUS: A retrospective. Journal of Usability Studies, 8(2), 2940.
Cliff, N. (1987). Analyzing multivariate data. Orlando, FL: Harcourt, Brace, Jovanovich.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of
information technology. MIS Quarterly, 13, 319339.
Finstad, K. (2006). The System Usability Scale and non-native English speakers. Journal of
Usability Studies, 1(4), 185188.
Jackson, D. L., Gillaspy, Jr., J. A., & Purc-Stephenson. (2009). Reporting practices in
confirmatory factor analysis: An overview and some recommendations. Psychological
Methods, 14, 623.
Kline, R. B. (2011). Principles and practices of structural equation modeling (3rd ed.). New
York, NY: The Guilford Press.
Kortum, P., & Bangor, A. (2013). Usability ratings for everyday products measured with the
System Usability Scale. International Journal of Human-Computer Interaction, 29, 6776.
Kortum, P., & Peres, S. C. (2014). The relationship between system effectiveness and
subjective usability scores using the System Usability Scale. International Journal of
Human-Computer Interaction, 30, 575584.
Kortum, P., & Sorber, M. (2015). Measuring the usability of mobile applications for phones and
tablets. International Journal of Human-Computer Interaction, 31, 518529.
Ledesma, R. D., & Valero-Mora, P. (2007). Determining the number of factors to retain in EFA:
An easy-to-use computer program for carrying out parallel analysis. Practical Assessment,
Research & Evaluation, 12(2), 111.
191
Journal of Usability Studies Vol. 12, Issue 4, August 2017
Lewis, J. R. (2014). Usability: Lessons learned . . . and yet to be learned. International Journal
of Human-Computer Interaction, 30, 663684.
Lewis, J. R., Brown, J., & Mayes, D. K. (2015). Psychometric evaluation of the EMO and the SUS
in the context of a large-sample unmoderated usability study. International Journal of
Human-Computer Interaction, 31(8), 545553.
Lewis, J. R., & Sauro, J. (2009). The factor structure of the System Usability Scale. In Kurosu,
M. (Ed.), Human Centered Design, HCII 2009 (pp. 94103). Heidelberg, Germany:
Springer-Verlag.
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE When there’s no time for the
SUS. In Proceedings of CHI 2013 (pp. 20992102). Paris, France: ACM.
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2015). Measuring perceived usability: The SUS,
UMUX-LITE, and AltUsability. International Journal of Human-Computer Interaction, 31,
496505.
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination
of sample size for covariance structure modeling. Psychological Methods, 1, 130-149.
Nunnally, J.C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
Patil, V. H., Singh, S. N., Mishra, S., & D. Donavan, T. (2007). Parallel Analysis Engine to Aid
Determining Number of Factors to Retain [Computer software]. Available from
http://smishra.faculty.ku.edu/parallelengine.htm.
Peres, S. C., Pham, T., & Phillips, R. (2013). Validation of the System Usability Scale (SUS):
SUS in the wild. In Proceedings of the Human Factors and Ergonomics Society Annual
Meeting (pp. 192196). Santa Monica, CA: HFES.
Pilotte, W. J., & Gable, R. K. (1990). The impact of positive and negative item stems on the
validity of a computer anxiety scale. Educational and Psychological Measurement, 50, 603
610.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical
Software, 48(2), 136.
Sauro, J. (2011). A practical guide to the System Usability Scale. Denver, CO: Measuring
Usability.
Sauro, J., & Lewis, J. R. (2009). Correlations among prototypical usability metrics: Evidence for
the construct of usability. In Proceedings of CHI 2009 (pp. 16091618). Boston, MA: ACM.
Sauro, J., & Lewis, J. R. (2011). When designing usability questionnaires, does it hurt to be
positive? In Proceedings of CHI 2011 (pp. 22152223). Vancouver, Canada: ACM.
Sauro, J., & Lewis, J. R. (2016). Quantifying the user experience: Practical statistics for user
research (2nd ed.). Cambridge, MA: Morgan-Kaufmann.
Schmitt, N., & Stuits, D. (1985) Factors defined by negatively keyed items: The result of
careless respondents? Applied Psychological Measurement, 9, 367373.
Schriesheim, C. A., & Hill, K. D. (1981). Controlling acquiescence response bias by item
reversals: the effect on questionnaire validity. Educational and Psychological Measurement,
41, 11011114.
Stewart, T. J., & Frye, A. W. (2004). Investigating the use of negatively-phrased survey items
in medical education settings: Common wisdom or common mistake? Academic Medicine,
79 (Suppl. 10), S1S3.
Tullis, T. S., & Stetson, J. N. (2004). A comparison of questionnaires for assessing website
usability. Paper presented at the Usability Professionals Association Annual Conference,
June. Minneapolis, MN, USA: UPA.
Wong, N., Rindfleisch, A., & Burroughs, J. (2003). Do reverse-worded items confound measures
in cross-cultural consumer research? The case of the material values scale. Journal of
Consumer Research, 30, 7291.
192
Journal of Usability Studies Vol. 12, Issue 4, August 2017
About the Authors
James R. (Jim) Lewis,
PhD
Dr. Lewis is a senior
human factors engineer
(at IBM since 1981). He
has published
influential papers in the
areas of usability
testing and
measurement. His
books include Practical
Speech User Interface
Design and (with Jeff
Sauro) Quantifying the
User Experience (now
in its second edition).
Jeff Sauro, PhD
Dr. Sauro is a six-sigma trained
statistical analyst and founding
principal of MeasuringU, a
customer experience research
firm based in Denver. He has
conducted usability tests and
statistical analysis for
companies such as Google,
eBay, Walmart, Autodesk,
Lenovo, and Dropbox, and has
published over 20 peer-
reviewed research articles and
5 books, including Customer
Analytics for Dummies.
... Despite no significant changes in the number and diversity of cooling strategies and behaviours utilised by participants, qualitative responses suggested that some participants modified their use of air-conditioning, fans, consumed more fluids, and monitored weather forecasts to prepare in advance of hot weather. These results are in accord with findings reported by others 12,35,42 . ...
... Another significant outcome was that participants found the intervention to be useful and acceptable to the extent that some expressed a wish to keep the system. The acceptable system usability score of 78 has also been associated with a verbal user rating of 'good' in other studies 42,43 . Moreover, the usability score of the intervention is higher than commonly used digital devices such as Global Positioning System (GPS) (SUS-70.8) ...
Article
Full-text available
Extreme heat events lead to considerable health burden and are becoming more severe and frequent, calling for the development of effective population-based and individualised heat early warning systems. We developed an individualised heat early warning system and tested it in 78 older adults’ ( ≥ 65 years) homes in Southeast Queensland, Australia. Quantitative and qualitative data from this proof-of-concept testing study showed that the Ethos system performed well on a standard usability scale (mean score of 78 on the System Usability Scale). Following a summer-time use of this early warning system, there were increases in heat preparedness ( P < 0.001, marginal homogeneity tests) but no significant increases in heat health risk perception or the uptake of low-cost cooling measures (e.g., hand/forearm bath, fans). This proof-of-concept research demonstrated the usability of this tailored, actionable, real-time digital heat early warning system, although the effectiveness of the system remains to be evaluated in a robust trial design.
... Although the SUS was designed as a unidimensional (one-factor) measurement of perceived usability (Brook 1996), between 1996 and 2009, researchers indicated a consistent two-factor structure with items 4 and 10 aligning with Learnability and the rest of the items with Usability (Borsci et al. 2009;Lewis & Sauro 2009). Although different results on the two-factor structure were achieved after 2009, numerous studies have continued to refer to the two factors (James and Sauro 2017). In this regard, the results on question 4 of the SUS indicate that students do not perceive a need for assistance from a technical person to use the app, while in the responses on question 10, students perceive that they would learn the app easily. ...
Article
Full-text available
Due to the coronavirus disease 2019 (COVID-2019) and the sudden shift to online learning, higher education institutions adopted various approaches to reduce cheating in online assessments, mainly involving online live proctoring (OLP). The international assessment integrity regulation trend also applied to a university in South Africa, where accounting lecturers implemented using a mobile invigilation application (app) during online off-campus assessments. This study explored student perceptions and the system usability of an invigilation app during digital assessments to develop a framework for enhanced technology adoption. This study used a mixed-method, convergence parallel approach from the functionalist paradigm. This included the qualitative exploration of students’ open-ended online feedback and responses on the System Usability Scale (SUS) after using an invigilator app on their mobile phones during an assessment. The Technology Acceptance Model (TAM) (Davis, MIS Quarterly 13:319-340, 1989) was used as a theoretical foundation for the study. Universal elements of students’ perceived invigilation experiences identified by Marano et al. (Higher Education Quarterly e12506, 2023) were added to the TAM constructs to create a conceptual framework for the exploration. Students’ online written responses were analysed through the constant comparative method (Boeije, Quantity and Quality 36:391-409, 2002) using ATLAS.ti™ software. Findings were presented as data networks based on the codes created and discussed according to the conceptual framework. The SUS results converged with the qualitative findings to create a novel conceptual model for enhanced invigilation technology implementation. The converged conceptual framework serves as a blueprint for addressing the successful implementation of an invigilation system with average usability by intentionally preparing students and leveraging learnability to address individual and technological concerns, perceived usefulness, perceived ease of use and attitude for increased technology adoption.
... Given that the experiment itself is a validation of a serious game used for education, "need to learn a lot of things" might be interpreted as "learning and loading prior knowledge about casting" in response to the opposite of what was intended. In addition, item 4 and item 10 are considered to belong to another dimension in the study of Lewis et al. [27]. We have also modified the expression of some items to clarify the meaning. ...
Article
Full-text available
Background A serious game module called “D-Casting” was developed in the previous study. This study aimed to determine the effectiveness of the D-Casting module. Methods The experiment consisted of two parts: construct validity assessment and skill transfer assessment. Eligible participants, who were students majoring in dental and dental technology, were recruited from the Stomatology College of Chongqing Medical University. The intervention was designed based on the serious games assessment framework. Results The construct validity assessment recruited a total of 145 participants (100% response rate). The results suggested that D-Casting module could effectively distinguish novice and experienced students (86.67[15.832] vs. 88.62 [11.252] vs. 82.24 [15.674] vs. 96.39 [8.419]; P < 0.001). The skill transfer assessment recruited a total of 78 participants (100% response rate). This assessment showed that the group combined with D-Casting for teaching and learning performed better than the control group in the casting process (82.26[13.711] vs. 89.64[6.796]; Z = -2.033; P = 0.042). The simple mediation analysis showed that the direct effect from the instructing method to the scores of the casts was not significant (95% CI − 3.45 to 2.98), but the indirect effect (95% CI 3.42 to 12.39) and total effect from the instructing method to the scores of the casts (95% CI 2.50 to 12.27; P = 0.004) were significant. Conclusion This study determined construct validity and the skill transfer situation for the D-Casting module. Further multi-center experiments should be conducted to determine how skills are transferred through serious games.
... System Usability Scale (SUS) adalah metode kuantitatif standar untuk mengukur usabilitas atau kemudahan penggunaan suatu produk atau sistem, yang terdiri dari 10 pernyataan dengan skala Likert 5 poin yang memungkinkan pengguna memberikan penilaian komprehensif tentang pengalaman mereka menggunakan sebuah antarmuka atau produk teknologi. [11] Contoh pertanyaan dalam SUS antara lain "Saya merasa sistem ini mudah digunakan" dan "Saya merasa perlu mempelajari banyak hal sebelum dapat menggunakan sistem ini". Setelah pengguna menjawab pertanyaanpertanyaan tersebut, nilai-nilai yang diperoleh akan dihitung dan digunakan untuk menilai kegunaan sistem ataupun produk. ...
Article
Posyandu Flamboyan melaksanakan pelayanan posyandu setiap bulan dan biasanya dilakukan pada awal bulan . Posyandu Flamboyan terletak di Kelurahan Wangga RT.12/RW.03 saat ini memiliki permasalahan pada pengolahan data balita yang masih dilakukan secara Pencatatan pada kertas sehingga hal ini dapat memperlambat pekerjaan kader posyandu dan sulit untuk ditemukan datanya kembali maka dapat terjadinya kerangkapan data sehingga data yang Hasil yang kurang akurat.Tujuan penelitian ini ialah dengan merancang sistem informasi posyandu berbasis website yang berisikan data balita, ibu hamil sehingga dapat mempermudah kader posyandu dalam mengelola data - data balita dan memudahkan masyarakat dalam mengakses informasi yang ada di Posyandu Flamboyan dapat dilakukan dimana saja dan kapanpun. Penelitian ini menggunakan metode Rapid Application Development dimana Rapid Application Development mempunyai beberapa tahapan yaitu Perencanaan Persyaratan , Workshop Perancangan , Implementasi Hasil dari penelitian ini adalah suatu sistem informasi berbasis website yang dapat digunakan oleh dalam melihat masyarakat balita.
Article
Full-text available
Conventional PIN and password methods failed to be resilient against observational attacks. In the wake of usable security, this gap is filled with enormous proposals of graphical passwords that claim to be resistant but sacrifice the usability concerns in the shape of other afflictions (second device 'phone/PC', headset, vibration motor, weird hand motions). In comparison, we propose PassNum, a grid-based usable, deployable, and secure graphical PIN authentication method (GPA) in the replacement (for every device) of conventional PIN. It is low-cost, user-friendly (child vs. old), secure (high entropy), and has a compatible design for low-sensitive (social sites) to more sensitive (banking apps). Through extensive experiments with 32 participants, PassNum achieved 98% accuracy with 10 s login time (average scores) and 100% memorability (cumulative scores). Furthermore, the 4th variation of PassNum proves to be 100% resilient against even recurring (3 repeated login sessions) recorded shoulder surfing attacks. Our qualitative survey revealed that PassNum might be the prime replacement for conventional PIN and password methods. ARTICLE HISTORY
Article
Full-text available
This study aims to evaluate the usability level of the Android-based Glasgow Coma Scale (GCS) application using the System Usability Scale (SUS) method. The Android-based application studied is Pemeriksaan GCS 1.0 by Creative Inspiring01. This application was chosen because it has a simple interface, is easy to use, and is available in Indonesian, which suits the needs of medical personnel. This study involved 30 ambulance officers who regularly perform GCS assessments during their duties. After using the application, participants were asked to complete the SUS questionnaire consisting of 10 Likert-scale statements. The data obtained were analyzed to calculate the average SUS score, which was then interpreted using three approaches: Acceptability Ranges, Grade Scale, and Adjective Rating. The results showed that the application received an average SUS score of 77.3, which falls into the Marginally Acceptable category and Grade C with an Excellent rating. Although the application was considered fairly good and acceptable, there are several areas for improvement, such as the user interface, application response speed, and customization settings.
Article
Full-text available
Critical thinking is essential in health disciplines though is reportedly underdeveloped in student health professionals. Immersive mobile extended reality (mXR) may facilitate critical thinking in health education though has not yet been fully explored. The main aim of this study was to evaluate the impact of co-designing a virtual environment on the facilitation of critical thinking in health education students. Second-year graduate-entry Doctor of Physiotherapy students (n = 25) co-designed health-related case scenarios over six weeks in a web-based 360-degree immersive environment. This included embedding exercise prescription videos that incorporated prompts for critical thinking of a target population. The evaluation included pre- and post-evaluation surveys, the Health Science Reasoning Test (HSRT-N) and the System Usability Scale (SUS). The results of this study demonstrated a positive effect on critical thinking skills- particularly in analysis, interpretation, inference, deduction, numeracy and overall (p < .05). Participants reported favourable perceptions of mXR usability and the learning experience, although challenges such as cybersickness and technical complexities were noted. Peer feedback suggested that the virtual environment promoted engagement and authenticity in learning. Recommendations for future iterations include enhancing population representation, addressing challenges in system usability, and refining instructional design elements. Overall, the study demonstrates the potential of mobile immersive reality to enhance critical thinking and foster authentic learning experiences in health education. Further design principles and implications for research design are proposed in the study.
Article
Background The clinical data management within biomedical research has gained importance over the last decade producing an increasing need of a web‐based software application providing electronic data capture and clinical data management functionalities to ensure high quality data. We chose REDCap system over OpenClinica (free‐distribution) to implement the electronic case report form (eCRF) at our HIV Unit. We then evaluated eCRF usability and stakeholder satisfaction in an upcoming Phase 4 clinical trial. Methods We assessed the perceived usability of the eCRF by different professional users, including nurses, researchers, study monitors and coordinators of the phase‐4 clinical trial, and their satisfaction using the System Usability Scale (SUS) questionnaire and the Net Promoter Score (NPS). Results Nineteen out of 21 persons involved agreed to participate. All were female, with mean age of 35 years (SD: 7), 11 were study coordinators or monitors, 5 nurses and 3 clinicians/researchers. The median SUS was 72.5 (IQR: 62.5; 80.0): monitors/study coordinators had median score of 77.5, researchers/clinicians, 72.5 and nurses, 57.5. Less Information Technology (IT) or computer‐experienced scored higher 92.5 (57.5; 95.0) versus more experienced 71.3 (62.5; 78.8). The overall NPS (% promoters–% detractors) was 21.1, 7 (37%) users were promoters, 9 (47%) passives and 3 (16%) detractors. Conclusions When adopting a new system, measuring user's perceived usability and satisfaction in a quantitative manner and with validated measures may be useful to identify users' uncovered needs and to improve future interaction user‐system that will positively affect the quality of data managed in clinical research.
Article
Full-text available
Objetivo: Este estudo teve como objetivo avaliar a usabilidade de um aplicativo para a Sistematização da Assistência de Enfermagem em pacientes com Sepse nas Unidades de Terapia Intensiva (UTI). Métodos: Trata-se de um estudo metodológico de abordagem quantitativa, que envolve dois processos: a construção de um aplicativo e a avaliação da usabilidade deste instrumento. A amostra do estudo foi por conveniência, composta por enfermeiros das UTIs adultos de um complexo hospitalar universitário. O instrumento utilizado para a avaliação da usabilidade foi o System Usability Scale (SUS), adaptado em uma escala Likert que busca avaliar, principalmente, a aprendizagem, a eficiência, e a satisfação, tendo como um escore bom >68. Resultados: Os resultados demonstraram que os profissionais consideraram a ferramenta aceitável, com boa pontuação no instrumento de qualificação. Após a utilização e a avaliação do aplicativo através da escala Likert SUS, obteve-se um escore de 83.18 na classificação. Conclusão: Isso confirma a usabilidade e utilidade do aplicativo, indicando sua viabilidade para integração nos sistemas de prontuário eletrônico em unidades hospitalares. Este avanço é crucial para pesquisas em Tecnologias da Comunicação e Informação na Enfermagem, abrindo novas possibilidades no empreendedorismo para melhorar a assistência e gestão de saúde para pacientes, equipes e instituições.
Article
Regardless of their domains, level, or expertise, students consider video lectures one of the most popular learning media while engaged in self-study sessions on any e-learning platform. In the absence of experts/teachers in a self-study session, students often need to browse the Internet to avail themselves of additional information on the relevant topics. Hence, it would be helpful for such motivated students if we augment the video lectures with such supplementary references. In this article, we present a video lecture augmentation system leveraging question-answer (QA) pairs offering supplementary references on the course-relevant concepts. We also designed a user interface to present these augmented video lectures categorically so that the students can readily opt for the augmentations of their choice. While we qualitatively surveyed the personalization of the augmentations and usability aspects of the user interface, we quantitatively evaluated our proposed video lecture augmentation system in terms of the performances of two primary underlying modules: augmentation retrieval and tag recommendation. We quantified the pedagogical effectiveness of the augmentations following an equivalent pretest-posttest setup. All these experiments indicate that the proposed augmentations are relevant and pedagogically effective, the categorical representation helps the students choose the necessary resources readily, and the designed interface is easy to use.
Article
Full-text available
Article
Full-text available
The use of applications on mobile devices has reached historic levels. Using the System Usability Scale (SUS), data were collected on the usability of applications used on two kinds of mobile platforms—phones and tablets—across two general classes of operating systems, iOS and Android. Over 4 experiments, 3,575 users rated the usability of 10 applications that had been selected based on their popularity, as well as 5 additional applications that users had identified as using frequently. The average SUS rating for the top 10 apps across all platforms was 77.7, with a nearly 20-point spread (67.7–87.4) between the highest and lowest rated apps. Overall, applications on phone platforms were judged to be more usable than applications on the tablet platforms. Practitioners can use the information in this article to make better design decisions and benchmark their progress against a known universe of apps for their specific mobile platform.
Article
Full-text available
This article describes the psychometric properties of the Emotional Metric Outcomes (EMO) questionnaire and the System Usability Scale (SUS) using data collected as part of a large-sample unmoderated usability study (n = 471). The EMO is a concise multifactor standardized questionnaire that provides an assessment of transaction-driven personal and relationship emotional outcomes, both positive and negative. The SUS is a well-known standardized usability questionnaire designed to assess perceived usability. In previous research, psychometric evaluation using data from a series of online surveys showed that the EMO and its component scales had high reliability and concurrent validity with loyalty and overall experience metrics but did not find the expected four-factor structure. Previous structural analyses of the SUS have had mixed results. Analysis of the EMO data from the usability study revealed the expected four-factor structure. The factor structure of the SUS appeared to be driven by item tone. The estimated reliability of the SUS (.90) was consistent with previous estimates. The EMO and its subscales were also quite reliable, with the estimates of reliability for the various EMO scales ranging from.86 to.96. Regression analysis using SUS, EMO, and Effort as predictors revealed different key drivers for the outcome metrics of Satisfaction and Likelihood-to-Recommend. The key recommendations are to include the EMO as part of the battery of poststudy standardized questionnaires, along with the SUS (or similar questionnaire), but to be cautious in reporting SUS subscales such as Usable and Learnable.
Article
Full-text available
The purpose of this research was to investigate various measurements of perceived usability, in particular, to assess (a) whether a regression formula developed previously to bring Usability Metric for User Experience LITE (UMUX-LITE) scores into correspondence with System Usability Scale (SUS) scores would continue to do so accurately with an independent set of data; (b) whether additional items covering concepts such as findability, reliability, responsiveness, perceived use by others, effectiveness, and visual appeal would be redundant with the construct of perceived usability or would align with other potential constructs; and (c) the dimensionality of the SUS as a function of self-reported frequency of use and expertise. Given the broad use of and emerging interpretative norms for the SUS, it was encouraging that the regression equation for the UMUX-LITE worked well with this independent set of data, although there is still a need to investigate its efficacy with a broader set of products and methods. Results from a series of principal components analyses indicated that most of the additional concepts, such as findability, familiarity, efficiency, control, and visual appeal covered the same statistical ground as the other more standard metrics for perceived usability. Two of the other items (Reliable and Responsive) made up a reliable construct named System Quality. None of the structural analyses of the SUS as a function of frequency of use or self-reported expertise produced the expected components, indicating the need for additional research in this area and a need to be cautious when using the Usable and Learnable components described in previous research.
Article
Full-text available
Nowadays, practitioners extensively apply quick and reliable scales of user satisfaction as part of their user experience (UX) analyses to obtain well-founded measures of user satisfaction within time and budget constraints. However, in the human-computer interaction (HCI) literature the relationship between the outcomes of standardized satisfaction scales and the amount of product usage has been only marginally explored. The few studies that have investigated this relationship have typically shown that users who have interacted more with a product have higher satisfaction. The purpose of this paper was to systematically analyze the variation in outcomes of three standardized user satisfaction scales (SUS, UMUX and UMUX-LITE) when completed by users who had spent different amounts of time with a website. In two studies, the amount of interaction was manipulated to assess its effect on user satisfaction. Measurements of the three scales were strongly correlated and their outcomes were significantly affected by the amount of interaction time. Notably, the SUS acted as a unidimensional scale when administered to people who had less product experience, but was bidimensional when administered to users with more experience. We replicated previous findings of similar magnitudes for the SUS and UMUX-LITE (after adjustment), but did not observe the previously reported similarities of magnitude for the SUS and the UMUX. Our results strongly encourage further research to analyze the relationships of the three scales with levels of product exposure. We also provide recommendations for practitioners and researchers in the use of the questionnaires.