ArticlePDF Available

Abstract and Figures

Nowadays, practitioners extensively apply quick and reliable scales of user satisfaction as part of their user experience (UX) analyses to obtain well-founded measures of user satisfaction within time and budget constraints. However, in the human-computer interaction (HCI) literature the relationship between the outcomes of standardized satisfaction scales and the amount of product usage has been only marginally explored. The few studies that have investigated this relationship have typically shown that users who have interacted more with a product have higher satisfaction. The purpose of this paper was to systematically analyze the variation in outcomes of three standardized user satisfaction scales (SUS, UMUX and UMUX-LITE) when completed by users who had spent different amounts of time with a website. In two studies, the amount of interaction was manipulated to assess its effect on user satisfaction. Measurements of the three scales were strongly correlated and their outcomes were significantly affected by the amount of interaction time. Notably, the SUS acted as a unidimensional scale when administered to people who had less product experience, but was bidimensional when administered to users with more experience. We replicated previous findings of similar magnitudes for the SUS and UMUX-LITE (after adjustment), but did not observe the previously reported similarities of magnitude for the SUS and the UMUX. Our results strongly encourage further research to analyze the relationships of the three scales with levels of product exposure. We also provide recommendations for practitioners and researchers in the use of the questionnaires.
Content may be subject to copyright.
Intl. Journal of Human–Computer Interaction, 31: 484–495, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1044-7318 print / 1532-7590 online
DOI: 10.1080/10447318.2015.1064648
Assessing User Satisfaction in the Era of User Experience:
Comparison of the SUS, UMUX, and UMUX-LITE as a Function of
Product Experience
Simone Borsci1, Stefano Federici2, Silvia Bacci3, Michela Gnaldi4, and Francesco Bartolucci3
1National Institute for Health Research, Diagnostic Evidence Cooperative, Imperial College of London, London,
United Kingdom
2Department of Philosophy, Social & Human Sciences and Education, University of Perugia, Perugia, Italy
3Department of Economics, University of Perugia, Perugia, Italy
4Department of Political Sciences, University of Perugia, Perugia, Italy
Nowadays, practitioners extensively apply quick and reliable
scales of user satisfaction as part of their user experience anal-
yses to obtain well-founded measures of user satisfaction within
time and budget constraints. However, in the human–computer
interaction literature the relationship between the outcomes of
standardized satisfaction scales and the amount of product usage
has been only marginally explored. The few studies that have
investigated this relationship have typically shown that users who
have interacted more with a product have higher satisfaction. The
purpose of this article was to systematically analyze the varia-
tion in outcomes of three standardized user satisfaction scales
(SUS, UMUX, UMUX-LITE) when completed by users who had
spent different amounts of time with a website. In two studies, the
amount of interaction was manipulated to assess its effect on user
satisfaction. Measurements of the three scales were strongly corre-
lated and their outcomes were significantly affected by the amount
of interaction time. Notably, the SUS acted as a unidimensional
scale when administered to people who had less product experience
but was bidimensional when administered to users with more expe-
rience. Previous findings of similar magnitudes for the SUS and
UMUX-LITE (after adjustment) were replicated but did not show
the previously reported similarities of magnitude for the SUS and
the UMUX. Results strongly encourage further research to analyze
the relationships of the three scales with levels of product exposure.
Recommendations for practitioners and researchers in the use of
the questionnaires are also provided.
1. INTRODUCTION
The assessment of website user satisfaction is a fascinating
topic in the field of human–computer interaction (HCI). Since
the late 1980s, practitioners have applied standardized usability
questionnaires to the measurement of user satisfaction, which
Address correspondence to Simone Borsci, National Institute
for Health Research, Diagnostic Evidence Cooperative, Imperial
College of London, St. Mary’s Hospital, QEQM Building, 10th
floor, Praed Street, W2 1NY London, United Kingdom. E-mail:
s.borsci@imperial.ac.uk
is one of the three main components of usability (ISO, 1998).
Satisfaction analysis is a strategic way to collect information—
before or after the release of a product—about the user experi-
ence (UX), defined as the “person’s perceptions and responses
resulting from the use and/or anticipated use of a product”
(ISO, 2010,p.3).
1.1. Amount of Experience and Perceived Usability
UX is a relatively new and broad concept that includes
and goes beyond traditional usability (Petrie & Bevan, 2009).
As recently outlined by Lallemand, Gronier, and Koenig (2015),
experts have different points of view on UX; however, there
is a wide agreement in the HCI community that the UX con-
cept has a temporal component, that is, the amount of user
interaction with a product affects people’s overall experience.
In addition, experts generally agree that the interactive expe-
rience of a user is affected by the perceived usability and
aesthetics of an interface (Borsci, Kuljis, Barnett, & Pecchia,
2014; Hassenzahl, 2005; Hassenzahl & Tractinsky, 2006;Lee
& Koubek, 2012; Tractinsky, 1997), and the extent to which
user needs are met (Hassenzahl et al, 2015). Accordingly, to
fully model the perceived experience of a user (Borsci, Kurosu,
Federici, & Mele, 2013; Lindgaard & Dudek, 2003; McLellan,
Muddimer, & Peres, 2012), practitioners should include a set
of repeated objective and subjective measures in their eval-
uation protocols to enable satisfaction analysis as a “subjec-
tive sum of the interactive experience” (Lindgaard & Dudek,
2003, p. 430).
Several studies have found that the magnitude of user satis-
faction is associated with a user’s amount of experience with the
product or system under evaluation (Lindgaard & Dudek, 2003;
McLellan et al., 2012). For instance, Sauro (2011) reported that
System Usability Scale (SUS; Brooke, 1996) scores differed as
a function of different levels of product experience. In other
words, people with long-term experience in the use of a product
484
USER SATISFACTION IN ERA OF USER EXPERIENCE 485
tended to rate their satisfaction with higher (better) scores than
users with shorter terms of experience.
In summary, researchers and practitioners assess user satis-
faction by means of questionnaires, but there are only a few
empirical studies that have systematically analyzed the vari-
ation of the outcomes of satisfaction scales when filled out
by users with different amounts of experience in the use of a
product (Kortum & Johnson, 2013; Lindgaard & Dudek, 2003;
McLellan et al., 2012; Sauro, 2011).
1.2. The System Usability Scale
Several standardized tools are available in the literature to
measure satisfaction (for a review, see Borsci et al., 2013).
An increasing trend favors the use of short scales due to their
speed and ease of administration, either as online surveys for
customers or after a usability test. One of the most popular is
the System Usability Scale (SUS; Lewis, 2006; Sauro & Lewis,
2011; Zviran, Glezer, & Avni, 2006), which has been cited in
more than 600 publications (Sauro, 2011) and is considered
an industry standard. Its popularity among HCI experts is due
to several factors, such as its desirable psychometric proper-
ties (high reliability and demonstrated validity), relatively short
length (10 items), and low cost (free; Bangor, Kortum, & Miller,
2008; McLellan et al., 2012).
The 10 items of the SUS were designed to form a
unidimensional measure of perceived usability (Brooke, 1996).
The standard version of the questionnaire has a mix of positive
and negative tone items, with the odd-numbered items having
a positive tone and the even-numbered items having a negative
tone. Respondents rate the magnitude of their agreement with
each item using a 5-point scale from 1 (strongly disagree)to
5(strongly agree). To compute the overall SUS score, (a) each
item is converted to a 0–4 scale for which higher numbers indi-
cated a greater amount of perceived usability, (b) the converted
scores are summed, and (c) the sum is multiplied by 2.5. This
process produces scores that can range from 0 to 100.
As Lewis (2014) recently stated, there are still lessons
that have to be learned about SUS, in particular about its
dimensionality. Despite the SUS having been designed to be
unidimensional, several researchers recently showed that the
items of SUS might load in two dimensions: usability and
learnability (Bangor et al., 2008; Borsci, Federici, & Lauriola,
2009; Lewis & Sauro, 2009; Lewis, Utesch, & Maher, 2013,this
issue; Sauro & Lewis, 2012). Since 2009, however, there have
been reports of large-sample SUS data sets for which two-factor
structures did not have the expected item-factor alignment
(Items 4 and 10 with Learnable, all others with Usable), indi-
cating a need for further research to clarify its dimensional
structure and the variables that might affect it (Lewis, 2014).
In recent years, the growing availability of SUS data from a
large number of studies (Bangor et al., 2008; Kortum & Bangor,
2012) has led to the production of norms for the interpretation
of mean SUS scores, for example, the Curved Grading Scale
(CGS; Sauro & Lewis, 2012). Using data from 446 studies and
more than 5,000 individual SUS responses, Sauro and Lewis
(2012) found the overall mean score of the SUS to be 68 with a
standard deviation of 12.5.The Sauro and Lewis CGS assigned
grades as a function of SUS scores ranging from F (absolutely
unsatisfactory) to A+(absolutely satisfactory), as follows:
Grade F (0–51.7)
Grade D (51.8–62.6)
Grade C– (62.7–64.9)
Grade C (65.0–71.0)
Grade C+(71.1–72.5)
Grade B– (72.6–74.0)
Grade B (74.1–77.1)
Grade B+(77.2–78.8)
Grade A– (78.9–80.7)
Grade A (80.8–84.0)
Grade A+(84.1–100)
Although they should be interpreted with caution, the grades
from the CGS provide an initial basis for determining if a mean
SUS score is below average, average, or above average (Sauro
&Lewis,2012).
1.3. Ultrashort Scales: The UMUX and UMUX-LITE
Although the SUS is a quick scale, practitioners sometimes
need to use reliable scales that are even shorter than the SUS
to minimize time, cost, and user effort. “This need is most
pressing when standardized usability measurement is one part
of a larger post-study or online questionnaire” (Lewis, 2014,
p. 676). As a consequence, quite recently, two new scales have
been proposed as shorter proxies of the SUS: the Usability
Metric for User Experience (UMUX; see Table 1), a four-item
tool developed and validated by Finstad (2010,2013), and the
UMUX-LITE composed of only the two positive-tone questions
from the UMUX (Lewis et al., 2013, this issue). The scale for
the UMUX items has 7 points, from 1 (strongly disagree)to7
(strongly agree).
TABLE 1
Items of the UMUX and UMUX-LITE
Item No. — Scale Item Content
Item 1 – UMUX [This system’s] capabilities meet
my requirements.
Item 1 – UMUX-LITE
Item 2 – UMUX Using [this system] is a
frustrating experience.
Item 3 – UMUX [This system] is easy to use.
Item 2– UMUX-LITE
Item 4 – UMUX I have to spend too much time
correcting things with [this
system].
Note. UMUX =Usability Metric for User Experience.
486 S. BORSCI ET AL.
Some findings have shown the UMUX to be bidimensional
as a function of the item tone, positive versus negative (Lewis,
2013; Lewis et al., 2013), despite the intention to develop a
unidimensional scale. The UMUX’s statistical structure might
be an artifact of the mixed positive/negative tone of the items
and in practice might not matter much. In light of this, both the
UMUX and its reduced version, the UMUX-LITE, are usually
interpreted as unidimensional measures.
By design (using a method similar to but not exactly the same
as the SUS), the overall UMUX and UMUX-LITE scores can
range from 0 to 100. Their scoring procedures are as follows:
UMUX: The odd items are scored as [score 1] and
even items as [7 score]. The sum of the item scores
is then divided by 24 and multiplied by 100 (Finstad,
2010).
UMUX-LITE: The two items are scored as [score 1],
and the sum of these is divided by 12 and multiplied by
100 (Lewis et al., 2013). For correspondence with SUS
scores, this sum is entered into a regression equation
to produce the final UMUX-LITE score. The follow-
ing equation combines the initial computation plus the
regression to show how to compute the recommended
UMUX-LITE score from the ratings of its two items.
UMUX LITE =.65(([Item 1 score]
+[Item 2 score] 2)100/12) +22.9.
(1)
Prior research (Finstad, 2010,2013; Lewis et al., 2013) has
shown that the SUS, UMUX, and UMUX-LITE are reliable
(Cronbach’s alpha between .80 and .95) and correlate signif-
icantly (p<.001). In the research reported to date, UMUX
scores have not only correlated with the SUS but also had a sim-
ilar magnitude. However, for the UMUX-LITE, it is necessary
to use the preceding formula (Equation 1) to adjust its scores to
achieve correspondence with the SUS (Lewis et al., 2013). For
the rest of this article, reported UMUX-LITE values are those
computed using Equation 1. Thus, the literature on the UMUX
and UMUX-LITE (Finstad, 2010; Lewis et al., 2013) suggests
that these two new short scales can be used as surrogates for the
SUS.
Currently, three studies (Kortum & Johnson, 2013; McLellan
et al., 2012; Sauro, 2011) have investigated the relation-
ship between SUS scores and amount of product experience.
The results of these studies have consistently indicated that
more experienced users had higher satisfaction outcomes (SUS
scores). Notably, researchers have not yet studied this effect on
the outcomes of quick scales such as the UMUX and UMUX-
LITE. The comparative analyses of these instruments were
performed mainly to validate the questionnaires, without con-
sidering the effect of the different levels of experience in the
use of a website (Finstad, 2010; Lewis et al., 2013).
1.4. Research Goals
The use of short scales as part of UX evaluation proto-
cols could sensibly reduce the costs of assessment, as well as
users’ time and effort to complete the questionnaires. Currently,
few studies have investigated the relationship among the SUS,
UMUX, and UMUX-LITE, and none have analyzed their relia-
bilities as a function of different amounts of interaction with a
product.
The primary goal of this article was to analyze the variation
of SUS, UMUX, and UMUX-LITE outcomes when completed
concurrently by users with different levels of experience in the
use of a website. To reach this goal, we pursued three main
objectives. First, we aimed to explore the variation of UMUX
and UMUX-LITE outcomes when administered to users with
two different levels of product experience. Second, we aimed
to observe whether, at different levels of product experience,
the correlations among the SUS, UMUX, and UMUX-LITE
were stable, with particular interest in the generalizability of
Equation 1. Finally, we checked whether the levels of respon-
dents’ product experience affected the dimensional structure of
the SUS. It may be that the Learnable scale might not emerge
until respondents have sufficient experience with the product
they are rating. To achieve these aims, we performed two stud-
ies with the three standardized usability metrics to measure
the self-report satisfaction of end-users with different levels of
experience with an e-learning web platform known as CLab
(http://www.cognitivelab.it).
2. METHODOLOGY
Students enrolled in the bachelor’s degree of psychology
program at the University of Perugia are strongly encouraged to
use the CLab target platform as an e-learning tool. Commonly,
students access it at least once a week for several reasons:
for instance, to look for information about courses and exam
timetables, to sign in for mandatory attendance classes, to
download course materials, to book a test/exam, or to post a
question and discuss issues with the professor.
For each study, a sample of volunteer students was asked to
assess the interface after different times of usage (based on their
date of subscription to the platform) by filling out the SUS,
UMUX, and UMUX-LITE questionnaires, presented in a ran-
dom order. The Italian version of the SUS used in Borsci et al.
(2009) was administered. In addition, translations and retransla-
tions were made by an independent group of linguistic experts
to produce Italian versions of the UMUX and UMUX-LITE.
Participants of the two studies were invited to fill out the
scales 2 months (Study 1) or 6 months (Study 2) after they
first accessed CLab. In these studies, participants received the
same instruction before the presentation of the questionnaires:
“Please, rate your satisfaction in the use of the platform on the
basis of your current experience of use” [Per favore, in base
alla tua attuale esperienza d’uso, valuta la tua soddisfazione
nell’utilizzo della piattaforma].
USER SATISFACTION IN ERA OF USER EXPERIENCE 487
The two studies were organized to measure different times
of participants’ exposure to CLab, thus measuring different
moments of UX acquisition, as follows:
Study 1, carried out 2 months after the students first
accessed CLab. The participants’ number of access
times and interaction with the platform (time exposure)
ranged from eight (once a week) to 56 (once a day).
Study 2, carried out 6 months after the students first
accessed CLab. The participants’ number of access
times and interaction with the platform (time exposure)
ranged from 24 (once a week) to 168 (once a day).
The two studies were reviewed and approved by the
Institutional Review Board of the Department of Philosophy,
Social and Human Sciences and Education, University of
Perugia. All participants provided their written informed con-
sent to participate in this study. No minors/children were
enrolled in this study. The study presented no potential risks.
2.1. Hypotheses
Concerning the effect of usage levels on the SUS’ dimen-
sional structure, we expected the following:
H1: The SUS dimensionality would be affected by the level of
experience acquired by the participants with the product
before the administration of the scale.
The second hypothesis concerns the correlations among the
tools (SUS, UMUX, UMUX-LITE). Recently (Finstad, 2010;
Lewis et al., 2013), researchers have reported strong correla-
tions among the SUS, UMUX, and UMUX-LITE. There is
as yet, however, no data on the extent to which the correla-
tions might be affected by the users’ levels of acquired product
experience. Thus, we expected the following:
H2: Significant correlations among the overall scores of the
SUS, UMUX, and UMUX-LITE, for all studies indepen-
dent of the administration conditions, that is, different
amounts of interaction time with the target website.
Finally, the third hypothesis concerns the relationship
between scale outcomes and users’ levels of product experience.
As previously noted, user satisfaction may vary depending on
both the level of product experience and the time of exposure
to a product (Lindgaard & Dudek, 2003; McLellan et al., 2012;
Sauro, 2011). In particular, experts tend to provide higher sat-
isfaction scores compared to novices (Sauro, 2011). In the light
of this, we expected the following:
H3: User satisfaction measured through SUS, UMUX, and
UMUX-LITE would be affected by the different conditions
of time exposure to the target website (2 or 6 months),
as well as by different level of website frequency of use.
Therefore, we expected students in Study 2 (cumulative
UX condition) to rate the CLab with all the three scales as
more satisfactory compared to users in the first study due
to their greater product exposure (6 months). Concurrently,
we expected participants with greater levels of product
frequency of use to rate the CLab with all the scales as
more satisfactory than participants with lower levels of
use.
2.2. Data Analysis
For each study, principal components analyses were per-
formed to assess the SUS’ dimensionality—focusing on
whether the item alignment of the resulting two-component
structure was consistent with the emergence of Learnable and
Usable components. Only if this expected pattern did not
emerge did we plan to follow up with a multidimensional latent
class item response theory model (LC IRT) to more deeply
test the dimensional structure of the scale (Bacci, Bartolucci,
& Gnaldi, 2014; Bartolucci, 2007; Bartolucci, Bacci, & Gnaldi,
2014). The primary purpose of this additional analysis would
be to confirm if a unidimensional structure was a better fit to the
data than the expected bidimensional structure—an assessment
that is not possible with standard principal components analy-
sis because it is impossible to rotate a unidimensional solution
(Cliff, 1987).
Descriptive statistics (mean, standard deviation) and Pearson
correlation analyses among the scales were performed to com-
pare the outcomes of the three scales and observe their rela-
tionships. Moreover, one-way analyses of variance (ANOVAs)
were carried out to assess the effect of experience on user satis-
faction as measured by the SUS, UMUX, and UMUX-LITE for
each study, and a final comprehensive ANOVA was conducted
to enable comparison of results between the two studies. The
MultiLCIRT package of R software by Bartolucci et al. (2014)
was used to estimate the multidimensional LC IRT models. All
other analyses were performed using IBM®SPSS 22.
3. THE STUDIES
3.1. Study 1
Participants
One hundred eighty-six 1st-year students of psychology
(31 male [17%], Mage =21.97, SD =5.63) voluntarily par-
ticipated in the study 2 months after their subscription to and
first use of the platform.
Procedure
All participants used an online form to fill out the question-
naires (SUS, UMUX, and UMUX-LITE) and indicated their
weekly use of the platform, from 1 (once per week)to5(once a
day). Participants were asked to rate their satisfaction in the use
of CLab on the basis of their current experience of use.
488 S. BORSCI ET AL.
Results of Study 1
SUS dimensionality. As shown in Table 2, principal
components analysis with Varimax rotation suggested a
unidimensional solution was appropriate for this set of SUS
data. The table shows the item loadings for one- and two-
component solutions.
The expected two-component solution would have shown
Items 4 and 10 aligning with one component and the other
eight items aligning on the other. Instead, the apparent pattern of
alignment was positive tone (odd-numbered) items versus nega-
tive tone (even-numbered) items, similar to the pattern observed
in previous research for the UMUX. Another indicator of the
inappropriateness of this two-component solution was the items
that had relatively large loadings on both components (Items 2,
3, 5, and 7).
To verify the appropriateness of the one-component solution,
we conducted additional analyses with a special class of sta-
tistical models known as LC IRT models (Bacci et al., 2014;
Bartolucci, 2007; Bartolucci et al., 2014). This class of mod-
els extended traditional IRT models (Nering & Ostini, 2010;
van der Linden & Hambleton, 1997) in two main directions.
First, they allow the analysis of item responses in cases of
questionnaires that measure more than one factor (also called,
in the context of IRT, latent trait or latent variable or abil-
ity). Second, multidimensional LC IRT models assume that the
population is composed of homogeneous groups of individu-
als sharing unobserved but common characteristics (so-called
latent classes; Goodman, 1974; Lazarsfeld & Henry, 1968).
IRT models are considered a powerful alternative to principal
components analysis, especially when questionnaires consist
of binary or (ordered) polytomously scored items rather than
quantitative items.
Our analysis proceeded with two steps. The first was the
selection of the number of latent classes (C). To compare
unidimensional and bidimensional assumptions through the
class of models at issue, we first needed to detect the optimal
number of latent classes, that is, the number of groups that
ensures a satisfactory level of goodness of fit of the statistical
model at issue to the observed data. For this aim, we selected the
number of latent classes, relying on the Bayesian Information
Criterion (BIC) index (Schwarz, 1978). More specifically, we
estimated unidimensional LC IRT models to increase values of
C(C=1; 2; 3; 4), keeping constant all the other elements char-
acterizing the class of models. We took that value just before
the first increase of BIC. We then repeated the analysis under
the assumption of bidimensionality.
Table 3 shows that the minimum value of the BIC index
was observed for C =3, both in the unidimensional case
and in the bidimensional case, suggesting that the sample of
individuals came from a population composed of three latent
classes. As smaller values of BIC are better than higher values, a
comparison between the unidimensional and the bidimensional
models with C =3 found that the BIC index gave evidence
of a better goodness of fit for the model with unidimensional
structure (BIC =2924.805) than for that with bidimensional
structure (BIC =2941.496; bold in Table 3).
TABLE 2
Principal Components Analysis of the System Usability Scale in Study 1 Showing One- and Two-Component Solutions
Eigenvalues Extraction Unidimensional Bidimensional
Items One Component Component 1 Component 2
9. I felt very confident using this website. .936 0.198 0.803
7. I would imagine that most people would learn to
use this website very quickly.
.932 0.455 0.505
1. I would like to use this website frequently. .867 0.113 0.762
2. I found this website unnecessarily complex. .865 0.509 0.418
8. I found this website very cumbersome/awkward
to use.
.839 0.822 0.019
3. I thought this website was easy to use. .814 0.471 0.503
6. I thought there was too much inconsistency in
this website.
.815 0.513 0.266
5. I found the various functions in this website
were well integrated.
.823 0.500 0.626
10. I needed to learn a lot of things before I could
get going with this website.
.753 0.789 0.055
4. I think that I would need assistance to be able to
use this website.
.637 0.517 0.360
Note. Bidimensional loadings greater than .4 in bold.
USER SATISFACTION IN ERA OF USER EXPERIENCE 489
TABLE 3
Unidimensional and Bidimensional LC IRT Models for System Usability Scale Data: Number of Latent Classes (C), Estimated
Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion (BIC) Index
Unidimensional Model Bidimensional Model
C#par BIC C #par BIC
11470.447 40 3147.714 1 1470.447 40 3147.714
21405.317 24 2934.725 2 1405.057 27 2949.717
31395.186 26 2924.805 3 1393.190 30 2941.496
41394.914 28 2934.602 4 1386.052 33 2942.729
Note. Bold indicates better goodness of fit for the model with unidimensional structure than for that with bidimensional structure.
The unidimensionality assumption was also verified through
a likelihood-ratio (LR) test as follows. Given the number of
latent classes selected in the previous step, an LR test was used
to compare models that differ in terms of the dimensional struc-
ture, that is, bidimensional versus unidimensional structure.
This type of statistical test allowed us to evaluate the similarity
between a general model and a restricted model, that is, a model
obtained by the general one by imposing one constraint so
that the restricted model is nested in the general model. More
precisely, an LR test evaluates, at a given significance level, the
null hypothesis of equivalence between the two nested models
at issue: If the null hypothesis is not rejected, the restricted
mode is preferred, in the interests of parsimony; if the null
hypothesis is rejected, the general model is preferred. In our
framework, the general model represents a bidimensional struc-
ture, where Items 4 and 10 of the SUS questionnaire contribute
to a different latent trait with respect to the remaining ones,
whereas the restricted model is used when all items belong to
the same dimension. The LR test is based on the difference
between the maximum log-likelihood of the two models, and it
evaluates if this difference is statistically significantly different
from zero. Higher values of log-likelihood difference denote
that the hypothesis of unidimensionality is unlikely and it
should be discarded in favor of bidimensionality. In particular,
the LR test statistic is given by 2 times the difference between
the maximum log-likelihood and, under certain regularity
conditions, is distributed as a chi-square with the difference in
the number of free parameters in the two compared models as
degrees of freedom.
The LR test statistic equaled 3.9918 with 4 degrees of free-
dom. The resulting pvalue, based on the chi-square distribution,
was equal to 0.4071, and therefore the null hypothesis of SUS
unidimensionality cannot be rejected for Study 1. To conclude,
both the BIC index and the LR test provided evidence in favor
of the unidimensionality assumption.
Correlations among the tools. The overall SUS, UMUX,
and UMUX-LITE scale results significantly correlated (p<
.001; Table 4). There was considerable overlap in the 95% con-
fidence intervals for the correlations between the SUS and the
TABLE 4
Correlations Among SUS, UMUX, and UMUX-LITE in Study
1 (With 95% CIs)
UMUX CI UMUX-LITE CI
r Lower Upper r Lower Upper
SUS .554∗∗ .430 .679 447∗∗ .313 .581
UMUX 1 .838∗∗ .756 .920
Note. SUS =System Usability Scale; UMUX =Usability Metric
for User Experience; CI =confidence interval.
∗∗p=.01 (two-tailed).
other scales, so it appeared that the UMUX and UMUX-LITE
had similar magnitudes of association with the SUS.
Table 5 shows that, on average, participants rated the plat-
form as satisfactory (scores higher than 70 out of 100) for all
three questionnaires, that is, with SUS scores greater than or
equal to C on the Sauro-Lewis CGS (Sauro & Lewis, 2012).
As demonstrated by comparison of the confidence intervals,
there were significant differences in the magnitudes of the
three metrics, with the SUS and UMUX-LITE having a closer
correspondence than the SUS and UMUX.
User satisfaction and frequency of use. Of the participants,
61.9% declared that they used the platform for more than 3 days
per week to every day, whereas 38.1% stated a lower rate
of usage. A one-way ANOVA showed a significant difference
among the user satisfaction levels between participants with a
low (1–2 days per week), medium (3–4 days per week), and
high (5 days per week) self-reported level of use of CLab
for all the questionnaires: SUS, F(2, 184) =4.39, p=.014;
UMUX, F(2, 184) =8.71, p=.001; and UMUX-LITE, F(2,
184) =6.76, p=.002. In particular, least significant difference
post hoc analyses showed that students who were used to inter-
acting with the platform from more than 3 days per week to
every day (those who used the website more frequently) tended
to judge the product as more satisfactory than people with less
exposure (Figure 1).
490 S. BORSCI ET AL.
TABLE 5
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 1 With 95% (and Associated
CGS Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 70.88 (C) 6.703 69.9 (C)71.8(C+)
UMUX 84.66 (A+) 12.838 82.8 (A)86.5(A+)
UMUX-LITE 73.83 (B−−) 9.994 72.4 (C+)75.3(B)
Note. SUS =System Usability Scale; UMUX =Usability Metric for User Experience; CI =confidence interval.
FIG. 1. Interaction between scale and frequency of use for Study 1. Note.
UMUX =Usability Metric for User Experience; SUS =System Usability
Scale.
3.2. Study 2
Participants and Procedure
Ninety-three students of psychology (17 male [18%], M
age =22.03, SD =1.44) voluntarily participated in the study
6 months after their subscription to and first use of CLab.
Participants followed the same procedure as in Study 1.
Results of Study 2
SUS dimensionality. To check the dimensionality of the
SUS after 6 months of usage, we performed a principal compo-
nents analysis with Varimax rotation. Table 6 shows the SUS,
under the test conditions of Study 2, was composed of two
dimensions in line with the previous studies that reported align-
ment of Items 4 and 10 separate from the other items (Borsci
et al., 2009; Lewis & Sauro, 2009; Lewis et al., 2013).
To further confirm the bidimensional structure of SUS in
Study 2, we performed a LC IRT analysis (Table 7). We esti-
mated the increase values of C (C =1; 2; 3; 4; 5) for both the
unidimensional and bidimensional models, keeping constant all
the other elements characterizing the class of models. We took
that value just before the first increase of BIC. Table 7 shows
that the minimum value of the BIC index for the unidimensional
model is C =4, and C =5 for the bidimensional case. In Study
2, the smaller value of BIC is outlined in the bidimensional
model C =5. The BIC index gave evidence of a better good-
ness of fit for the model with bidimensional structure (BIC =
1912.636) than for that with unidimensional structure (BIC =
2059.4; bold in Table 7).
Finally, the LR test statistic equaled 164.206 with 3 degrees
of freedom for the bidimensional model and 148.5492 with
2 degrees of freedom for the unidimensional model. For both
the models (bidimensional and unidimensional) the resulting p
value, based on the chi-square distribution, was equal to 0.001.
Therefore, the null hypothesis of SUS unidimensionality can
be rejected for Study 2. To conclude, both the BIC index and
the LR test provided evidence in favor of the bidimensionality
assumption.
Correlations among the tools. As in Study 1, all three
scales were strongly correlated (p<.001; see Table 8).
Comparison of the 95% confidence intervals indicated that the
magnitudes of association among the three scales were similar
between the studies (p>.05).
Table 9 shows that participants judged the platform as
satisfactory—that is, with mean scores higher than 75 out of
100—for all three questionnaires. For this set of data, there was
substantial overlap in the confidence intervals for the SUS and
UMUX-LITE, with identical CGS grades for the SUS/UMUX-
LITE means and interval limits. Consistent with the findings
from Study 1, the magnitude of difference between the mean
SUS and UMUX scores was significant.
User satisfaction and frequency of use. Of the participants,
51% declared that they used the platform for more than 3 days
per week to every day, whereas 49% declared a lower rate
of usage. A one-way ANOVA confirmed that for all three
scales there was a significant difference—SUS, F(2, 91) =
18.37, p=.001; UMUX, F(2, 91) =12.11, p=.001; and
UMUX-LITE, F(2, 91) =11.57, p=.001—among the sat-
isfaction rates of participants (at least for students with low
and high levels of exposure to the platform). Again, our results
indicated a significant relationship between frequency of use
and satisfaction (Figure 2).
USER SATISFACTION IN ERA OF USER EXPERIENCE 491
TABLE 6
Principal Components Analysis of the System Usability Scale in Study 2
Items Usability Learnability
8. I found this website very cumbersome/awkward to use. .910 .121
1. I would like to use this website frequently. .869 .105
5. I found the various functions in this website were well integrated. .800 .122
3. I thought this website was easy to use. .769 .226
7. I would imagine that most people would learn to use this website
very quickly.
.754 .258
2. I found this website unnecessarily complex. .739 .273
6. I thought there was too much inconsistency in this website. .708 .183
9. I felt very confident using this website. .683 .106
10. I needed to learn a lot of things before I could get going with
this website.
.133 .841
4. I think that I would need assistance to be able to use this website. .206 .753
Note. Loadings greater than .4 in bold.
TABLE 7
Unidimensional and Bidimensional Latent Class Item Response Theory Models for System Usability Scale Data: Number of
Latent Classes (C), Estimated Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion
(BIC) Index
Unidimensional Model Bidimensional Model
C#par BIC C #par BIC
11050.62 40 2282.965 1 1050.62 40 2282.965
2996.09 24 2101.22 2 964.826 24 2038.692
3975.599 26 2069.323 3 910.835 27 1944.338
4966.094 28 2059.44891.82 30 1919.938
5963.457 30 2063.212 5881.354 33 1912.636
6963.457 32 2072.299 6 878.343 36 1920.244
Note. Bold indicates a better goodness of fit for the model with bidimensional structure than for that with unidimensional structure.
TABLE 8
Correlations Among SUS, UMUX and UMUX-LITE in Study 2 (With 95% CI)
UMUX CI UMUX-LITE CI
r Lower Upper r Lower Upper
SUS .716∗∗ . 571 . 861 658∗∗ . 502 . 815
UMUX 1 . 879∗∗ . 717 . 953
Note. SUS =System Usability Scale; UMUX =Usability Metric for User Experience; CI =confidence interval.
∗∗p=.01 (two-tailed).
4. GENERAL RESULTS
Although the questionnaire results strongly correlated
independent of the administration conditions, in line with
Hypothesis 3, we performed a comprehensive ANOVA to ana-
lyze the differences among the ratings as a function of duration
of exposure (2 or 6 months) and independently as a function of
different frequencies of use indicated significant main effects
and interactions for all three scales (see Table 10). Figure 3
provides a graphic depiction of the significant interactions.
Least significant difference post hoc analyses revealed sig-
nificant differences (p<.001) for all comparisons of the
different frequencies of use (low, medium, or high).
492 S. BORSCI ET AL.
TABLE 9
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 2 With 95% CIs (and
Associated Curved Grading Scale Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 75.24 (B) 13.037 72.6 (B)77.0(B+)
UMUX 87.69 (A+) 10.291 85.6 (A+)89.8(A+)
UMUX-LITE 76.45 (B) 9.943 74.4 (B)78.5(B+)
Note. SUS =System Usability Scale; UMUX =Usability Metric for User Experience; CI =confidence interval.
FIG. 2. Interaction between scale and frequency of use for Study 2. Note.
UMUX =Usability Metric for User Experience; SUS =System Usability
Scale.
5. DISCUSSION
Table 11 summarizes the testing outcomes for the
hypotheses.
The outcomes of the studies show that the learnability
dimension of the SUS, as suggested, might emerge only under
certain conditions, that is, when it is administered to users after a
long enough period of exposure to the interface. This variability
of the SUS dimensionality may be due to its original develop-
ment as a unidimensional scale of perceived usability (Brooke,
1996). Items 4 and 10 of the SUS compose the learnability
dimension of this scale (Tables 1 and 5). Item 4 pertains to the
need for support in use of the system (“I think I would need the
support of a technical person to be able to use this system”),
and Item 10 pertains to the perceived complexity of learning
the system (“I needed to learn a lot of things before I could get
going with this system”). These two items are strongly related
to the ability of users to quickly understand how to use the prod-
uct without help. In tune with that, the learnability dimension is
probably sensible to the level of confidence acquired by users in
using the product functions and in anticipating system reactions.
Our results are consistent with Dix, Finlay, Abowd, and Beale’s
(2003) definition of learnability as “the ease with which new
users can begin effective interaction and achieve maximal per-
formance” (p. 260). In fact, the second dimension of the SUS
might emerge only when users perceived themselves as effec-
tive in the use of the product, making this an interesting topic
for future research.
All the satisfaction scale results strongly correlated under
each condition of scale administration, demonstrating conver-
gent validity of the construct they purport to measure. The
Italian versions of the UMUX and UMUX-LITE, as with
the version of the SUS previously validated by other stud-
ies (Borsci et al., 2009), confirmed the reliability coefficient
rates of the English version, with a Cronbach’s alpha between
.80 and .90 for the UMUX, and between .71 and .85 for
the UMUX-LITE. Moreover, independent of the condition of
scale administration, Equation 1 functioned properly by bring-
ing UMUX-LITE scores into reasonable correspondence with
concurrently collected SUS scores.
Unlike previous studies (Finstad, 2010; Lewis et al., 2013),
where the magnitudes of UMUX averages were quite close
to corresponding SUS averages, in the present study the aver-
age UMUX scores were significantly higher compared to the
SUS and UMUX-LITE means (Tables 4 and 7). For instance,
for our cohort of participants with 2 months of product expe-
rience, Table 5 shows that although the interval limits around
the UMUX had CGS grade ranges from A to A+, the SUS and
UMUX-LITE were more aligned with grade ranges from C to B.
This difference with previous outcomes could be a peculiarity
of this research. However, the differences among the results of
UMUX and the other two scales were large enough to lead prac-
titioners to make different decisions about CLab. For instance,
by relying only on UMUX outcomes, a practitioner may report
to designers that CLab is a very satisfactory interface and no fur-
ther usability analyses are needed. Alternatively, on the basis of
the SUS or UMUX-LITE outcomes, practitioners would likely
report to designers that CLab is reasonably satisfactory, but fur-
ther usability analysis and redesign could improve the overall
interaction experience of end-users (the UX).
Finally, as Tables 9 and 10 show, the outcomes of all three
questionnaires were affected by the levels of experience of the
USER SATISFACTION IN ERA OF USER EXPERIENCE 493
TABLE 10
Main Effects and Interactions for Combined Analysis of Variance
Scale Effect Outcome
SUS Main effect of duration F(1, 263) =10.7, p=.001
Main effect of frequency of use F(2, 263) =30.9, p<.0001
Duration ×Frequency interaction F(2, 263) =15.8, p<.0001
UMUX Main effect of duration F(1, 263) =17.4, p<.0001
Main effect of frequency of use F(2, 263) =22.2, p<.0001
Duration ×Frequency interaction F(2, 263) =3.4, p=.035
UMUX-LITE Main effect of duration F(1, 263) =4.7, p=.03
Main effect of frequency of use F(2, 263) =16.8, p<.0001
Duration ×Frequency interaction F(2, 263) =4.7, p=.01
Note. SUS =System Usability Scale; UMUX =Usability Metric for User Experience.
FIG. 3. Interaction between scale and frequency of use for Studies 1 and 2.
Note. UMUX =Usability Metric for User Experience; SUS =System Usability
Scale.
respondents, by both the duration of use across months and the
weekly frequency of use. In tune with previous studies (Kortum
& Johnson, 2013; McLellan et al., 2012; Sauro, 2011), users
with a greater amount of product experience were more satis-
fied than users with less experience. This was also confirmed
by the ANOVAs performed for Studies 1 and 2: Greater prod-
uct experience was associated with a higher the level of user
satisfaction.
5.1. Limitations of the Study
Even though the outcomes of this study were generally
in line with previous research, the representativeness of these
results is limited due to the characteristics of cohorts involved
in the study. Our results concern satisfaction in the use of an
e-learning web interface rated by students with similar charac-
teristics (age, education, country, etc.). Therefore, we cannot
assume that the outcomes will generalize to other kinds of
interfaces. To exhaustively explore the relationship between
the amount of experience gained through time and satisfaction
gathered by means of short scales, future studies should include
users with various individual differences and divergent char-
acteristics (e.g., experience with the product, age, education,
individual functioning, and disability) in the use of different
types of websites.
Our results, obtained through the variation of the amount of
exposure of users to the interface, showed that people with more
exposure to a product were likely to rate the interface as more
satisfactory. However, because this is correlational evidence, it
is also possible that users who experience higher satisfaction
during early use of a product might choose to use it more fre-
quently, thus gaining high levels of product experience. Future
longitudinal studies should investigate this relationship, mea-
suring, through SUS, UMUX, and UMUX-LITE, whether users
who perceive a product as satisfactory tend to spend more time
using it. It would also be of value to conduct a designed exper-
iment with random assignment of participants to experience
conditions to overcome the ambiguity of correlations.
6. CONCLUSIONS
Prior product experience was associated with the user satis-
faction measured by the SUS and its proposed alternate ques-
tionnaires, the UMUX and UMUX-LITE. Therefore, consistent
with previous research (Kortum & Johnson, 2013; McLellan
et al., 2012; Sauro, 2011), to obtain an exhaustive picture of user
satisfaction, researchers and practitioners should take into con-
sideration each user’s amount of exposure to the product under
evaluation (duration and frequency of use).
All of the scales we analyzed were strongly correlated and
can be used as quick tools to assess user satisfaction. Therefore,
practitioners who plan to use one or all of these scales should
carefully consider their administration for the proper manage-
ment of satisfaction outcomes. Based on our results, we offer
several points of advice:
When administered to users after a short period of
product use, it is safest to consider the SUS to be a
494 S. BORSCI ET AL.
TABLE 11
Summary of Study Outcomes by Hypothesis
Hypothesis Result Meaning
Hypothesis 1 Supported SUS dimensionality was affected by the different levels of product experience. When
administered to users with less product experience, the SUS had a unidimensional
structure, whereas it had a bidimensional structure for respondents with more product
experience.
Hypothesis 2 Supported All the three scales were strongly correlated, independent of the administration
conditions.
Hypothesis 3 Supported Participants with more product experience were more satisfied than those with less
product experience regardless of whether that experience was gained over a duration of
exposure or by frequency of use.
Note. SUS =System Usability Scale.
unidimensional scale, so we recommend against par-
titioning it into Usable and Learnable components in
that context. Moreover, practitioners should anticipate
the satisfaction scores of newer users will be signif-
icantly lower than the scores of more experienced
people.
When the SUS is administered to more experienced
users, the scale appears to have bidimensional prop-
erties, making it suitable to compute both an overall
SUS score and its Learnable and Usable components.
The overall level of satisfaction will be higher than that
among less experienced users.
Due to their high correlation with the SUS, in par-
ticular, the UMUX and UMUX-LITE overall scores
showed similar behaviors.
If using one of the ultrashort questionnaires as a proxy
for the SUS, the UMUX-LITE (with its adjustment
formula) appears to provide results that are closer in
magnitude to the SUS than the UMUX, making it the
more desirable proxy.
The UMUX and UMUX-LITE are both reliable and valid
proxies of the SUS. Nevertheless, Lewis et al. (2013) suggested
using them in addition to the SUS rather than instead of the
SUS for critical usability work, due to their recent development
and still limited employment. In particular, on the basis of our
results, we recommend that researchers avoid using only the
UMUX for their analysis of user satisfaction because, at least in
the current study, this scale seemed too optimistic. In the forma-
tive phase of design or in agile development, the UMUX-LITE
could be adopted as a preliminary and quick tool to test users’
reactions to a prototype. Then, in advanced design phases or in
summative evaluation phases, we recommend using a combina-
tion of the SUS and UMUX-LITE (or UMUX) to assess user
satisfaction with usability (note that because the UMUX-LITE
was derived from the UMUX, when you collect the UMUX
you also collect the data needed to compute the UMUX-LITE).
Over time this could lead to a database of concurrently collected
SUS, UMUX, and UMUX-LITE scores that would allow more
detailed investigation of their relationships and psychometric
properties.
ACKNOWLEDGMENTS
We thank Dr. James R. Lewis, senior human factors engineer
at IBM Software Group and guest editor of this special issue,
for his generous feedback during the preparation of this paper.
REFERENCES
Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A class of multidimen-
sional latent class IRT models for ordinal polytomous item responses.
Communications in Statistics – Theory and Methods,43, 787–800.
doi:10.1080/03610926.2013.827718
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation
of the System Usability Scale. International Journal of Human-Computer
Interaction,24, 574–594. doi:10.1080/10447310802205776
Bartolucci, F. (2007). A class of multidimensional IRT models for test-
ing unidimensionality and clustering items. Psychometrika,72, 141–157.
doi:10.1007/s11336-005-1376-9
Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package
for multidimensional latent class item response models. Computational
Statistics & Data Analysis,71, 971–985. doi:10.1016/j.csda.2013.05.018
Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of the
System Usability Scale (SUS): A test of alternative measurement models.
Cognitive Processing,10, 193–197. doi:10.1007/s10339-009-0268-9
Borsci, S., Kuljis, J., Barnett, J., & Pecchia, L. (2014). Beyond the user
preferences: Aligning the prototype design to the users’ expectations.
Human Factors and Ergonomics in Manufacturing & Service Industries.
doi:10.1002/hfm.20611
Borsci, S., Kurosu, M., Federici, S., & Mele, M. L. (2013). Computer systems
experiences of users with and without disabilities: An evaluation guide for
professionals. Boca Raton, FL: CRC Press. doi:10.1201/b15619-1
Brooke, J. (1996). SUS: A “quick and dirty” usability scale. In P. W. Jordan,
B. Thomas, B. A. Weerdmeester, & I. L. McClelland (Eds.), Usability
evaluation in industry (pp. 189–194). London, UK: Taylor & Francis.
Cliff, N. (1987). Analyzing multivariate data. San Diego, CA: Harcourt Brace
Jovanovich.
Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2003). Human–computer
interaction. Harlow, UK: Pearson Education.
Finstad, K. (2010). The usability metric for user experience. Interacting with
Computers,22, 323–327. doi:10.1016/j.intcom.2010.04.004
Finstad, K. (2013). Response to commentaries on “The Usability Metric
for User Experience.” Interacting with Computers,25, 327–330.
doi:10.1093/iwc/iwt005
USER SATISFACTION IN ERA OF USER EXPERIENCE 495
Goodman, L. A. (1974). Exploratory latent structure analysis using
both identifiable and unidentifiable models. Biometrika,61, 215–231.
doi:10.1093/biomet/61.2.215
Hassenzahl, M. (2005). The thing and I: Understanding the relationship between
user and product. In M. Blythe, K. Overbeeke, A. Monk, & P. Wright
(Eds.), Funology: From usability to enjoyment (Vol. 3, pp. 31–42). Berlin,
Germany: Springer. doi:10.1007/1-4020-2967-5_4
Hassenzahl, M., & Tractinsky, N. (2006). User experience—A research agenda.
Behaviour & Information Technology,25, 91–97. doi:10.1080/01449290
500330331
Hassenzahl, M., Wiklund-Engblom, A., Bengs, A., Hägglund, S., &
Diefenbach, S. (2015). Experience-Oriented and Product-Oriented
Evaluation: Psychological Need Fulfillment, Positive Affect, and Product
Perception. International Journal of Human-Computer Interaction,31,
530–544.
ISO 9241-11:1998 Ergonomic requirements for office work with visual display
terminals – Part 11: Guidance on usability.
ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210:
Human-centred design for interactive systems.
Kortum, P. T., & Bangor, A. (2012). Usability ratings for everyday prod-
ucts measured with the System Usability Scale. International Journal
of Human–Computer Interaction,29, 67–76. doi:10.1080/10447318.2012.
681221
Kortum, P. T., & Johnson, M. (2013). The relationship between levels of user
experience with a product and perceived system usability. Proceedings of
the Human Factors and Ergonomics Society Annual Meeting,57 , 197–201.
doi:10.1177/1541931213571044
Lallemand, C., Gronier, G., & Koenig, V. (2015). User experience: A con-
cept without consensus? Exploring practitioners’ perspectives through
an international survey. Computers in Human Behavior,43, 35–48.
doi:10.1016/j.chb.2014.10.048
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA:
Houghton, Mifflin.
Lee, S., & Koubek, R. J. (2012). Users’ perceptions of usability and aesthetics
as criteria of pre- and post-use preferences. European Journal of Industrial
Engineering,6, 87–117. doi:10.1504/EJIE.2012.044812
Lewis, J. R. (2006). Usability testing. In G. Salvendy (Ed.), Handbook of human
factors and ergonomics (pp. 1275–1316). New York, NY: Wiley & Sons.
Lewis, J. R. (2013). Critical review of “The Usability Metric for
User Experience.” Interacting with Computers,25, 320–324.
doi:10.1093/iwc/iwt013
Lewis, J. R. (2014). Usability: Lessons learned ... and yet to be learned.
International Journal of Human-Computer Interaction,30, 663–684.
doi:10.1080/10447318.2014.930311
Lewis, J. R., & Sauro, J. (2009). The factor structure of the System Usability
Scale. In M. Kurosu (Ed.), Human centered design (Vol. 5619, pp. 94–103).
Berlin, Germany: Springer. doi:10.1007/978-3-642-02806-9_12
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE: When there’s
no time for the SUS. In Proceedings of CHI 2013 (pp. 2099–2102). Paris,
France: ACM. doi:10.1145/2470654.2481287
Lindgaard, G., & Dudek, C. (2003). What is this evasive beast we
call user satisfaction? Interacting with Computers,15, 429–452.
doi:10.1016/S0953-5438(02)00063-2
McLellan, S., Muddimer, A., & Peres, S. C. (2012). The effect of experience on
System Usability Scale ratings. Journal of Usability Studies,7, 56–67.
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response
theory models. New York, NY: Taylor & Francis.
Petrie, H., & Bevan, N. (2009). The evaluation of accessibility, usability, and
user experience. In C. Stephanidis (Ed.), The universal access handbook
(pp. 299–314). Boca Raton, FL: CRC Press.
Sauro, J. (2011). Does prior experience affect perceptions of usability?
Retrieved from http://www.measuringusability.com/blog/prior-exposure.
php
Sauro, J., & Lewis, J. R. (2011). When designing usability question-
naires, does it hurt to be positive? In Proceedings of CHI 2011
(pp. 2215–2223). Vancouver, Canada: ACM. doi:10.1145/1978942.
1979266
Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience:
Practical statistics for user research. Burlington, MA: Morgan
Kaufmann.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of
Statistics,6, 461–464. doi:10.2307/2958889
Tractinsky, N. (1997). Aesthetics and apparent usability: Empirically assess-
ing cultural and methodological issues. In Proceedings of CHI 1997
(pp. 115–122). Atlanta, GA: Association for Computing Machinery.
doi:10.1145/258549.258626
van der Linden, W., & Hambleton, R. K. (1997). Handbook of modern item
response theory. New York, NY: Springer.
Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commercial
web sites: The effect of design and use. Information & Management,43,
157–178. doi:10.1016/j.im.2005.04.002
ABOUT THE AUTHORS
Simone Borsci is a Research Fellow in Human Factors
at Imperial College of London NHIR-Diagnostic Evidence
Cooperative group. He has over 10 years of experience as psy-
chologist and HCI expert in both industry and academia. He
has worked as the UX lead of the Italian Government’s working
group on usability, and as a researcher at University of Perugia,
Brunel University, and Nottingham University.
Stefano Federici is currently Associate Professor of General
Psychology at the University of Perugia. He is the coordi-
nator of a research team of CognitiveLab at University of
Perugia (www.cognitivelab.it). His research is focused on assis-
tive technology assessment processes, disability, and cognitive
and human interaction factors.
Michela Gnaldi is currently an Assistant Professor of
Social Statistics at the Department of Political Sciences of the
University of Perugia. Her main research interest concerns mea-
surement in education. On this topic, she participated in several
research projects of national interest in Italy and in the UK,
where she has been working as a statistician and researcher at
the National Foundation for Educational Research.
Silvia Bacci has been an Assistant Professor of Statistics at
the University of Perugia. Her research interests concern latent
variable models, with a special focus on models for categorical
and longitudinal/multilevel data, latent class models, and item
response theory models. Now she participates in a FIRB project
funded by the Italian government.
Francesco Bartolucci is Full Professor of Statistics at the
Department of Economics of University of Perugia. He is the
Principal Investigator of the research project “Mixture and
latent variable models for causal inference and analysis of
socio-economic data” (FIRB 2012 - “Futuro in ricerca” – Italian
Government).
... Satisfaction is a complex measure of the end-user reaction to and their reasoning about systems relating to the efficiency and effectiveness, and accuracy and reliability of assessment modalities [3,13,20,25,29]. ...
... User satisfaction is generally assessed after interaction with a given system, by using reliable usability scales such as the System Usability Scale (SUS, [7] and its shorter proxies, the Usability Metric for User Experience (UMUX, [18]) and UMUX LITE [28]. Instead of questions about user satisfaction and ease of use, that are barely comparable [2,3], standardised scales aim to assess the users' perspective after interacting with products, usually on a score from 0 to 100, to provide comparable insights regarding the quality of tools, by investigating the participants' perception of, and reaction to key interactive aspects of such experiences. Such standardised subjective assessment, when coupled with objective measures of effectiveness and efficiency can provide relevant, replicable, and comparable information about the usability (ISO 9241-11). ...
... The present work aims to perform a confirmatory factorial analysis on BUS-15, testing its psychometric properties and potential alternative factorial models. The confirmatory analysis is considered a necessary step [32,40] to validate a new scale by statistically checking and optimising the factorial model that emerged in the exploratory analysis [3]. In addition, we also conduct an analysis under a designometric perspective. ...
Article
Full-text available
The Bot Usability Scale (BUS) is a standardised tool to assess and compare the satisfaction of users after interacting with chatbots to support the development of usable conversational systems. The English version of the 15-item BUS scale (BUS-15) was the result of an exploratory factorial analysis; a confirmatory factorial analysis tests the replicability of the initial model and further explores the properties of the scale aiming to optimise this tool seeking for the stability of the original model, the potential reduction of items, and testing multiple language versions of the scale. BUS-15 and the usability metrics for user experience (UMUX-LITE), used here for convergent validity purposes, were translated from English to Spanish, German, and Dutch. A total of 1292 questionnaires were completed in multiple languages; these were collected from 209 participants interacting with an overall pool of 26 chatbots. BUS-15 was acceptably reliable; however, a shorter and more reliable solution with 11 items (BUS-11) emerged from the data. The satisfaction ratings obtained with the translated version of BUS-11 were not significantly different from the original version in English, suggesting that the BUS-11 could be used in multiple languages. The results also suggested that the age of participants seems to affect the evaluation when using the scale, with older participants significantly rating the chatbots as less satisfactory, when compared to younger participants. In line with the expectations, based on reliability, BUS-11 positively correlates with UMUX-LITE scale. The new version of the scale (BUS-11) aims to facilitate the evaluation with chatbots, and its diffusion could help practitioners to compare the performances and benchmark chatbots during the product assessment stage. This tool could be a way to harmonise and enable comparability in the field of human and conversational agent interaction.
... It contains 10 mixed tone items, where odd numbers have a positive tone while the even numbers have a negative tone, with a 5-point Likert scales anchored at the end points with the terms "Strongly disagree" for 1, "Strongly agree" for 5 (Lewis et al., 2015). These values are summed and converted, by means of multiplication factors, to obtain an overall score ranging between 0 and 100 (Borsci et al., 2015). This range is then broken down in different grades (Sauro and Lewis, 2011) of usability indicated with an uppercase letter from F (absolutely unsatisfactory) to A+ (absolutely satisfactory). ...
... Nevertheless, a more stringent criterion has been used to this end. In particular, it has been adopted a curved grading scale (Borsci et al., 2015) that correlates the SUS score to the usability perceived by users. On the basis of this criterion, the SUS mean values of both groups have overcome the threshold of 84.1 which identifies the lower limit of the range identified by the letter A + . ...
Chapter
In the context of Industry 4.0, Operator 4.0 paradigm represents a key factor when dealing with the integration of new digital technologies into smart factories that are suited for workers with different skills, capabilities, and preferences. In this regard, to encourage the introduction of these new digital solutions and achieve high user acceptance, it is fundamental to consider human factors and put the worker at the center of the development process through the adoption of structured design strategies such as user-centered design (UCD) approaches.
... Moreover, the UX concept is also tightly related to service quality. UX includes but goes beyond traditional usability and can be described as the "person's perception and responses resulting from the use and/or anticipated use of a product" [23]. ...
... In this study, we describe UX as the subjective reflection of service quality from users. Though experts have different points of view on UX, it is generally agreed that (1) the amount of interactions with the product can affect UX, and (2) the interactive experience of a user is affected by perceived usability, interface aesthetics and the extent to which user needs are met [23]. It is reported that UX is a vital, even decisive, factor in user behavior [13,14]. ...
Article
Full-text available
The important influence of e-service quality (e-SQ) on customer satisfaction and loyalty has been demonstrated in many contexts, but has not been examined in telecom settings yet. The current study aimed to construct a measurement scale for e-SQ in telecom settings, as well as to investigate the relationship between e-SQ, customer satisfaction, and customer loyalty. In this study, we analyzed self-reports from 9249 respondents (74.55% were male) between the ages of 19 and 45. A scale consisting of five user experinece dimensions (functional completeness, performance, interface and interaction quality, content and information, support or service) was developed to measure e-SQ in the telecom industry. The scale was proven reliable and valid. The analysis confirmed a positive relationship between e-SQ, customer satisfaction and loyalty. In addition, e-SQ was found to be a core predictor of customer satisfaction and customer loyalty. Moreover, customer satisfaction emerged as the strongest predictor of customer loyalty.
... The relationship between accessibility and usability is often reduced to one of objectivity and subjectivity (Borsci, Federici, Bacci, Gnaldi, and Bartolucci, 2015;Borsci et al., 2013;Federici et al., 2005), but this simplification does not capture all the aspects of the interaction between technology and user (Annett, 2002;Kirakowski, 2002). As argued: Accessibility refers to the interface code that allows a user to access and achieve the information (e.g., a user can read a text alternative description of a figure by a screen reader), usability pertains to the subjective perception (satisfaction) of the interface structure's efficiency and effectiveness (e.g., a user is satisfied because they can immediately achieve the information they are looking for). ...
... 2. The techniques actually applied by the evaluator for evaluating accessibility, usability, and satisfaction (see for a review Borsci et al., 2013Borsci et al., , 2015. Using a particular technique forces the evaluator to adapt his or her mental model to the perspective endorsed by the technique. ...
Chapter
This chapter discusses the relationship between the accessibility and usability constructs and how they relate to the user experience (UX) theoretical approach. We present an integrated model of interaction evaluation, a new evaluation perspective based on UX that is intended to be used as a framework for evaluating users’ interactions with assistive technology (AT) and to organize and evaluate the AT assessment process. The evaluator’s mental model is used to evaluate the relationship between the designers’ and the users’ mental models from objective and the subjective points of view. The new perspective endorsed by the chapter is that the UX concept can be used not only to set up an evaluation of users’ interactions with AT, but also to organize and evaluate the AT assessment process and to design (or redesign) technologies to overcome the barriers to use that disabled users typically experience. The redesign of a sonificated web search engine is presented as an example of the growing need to use a UX-based approach to AT design.
... Table 5 shows the SUS results of the VHXR-based welding training system. The overall SUS mean score is 72, which is higher than the average score of 70 [40,41]. The factor analysis with varimax rotation revealed three significant factors in the questionnaire. ...
Article
Full-text available
Welding training has been an important job training process in the industry and usually demands a large amount of resources. In real practice, the strong magnetic force and intense heat during the welding processes often frighten novice welders. In order to provide safe and effective welding training, this study developed a visuo-haptic extended reality (VHXR)–based hands-on welding training system for training novice welders to perform a real welding task. Novice welders could use the VHXR-based system to perform a hands-on manual arc welding task, without exposure to high temperature and intense ultraviolet radiation. Real-time and realistic force and visual feedback are provided to help trainees to maintain a constant arc length, travel speed, and electrode angle. Compared to the traditional video training, users trained using the VHXR-based welding training system significantly demonstrated better performance in real welding tasks. Trainees were able to produce better-quality joints by performing smoother welding with less mistakes, inquiry times, and hints.
... This research uses SUS with the Post-Study System Usability Questionnaires (PSSUQ) questionnaire model [5]. The System Usability Scale (SUS) provides a "quick and dirty", reliable tool for measuring the usability [6,7]. PSSUQ consists of a 16 items question with 7 response options for respondents; from Strongly disagree to Strongly agree. ...
Article
Full-text available
During the corona virus pandemic, the thesis examination in the Construction Engineering Management study program was conducted online. This is in accordance with the rules of the director at State polytechnic of Malang. Based on this background, a thesis examination system was developed to enable the tests to be conducted online. The main features of this application are the guidance assessment, exam evaluation and revision can be done online. To evaluate the system that has been developed, the usability level of the application is measured using the System Usability Scale. System usability scale is a technique used to measure usability score, system quality, the information presented and the system interface. Based on the questionnaire filled by user the usability score of SIPROK-MRK was 0,82.
... The quantitative data collected in this study was subjected to descriptive and inferential statistical analysis. To better understand and interpret the SUS scores, the adjective [68] and curved grading scales [69,70] were used to analyze and interpret the SUS scores. This involved calculating a SUS score from the completed questionnaires, and generating a value on a 100-point rating scale, which may then be mapped to descriptive adjectives (best imaginable, excellent, good, OK, poor, and worst imaginable), an acceptability range (acceptable, marginal-high, marginal-low, and not acceptable), and a curved grading sale (F=absolutely unsatisfactory to A+=absolutely satisfactory) . ...
Article
Background In the field of occupational therapy, the assistive equipment provision process (AEPP) is a prominent preventive strategy used to promote independent living and to identify and alleviate fall risk factors via the provision of assistive equipment within the home environment. Current practice involves the use of paper-based forms that include 2D measurement guidance diagrams that aim to communicate the precise points and dimensions that must be measured in order to make AEPP assessments. There are, however, issues such as “poor fit” of equipment due to inaccurate measurements taken and recorded, resulting in more than 50% of equipment installed within the home being abandoned by patients. This paper presents a novel 3D measurement aid prototype (3D-MAP) that provides enhanced measurement and assessment guidance to patients via the use of 3D visualization technologies. Objective The purpose of this study was to explore the perceptions of older adults with regard to the barriers and opportunities of using the 3D-MAP application as a tool that enables patient self-delivery of the AEPP. Methods Thirty-three community-dwelling older adults participated in interactive sessions with a bespoke 3D-MAP application utilizing the retrospective think-aloud protocol and semistructured focus group discussions. The system usability scale (SUS) questionnaire was used to evaluate the application’s usability. Thematic template analysis was carried out on the SUS item discussions, think-aloud, and semistructured focus group data. Results The quantitative SUS results revealed that the application may be described as having “marginal-high” and “good” levels of usability, along with strong agreement with items relating to the usability (P=.004) and learnability (P<.001) of the application. Four high-level themes emerged from think-aloud and focus groups discussions: (1) perceived usefulness (PU), (2) perceived ease of use (PEOU), (3) application use (AU) and (4) self-assessment (SA). The application was seen as a useful tool to enhance visualization of measurement guidance and also to promote independent living, ownership of care, and potentially reduce waiting times. Several design and functionality recommendations emerged from the study, such as a need to manipulate the view and position of the 3D furniture models, and a need for clearer visual prompts and alternative keyboard interface for measurement entry. Conclusions Participants perceived the 3D-MAP application as a useful tool that has the potential to make significant improvements to the AEPP, not only in terms of accuracy of measurement, but also by potentially enabling older adult patients to carry out the data collection element of the AEPP themselves. Further research is needed to further adapt the 3D-MAP application in line with the study outcomes and to establish its clinical utility with regards to effectiveness, efficiency, accuracy, and reliability of measurements that are recorded using the application and to compare it with 2D measurement guidance leaflets.
Article
Full-text available
The Chinese version of the system usability scale (SUS) was re-translated in this study by the addition of an interview process plus the modification and selection of strict translation results. The revised translation is in close accordance with the linguistic usage of Chinese native speakers without any ambiguity. The revised Chinese version of the psychometric measurement is shown to be reliable, effective, and sensitive. We also conducted a comparative study within one group to confirm that the reliability of the cross-cultural adaptation version is higher than that of the original version. The questionnaire provides a tested tool for Chinese language users to help practitioners complete usability assessments.
Article
Full-text available
We collected over a thousand technology-mediated positive experiences with media and obtained measures describing aspects of the experience itself (affect, psychological need fulfillment) and of the product (i.e., content and technology) integral to the experience (pragmatic quality, hedonic quality). We found a strong relation between intensity of need fulfillment and positive affect. Furthermore, different activities had different need profiles. Watching was foremost a relatedness experience, listening a stimulation and meaning experience, and playing a competence experience. Need fulfillment and positive affect was related to perceptions of hedonic quality, however moderated through attribution, that is, the belief that the product played a role in creating the experience. Pragmatic quality was not linked to experiential measures. The present study (a) demonstrates the merits of distinguishing between an experience-oriented and a product-oriented evaluation, (b) suggests a set of possible measurement instruments for experience-oriented and a product-oriented evaluation, and (c) details the process of how positive experience is transformed into positive product perceptions and judgments of appeal.
Chapter
Developers work to create eSystems that are easy and straightforward for people to use. Terms such as user friendly and easy to use o en indicate these characteristics, but the overall technical term for them is usability. e ISO 9241 standard on Ergonomics of Human System Interaction2 (Part 11, 1998) defines usability as: e extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.
Article
In 2010, Kraig Finstad published (in this journal) ‘The Usability Metric for User Experience’—the UMUX. The UMUX is a standardized usability questionnaire designed to produce scores similar to the System Usability Scale (SUS), but with 4 rather than 10 items. The development of the questionnaire followed standard psychometric practice. Psychometric evaluation of the final version of the UMUX indicated acceptable levels of reliability (internal consistency), concurrent validity, and sensitivity. Critical review of this research suggests that its weakest element was the structural analysis, which concluded that the UMUX is unidimensional based on insufficient evidence. Mixed-tone item content and parallel analysis of the eigenvalues point to a possible two-factor structure. This weakness, however, is of more theoretical than practical importance, given the overall scale’s apparent reliability, validity, and sensitivity.
Article
The System Usability Scale (SUS) is a ten-point assessment tool developed as a reliable low-cost subjective usability scale that can be applied to systems in any number of contexts. Research has demonstrated higher usability ratings from users who claim greater experience with an interface than from those who rate themselves as having less experience. This paper describes research to extend this work by experimentally controlling the experience levels of the users over the course of the study, rather than relying on users’ self-report. Two studies were conducted. In the first, Microsoft Publisher was used over three one hour sessions, with usability being measured with the SUS at the completion of each session. In the second study, MathWorks MATLAB was used over the course of 14 weeks, and SUS usability was measured near the beginning, the middle and end of this time frame. Results from the MS publisher study showed an increase in reported usability with increased experience consistent with the literature, but the data from the MATLAB study did not show this trend. Reasons for this discrepancy are discussed, as are future research directions that could shed further light on these unexpected findings.
Article
The Usability Metric for User Experience (UMUX) is a four-item Likert scale aimed at replicating the psychometric properties of the System Usability Scale (SUS) in a more compact form. As part of a special issue of the journal Interacting with Computers, the UMUX is being examined in terms of purpose, reliability, validity and structure. This response to commentaries addresses concerns with these issues through updated archival research, deeper analysis on the original data and some updated results with an average-scoring system. The new results show the UMUX performs as expected for a wide range of systems and consists of one underlying usability factor.