ArticlePDF Available

Psychometric Evaluation of the EMO and the SUS in the Context of a Large-Sample Unmoderated Usability Study

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:
  • MeasuringU

Abstract and Figures

This article describes the psychometric properties of the Emotional Metric Outcomes (EMO) questionnaire and the System Usability Scale (SUS) using data collected as part of a large-sample unmoderated usability study (n = 471). The EMO is a concise multifactor standardized questionnaire that provides an assessment of transaction-driven personal and relationship emotional outcomes, both positive and negative. The SUS is a well-known standardized usability questionnaire designed to assess perceived usability. In previous research, psychometric evaluation using data from a series of online surveys showed that the EMO and its component scales had high reliability and concurrent validity with loyalty and overall experience metrics but did not find the expected four-factor structure. Previous structural analyses of the SUS have had mixed results. Analysis of the EMO data from the usability study revealed the expected four-factor structure. The factor structure of the SUS appeared to be driven by item tone. The estimated reliability of the SUS (.90) was consistent with previous estimates. The EMO and its subscales were also quite reliable, with the estimates of reliability for the various EMO scales ranging from.86 to.96. Regression analysis using SUS, EMO, and Effort as predictors revealed different key drivers for the outcome metrics of Satisfaction and Likelihood-to-Recommend. The key recommendations are to include the EMO as part of the battery of poststudy standardized questionnaires, along with the SUS (or similar questionnaire), but to be cautious in reporting SUS subscales such as Usable and Learnable.
Content may be subject to copyright.
This article was downloaded by: [James R. Lewis]
On: 07 August 2015, At: 17:15
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,
London, SW1P 1WG
Click for updates
International Journal of Human-Computer Interaction
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/hihc20
Psychometric Evaluation of the EMO and the SUS in
the Context of a Large-Sample Unmoderated Usability
Study
James R. Lewisa, Joshua Brownb & Daniel K. Mayesb
a IBM Corporation, Boca Raton, Florida, USA
b Strategic Resources, State Farm Mutual Automobile Insurance Company, Bloomington,
Illinois, USA
Accepted author version posted online: 25 Jun 2015.
To cite this article: James R. Lewis, Joshua Brown & Daniel K. Mayes (2015) Psychometric Evaluation of the EMO and the SUS
in the Context of a Large-Sample Unmoderated Usability Study, International Journal of Human-Computer Interaction, 31:8,
545-553, DOI: 10.1080/10447318.2015.1064665
To link to this article: http://dx.doi.org/10.1080/10447318.2015.1064665
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Intl. Journal of Human–Computer Interaction, 31: 545–553, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1044-7318 print / 1532-7590 online
DOI: 10.1080/10447318.2015.1064665
Psychometric Evaluation of the EMO and the SUS in the Context
of a Large-Sample Unmoderated Usability Study
James R. Lewis1, Joshua Brown2, and Daniel K. Mayes2
1IBM Corporation, Boca Raton, Florida, USA
2Strategic Resources, State Farm Mutual Automobile Insurance Company, Bloomington, Illinois, USA
This article describes the psychometric properties of the
Emotional Metric Outcomes (EMO) questionnaire and the System
Usability Scale (SUS) using data collected as part of a large-sample
unmoderated usability study (n=471). The EMO is a concise mul-
tifactor standardized questionnaire that provides an assessment of
transaction-driven personal and relationship emotional outcomes,
both positive and negative. The SUS is a well-known standard-
ized usability questionnaire designed to assess perceived usability.
In previous research, psychometric evaluation using data from a
series of online surveys showed that the EMO and its component
scales had high reliability and concurrent validity with loyalty and
overall experience metrics but did not find the expected four-fac-
tor structure. Previous structural analyses of the SUS have had
mixed results. Analysis of the EMO data from the usability study
revealed the expected four-factor structure. The factor structure
of the SUS appeared to be driven by item tone. The estimated
reliability of the SUS (.90) was consistent with previous estimates.
The EMO and its subscales were also quite reliable, with the esti-
mates of reliability for the various EMO scales ranging from .86 to
.96. Regression analysis using SUS, EMO, and Effort as predic-
tors revealed different key drivers for the outcome metrics of
Satisfaction and Likelihood-to-Recommend. The key recommen-
dations are to include the EMO as part of the battery of poststudy
standardized questionnaires, along with the SUS (or similar ques-
tionnaire), but to be cautious in reporting SUS subscales such as
Usable and Learnable.
1. INTRODUCTION
1.1. The Emotional Metric Outcomes Questionnaire
Description. The Emotional Metric Outcomes (EMO)
questionnaire is a recently published instrument designed to
provide a concise multifactor standardized questionnaire for
assessing the emotional outcomes of customer interaction.
It allows an assessment of transaction-driven personal and rela-
tionship emotional outcomes, both positive and negative. Initial
development of the EMO and its use in different contexts
Address correspondence to James R. Lewis, 7329 Serrano Terrace,
Delray Beach, FL 33446, USA. E-mail: jimlewis@us.ibm.com
Color versions of one or more of the figures in the article can be
found online at www.tandfonline.com/hihc.
has shown it to have the desirable psychometric characteris-
tics of reliability, validity, and sensitivity (Lewis & Mayes,
2014).
The primary purpose of the EMO is to help researchers
move beyond traditional assessment of satisfaction to achieve
a better diagnosis of customer response to products and pro-
cesses. To that purpose, the EMO allows an overall score and
scores for its component scales: Positive Relationship Affect
(PRA), Negative Relationship Affect (NRA), Positive Personal
Affect (PPA), and Negative Personal Affect (NPA). Given
the crossed nature of the component scales, it is also pos-
sible to conduct analyses of Tone (Positive/Negative), Scope
(Relationship/Personal), and their interaction.
There are two versions of the EMO—the EMO16 (full ver-
sion with 16 items, four per component scale) and the EMO08
(short version with eight items, two per component scale). EMO
overall and component scale scores can range from 0 (poorest
rating) to 10 (best rating). As shown in Figure 1,Items1to
4 make up PRA04, 5 to 8 make up NRA04, 9 to 12 are PPA04,
and 13 to 16 are NPA04. The short version uses the first two
items of each component scale (PRA02: 1–2; NRA02: 5–6;
PPA02: 9–10; NPA02: 13–14).
Previous research. To ensure an adequate sample size for
psychometric research, the initial development and validation
of the EMO was conducted in the context of a series of surveys
(Lewis & Mayes, 2014). A review of personal and relation-
ship emotional outcomes from the literatures of psychology,
human–computer interaction, human factors engineering, mar-
ket research, and machine learning resulted in an initial set of
52 items in which items had content related to a hypothesized
EMO factor and had a high factor loading in its original research
context. The purpose of the first survey (Phase 1, n=3,029) was
to collect item ratings in the context of a recent interaction
with an insurance or financial company. Respondents also com-
pleted items for the assessment of overall experience and three
loyalty items (likelihood to recommend, to remain a customer,
and to switch). The results of this survey informed the selec-
tion of the best 16 candidate items for the EMO (best four
for each component scale). The estimated reliabilities (using
coefficient alpha) for the full and short versions of the EMO
545
Downloaded by [James R. Lewis] at 17:15 07 August 2015
546 J. R. LEWIS ET AL.
FIG. 1. The EMO questionnaire. Note. PRA04: Items 1–4; PRA02: Items 1–2; NRA04: Items 5–8; NRA02: Items 5–6; PPA04: Items 9–12; PPA02: Items 9–10;
NPA04: Items 13–16; NPA02: Items 13–14—to compute EMO scale scores, average the indicated items after reverse-scoring items with negative tone (NRA
and NPA), using the formula 10 xi—overall EMO scores are averages of the indicated scale scores. EMO =Emotional Metric Outcomes; PRA =Positive
Relationship Affect; NRA =Negative Relationship Affect; PPA =Positive Personal Affect; NPA =Negative Personal Affect.
(and their component scales) exceeded the typical acceptance
criterion of .70 (Nunnally, 1978). All EMO scales correlated
significantly with the outcome metrics of overall experience
and loyalty (evidence of concurrent validity, all rs>.30 as
recommended in Nunnally, 1978). Structural analyses (prin-
cipal components and factor analysis) did not result in the
expected four-factor solution but instead indicated a two-factor
solution with items aligning as a function of item tone (posi-
tive vs. negative). Despite this weakness in construct validity,
regression analyses of EMO predictions of overall experi-
ence, loyalty, and likelihood-to-recommend (LTR) found that
each EMO component scale made at least one significant con-
tribution to the prediction of overall experience, loyalty, or
LTR. (LTR is the basis of the Net Promoter Score©—a trade-
mark of Satmetrix Systems, Inc., Bain & Company, and Fred
Reichheld.)
The purpose of Phase 2 (n=1,041) was to determine the
extent to which the EMO developed in Phase 1 exhibited accept-
able levels of psychometric quality in an independent sample of
data. Reliability, concurrent validity, and regression outcomes
were similar to those from Phase 1. The structural analyses were
similar to those from Phase 1, but in Phase 2 a three-factor
solution emerged, with positive-tone items on one factor and
negative-tone items distributing as expected across the hypothe-
sized personal and relationship factors. Thus, the Phase 2 results
supported the decisions and conclusions reached in Phase 1.
In a third survey (Phase 3, n=1,943), respondents not only
completed the EMO items and provided ratings of overall expe-
rience and loyalty (including LTR) based on their recollections
of recent use of an insurance website but also completed the
System Usability Scale (SUS; Brooke, 1996; Sauro & Lewis,
2012). For the EMO, the results of structural, reliability, and
concurrent validity analyses were consistent with the results
from Phase 2. Of particular interest in this phase were the cor-
relations between the EMO, SUS (see the next section), and
LTR. Sauro and Lewis (2012) had reported a correlation of .6
(R2=36%) between SUS and LTR ratings. With 64% of the
variance in LTR remaining unexplained, the hypothesis was that
adding the EMO16 would improve prediction of LTR. Lewis
and Mayes (2014) found, however, that the combination of the
SUS and EMO only marginally improved prediction of overall
experience and did not improve the prediction of LTR. When
modeled alone, the SUS accounted for 29% of variance in LTR;
the EMO16 accounted for 52%.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
PSYCHOMETRIC EVALUATION OF THE EMO AND SUS 547
1.2. The SUS Questionnaire
Description. Despite being a self-described “quick-and-
dirty” usability scale, the SUS (Brooke, 1996), developed in
the mid-1980s, has become a popular questionnaire for end-of-
test subjective assessments of usability (Sauro & Lewis, 2012;
Zviran, Glezer, & Avni, 2006). The SUS accounted for 43%
of posttest questionnaire usage in a recent study of a collec-
tion of unpublished usability studies (Sauro & Lewis, 2009).
Research conducted on the SUS has shown that although it is
fairly quick, it is probably not all that dirty. The SUS (shown in
Figure 2) is a questionnaire with 10 items, each with five scale
steps. The odd-numbered items have a positive tone; the tone of
the even-numbered items is negative.
The SUS scoring method requires participants to provide a
response to all 10 items. If for some reason participants can’t
respond to an item, they should select the center point of the
scale (Brooke, 1996). The first step in scoring a SUS ques-
tionnaire is to determine each item’s score contribution. For
positively worded items (odd numbers), the score contribution is
the scale position minus 1 (xi 1). For negatively worded items
(even numbers), the score contribution is 5 minus the scale posi-
tion (5 xi). With these manipulations, each SUS item ranges
from 0 to 4 (a higher score contribution indicates a better out-
come). To get the overall SUS score, multiply the sum of the
item score contributions by 2.5. Thus, overall SUS scores range
from 0 to 100 in 2.5-point increments.
Previous research. The SUS is probably the most stud-
ied standardized usability questionnaire, with the number of
research papers increasing in recent years (Lewis, 2014).
Its fundamental psychometric characteristics are well known.
Estimates of its reliability (assessed using coefficient alpha) are
typically just over .90 (Sauro & Lewis, 2012). Recent studies
have also provided evidence of the validity and sensitivity of
the SUS. Bangor, Kortum, and Miller (2008) found the SUS
to be sensitive to differences among types of interfaces and
changes made to a product. They also found significant con-
current validity with a single 7-point rating of user friendliness
(r=.806). Lewis and Sauro (2009) reported that the SUS was
sensitive to the differences in a set of 19 usability tests. Since
its initial publication, some researchers have proposed minor
changes to the wording of the items. For example, the original
SUS items referred to “system,” but substituting the word “web-
site” or “product,” or using the actual website or product name,
seems to have no effect on the resulting scores (Lewis & Sauro,
2009).
Sauro and Lewis (2009), in a meta-analysis of industrial
usability studies, found a significant relationship between rat-
ings in standardized usability questionnaires (including the
SUS) and other prototypical usability metrics, reporting a mean
correlation with system effectiveness (as assessed by successful
task completions) of .24 in the context of a posttest evaluation
of satisfaction following the completion of multiple tasks, and
a mean correlation of .51 in the context of satisfaction mea-
surement immediately following the performance of a single
task. Following up on this research, Kortum and Peres (2014)
conducted two studies in which they found significant but some-
what varying correlations between system effectiveness and
SUS scores (Study 1, r=.209; Study 2, r=.730).
Despite the otherwise excellent psychometric properties of
the SUS, there have been inconsistent reports of its construct
validity. The original intent was for the SUS to be a uni-
dimensional (one-factor) measurement of perceived usability
FIG. 2. The System Usability Scale questionnaire.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
548 J. R. LEWIS ET AL.
(Brooke, 1996). Once researchers began to publish data sets (or
correlation matrices) from sample sizes large enough to sup-
port factor analysis, it began to appear that the items of the
SUS might align with two factors. Data from three indepen-
dent studies (Bangor et al., 2008; Borsci, Federici, & Lauriola,
2009; Lewis & Sauro, 2009) indicated a consistent two-factor
structure (with Items 4 and 10 aligning on a factor separate
from the remaining items). Analyses conducted since 2009
(Lewis, Utesch, & Mayer, 2013; Sauro & Lewis, 2011; and
a number of unpublished analyses) have typically resulted in
a two-factor structure but have not replicated the item-factor
alignment that seemed apparent in 2009. The more recent anal-
yses have been somewhat consistent with a general alignment of
positive- and negative-tone items on separate factors—the type
of unintentional structure that can occur with sets of mixed-
tone items (Sauro & Lewis, 2012). Borsci et al. (this issue)
found the SUS to be unidimensional when relatively inexperi-
enced users provided ratings but to have the expected two-factor
structure (Usable and Learnable) for more experienced users.
Consequently, Lewis (2014) recommended that usability practi-
tioners or researchers who have fairly large-sample data sets of
SUS questionnaires should publish the results of factor analysis
of their data to lead to a deeper understanding of the conditions
that may affect the factor structure of the SUS.
2. METHOD
2.1. The Usability Study
The usability study that provided the data for the analy-
ses was a remote unmoderated UserZoom (see userzoom.com)
online usability study conducted to assess subtle design differ-
ences between two processes for obtaining a quote for renter’s
insurance. These types of online usability studies are similar
to online surveys but differ in the way participants actively
interact with the system under evaluation (Albert, Tullis, &
Tedesco, 2010;Lewis,2012). Despite potential issues such as
potential disruptions in the participant’s location while work-
ing on tasks, McFadden, Hager, Elie, and Blackwell (2002)
reported that remote testing was effective and produced results
comparable to more traditional testing. Other studies that have
demonstrated general consistency between the outcomes of
online unmoderated and in-lab moderated usability studies
were Andreasen, Nielsen, Schrøder, and Stage (2007); Tullis,
Fleischman, McNulty, Cianchette, and Bergel (2002); and West
and Lehman (2006).
Key criteria for participation in the study included being
a current renter, having shopped for renter’s insurance
recently, and being familiar with Internet browsing. Participants
attempted to complete one task (getting an online quote for
renter’s insurance). Participants who successfully completed
the task were those who reached the page where a completed
quote appeared. Although it is not possible to know exactly
what happened, a considerable amount of the feedback left by
those who abandoned the task referred to some sort of web-
site malfunction. After either completing or abandoning the
task, participants took a survey designed to assess their experi-
ence. Following screening for quality purposes (e.g., removing
apparent speeders and cheaters), 471 participants completed
the appropriate survey. The key subjective survey measures
included satisfaction, LTR, perceived effort, the EMO (see
Figure 1), and the SUS (see Figure 2).
2.2. Analytical Methods and Goals
The primary goal of this research was to investigate the
psychometric properties of the EMO and SUS in the con-
text of a usability study, filling one of the research gaps
identified in the initial publication of the EMO (Lewis &
Mayes, 2014). The current analyses focused on all partici-
pants who completed a minimum set of items needed for
analysis (including participants who successfully completed
the task and those who abandoned), for a total sample size
of 471. Unless otherwise specified, the analyses used the
full 16-item version of the EMO (EMO16, with subscales
PRA04, NRA04, PPA04, and NPA04)—unreported analyses
conducted using the short version (EMO08, with subscales
PRA02, NRA02, PPA02, and NPA02) were consistent with the
full-version.
To enable combination of EMO item scores through aver-
aging, all negative-tone items had their scales reversed using
the formula 10 xiwhere xiwas the item score. For exam-
ple, if the response for em07 (“This company cares more about
selling to me than about satisfying me”) was 3, then the rat-
ing for em07adj (the reversed item) was 7. Thus, a higher
item rating always implies a better score, a consistency needed
for scales created through averaging items to be meaningful
and interpretable. SUS scores were calculated using the proce-
dure established by Brooke (1996), just described. Factor and
reliability analyses used adjusted rather than raw item ratings.
Throughout, within-subjects effects with a significant
Mauchly’s Test of Sphericity appear with Greenhouse-Geisser
corrected degrees of freedom, reported to the nearest 0.1.
3. RESULTS
3.1. Factor Structure of the EMO
Table 1 shows the results of a varimax-rotated unweighted
least squares factor analysis of the EMO items (the results of
a principal components analysis indicated the same structure).
Unlike the structures reported by Lewis and Mayes (2014),
the data from the usability test produced a structure with the
expected item-factor alignment for a four-factor solution.
3.2. Factor Structure of the SUS
Table 2 shows the results of a varimax-rotated unweighted
least squares factor analysis of the SUS items. The analysis
Downloaded by [James R. Lewis] at 17:15 07 August 2015
PSYCHOMETRIC EVALUATION OF THE EMO AND SUS 549
TABLE 1
Factor Structure of the Emotional Metric Outcomes
Questionnaire
Item PRA NRA PPA NPA
em01 .799 .188 .283 .146
em02 .883 .176 .227 .178
em03 .774 .189 .242 .194
em04 .710 .105 .227 .159
em05adj .085 .761 .095 .181
em06adj .102 .886 .156 .176
em07adj .253 .674 .085 .156
em08adj .130 .711 .032 .225
em09 .403 .125 .664 .393
em10 .426 .141 .739 .382
em11 .411 .147 .759 .403
em12 .451 .153 .671 .373
em13adj .188 .228 .332 .790
em14adj .174 .334 .195 .741
em15adj .217 .242 .281 .853
em16adj .205 .249 .296 .847
Note. PRA =Positive Relationship Affect; NRA =Negative
Relationship Affect; PPA =Positive Personal Affect; NPA =Negative
Personal Affect.
TABLE 2
Factor Structure of the System Usability Scale
Item Positive Negative
1 .711 .036
2 .406 .669
3 .693 .380
4 .169 .740
5 .644 .279
6 .430 .758
7 .712 .349
8 .305 .703
9 .786 .352
10 .130 .793
indicated alignment as a function of item tone (positive vs.
negative). A principal components analysis revealed the same
structure.
3.3. Reliability Analyses
Table 3 shows the reliabilities of the SUS and the vari-
ous EMO scales. The SUS had its typical reliability of around
.90. All EMO scales, even those based on just two items, had
reliabilities greater than .85.
TABLE 3
Scale Reliabilities
Scale Reliability
SUS 0.90
EMO16 0.94
PRA04 0.92
NRA04 0.87
PPA04 0.96
NPA04 0.95
EMO08 0.88
PRA02 0.92
NRA02 0.86
PPA02 0.92
NPA02 0.88
Note. SUS =System Usability Scale; EMO =Emotional Metric
Outcomes; PRA =Positive Relationship Affect; NRA =Negative
Relationship Affect; PPA =Positive Personal Affect; NPA =Negative
Personal Affect.
3.4. Comparison of Full and Short Versions of the EMO
For interchangeable use of the full and short versions of the
EMO, it is desirable that there be little difference between their
mean scores. With large sample sizes, even small differences
can turn out to be statistically significant, so it is important to
also consider what differences would be practically meaningful.
For scales that can range from 0 to 10, a difference of 1 point is
10% of the range and a difference of 0.1 point is 1%. It seems
reasonable that mean differences that are less than 0.1, although
they may be statistically significant when the sample size is
large, would rarely have any practical meaning. Table 4 shows
the mean differences between the full and short EMO scales.
The differences between full and short scales for the overall
EMO and the PRA and NRA component scales were very small
and not statistically significant. For PPA and NPA, although
the differences were statistically significant, they were small
enough to be of little practical significance (<0.1).
TABLE 4
Comparisons of Long and Short Versions of EMO Scales
Comparison MDifference tTes t
EMO16-EMO08 .00613 t(468) =.38, p=.70
PRA04-PRA02 .01066 t(468) =.37, p=.71
NRA04-NRA02 .02452 t(468) =.55, p=.58
PPA04-PPA02 .06557 t(468) =–2.83, p=.005
NPA04-NPA02 .05490 t(468) =2.12, p=.035
Note. EMO =Emotional Metric Outcomes; PRA =Positive
Relationship Affect; NRA =Negative Relationship Affect;
PPA =Positive Personal Affect; NPA =Negative Personal Affect.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
550 J. R. LEWIS ET AL.
3.5. Regression Analyses
A series of regression analyses were conducted to examine
the strength of relationship among the SUS and EMO scales,
plus a one-item assessment of perceived Effort (assessed using
a 5-point scale item with the text “How difficult or easy was
your experience on this site?”), on the outcome metrics of
Satisfaction and LTR. All of the predictors in isolation signif-
icantly predicted Satisfaction and LTR but not equally well. For
the prediction of Satisfaction, the best model had PPA04, SUS,
and Effort as predictors. For the prediction of LTR, the best
model had PRA04, PPA04, and SUS. Table 5 lists the results
for all models. The last model listed for each outcome variable
contains only the significant predictors, with their beta weights
in parentheses.
3.6. Sensitivity Analyses
Table 6 shows the results of ttests conducted on the set
of predictive and outcome metrics, comparing the means for
participants who were successful with those who were not.
As expected, all ttests were highly significant (p<.0001), with
better outcomes for successful participants.
Due to its method of construction, analyses of variance using
the EMO can treat the subscales either as a single variable with
four levels or as two variables with two levels each (Scope:
Relationship vs. Personal; Tone: Positive vs. Negative). Profile
analysis (mixed analysis of variance with Scope/Tone as a
TABLE 6
Results of tTests Comparing Ratings From Successful and
Unsuccessful Participants
Variable M Difference tTes t
SUS 28.83 t(47.2) =7.82, p<.0001
Effort 1.57 t(46.1) =8.18, p<.0001
EMO16 2.56 t(47.8) =7.20, p<.0001
PRA04 1.68 t(49.5) =4.56, p<.0001
NRA04 1.60 t(51.6) =3.90, p<.0001
PPA04 3.19 t(46.1) =6.24, p<.0001
NPA04 3.75 t(47.1) =7.63, p<.0001
Satisfaction 1.79 t(46.9) =9.98, p<.0001
LTR 2.35 t(48.6) =5.63, p<.0001
Note. SUS =System Usability Scale; EMO =Emotional Metric
Outcomes; PRA =Positive Relationship Affect; NRA =Negative
Relationship Affect; PPA =Positive Personal Affect; NPA =Negative
Personal Affect; LTR =likelihood-to-recommend.
within-subjects variables and task success as a between-subjects
variable) found a significant main effect of task success, F(1,
467) =106.8, p<.0001, and a significant Success ×Scope
interaction, F(1, 467.0) =60.3, p<.0001—both evidence of
EMO scale sensitivity. As shown in Table 7 and Figure 3,the
interaction is not only significant but interesting. The primary
TABLE 5
Results of Regression Analyses
Prediction of Model (Predictors) R2(%)
Significant Predictors
(p<.05)
Satisfaction SUS 39.9 SUS
EMO16 36.0 EMO16
Effort 43.0 Effort
EMO16, SUS 42.4 SUS, EMO16
EMO16, SUS, Effort 54.2 SUS, EMO16, Effort
PRA04, NRA04, PPA04, NPA04 45.2 PPA04, NPA04
PRA04, NRA04, PPA04, NPA04, SUS 49.5 NRA04, PPA04, SUS
PRA04, NRA04, PPA04, NPA04, SUS, Effort 57.7 PPA04, SUS, Effort
PPA04 (.312), SUS (.194), Effort (.380) 57.5 PPA04, SUS, Effort
LTR SUS 40.1 SUS
EMO16 60.9 EMO16
Effort 16.6 Effort
EMO16, SUS 60.9 EMO16
EMO16, SUS, Effort 60.9 EMO16
PRA04, NRA04, PPA04, NPA04 68.7 PRA04, NRA04, PPA04
PRA04, NRA04, PPA04, NPA04, SUS 69.0 PRA04, PPA04, SUS
PRA04, NRA04, PPA04, NPA04, SUS, Effort 69.0 PRA04, PPA04, SUS
PRA04 (.350), PPA04 (.426), SUS (.155) 68.7 PRA04, PPA04, SUS
Note. SUS =System Usability Scale; EMO =Emotional Metric Outcomes; PRA =Positive Relationship Affect; NRA =Negative
Relationship Affect; PPA =Positive Personal Affect; NPA =NegativePersonalAffect;LTR=likelihood-to-recommend.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
PSYCHOMETRIC EVALUATION OF THE EMO AND SUS 551
TABLE 7
Success ×Scope Interaction
Outcome Relationship Personal Difference
Success 7.6 8.6 1.0
Failure 5.9 5.1 0.8
Difference 1.7 3.5 1.8
FIG. 3. Interaction between successful task completion and scope.
source of the interaction seems to be the difference between suc-
cessful and unsuccessful participants on the personal aspects of
emotion. The difference for the relationship aspects was fairly
large, but the difference for personal aspects was almost twice
the magnitude.
3.7. Relationship Between System Effectiveness and SUS
Scores
Adding to the investigation of the relationship between
system effectiveness and SUS scores started by Kortum and
Peres (2014), we also found a significant correlation between
successful task completions and SUS scores, r(469) =.486,
p<.0001.
4. DISCUSSION
Recently, Lewis and Mayes (2014) published a standard-
ized instrument for the assessment of emotional outcomes, the
EMO questionnaire. The EMO has four component scales:
PRA, NRA, PPA, and NPA, with scores ranging from 0 (poorest
rating) to 10 (best rating). These component scales can be com-
bined for analysis of Tone (Positive vs. Negative) and Scope
(Relationship vs. Personal) and their interaction. It is also pos-
sible for practitioners to use the full and short versions of the
EMO or, in special circumstances (e.g., management aversion
to asking negative-tone items), to use specific EMO subscales
rather than the full scale. For the first time since its development,
the EMO was included in a usability test. The purpose of the
current research was to investigate the psychometric properties
of the EMO and the standard SUS in the context of a usability
test.
4.1. Key Outcomes
The EMO had the expected factor structure. Both factor
and principal components analysis yielded a four-factor solution
with the expected item-factor alignment. This finding contrasts
with the results from the original EMO analyses (Lewis &
Mayes, 2014), in which all EMO component scales were signif-
icant predictors of one or more outcome variables but the factor
structure tended to yield three- rather than four-factor solutions.
It is interesting that in a context of measurement in which partic-
ipants provided an immediate rather than a retrospective rating
of interaction, the expected factor structure emerged.
The factor structure of the SUS appeared to be driven by item
tone. The data from this usability study suggested that the SUS
was bidimensional, but as a function of item tone (positive or
negative) rather than any interesting structure such as the pre-
viously reported Usable/Learnable dimensions. It is common
for questionnaires composed of mixed-tone items to have such
a structure and, unless the positive–negative structure is inten-
tional (as in the construction of the EMO), it is of no theoretical
interest. In other words, there is no theoretical or practical rea-
son for usability practitioners or researchers to separately report
scores based on the positive- and negative-tone items of the
SUS. Instead, it is reasonable to treat it as a unidimensional
measure. From the literature, there is some justification for par-
titioning SUS scores into Usable and Learnable subscales, but
given the variability in reported factor structures for the SUS,
any such partitioning should be interpreted with caution.
The estimated reliability of the SUS was consistent with
previous estimates. Typically estimated at just over .90, the
estimate from this data was exactly .90.
The EMO (both versions) and its subscales were reliable.
The estimates of reliability for the various EMO scales ranged
from .86 to .96, substantially exceeding the typical minimum
criterion of .70 (Nunnally, 1978).
The full and short versions of the EMO scales had sim-
ilar outcomes. In this usability testing data, the differences
between full and short scales for the overall EMO and the PRA
and NRA component scales were very small and not statisti-
cally significant. For PPA and NPA, although the differences
were statistically significant, they were small enough to be of
little practical significance (<0.1 on a scale ranging from 0 to
10). It is important to be cautious in generalizing this result.
Future research using the EMO should, if at all possible, use the
full version and report the differences between full and short
versions to build up enough data to determine if the similarity
of the outcomes of the scales is or is not usually the case.
Regression analysis using SUS, EMO, and Effort as predic-
tors revealed different key drivers for the outcome metrics of
Satisfaction and LTR. All of the predictors in isolation sig-
nificantly predicted Satisfaction and LTR, but not equally well.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
552 J. R. LEWIS ET AL.
For Satisfaction, the significant predictors, in order of their beta
weights, were Effort (.380), PPA04 (.312), and SUS (.194).
For the prediction of LTR, the best predictors, in order of
their beta weights, were PPA04 (.426), PRA04 (.350), and SUS
(.155). Only PPA04 (positive personal affect) was common to
both models. The key difference between the models was that
PRA04 (positive relationship affect) was not a key driver for
Satisfaction but was for LTR, whereas Effort was not a key
driver for LTR but was for Satisfaction.
EMO ratings were sensitive to the manipulation of success-
ful task completion. Profile analysis found a significant main
effect of task success and a significant Success ×Scope inter-
action. The primary source of the interaction seemed to be the
difference between successful and unsuccessful participants on
the personal aspects of emotion. The difference for the rela-
tionship aspects was fairly large, but the difference for personal
aspects had almost twice the magnitude. In other words, being
successful led to a better emotional outcome than being unsuc-
cessful, but the majority of the effect appeared to be in how
participants felt about themselves rather than in how they felt
about their relationship with the enterprise.
The correlation between system effectiveness and SUS rat-
ings was significant and substantial. Consistent with the find-
ings reported by Kortum and Peres (2014), there was a signif-
icant and substantial correlation between system effectiveness
(as measured by successful task completion) and SUS ratings,
r(469) =.49, p<.0001. This value is between the two reported
by Kortum and Peres (.21, .73) and is very close to the mean
of .51 reported by Sauro and Lewis (2009) for the relationship
between task-level satisfaction with successful completion of
that task. This seems appropriate and reasonable given that the
usability study that was the source of the data included one task.
4.2. Conclusions
In conclusion, the EMO performed very well in the context
of a usability study, showing generally excellent psychometric
characteristics. For the effective prediction of Satisfaction and
LTR from usability study data, there also appears to be value in
collecting SUS and Effort scores.
4.3. Recommendations
The key recommendations are as follows:
1. Usability practitioners and researchers should include the
EMO as part of their battery of poststudy standardized ques-
tionnaires, along with the SUS (or similar assessment of
perceived usability).
2. Even though the data in this study indicated practical equiv-
alence of the long and short versions of the EMO, until
other researchers have replicated this finding, future research
using the EMO should, if at all possible, use the full version.
Practitioners and researchers should report the differences
between full and short versions to help determine if the sim-
ilarity of the outcomes of the scales is or is not usually the
case.
3. Report overall SUS scores, but be cautious in reporting SUS
subscales such as Usable and Learnable—we are still learn-
ing about when the SUS seems to have or doesn’t have
meaningful bidimensional structure.
REFERENCES
Albert, B., Tullis, T., & Tedesco, D. (2010). Beyond the usability lab:
Conducting large-scale online user experience studies. Burlington, MA:
Morgan Kaufmann.
Andreasen, M. S., Nielsen, H. V., Schrøder, S. O., & Stage, J. (2007). What
happened to remote usability testing? An empirical study of three methods.
In Proceedings of CHI 2007 (pp. 1405–1414). San Jose, CA: Association
for Computing Machinery.
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation
of the System Usability Scale. International Journal of Human–Computer
Interaction,24, 574–594.
Borsci, S., Federici, S., Gnaldi, M., Bacci, S., & Bartolucci, F. (this issue).
Assessing user satisfaction in the era of user experience: An exploratory
analysis of SUS, UMUX and UMUX-LITE. International Journal of
Human–Computer Interaction, XX, xx–xx.
Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of
the System Usability Scale: A test of alternative measurement models.
Cognitive Processes,10, 193–197.
Brooke, J. (1996). SUS: A “quick and dirty” usability scale. In P. Jordan, B.
Thomas, & B. Weerdmeester (Eds.), Usability evaluation in industry (pp.
189–194). London, UK: Taylor & Francis.
Kortum, P., & Peres, S. C. (2014). The relationship between system effec-
tiveness and subjective usability scores using the System Usability Scale.
International Journal of Human–Computer Interaction,30, 575–584.
Lewis, J. R. (2012). Usability testing. In G. Salvendy (Ed.), Handbook of human
factors and ergonomics (pp. 1267–1312). New York, NY: Wiley.
Lewis, J. R. (2014). Usability: Lessons learned ... and yet to be learned.
International Journal of Human–Computer Interaction,30, 663–684.
Lewis, J. R., & Mayes, D. K. (2014). Development and psychometric evalua-
tion of the Emotional Metric Outcomes (EMO) questionnaire. International
Journal of Human–Computer Interaction,30, 685–702.
Lewis, J. R., & Sauro, J. (2009). The factor structure of the System
Usability Scale. In M. Kurosu (Ed.), Human centered design (pp. 94–103).
Heidelberg, Germany: Springer-Verlag.
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE—When
there’s no time for the SUS. In Proceedings of CHI 2013 (pp. 2099–2102).
Paris, France: Association for Computing Machinery.
McFadden, E., Hager, D. R., Elie, C. J., & Blackwell, J. M. (2002). Remote
usability evaluation: Overview and case studies. International Journal of
Human–Computer Interaction,14, 489–502.
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
Sauro, J., & Lewis, J. R. (2009). Correlations among prototypical usability met-
rics: Evidence for the construct of usability. In Proceedings of CHI 2009
(pp. 1609–1618). Boston, MA: Association for Computing Machinery.
Sauro, J., & Lewis, J. R. (2011). When designing usability questionnaires,
does it hurt to be positive? In Proceedings of CHI 2011 (pp. 2215–2223).
Vancouver, Canada: Association for Computing Machinery.
Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical
statistics for user research. Waltham, MA: Morgan Kaufmann.
Tullis, T., Fleischman, S., McNulty, M., Cianchette, C., & Bergel, M. (2002).
An empirical comparison of lab and remote usability testing of web sites. In
Proceedings of the Usability Professionals Association (pp. 1–8). Chicago,
IL: Usability Professionals Association.
West, R., & Lehman, K. R. (2006). Automated summative usability studies: An
empirical evaluation. In Proceedings of CHI 2006 (pp. 631–639). Montreal,
Canada: Association for Computing Machinery.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
PSYCHOMETRIC EVALUATION OF THE EMO AND SUS 553
Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commer-
cial web sites: The effect of design and use. Information Management,43,
157–178.
ABOUT THE AUTHORS
James R. Lewis is a senior human factors engineer (at IBM
since 1981), currently focusing on the design/evaluation of
speech applications. He has published influential papers in the
areas of usability testing and measurement. His books include
Practical Speech User Interface Design and (with Jeff Sauro)
Quantifying the User Experience.
Joshua Brown is a senior user experience researcher (at
State Farm since 2013), currently focusing on home and busi-
ness insurance, as well as agency production systems. He has
provided user experience research and consulting to various
projects previously and currently in development at State Farm.
Daniel K. Mayes is a manager in the research department
(at State Farm since 2005), currently focusing on customer seg-
mentation and its application to business initiatives. He has
published groundbreaking papers in the areas of cognitive psy-
chology and emotional research. His previous experience also
includes applying novel right brain constructs to other business-
to-customer applications such as video game design/evaluation.
Downloaded by [James R. Lewis] at 17:15 07 August 2015
... In addition, calculation and interpretation of the SUS scores are fairly straightforward and easily comprehensible to even non-experts. Numerous studies have been conducted on the validity, reliability and sensitivity of the SUS, and it has been shown that the scale gives 0.90 and higher results (Bangor et al. 2008;Lewis and Sauro 2009;Lewis et al. 2015a;Lewis et al. 2015b), above the typical criterion of 0.70 (Nunnally 1978). ...
Article
Full-text available
Technology has been a significant driver of sociological change throughout history. The relationship between technology and society is complex, with technology both shaping and being shaped by social forces. Technological developments, which started with the industrial revolution and gained a new dimension with Industry 4.0, have brought about radical changes in the ways technology is used to offer new products and services (Petrasch and Hentschke 2016). In addition, the 2020 Covid-19 pandemic has accelerated digital transformation causing many services and activities to be transferred to the digital environment and education was no exception. Advances in technology, changes in social structure and unforeseen epidemics and natural disasters have brought learning management systems in the center of our lives as a main part of education services. Although there existed several learning management systems for a while, after the 2020 pandemic we started to see a rapid increase in such systems. However, usability of these systems need to be evaluated preferably before and after their release. In this paper, we examined the usability of Bozok Learning Management System (BOYSIS) developed within Yozgat Bozok University and designed in line with the needs of the university. Since every system performs differently depending on the social and cultural environment it belongs, system usability needs to be evaluated from the point of final users. For this purpose, the System Usability Scale (SUS) was employed and we found that the BOYSIS has acceptable level of usability, yet it is below the expected industry standard.
... However, some inconsistent results have been obtained regarding its internal structure. Whereas some investigations replicated the one-factor structure of the original study (Bangor et al., 2008;Kortum & Sorber, 2015;Lewis, Brown, et al., 2015;Lewis, Utesch et al., 2015;Sauro & Lewis, 2011), other studies revealed a two-factor structure (Lewis & Sauro, 2009). In this sense, research showing a two-factor structure (Lewis & Sauro, 2017) supports the idea that the internal structure might be obscured because method effects associated with positively-and negativelytoned items, which might lead to different factors, as shown in previous research with scales using a similar approach (Wolgast, 2014). ...
Article
Full-text available
This study examined the SUS’s psychometric properties with the Spanish population considering the plausible method effects associated with negatively worded items. A short form consisting of either direct or reversed items was also examined. Participants were 1321 Spaniards who completed the SUS. Confirmatory analyses showed that the SUS was a valid measure with a one-factor structure when method errors associated with negatively worded items were considered (CFI = .932, TLI = .898; RMSEA = .055, CI 90% = 0.047, 0.062), and shown evidence of reliability (Cronbach’s alpha = .76). The short version with only positively worded items also showed to be a valid (CFI = .973, TLI = .946; RMSEA = .057, CI 90% = .041, .075) and reliable measure (Cronbach alpha = .77) to assess usability. This is the first study to clarify the effect of negatively worded items in the structure of the SUS and propose a short version of the SUS to be used with Spaniards when a brief version is preferred to test usability.
... Looking back at our analysis of usability, the subjects grouped in label ➀ in Fig. 8b have a dissenting opinion from the rest in terms of usability (U ), while the subjects in group ➂ make the average decrease. Thus, we will discuss a psychometric evaluation of the SUS for these subjects to analyse which factors influenced them the most in the actual usability perceived [81][82][83][84]. 18 https://github.com/angel539/extremo/wiki/Artifacts-Evaluation. ...
Article
Full-text available
Model-driven engineering (MDE) uses models as first-class artefacts during the software development lifecycle. MDE often relies on domain-specific languages (DSLs) to develop complex systems. The construction of a new DSL implies a deep understanding of a domain, whose relevant knowledge may be scattered in heterogeneous artefacts, like XML documents, (meta-)models, and ontologies, among others. This heterogeneity hampers their reuse during (meta-)modelling processes. Under the hypothesis that reusing heterogeneous knowledge helps in building more accurate models, more efficiently, in previous works we built a (meta-)modelling assistant called Extremo. Extremo represents heterogeneous information sources with a common data model, supports its uniform querying and reusing information chunks for building (meta-)models. To understand how and whether modelling assistants—like Extremo—help in designing a new DSL, we conducted an empirical study, which we report in this paper. In the study, participants had to build a meta-model, and we measured the accuracy of the artefacts, the perceived usability and utility and the time to completion of the task. Interestingly, our results show that using assistance did not lead to faster completion times. However, participants using Extremo were more effective and efficient, produced meta-models with higher levels of completeness and correctness, and overall perceived the assistant as useful. The results are not only relevant to Extremo, but we discuss their implications for future modelling assistants.
... In phase 2 and 3, the Dutch version of the System Usability Scale [SUS; (60)] was used; a 10-item questionnaire scored on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree), measuring the usability and acceptance of an intervention. The reliability of the scale is high (a > 0.90) and the concurrent validity acceptable (61). An example of an item is "I think that I would need the support of a technical person to be able to use this system". ...
Article
Full-text available
Introduction Over 25% of Dutch young people are psychologically unhealthy. Individual and societal consequences that follow from having mental health complaints at this age are substantial. Young people need care which is often unavailable. ENgage YOung people earlY (ENYOY) is a moderated digital social therapy-platform that aims to help youngsters with emerging mental health complaints. Comprehensive research is being conducted into the effects and to optimize and implement the ENYOY-platform throughout the Netherlands. The aim of this study is to explore the usability and user experience of the ENYOY-platform. Methods A user-centered mixed-method design was chosen. 26 young people aged 16–25 with emerging mental health complaints participated. Semi-structured interviews were conducted to explore usability, user-friendliness, impact, accessibility, inclusivity, and connection (Phase 1). Phase 2 assessed usability problems using the concurrent and retrospective Think Aloud-method. User experience and perceived helpfulness were assessed using a 10-point rating scale and semi-structured interviews (Phase 3). The Health Information Technology Usability Evaluation Scale (Health-ITUES; Phase 1) and System Usability Scale (SUS; Phase 2 and 3) were administered. Qualitative data was analyzed using thematic analysis. Task completion rate and time were tracked and usability problems were categorized using the Nielsen's rating scale (Phase 2). Results Adequate to high usability was found (Phase 1 Health-ITUES 4.0( 0.34 ); Phase 2 SUS 69,5( 13,70 ); Phase 3 SUS 71,6( 5,63 )). Findings from Phase 1 ( N = 10) indicated that users viewed ENYOY as a user-friendly, safe, accessible, and inclusive initiative which helped them reduce their mental health complaints and improve quality of life. Phase 2 ( N = 10) uncovered 18 usability problems of which 5 of major severity (e.g. troubles accessing the platform). Findings from Phase 3 ( N = 6) suggested that users perceived the coaching calls the most helpful [9( 0.71 )] followed by the therapy content [6.25( 1.41 )]. Users liked the social networking aspect but rated it least helpful [6( 2.1 )] due to inactivity. Conclusion The ENYOY-platform has been found to have adequate to high usability and positive user experiences were reported. All findings will be transferred to the developmental team to improve the platform. Other evaluation methods and paring these with quantitative outcomes could provide additional insight in future research.
... Total scores are converted (by multiplying the total score by 2.5) to a scale ranging from 0 to 100, where higher scores are indicative of higher platform usability. The SUS is considered a reliable instrument (Cronbach α=.90), with scores higher than 70 indicating "good" usability [48,49]. ...
Article
Full-text available
Background Internet-based interventions can be effective in the treatment of depression. However, internet-based interventions for older adults with depression are scarce, and little is known about their feasibility and effectiveness. Objective To present the design of 2 studies aiming to assess the feasibility of internet-based cognitive behavioral treatment for older adults with depression. We will assess the feasibility of an online, guided version of the Moodbuster platform among depressed older adults from the general population as well as the feasibility of a blended format (combining integrated face-to-face sessions and internet-based modules) in a specialized mental health care outpatient clinic. Methods A single-group, pretest-posttest design will be applied in both settings. The primary outcome of the studies will be feasibility in terms of (1) acceptance and satisfaction (measured with the Client Satisfaction Questionnaire-8), (2) usability (measured with the System Usability Scale), and (3) engagement (measured with the Twente Engagement with eHealth Technologies Scale). Secondary outcomes include (1) the severity of depressive symptoms (measured with the 8-item Patient Health Questionnaire depression scale), (2) participant and therapist experience with the digital technology (measured with qualitative interviews), (3) the working alliance between patients and practitioners (from both perspectives; measured with the Working Alliance Inventory–Short Revised questionnaire), (4) the technical alliance between patients and the platform (measured with the Working Alliance Inventory for Online Interventions–Short Form questionnaire), and (5) uptake, in terms of attempted and completed modules. A total of 30 older adults with mild to moderate depressive symptoms (Geriatric Depression Scale 15 score between 5 and 11) will be recruited from the general population. A total of 15 older adults with moderate to severe depressive symptoms (Geriatric Depression Scale 15 score between 8 and 15) will be recruited from a specialized mental health care outpatient clinic. A mixed methods approach combining quantitative and qualitative analyses will be adopted. Both the primary and secondary outcomes will be further explored with individual semistructured interviews and synthesized descriptively. Descriptive statistics (reported as means and SDs) will be used to examine the primary and secondary outcome measures. Within-group depression severity will be analyzed using a 2-tailed, paired-sample t test to investigate differences between time points. The interviews will be recorded and analyzed using thematic analysis. Results The studies were funded in October 2019. Recruitment started in September 2022. Conclusions The results of these pilot studies will show whether this platform is feasible for use by the older adult population in a blended, guided format in the 2 settings and will represent the first exploration of the size of the effect of Moodbuster in terms of decreased depressive symptoms. International Registered Report Identifier (IRRID) PRR1-10.2196/41445
Article
Perceived clutter is a potentially important but understudied construct in UX research. In this paper we described the development and assessment of a standardized questionnaire for reliable and valid measurement of perceived clutter of websites. Starting with an initial set of 16 items and two hypothesized factors, a series of exploratory analyses led to a final set of five items, two for the hypothesized construct of Content Clutter (too much irrelevant content like ads and videos) and three for the hypothesized construct of Design Clutter (poor design of relevant information like too much text, an unpleasant layout, or too much visual noise). Confirmatory analyses using an independent dataset showed excellent fit statistics for CFA of the five-item questionnaire and good fit for an SEM of the connections between clutter and other UX constructs. Researchers should exercise caution about generalizing results to other contexts and interfaces, but UX practitioners should be able to use this perceived clutter of websites (PCW) questionnaire when assessing consumer websites.
Chapter
Full-text available
Usability and user experience (UX) are important concepts in the design and evaluation of products or systems intended for human use. This chapter introduces the fundamentals of design for usability and UX, focusing on the application of science, art, and craft to their principled design. It reviews the major methods of usability assessment, focusing on usability testing. The concept of UX casts a broad net over all of the experiential aspects of use, primarily subjective experience. User-centered design and design thinking are methods used to produce initial designs, after which they typically use iteration for design improvement. Service design is a relatively new area of design for usability and UX practitioners. The fundamental goal of usability testing is to help developers produce more usable products. The primary activity in diagnostic problem discovery tests is the discovery, prioritization, and resolution of usability problems.
Article
Full-text available
Fatigue is the most reported symptom in patients with sarcoidosis (SPs) and is a significant predictor of decreased quality of life that is strongly associated with stress and negative mood states. Few medications exist for treating fatigue in SPs, and outpatient physical rehabilitation programs are limited by availability and cost. Sarcoidosis in the US predominantly impacts minorities and underserved populations who are of working age and often have limited resources (e.g., financial, transportation, time off work) that may prevent them from attending in-person programs. The use of mobile health (mHealth) is emerging as a viable alternative to provide access to self-management resources to improve quality of life. The Sarcoidosis Patient Assessment and Resource Companion (SPARC) App is a sarcoidosis-specific mHealth App intended to improve fatigue and stress in SPs. It prompts SPs to conduct breathing awareness meditation (BAM) and contains educational modules aimed at improving self-efficacy. Herein we describe the design and methods of a 3-month randomized control trial comparing use of the SPARC App (10-min BAM twice daily) to standard care in 50 SPs with significant fatigue (FAS ≥22). A Fitbit® watch will provide immediate heartrate feedback after BAM sessions to objectively monitor adherence. The primary outcomes are feasibility and usability of the SPARC App (collected monthly). Secondary endpoints include preliminary efficacy at improving fatigue, stress, and quality of life. We expect the SPARC App to be a useable and feasible intervention that has potential to overcome barriers of more traditional in-person programs.
Article
Full-text available
Foram avaliados 03 (três) modelos de protetores faciais, por 08 (oito) profissionais de saúde, com atuação em atendimento a pacientes com suspeita ou confirmação de diagnóstico de COVID - 19. Os protetores foram avaliados durante uma semana e ao término de cada semana, os voluntários responderam aos formulários de pesquisa e enviaram registros. Os resultados apontaram para recomendações que podem contribuir para a definição de requisitos de design que auxiliem no desenvolvimento de artefatos que atendam às percepções e expectativas dos profissionais de saúde.
Chapter
Full-text available
Covers the basics of usability testing plus some statistical topics (sample size estimation, confidence intervals, and standardized usability questionnaires).
Article
Full-text available
This article describes the development and psychometric evaluation of the Emotional Metric Outcomes (EMO) questionnaire—a new questionnaire designed to assess the emotional outcomes of interaction, especially the interaction of customers with service-provider personnel or software. The EMO is a concise multifactor standardized questionnaire that provides an assessment of transaction-driven personal and relationship emotional outcomes, both positive and negative. The primary purpose of the EMO is to move beyond traditional assessment of satisfaction to achieve a more effective measurement of customers’ emotional responses to products and processes. Psychometric evaluation showed that the EMO and its component scales had high reliability and concurrent validity with loyalty and overall experience metrics in a variety of measurement contexts. Concurrent measurement with the System Usability Scale (SUS) indicated that reported significant correlation of the SUS with likelihood-to-recommend ratings are probably primarily due to emotional rather than utilitarian aspects of the SUS.
Article
Full-text available
The philosopher of science J. W. Grove (1989) once wrote, “There is, of course, nothing strange or scandalous about divisions of opinion among scientists. This is a condition for scientific progress” (p. 133). Over the past 30 years, usability, both as a practice and as an emerging science, has had its share of controversies. It has inherited some from its early roots in experimental psychology, measurement, and statistics. Others have emerged as the field of usability has matured and extended into user-centered design and user experience. In many ways, a field of inquiry is shaped by its controversies. This article reviews some of the persistent controversies in the field of usability, starting with their history, then assessing their current status from the perspective of a pragmatic practitioner. Put another way: Over the past three decades, what are some of the key lessons we have learned, and what remains to be learned? Some of the key lessons learned are:• When discussing usability, it is important to distinguish between the goals and practices of summative and formative usability.• There is compelling rational and empirical support for the practice of iterative formative usability testing—it appears to be effective in improving both objective and perceived usability.• When conducting usability studies, practitioners should use one of the currently available standardized usability questionnaires.• Because “magic number” rules of thumb for sample size requirements for usability tests are optimal only under very specific conditions, practitioners should use the tools that are available to guide sample size estimation rather than relying on “magic numbers.”
Conference Paper
Full-text available
In this paper we present the UMUX-LITE, a two-item questionnaire based on the Usability Metric for User Experience (UMUX) [6]. The UMUX-LITE items are This system's capabilities meet my requirements and This system is easy to use." Data from two independent surveys demonstrated adequate psychometric quality of the questionnaire. Estimates of reliability were .82 and .83 -- excellent for a two-item instrument. Concurrent validity was also high, with significant correlation with the SUS (.81, .81) and with likelihood-to-recommend (LTR) scores (.74, .73). The scores were sensitive to respondents' frequency-of-use. UMUX-LITE score means were slightly lower than those for the SUS, but easily adjusted using linear regression to match the SUS scores. Due to its parsimony (two items), reliability, validity, structural basis (usefulness and usability) and, after applying the corrective regression formula, its correspondence to SUS scores, the UMUX-LITE appears to be a promising alternative to the SUS when it is not desirable to use a 10-item instrument.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Article
This article examines the relationship between users’ subjective usability assessments, as measured using the System Usability Scale (SUS), and the ISO metric of effectiveness, using task success as the measure. The article reports the results of two studies designed to explore the relationship between SUS scores and user success rates for a variety of interfaces. The first study was a field study, where stereotypical usability assessments on a variety of products and services were performed. The second study was a well-controlled laboratory study where the level of success that users were able to achieve was controlled. For both studies, the relationship between SUS scores and their attendant performance were examined at both the individual level and the average system level. Although the correlations are far from perfect, there are reliable and reasonably strong positive correlations between subjective usability measures and task success rates, for both the laboratory and field studies at both the individual and system level.
Book
Usability testing and user experience research typically take place in a controlled lab with small groups. While this type of testing is essential to user experience design, more companies are also looking to test large sample sizes to be able compare data according to specific user populations and see how their experiences differ across user groups. But few usability professionals have experience in setting up these studies, analyzing the data, and presenting it in effective ways. Online usability testing offers the solution by allowing testers to elicit feedback simultaneously from 1,000s of users. Beyond the Usability Lab offers tried and tested methodologies for conducting online usability studies. It gives practitioners the guidance they need to collect a wealth of data through cost-effective, efficient, and reliable practices. The reader will develop a solid understanding of the capabilities of online usability testing, when it's appropriate to use and not use, and will learn about the various types of online usability testing techniques.