Conference PaperPDF Available

Short Scales of Satisfaction Assessment: A Proxy to Involve Disabled Users in the Usability Testing of Websites

Authors:

Abstract and Figures

Short scales of user satisfaction analysis are largely applied in usability studies as part of the measures to assess the interaction experience of users. Among the traditional tools, System Usability Scale (SUS), composed of 10 items, is the most applied quick evaluation scale. Recently, researchers have proposed two new and shorter scales: the Usability Metric for User Experience (UMUX), composed of four items, and the UMUX-LITE, which consists of only the two positive items of UMUX. Despite their recent creation, researchers in human-computer interaction (HCI) have already showed that these two tools are reliable and strongly correlated to each other [1–3]. Nevertheless, there are still no studies about the use of these questionnaires with disabled users. As HCI experts claim [4–7], when disabled and elderly users are included in the assessment cohorts, they add to the overall analysis alternative and extended perspectives about the usability of a system. This is particularly relevant to those interfaces that are designed to serve a large population of end-users, such as websites of public administration or public services. Hence, for a practitioner adding to the evaluation cohorts a group of disabled people may sensibly extend number and types of errors identified during the assessment. One of the major obstacles in creating mixed cohorts is due to the increase in time and costs of the evaluation. Often, the budget does not support the inclusion of disabled users in the test. In order to overcome these hindrances, the administering to disabled users of a short questionnaire—after a period of use (expert disabled costumers) or after an interaction test performed through a set of scenario-driven tasks (novice disabled users)—permits to achieve a good trade-off between a limited effort in terms of time and costs and the advantage of evaluating the user satisfaction of disabled people in the use of websites. To date, researchers have neither analyzed the use of SUS, UMUX, and UMUX-LITE by disabled users, nor the reliability of these tools, or the relationship among those scales when administrated to disabled people. In this paper, we performed a usability test with 10 blind and 10 sighted users on the Italian website of public train transportation to observe the differences between the two evaluation cohorts in terms of: (i) number of identified errors, (ii) average score of the three questionnaires, and (iii) reliability and correlation of the three scales. The outcomes confirmed that the three scales, when administered to blind or sighted users, are reliable (Cronbach’s α > 0.8), though UMUX reliability with disabled users is lower than expected (Cronbach’s α < 0.5). Moreover, all the scales are strongly correlated (p < .001) in line with previous studies. Nevertheless, significant differences were identified between sighed and blind participants in terms of (i) number of errors experienced during the interaction and (ii) average satisfaction rated through the three questionnaires. Our data show, in agreement with previous studies, that disabled users have divergent perspectives on satisfaction in the use of a website. The insight of disabled users could be a key factor to improve the usability of those interfaces which aim to serve a large population, such as websites of public administration and services. In sum, we argue that to preserve the budget and even incorporate disabled users’ perspectives in the evaluation reports with minimal costs, practitioners may reliably test the satisfaction by administrating SUS and UMUX or UMUX-LITE to a mixed sample of users with and without disability.
Content may be subject to copyright.
Short Scales of Satisfaction Assessment:
A Proxy to Involve Disabled Users
in the Usability Testing of Websites
Simone Borsci
1
, Stefano Federici
2,3(&)
, Maria Laura Mele
2,3
,
and Matilde Conti
2
1
Human Factors Research Group, School of Mechanical, Materials
and Manufacturing Engineering, The University of Nottingham, Nottingham, UK
simone.borsci@gmail.com
2
Department of Philosophy, Social & Human Sciences and Education,
University of Perugia, Perugia, Italy
stefano.federici@unipg.it, matildeconti92@hotmail.it,
marialaura.mele@gmail.com
3
ECONA, Interuniversity Centre for Research on Cognitive Processing in
Natural and Articial Systems, Sapienza University of Rome, Rome, Italy
Abstract. Short scales of user satisfaction analysis are largely applied in
usability studies as part of the measures to assess the interaction experience of
users. Among the traditional tools, System Usability Scale (SUS), composed of
10 items, is the most applied quick evaluation scale. Recently, researchers have
proposed two new and shorter scales: the Usability Metric for User Experience
(UMUX), composed of four items, and the UMUX-LITE, which consists of only
the two positive items of UMUX. Despite their recent creation, researchers in
human-computer interaction (HCI) have already showed that these two tools are
reliable and strongly correlated to each other [13]. Nevertheless, there are still
no studies about the use of these questionnaires with disabled users. As HCI
experts claim [47], when disabled and elderly users are included in the
assessment cohorts, they add to the overall analysis alternative and extended
perspectives about the usability of a system. This is particularly relevant to those
interfaces that are designed to serve a large population of end-users, such as
websites of public administration or public services. Hence, for a practitioner
adding to the evaluation cohorts a group of disabled people may sensibly extend
number and types of errors identied during the assessment. One of the major
obstacles in creating mixed cohorts is due to the increase in time and costs of the
evaluation. Often, the budget does not support the inclusion of disabled users in
the test. In order to overcome these hindrances, the administering to disabled
users of a short questionnaireafter a period of use (expert disabled costumers)
or after an interaction test performed through a set of scenario-driven tasks
(novice disabled users)permits to achieve a good trade-off between a limited
effort in terms of time and costs and the advantage of evaluating the user
satisfaction of disabled people in the use of websites. To date, researchers have
neither analyzed the use of SUS, UMUX, and UMUX-LITE by disabled users,
nor the reliability of these tools, or the relationship among those scales when
administrated to disabled people.
© Springer International Publishing Switzerland 2015
M. Kurosu (Ed.): Human-Computer Interaction, Part III, HCII 2015, LNCS 9171, pp. 3542, 2015.
DOI: 10.1007/978-3-319-21006-3_4
In this paper, we performed a usability test with 10 blind and 10 sighted users
on the Italian website of public train transportation to observe the differences
between the two evaluation cohorts in terms of: (i) number of identied errors,
(ii) average score of the three questionnaires, and (iii) reliability and correlation
of the three scales.
The outcomes conrmed that the three scales, when administered to blind or
sighted users, are reliable (Cronbachs α > 0.8), though UMUX reliability with
disabled users is lower than expected (Cronbachs α < 0.5). Moreover, all the
scales are strongly correlated (p < .001) in line with previous studies. Never-
theless, signicant differences were identied between sighed and blind par-
ticipants in terms of (i) number of errors experienced during the interaction and
(ii) average satisfaction rated through the three questionnaires. Our data show, in
agreement with previous studies, that disabled users have divergent perspectives
on satisfaction in the use of a website. The insight of disabled users could be a
key factor to improve the usability of those interfaces which aim to serve a large
population, such as websites of public administration and services. In sum, we
argue that to preserve the budget and even incorporate disabled users per-
spectives in the evaluation reports with minimal costs, practitioners may reliably
test the satisfaction by administrating SUS and UMUX or UMUX-LITE to a
mixed sample of users with and without disability.
Keywords: Disabled user interaction
Usability evaluation Usability Metric
for User Experience
System Usability Scale
1 Introduction
Satisfaction is one of the three main components of usability [8], along with effec-
tiveness and efciency. Practitioners are used to testing this component through stan-
dardized questionnaires after that people have gain some experience in the use of a
website. In particular, experts are used to applying short scales of satisfaction analysis
to reduce the time and the costs of the assessment of a website. Among the quick
satisfaction scales, the most popula r tool of assessment is SUS [9]. SUS is a free and
highly reliable instrument [1014], composed of only 10 items on a ve-point scale (1:
Strongly disagree; 5: Strongly agree). To compute the overall SUS score, (1) each item
is converted to a 0-4 scale for which higher numbers indicate a greater amount of
perceived usability, (2) the converted scores are summed, and (3) the sum is multiplied
by 2.5. This process produces scores that can range from 0 to 100. Despite the fact SUS
was designed to be unidimensional, since 2009, several researchers have showed that
this tool has two-factor structures: Learnability (scores of items 4 and 10) and Usability
(scores of items 1-3 and 5-9) [2, 3, 13, 1517]. Moreover, the growing availability of
SUS data from a large number of studies [13, 18] has led to the production of norms for
the interpretation of mean SUS scores, e.g., the Curved Grading Scale (CGS) [16].
Using data from 446 studies and over 5,000 individual SUS responses, Sauro and
Lewis [16] found the overall mean score of the SUS to be 68 with a standard deviation
of 12.5.
The Sauro and Lewis CGS assigned grades as a function of SUS scores ranging
from F (absolutely unsatisfactory) to A+ (absolutely satisfactory), as follows:
36 S. Borsci et al.
Grade F (051.7); Grade D (51.862.6); Grade C- (62.764.9); Grade C (65.071.0);
Grade C+ (71.172.5); Grade B- (72.674.0); Grade B (74.177.1); Grade B+ (77.2
78.8); Grade A- (78.980.7); Grade A (80.8-84.0); Grade A+ (84.1100).
Recently, two new scales were proposed as shorter proxies of SUS [17]: the
UMUX, a four-item tool [1, 19], and the UMUX-LITE composed of only the two
positive-tone questions from the UMUX [3]. The UMUX items have seven points (1:
Strongly disagree; 7: Strongly agree) and both the UMUX and its reduced version, the
UMUX-LITE, are usually interpreted as unidimensional measures. The overall scales
of the UMUX and UMUX-LITE range from 0 to 100. Their scoring procedures are:
UMUX: The odd items are scored as [score 1] and even items as [7 score]. The
sum of the item scores is then divided by 24 and multiplied by 100 [1].
UMUX-LITE: The two items are scored as [score 1], and the sum of these is
divided by 12 and multiplied by 100 [3]. As researchers showed [1, 3, 19], SUS,
UMUX, and UMUX-LITE are reliable (Cronbachs α between .80 and .95) and cor-
relate signicantly (p < .001). However, for the UMUX-LITE, it is necessary to use the
following formula (1) to adjust its scores to achieve correspondence with the SUS [3].
UMUX LITE ¼ :65 Item 1 score½þItem 2 score½ðÞþ22:9: ð1Þ
Despite the fact short scale of satisfaction analysis is quite well known and used in
HCI studies, rarely have the psychometric properties of these scales been analyzed by
researchers when applied to test the usability of an interface with disabled users. This is
because elderly and disabled people are often excluded from the usability evaluation
cohorts because they are considered people with special needs [20], instead of
possible end-users of a product with divergent and alternative modalities of interaction
with websites. Nevertheless, as suggested by Borsci and colleagues [21], the experi-
ence of disabled users has a great value for HCI evaluators and for their clients. Indeed,
to enrich an evaluation cohort with sub-samples of disabled users could help evaluators
to run a sort of stress test of an interface [21
].
The main complaint of designers, as regards the involvement of disabled people in
the usability evaluation, is the cost of the test for disabled users. In fact, disabled users
testing usually requires more tim e compared with the assessment performed by people
without disability. The extra-time could be due to the following reasons. First, some
disabled users need to interact with a website through a set of assistive technologies
and this could require conducting the test in the wild instead of a lab. Second, eval-
uators need to set-up an adapted protocol of assessment for people with cognitive
impairment, such as dementia [7]. Nevertheless, these issues could be overcome by
adopting specic strategies. For instance, experts could ask for a small sample of
disabled users, who are already customers of a website, to perform at their house a set
of short interactions with a website driven by scenarios. Another approac h could be to
ask disabled users who are novices in the use of a website, to perform at home for a
week a set of tasks by controlling remotely the interaction of these users [4]. Inde-
pendently from the strategies, instead of fully monitoring the usability errors performed
by disabled users, experts could just request from these end-users to complete a short
scale after their experience with a system to gather their overall satisfaction. The
satisfaction outcomes of disabled users cohort could be then aggregated and compared
Short Scales of Satisfaction Assessment: A Proxy to Involve Disabled Users 37
with the results of the other cohort of people without disability. Therefore, by using
short scales of satisfaction evaluation, practitioners could save on costs and, with a
minimal effort, report to designers the number of errors identied, the level of satis-
faction experienced by users without disability, and a comparative analysis of the
satisfaction with a mixed cohort of users. Thus, short scales could be powerful tools to
include, at minimal cost, the opinions of disabled users in the usability assessment, in
order to enhance the reliability of the assessment report for the designers.
Today, the possibility to include a larger sample of users with different kind of
behaviors in the usabi lity testing is particularly relevant to obtain a reliable assessment.
In fact, in the context of ubiquitous computing people could access and interact through
different mobile devices with websites, and a large set of information on public services
(such as taxes, education, transport, etc.) is available online. Therefore, for the success
of public services websites it is important to have an interface which is accessible to a
wide range of possible users and usable in a satisfactory way.
Despite the growing involvement of disabled users in the usability analysis, there
are no studies analyzing the psychometric properties of short scales of satisfaction and
the use of these tools to assess the usability of website interfaces perceived by a sample
of disabled users.
The aim of this paper is to propose a preliminary analysis of the use of SUS,
UMUX, and UMUX-LITE with a small sample of users with and without disability. To
reach this aim, we involved in a usability assessment two different cohorts (blind and
sighted users), in order to observe the differences between the two samples in terms of
number of errors experienced by the end-users during the navigation, and the overall
scores of the questionnaires. Moreover, we compared the psychometric properties of
SUS, UMUX, and UMUX-LITE when administered to blind and sighted participants in
terms of reliability and scales correlation.
2 Methodology
Two evaluation cohorts composed of 10 blind-from-birth users (Age: 23.51; SD: 3.12)
and 10 sighted users (Age: 27.88; SD: 5.63) were enrolled through advertisements
among associations of disabled users , and among the students of the University of
Perugia, in Italy. Each participant was asked to perform on the website of the Italian
public train company (http://www.trenitalia.it) the following three tasks, presented as
scenarios:
Find and buy online a train ticket from Milan Central station to Rome
Termini station.
Find online and print the locat ion of info-points and ticket ofces at the train station
of Perugia.
Use the online claim form to report a problem about a train service.
Participants were asked to verbalize aloud their problems during the navigation. In
particular, sighted users were tested through a concurrent thinking aloud protocol,
while blind users were tested by a partial concurrent thinking aloud [7].
38 S. Borsci et al.
After the navigation each participant lled the Italian validated version [14 ] of three
scales, presented in a random order.
2.1 Data Analysis
For each group of participants there were descriptive statistics (mean [M], standard
deviation [SD]). An independent t-test analysis was performed to test the differences
between the two evaluation cohorts in terms of overall scores of the three question-
naires. Moreover, a Cronbachs α and Pearson correlation analyses were performed to
analyze the psychometric properties of the scales when administered to different
end-users. All analyses were performed using IBM® SPSS 22.
3 Results
3.1 Usability Problems and User Satisfaction
The two evaluation cohorts identied, separately, a total number of 29 problems: Blind
users experienced 19 usability issues, while sighted users experienced only 10 issues.
Of the 29 issues reported by the two cohorts, eight issues were ident ied by both blind
and sighted users; tw o problems only by sighted users; and 11 only by blind users.
Therefore, a sample of 21 unique usability issues was identied testing 20 end-users.
As reported in Table 1 , an independent t-test analysis showed that for each of the
questionnaires there was a signicant difference between the overall satisfaction in use
experienced by blind and sighted users.
As can be seen in Table 2, while blind users assessed the website as not usable
(Grade F), sighted users judged the interface as having an adequate level of usability
(Grades for C- to C). By aggregating the two evolution cohorts, the website could be
judged as a product with a low level of usability (Grade F).
3.2 Psychometric Properties of Questionnaires
The Cronbachs α analysis showed that all the questionnaires are reliable when
administered to both sighted and blind users (Table 3). Nevertheless, in the specic
case of blind users, UMUX reliability is lower than expected (.568).
Table 1. Differences among SUS, UMUX, and UMUX-LITE administered to blind and sighted
users.
Blind vs. Sighted users Degree of Freedom tp
SUS 17 6.469 .001
UMUX 4.876 .001
UMUX-LITE 4.319 .001
Short Scales of Satisfaction Assessment: A Proxy to Involve Disabled Users 39
As Table 4 shows, all the questionnaires, independently from the evolution cohort,
are strongly correlated (p < .001).
4 Discussion
Table 2 clearly shows that while sighted users judged the website as quite a usable
interface (Grades from C- to C), disabled users assessed the product as not usable
(Grade F). This distance between the two evaluation cohorts is perhaps due to the fact
that blind users experienced 11 more problems than the cohort of sighted participants.
These results indi cate that a practitioner adding to an evaluation cohort a sample of
disabled users may drastically change the results of the overall usability assessment,
i.e., the average overall score of the scales (Table 1).
Table 2. Average score, standard deviation (SD) and average aggregated scores of the SUS,
UMUX, and UMUX-LITE of blind and sighted users. For each scale the Curved Grading
Scale (CGS), provided by Sauro and Lewis [16], was also used to dene the grade of website
usability.
Sighted Blind Av. aggregated scores
SUS 67.75 (SD:20.83) 15.25 (SD:11.98) 41.5 (SD:31.6)
Grade C Grade F Grade F
UMUX 62.02 (SD:17.91) 32.10 (SD:11.99) 46.27 (SD:21.21)
Grade C - Grade F Grade F
UMUX-LITE 68.52 (27.48) 17.54 (14.24) 41.66 (34.27)
Grade C Grade F Grade F
Table 4. Correlations among SUS, UMUX, and UMUX-LITE for both blind and sighted users.
Types of end-users Scales SUS UMUX
Blind SUS 1 .948**
UMUX .935** 1
UMUX-LITE .948** .928**
Sighted SUS 1 .890**
UMUX .890** 1
UMUX-LITE 820** .937**
**. Correlation is signicant at the 0.01 level (2-tailed).
Table 3. Reliability of SUS, UMUX, and UMUX-LITE for both blind, and sighted users.
Blind Sighted
SUS .837 .915
UMUX .568 .898
UMUX-LITE .907 .938
40 S. Borsci et al.
The three scales were very reliable for both the cohorts (Cronbachs α > 0.8;
Table 3), however, the UMUX showed a low reliability when administered to blind
users (Cronbachs α > 0.5). This low level of reliability of UMUX was unexpected,
considering also that UMUX-LITE composed of only the positive items of UMUX
i.e., items 1 and 3 was very reliable (Table 3). Pe rhaps the negative items of UMUX
i.e., items 2 and 4 were perceived by disabled users as complex or unnecessary
questions, or this effect is an artifact of the randomized presentation of the question-
naires to the participants. Finally, for both the cohorts, the three scales were strongly
correlated i.e., p<.001 (see Table 4).
5 Conclusion
Quick and short questionnaires could be reliably used to assess the usability of a
website with blind users. All the three tools reliably capture the experience of partic-
ipants with and without disability, by offering to practitioners a good set of stan-
dardized results about the usability of a website.
Although further studies are needed to clarify the reliability of UMUX when
administered to disabled users, our results suggest that UMUX-LITE and SUS might be
applied by practitioners as good scales of satisfaction analysis. The use of these short
scales may help practitioners to involve blind participants in their evaluation cohorts
and to compare the website experience of people with and without disability. In fact,
practitioners with a minimal cost may administer SUS and UM UX or UMUX-LITE to
a mixed sample of users , thus obtaining an extra value for their report: the divergent
perspectives of the disabled users. This extra value is particularly important for web-
sites of public administration and of those services, such as public transport, that have
to be accessed by a wide range of people with d ifferent levels of functioning.
References
1. Finstad, K.: The Usability Metric for User Experience. Interacting with Computers 22, 323
327 (2010)
2. Lewis, J.R., Sauro, J.: The Factor Structure of the System Usability Scale. In: Kurosu, M.
(ed.) HCD 2009. LNCS, vol. 5619, pp. 94103. Springer, Heidelberg (2009)
3. Lewis, J.R., Utesch, B.S., Maher, D.E.: Umux-Lite: When Theres No Time for the Sus. In:
Conference on Human Factors in Computing Systems: CHI 13, pp. 20992102 (2013)
4. Petrie, H., Hamilton, F., King, N., Pavan, P.: Remote Usability Evaluations with Disabled
People. In: SIGCHI Conference on Human Factors in Computing Systems: CHI 06,
pp. 11331141 (2006)
5. Power, C., Freire, A., Petrie, H., Swallow, D.: Guidelines Are Only Half of the Story:
Accessibility Problems Encountered by Blind Users on the Web. In: Conference on Human
Factors in Computing Systems: CHI 12, pp. 433 (2012)
6. Rømen, D., Svanæs, D.: Evaluating Web Site Accessibility: Validating the Wai Guidelines
through Usability Testing with Disabled Users. In: 5
th
Nordic Conference on
Human-Computer InteractionBuilding Bridges: NordiCHI08, pp. 535538 (2008)
Short Scales of Satisfaction Assessment: A Proxy to Involve Disabled Users 41
7. Federici, S., Borsci, S., Stamerra, G.: Web Usability Evaluation with Screen Reader Users:
Implementation of the Partial Concurrent Thinking Aloud Technique. Cogn. Process. 11,
263272 (2010)
8. ISO: Iso 9241-11:1998 Ergonomic Requirements for Ofce Work with Visual Display
Terminals Part 11: Guidance on Usability. CEN, Brussels, BE (1998)
9. Brooke, J.: Sus: A Quick and Dirty Usability Scale. In: Jordan, P.W., Thomas, B.,
Weerdmeester, B.A., McClelland, I.L. (eds.) Usability Evaluation in Industry, pp. 189194.
Taylor & Francis, London (1996)
10. Lewis, J.R.: Usability Testing. In: Salvendy, G. (ed.) Handbook of Human Factors and
Ergonomics, pp. 12751316. John Wiley & Sons, New York (2006)
11. Sauro, J., Lewis, J.R.: When Designing Usability Questionnaires, Does It Hurt to Be
Positive? In: Conference on Human Factors in Computing Systems: CHI 11, pp. 2215
2224 (2011)
12. Zviran, M., Glezer, C., Avni, I.: User Satisfaction from Commercial Web Sites: The Effect
of Design and Use. Information & Management 43, 157178 (2006)
13. Bangor, A., Kortum, P.T., Miller, J.T.: An Empirical Evaluation of the System Usability
Scale. International Journal of Human-Computer Interaction 24, 574594 (2008)
14. McLellan, S., Muddimer, A., Peres, S.C.: The Effect of Experience on System Usability
Scale Ratings. Journal of Usability Studies 7,5667 (2012)
15. Borsci, S., Federici, S., Lauriola, M.: On the Dimensionality of the System Usability Scale
(Sus): A Test of Alternative Measurement Models. Cogn. Process. 10, 193197 (2009)
16. Sauro, J., Lewis, J.R.: Quantifying the User Experience: Practical Statistics for User
Research. Morgan Kaufmann, Burlington (2012)
17. Lewis, J.R.: Usability: Lessons Learned and yet to Be Learned. International Journal of
Human-Computer Interaction 30, 663684 (2014)
18. Kortum, P.T., Bangor, A.: Usability Ratings for Everyday Products Measured with the
System Usability Scale. International Journal of Human-Computer Interaction 29,6776
(2012)
19. Finstad, K.: Response to Commentaries on The Usability Metric for User Experience.
Interacting with Computers 25, 327330 (2013)
20. Biswas, P., Langdon, P.: Towards an Inclusive World a Simulation Tool to Design
Interactive Electronic Systems for Elderly and Disabled Users. In: 2011 Annual SRII Global
Conference, pp. 7382 (2011)
21. Borsci, S., Kurosu, M., Federici, S., Mele, M.L.: Computer Systems Experiences of Users
with and without Disabilities: An Evaluation Guide for Professionals. CRC Press, Boca
Raton, FL (2013)
42 S. Borsci et al.
... Research Implications. First, as with participants without disabilities (Bangor, Kortum, and Miller, 2008), the SUS has been found to be reliable for users with visual disabilities (Borsci, Federici, Mele, and Conti, 2015) and there is little reason to believe this would be different for users with other types of disabilities. ...
... Because the SUS is only diagnostic for usability (Bangor, Kortum, and Miller, 2008) or usability and learnability (Lewis and Sauro, 2009), accessibility issues encountered will factor into how a participant with a disability responds to the SUS statements and thus be an inseparable part of their SUS score. This combined effect has been found specifically with SUS scores (Borsci et al, 2015) as well as for WAMMI (Schmutz, Sonderegger, and Sauer, 2017). Thus, divergence in SUS scores between users with and without disabilities may be one indicator to a researcher about the impact of accessibility barriers encountered (and hopefully identified) during the study. ...
Article
Full-text available
This panel will discuss the System Usability Scale. Panelists all have extensive experience using the SUS within a broad range of contexts: diverse people (e.g., abilities, languages); different types of products; and different testing scenarios. Members of the audience will have the opportunity to ask questions about new research on the validity of the SUS in different environments as well as about lessons learned from practitioners using it to evaluate commercial products. Topics of specific interest to the authors are detailed within this paper.
... (i) Researchers tend to consider a lack of complaints as an indirect measure of the safety and acceptability of tools. However, safety and acceptability should be assessed with consolidated and comparable methodologies to rule out risks in use [37][38][39]. (ii) Satisfaction, intended as a usability metric, is a different construct from acceptability, and these two constructs should be measured separately with available standardized questionnaires [39,40]. ...
... However, safety and acceptability should be assessed with consolidated and comparable methodologies to rule out risks in use [37][38][39]. (ii) Satisfaction, intended as a usability metric, is a different construct from acceptability, and these two constructs should be measured separately with available standardized questionnaires [39,40]. ...
Conference Paper
Full-text available
People with disabilities or special needs can benefit from AI-based conversational agents, which are used in competence training and well-being management. Assessment of the quality of interactions with these chatbots is key to being able to reduce dissatisfaction with them and to understand their potential long-term benefits. This will in turn help to increase adherence to their use, thereby improving the quality of life of the large population of end-users that they are able to serve. We systematically reviewed the literature on methods of assessing the perceived quality of interactions with chatbots, and identified only 15 of 192 papers on this topic that included people with disabilities or special needs in their assessments. The results also highlighted the lack of a shared theoretical framework for assessing the perceived quality of interactions with chatbots. Systematic procedures based on reliable and valid methodologies continue to be needed in this field. The current lack of reliable tools and systematic methods for assessing chatbots for people with disabilities and special needs is concerning, and may lead to unreliable systems entering the market with disruptive consequences for users. Three major conclusions can be drawn from this systematic analysis: (i) researchers should adopt consolidated and comparable methodologies to rule out risks in use; (ii) the constructs of satisfaction and acceptability are different, and should be measured separately; (iii) dedicated tools and methods for assessing the quality of interaction with chatbots should be developed and used to enable the generation of comparable evidence.
... Accordingly, to fully model the perceived experience of a user, practitioners should include a set of repeated objective and subjective measures in their evaluation protocols to enable satisfaction and benefit analysis as a "subjective sum of the interactive experience" [4]. Several standardized tools have been developed to measure satisfaction, realization of benefit and perceived usability of user with and without disabilities [5][6][7][8][9][10][11]. It is also well known that if the UX of a product is assessed at the end of the design process, product changes are much more expensive than if the same evaluation were conducted throughout the development process (i.e., according to a usercentered design, UCD) [5,12]. ...
Chapter
To fully model the perceived experience of a user, practitioners should include a set of repeated objective and subjective measures in their evaluation protocols to enable satisfaction and benefit analysis as a “subjective sum of the interactive experience.” It is also well known that if the UX of a product is assessed at the end of the design process, product changes are much more expensive than if the same evaluation were conducted throughout the development process. In this study, we aim to present how these concepts of UX and UCD inform the process of selecting and assigning assistive technologies (ATs) for people with disabilities (PWD) according to the Matching Person and Technology (MPT) model and assessments. To make technology the solution to the PWD’s needs, the MPT was developed as an international measure evidence-based tool to assess the best match between person and technology, where the user remains the main actor in all the selection, adaptation, and assignment process (user-driven model). The MPT model and tools assume that the characteristics of the person, environment, and technology should be considered as interacting when selecting the most appropriate AT for a particular person’s use. It has demonstrated good qualitative and quantitative psychometric properties for measuring UX, realization of benefit and satisfaction and, therefore, it is a useful resource to help prevent the needs and preferences of the users from being met and can reduce early technology abandonment and the consequent waste of money and energy.
... Reliability: The Cronbach alpha reliability we obtained is 0.84, 95% CI (0.807, 0.870), which is lower than that of the original English version (0.92) but does exceed than the lower threshold of 0.7. This result is basically in line with other translated versions (Alghannam et al., 2018;Blažica & Lewis, 2015;Borsci, Federici, Mele, & Conti, 2015). ...
Article
Full-text available
The Chinese version of the system usability scale (SUS) was re-translated in this study by the addition of an interview process plus the modification and selection of strict translation results. The revised translation is in close accordance with the linguistic usage of Chinese native speakers without any ambiguity. The revised Chinese version of the psychometric measurement is shown to be reliable, effective, and sensitive. We also conducted a comparative study within one group to confirm that the reliability of the cross-cultural adaptation version is higher than that of the original version. The questionnaire provides a tested tool for Chinese language users to help practitioners complete usability assessments.
... The protocol recommends using at least one of three usability assessment questionnaires: (i) the SUS [5,6], the Us.E. 2.0 questionnaire [7], and the Usability Metric for User Experience, lite version (UMUX-LITE) [8][9][10]. The second part of eGLU 2.0 involves several in-depth analyses of and extensions to the basic procedure. ...
Chapter
Full-text available
Since 2012, usability testing in Italian public administration (PA) has been guided by the eGLU 2.1 technical protocols, which provide a set of principles and procedures to support specialized usability assessments in a controlled and predictable way. This paper describes a new support tool for usability testing that aims to facilitate the application of eGLU 2.1 and the design of its User eXperience (UX) evaluation methodology. The usability evaluation tool described in this paper is called UTAssistant (Usability Tool Assistant). UTAssistant has been entirely developed as a Web platform, supporting evaluators in designing usability tests, analyzing the data gathered during the test and aiding Web users step-by-step to complete the tasks required by an evaluator. It also provides a library of questionnaires to be administered to Web users at the end of the usability test. The UX evaluation methodology adopted to assess the UTAssistant platform uses both standard and new bio-behavioral evaluation methods. From a technological point of view, UTAssistant is an important step forward in the assessment of Web services in PA, fostering a standardized procedure for usability testing without requiring dedicated devices, unlike existing software and platforms for usability testing.
Article
Full-text available
A new Usability Metric for User Experience (UMUX) is translated and validated for native Chinese speakers in this study. The forward-backward translation method is applied to translate the UMUX. The results are optimized through structure-back interviews to obtain a final, unambiguous UMUX version. The Chinese version of the UMUX questionnaire is proven to have high reliability, sensitivity, and effectiveness. The correspondence between the Chinese UMUX version, UMUX-LITE, and SUS is also investigated. The results of this work may provide Chinese usability practitioners with a standardized scale after rigorous testing, as well as a closer understanding of the relationship between the three questionnaires in assisting users to make sound choices.
Article
A new Usability Metric for User Experience (UMUX) is translated and validated for native Chinese speakers in this study. The forward-backward translation method is applied to translate the UMUX. The results are optimized through structure-back interviews to obtain a final, unambiguous UMUX version. The Chinese version of the UMUX questionnaire is proven to have high reliability, sensitivity, and effectiveness. The correspondence between the Chinese UMUX version, UMUX-LITE, and SUS is also investigated. The results of this work may provide Chinese usability practitioners with a standardized scale after rigorous testing, as well as a closer understanding of the relationship between the three questionnaires in assisting users to make sound choices.
Chapter
The advancement and diffusion of Web technology forced the availability of information with quality and easily accessible. Such requirements have actively engaged Human-Computer Interaction (HCI) research community to ensure that IT interfaces can be used on an equal basis by users with disabilities and older users. It is therefore essential that the interface is easy to use and that it meets the expectations and needs of all users. The advancement and diffusion of technology, particularly the Internet, requires providing quality and easily accessible information. These requirements confirm the relevance of the role of the interface as the main element of user interaction with information systems. The development of interfaces that satisfy users with different needs, use their motor and perceptual, cultural and social skills is not a simple task. According to several experts and pioneers in HCI, interfaces must be built respecting the principles of user-centric design, using a high level of use and in compliance with the guidelines of basic accessibility so that all the aspects of a user's experience (UX) in such environments are considered and covered. We realize that interfaces are still an important research, worth exploring for its potential to create accessible personalized interfaces. That's the aim of this work when proposing a methodology for assessing dynamic websites accessibility which can be applied throughout or in the end of the development phase. This evaluation is based on an open document made available by the World Wide Web Consortium (W3C) regarding accessibility guidelines, a standard to ensure the long-term growth of the Web through the Web Accessibility Initiative (WAI).
Conference Paper
Full-text available
El presente trabajo tiene como objetivo valorar el grado de satisfacción de los estudiantes de la asignatura de ‘Contenidos 1’ del Máster de Formación del Profesorado de la UIB en la iniciación a la aplicación Pearltrees. Se trata de una investigación de corte cuantitativo descriptivo. La técnica elegida para la recogida de datos es una encuesta basada en el cuestionario de usabilidad, System Usability Scale (SUS), Brooke (1996). Este cuestionario permite valorar el grado de satisfacción del usuario a partir de la ISO 9241-11. La muestra, no probabilística, es de 19 estudiantes, sobre una población de 25 matriculados. Los resultados obtenidos destacan que la puntuación media general es de 78,81puntos, que se corresponde a un grado de satisfacción notable ‘B+’ (Lewis y Sauro, 2016). En cuanto a la valoración a partir de la variable género, los resultados indican que el grado de satisfacción, también se encuentra dentro de la valoración ‘notable-alto’ (hombres 84,58, valoración de ‘A+’ y mujeres 76,15, valoración de ‘B+’). La incorporación de la aplicación Pearltrees como herramienta de gestión y curación de contenidos es valorada de forma positiva por los estudiantes, además de ser una aplicación útil tanto a nivel personal como para la incorporación en sus funciones docentes.
Preprint
Full-text available
El presente trabajo tiene como objetivo valorar el grado de satisfacción de los estudiantes de la asignatura de 'Contenidos 1' del Máster de Formación del Profesorado de la UIB en la iniciación a la aplicación Pearltrees. Se trata de una investigación de corte cuantitativo descriptivo. La técnica elegida para la recogida de datos es un cuestionario de usabilidad, tipo 'post-study', System Usability Scale (SUS), Brooke (1996). Este cuestionario permite valorar el grado de satisfacción del usuario a partir de la ISO 9241-11. La muestra, no probabilística, es de 19 estudiantes, sobre una población de 25 matriculados. Los resultados obtenidos destacan que la puntuación media general es de 78,81puntos, que se corresponde a un grado de satisfacción notable 'B+' (Lewis y Sauro, 2016). En cuanto a la valoración a partir de la variable género, los resultados indican que el grado de satisfacción, también se encuentra dentro de la valoración 'notable-alto' (hombres 84,58, valoración de 'A+' y mujeres 76,15, valoración de 'B+'). La incorporación de la aplicación Pearltrees como herramienta de gestión y curación de contenidos es valorada de forma positiva por los estudiantes, además de ser una aplicación útil tanto a nivel personal como para la incorporación en sus clases.
Article
Full-text available
Longitudinal studies have to do with testing over time and thus take into consideration previous user experience with a product or product versions. However, it is difficult to conduct these types of studies. Therefore the literature is sparse on examples of the explicit effect of user experience on user satisfaction metrics in industry-standard survey instruments. During a development experience in 2009, we used a cross-sectional method to look at the effects of user profiles on ratings for commercial products that use one such instrument, the System Usability Scale or SUS. Recent research has reported finding that differences in user ratings could be based on the extent of a user's prior experience with the computer system, a Web site being visited or a desktop application like Microsoft's Office suite being used. Compared to off-the-shelf office products or personal Web applications, we were curious if we would find the same experience effect for domain specialists using geosciences products in the course of their daily professional job roles. In fact, from data collected with 262 end users across different geographic locations testing two related oilfield product releases, one Web-based and one desktop-based, we found results that were quite close to early assessment studies: Users having a more extensive experience with a product tended to provide higher, more favorable, SUS scores over users with either no or limited experience with a product—and by as much as 15-16%, regardless of the domain product type. This and other observations found during our product testing have led us to offer some practical how-to's to our internal product analysts responsible for managing product test cycles, administering instruments like the SUS to users, and reporting results to development teams.
Conference Paper
Full-text available
In this paper we present the UMUX-LITE, a two-item questionnaire based on the Usability Metric for User Experience (UMUX) [6]. The UMUX-LITE items are This system's capabilities meet my requirements and This system is easy to use." Data from two independent surveys demonstrated adequate psychometric quality of the questionnaire. Estimates of reliability were .82 and .83 -- excellent for a two-item instrument. Concurrent validity was also high, with significant correlation with the SUS (.81, .81) and with likelihood-to-recommend (LTR) scores (.74, .73). The scores were sensitive to respondents' frequency-of-use. UMUX-LITE score means were slightly lower than those for the SUS, but easily adjusted using linear regression to match the SUS scores. Due to its parsimony (two items), reliability, validity, structural basis (usefulness and usability) and, after applying the corrective regression formula, its correspondence to SUS scores, the UMUX-LITE appears to be a promising alternative to the SUS when it is not desirable to use a 10-item instrument.
Book
Full-text available
The book we propose is not only a classic handbook or a practical guide for evaluation practictioners that presents and discusses one or a set of evaluation techniques for assessing diferent aspects of interaction. Our proposal is at first a new theoretical perspective in the human computer interaction evaluation that aims to integrate, in a multisteps evaluation process, more techniques for obtaining a whole assessment of interaction. Our theorical perspective is supported by an historical and experimental argumentation. Secondary our book by merging a user center perspective with the idea of user experience and with the growing need of disabled users partecipation in the evalaution and in the improvment of the HCI, proposes a reconceptualization of the web, social and portable tecnologies in a new category the “psychotecnologies” with specific properties. The integrated methodology of intercation evalaution is proposed as a framework for practictioners in order to evaluate all the aspects of the interaction from the accessibility (i.e. the more obejective point of view) to the staisfaction (i.e. the most subjective poitn of view). The evalaution techniques we analyse and the evaluation tools we propose in the book are supported by experimental exemplifications and are correlated to their application in the integrated methodology. Our goal is not only to presents the correct application of the techniques, but also to promote a standard evaluation process in which disabled and not disabled peoples are involved in the assessment.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Article
The Usability Metric for User Experience (UMUX) is a four-item Likert scale aimed at replicating the psychometric properties of the System Usability Scale (SUS) in a more compact form. As part of a special issue of the journal Interacting with Computers, the UMUX is being examined in terms of purpose, reliability, validity and structure. This response to commentaries addresses concerns with these issues through updated archival research, deeper analysis on the original data and some updated results with an average-scoring system. The new results show the UMUX performs as expected for a wide range of systems and consists of one underlying usability factor.
Article
The philosopher of science J. W. Grove (1989) once wrote, “There is, of course, nothing strange or scandalous about divisions of opinion among scientists. This is a condition for scientific progress” (p. 133). Over the past 30 years, usability, both as a practice and as an emerging science, has had its share of controversies. It has inherited some from its early roots in experimental psychology, measurement, and statistics. Others have emerged as the field of usability has matured and extended into user-centered design and user experience. In many ways, a field of inquiry is shaped by its controversies. This article reviews some of the persistent controversies in the field of usability, starting with their history, then assessing their current status from the perspective of a pragmatic practitioner. Put another way: Over the past three decades, what are some of the key lessons we have learned, and what remains to be learned? Some of the key lessons learned are:• When discussing usability, it is important to distinguish between the goals and practices of summative and formative usability.• There is compelling rational and empirical support for the practice of iterative formative usability testing—it appears to be effective in improving both objective and perceived usability.• When conducting usability studies, practitioners should use one of the currently available standardized usability questionnaires.• Because “magic number” rules of thumb for sample size requirements for usability tests are optimal only under very specific conditions, practitioners should use the tools that are available to guide sample size estimation rather than relying on “magic numbers.”
Article
This paper characterizes the usability of 14 common, everyday products using the System Usability Scale (SUS). Over 1,000 users were queried about the usability of these products using an online survey methodology. The study employed two novel applications of the SUS. First, participants were not asked to perform specific tasks on these products before rating their usability, but were rather asked to assess usability based on their overall integrated experience with a given product. Second, some of the evaluated products were assessed as a class of products (e.g. ‘microwaves’) rather than a specific make and model, as is typically done. The results show clear distinctions among different products and will provide practitioners and researchers with important known benchmarks as they seek to characterize and describe results from their own usability studies.