ArticlePDF Available

Chinese System Usability Scale: Translation, Revision, Psychological Measurement

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:

Abstract and Figures

The Chinese version of the system usability scale (SUS) was re-translated in this study by the addition of an interview process plus the modification and selection of strict translation results. The revised translation is in close accordance with the linguistic usage of Chinese native speakers without any ambiguity. The revised Chinese version of the psychometric measurement is shown to be reliable, effective, and sensitive. We also conducted a comparative study within one group to confirm that the reliability of the cross-cultural adaptation version is higher than that of the original version. The questionnaire provides a tested tool for Chinese language users to help practitioners complete usability assessments.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=hihc20
International Journal of Human–Computer Interaction
ISSN: 1044-7318 (Print) 1532-7590 (Online) Journal homepage: https://www.tandfonline.com/loi/hihc20
Chinese System Usability Scale: Translation,
Revision, Psychological Measurement
Yuhui Wang, Tian Lei & Xinxiong Liu
To cite this article: Yuhui Wang, Tian Lei & Xinxiong Liu (2019): Chinese System Usability Scale:
Translation, Revision, Psychological Measurement, International Journal of Human–Computer
Interaction, DOI: 10.1080/10447318.2019.1700644
To link to this article: https://doi.org/10.1080/10447318.2019.1700644
Published online: 10 Dec 2019.
Submit your article to this journal
Article views: 75
View related articles
View Crossmark data
Chinese System Usability Scale: Translation, Revision, Psychological Measurement
Yuhui Wang
a
, Tian Lei
b
, and Xinxiong Liu
a
a
Industrial Design Department, Huazhong University of Science and Technology, Wuhan, China;
b
School of Mechanical Science and Engineering,
Huazhong University of Science and Technology, Wuhan, China
ABSTRACT
The Chinese version of the system usability scale (SUS) was re-translated in this study by the addition of
an interview process plus the modification and selection of strict translation results. The revised
translation is in close accordance with the linguistic usage of Chinese native speakers without any
ambiguity. The revised Chinese version of the psychometric measurement is shown to be reliable,
effective, and sensitive. We also conducted a comparative study within one group to confirm that the
reliability of the cross-cultural adaptation version is higher than that of the original version. The
questionnaire provides a tested tool for Chinese language users to help practitioners complete
usability assessments.
1. Introduction
Usability research may involve inherent, performance-based, or
perceived assessments. Perceived usability is the users direct,
self-reported intra-task coherence, efficiency, organization,
user-friendliness, and immediacy of a given system (Mcgee,
Rich, & Dumas, 2004). Perceived usability is tested to judge
the users intuitive experience of the system (Park, Han, Kim,
Cho, & Park, 2013; Tullis & Albert, 2008; Vermeeren et al.,
2010). This is very important, because users are more likely to
recommend products to others based on intuitive feelings.
Higher perceived usability can enhance customer loyalty
(Flavián, Guinalíu, & Gurrea, 2006;Sauro,2010).
Measures of perceived usability typically center on subjective
questionnaires (Yang, Linder, & Bolchini, 2012). Usability infor-
mation and ratings can be obtained through questionnaire
scores. Questionnaires used for this purpose include USE,
QUIS, UMUX, PUTQ, SUS, etc.; SUS is the most commonly
used among them (Assila, De Oliveira, & Ezzedine, 2016;Lewis,
2018b). SUS contains only 10 items which can be quickly eval-
uated (Brooke, 1996). Versions of SUS have been widely utilized
to test perceived usability (Brooke, 2013;Lewis,2018b;Sauro&
Lewis, 2009); the questionnaire also provides high reliability
(0.91) and sensitivity (Blažica & Lewis, 2015), is free to use,
and yields good test results with small samples (Tullis &
Stetson, 2004). Everyday products (Kortum & Bangor, 2013),
all manner ofsystems (Konstantina, Nikolaos, & Christos, 2015),
mobile applications (Kortum & Sorber, 2015), and web sites
(Flavián et al., 2006) have been tested for usability by SUS.
SUS is commonly deployed for usability evaluations in China,
from the evaluation of medical equipment (Yan, Wang, Liu, &
Wu, 2012) to health monitoring systems (Jia et al., 2013)toATM
interfaces (Wang & Lv, 2017) and in-vehicle information
systems (Li, Chen, Sha, & Lu, 2017). In these studies, the SUS
was either directly translated or distributed to participants with-
out psychometric evaluation. Scales that are not tested or eval-
uated by psychological measures may lead to inaccurate
measurements. A cross-cultural questionnaire translation may
also impact the results (Finstad, 2006). The SUS version cur-
rently used in mainland China is based on a published Chinese
version of Quantifying the User Experience(Sauro & Lewis,
2014). In many unpublished studies and interviews with IT
companies in China, this version was shown to reflect certain
items poorly and represent notable cross-cultural reading differ-
ences. In this study, we re-translated the Chinese SUS version to
improve its adaptability and localization.
2. Related work
2.1. Previous SUS translations
SUS, the most widely used tool for measuring perceived usabil-
ity, should not be limited to those who are fluent in English
(Lewis, 2018b). The original SUS may not be suitable in multi-
cultural environments as non-native Englishspeakers may inter-
pret it differently (Finstad, 2006). To allow for wider usage
without sacrificing consistent evaluation results, translation of
the questionnaire into local languages is essential. The question-
naire can be roughly translated, but the translation must be
accompanied by psychological measurement and localization
to ensure it is properly understood. The SUS has already been
strictly translated into Arabic (Alghannam, Albustan, Al-
Hassan, & Albustan, 2018), Slovene (Blažica & Lewis, 2015),
Polish (Borkowska & Jach, 2017), Portuguese (Martinsa, Rosa,
Queirós, & Silva, 2015), Italian (Borsci, Federici, Bacci, Gnaldi, &
CONTACT Yuhui Wang 346235912@qq.com Room 413, East Building 1, Industrial Design Department, Huazhong University of Science and Technology,
Wuhan, Hubei 430074, China.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/hihc.
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION
https://doi.org/10.1080/10447318.2019.1700644
© 2019 Taylor & Francis Group, LLC
Bartolucci, 2015), Malay (Mohamad, Yaacob, & Yaacob, 2018),
and other languages. These translations extend SUS to non-
English-speaking users, and do include effective psychological
measurement techniques.
Compared to the original English version of SUS (Table 1),
certain items are typically modified to accommodate respon-
ders from different cultural backgrounds. In existing SUS
translations, it is common to use words interchangeably with-
out changing their meanings. The word cumbersomein
Item 8, for example, can cause confusion in certain translations
and may be better replaced by awkward(Bangor, Kortum, &
Miller, 2008; Finstad, 2006). In the Polish translation
(Borkowska & Jach, 2017), awkwardis replaced with incon-
venient. In the Arabic translation (Alghannam et al., 2018), it
is replaced with strange. In one Chinese version (Sheu, Fu, &
Shih, 2017), the translation is a little troublesome to use.
Item 6 does not readily translate into multiple languages, either
(Borkowska & Jach, 2017).
Psychological measurement questionnaires should gener-
ally include reliability, construct validity, and sensitivity con-
siderations (Sauro & Lewis, 2011). Translations must ensure
appropriate psychological measures. The Cronbach alpha
coefficient of existing translations is generally above 0.8,
which is higher than the lower limit of 0.7 and usually lower
than the English version. Construct validity also must be
tested to determine the consistency of the questionnaire pro-
ject infrastructure. There are two potential factors, ease of
learning and availability, which can be measured for this
purpose; however, the current versions of SUS are not strictly
consistent in this regard, i.e., certain items may be biased.
Sensitivitymainly refers to the appropriate response to
a questionnaire when participantsusage frequency or amount
of use with the system has changed (Alghannam et al., 2018;
Blažica & Lewis, 2015). It is usually assessed via t-/F-test
(Alghannam et al., 2018; Bangor et al., 2008). If the signifi-
cance level is below the critical value of 0.05, then the ques-
tionnaire is sensitive. The translated version of the existing
SUS is generally sensitive (Lewis, 2018b). Tests of concurrent
validity have consistently shown that translated versions of
the SUS adequately correlate (r > 0.30) with measures such as
likelihood-to-recommend, CSUQ, PSSUQ, and UMUX-LITE
(Lewis, 2018b).
Although the factor structure of SUS was once considered to
have a two-factor structure of LearnabilityandUsability
(Borsci, Federici, & Lauriola, 2009), more recent scholars have
not replicated this finding (Lewis, 2018b). Lewis and Sauro
(2017b) recommended SUS as a dimension of perceived usabil-
ity after completing a large sample analysis.
To sum up, although it is difficult to translate some items
in the process of SUS localization, psychological measure-
ments have proven that the translated versions of various
languages are reliable and suitable for the native language
users. The confidence of the Chinese translation is acceptable.
2.2. Necessity for Chinese re-translation
Usability research began later in China than in its interna-
tional counterparts, but has grown very rapidly in recent years
(Lei, Xu, Meng, Zhang, & Gong, 2014; Wang, 2003) and
continues to produce meaningful results (Gao et al., 2013;Li
& Li, 2011; Liu, 2014; Liu, Zhang, Zhang, & Chen, 2011). SUS,
as discussed above, is the most widely used usability evalua-
tion questionnaire. Though SUS has been directly translated
into Chinese, there is yet a lack of accompanying psychologi-
cal measurements and sizable obstacles to understanding
remain. Users of different cultures may evaluate usability
differently (Herman, 1996; Noiwan & Norcio, 2006; Rajanen
et al., 2017). Several unpublished studies contain issues with
the understanding of items on the current Chinese SUS ver-
sion; in other words, the questionnaire does not apply without
modifications to localize it. Such modifications are lacking
from several key versions of the questionnaire that have
been directly translated (Sauro & Lewis, 2014;
Sheu et al., 2017; Tullis & Albert, 2008).
User testing is necessary after forward-back translation to
judge questionnaire efficacy, as translation experts are usually
not users. This link is very important, as it can reveal hid-
denissues with the translation, help participants better
understand the questionnaire, and provide an important
reference for subsequent revision of the translated version.
Sharfina and Santoso (2017) also pointed out that the literal
translation of an original version is insufficient without
further cultural adjustments. Blažica and Lewis (2015),
Sharfina and Santoso (2017), and Mohamad et al. (2018)
reached similar conclusions. The Chinese SUS version cur-
rently in use is a temporary translation that has not been
localized. Certain items are open to misunderstanding even
after the forward-back translation is complete.
Table 1. Forward translation results; item 8 contains two translations.
The original version of SUS Forward translation results
1. I think that I would like to use this system frequently. 1 使.
2. I found the system unnecessarily complex. 2. .
3. I thought the system was easy to use. 3. 使.
4. I think that I would need the support of a technical person to be able to use this system. 4. 使.
5. I found the various functions in this system were well integrated. 5. .
6. I thought there was too much inconsistency in this system. 6. 不一.
7. I would imagine that most people would learn to use this system very quickly. 7. 大多很快会使.
8. I found the system very awkward to use. 8. 使.
使.
9. I felt very confident using the system. 9. 使,.
10. I needed to learn a lot of things before I could get going with this system. 10. 使,.
2Y. WANG ET AL.
2.3. Cross-cultural adaptation and methods
Cultural differences can affect the accuracy of instrument
translation. We cannot assume that a particular concept has
the same relevance across various cultures. Intercultural trans-
lation questionnaires are rife with disparity among spoken
phrases, word clarity, and word meaning. Simple, verbatim
translation does not sufficiently reflect cultural and linguistic
differences (Hilton & Skrutkowski, 2002).
Construct bias, method bias, and item bias are known to
be caused by cultural differences in cross-cultural research
on various instruments. Among them, construct bias and
item bias are the factors most significantly affecting instru-
ment translation quality (Van de vijver & Leung, 1997;Van
de vijver & Poortinga, 1997). Construct biasrefers to the
inconsistency of a groups internal construct of a concept
(Van de Vijver & Rothmann, 2004). The emergence of such
bias may be related to the different cultural backgrounds of
different groups, but the current research on SUS transla-
tion does not account for this. Lewis and Sauro (2017b)
studied 9156 samples and posited only one factor structure
for SUS, but did not report whether the samples were from
English-speaking countries or countries with other cultural
backgrounds.
Because perceived usability testing has some commonly
used specifications and testing methods (Lewis, 2006; Rubin
& Chisnell, 2008), and the measurement objects do not
change due to cross-cultural differences, there is almost no
methodological bias in SUS. Item biasrefers to the fact that
individuals of various cultural groups do not score the same
single item in the same way (Shepard, Camilli, & Averill,
1981). The occurrence of item deviation is closely related to
a certain culture or cultural specifics in the content of the
unfamiliar item (Vijver & Tanzer, 2004). For example, Finstad
(2006) found that non-native English speakers do not readily
understand cumbersomebut are more familiar with
awkward.
In the process of actual translation, certain instruments
must be cross-culturally adapted to mitigate the bias caused
by cultural differences. Ideally, a preliminary qualitative study
with participants from the target culture will be completed.
The item needs to be understood in parallel terminology in
the target language and the phrasing must be fully clear
(Hilton & Skrutkowski, 2002). In the process of translating
the CSUQ into Turkish (Erdinç & Lewis, 2013), the translator
assessed the semantics, idioms, and concepts of the items
while altering the expression of certain verbs and forms.
They modified a total of 14 items. In a translation of the
PSSUQ into Portuguese (Rosa et al., 2015), the translator
also carried out a cultural adaptation process to ensure the
Portuguese and English versions were equal in semantics and
content.
Pretests or interviews with specific groups can also be useful
in terms of translation accuracy (Dianat, Ghanbari, &
Asgharijafarabadi, 2014; Sharfina & Santoso, 2017). Other ques-
tionnaires have been treated similarly in addition to usability
assessment questionnaires. For example, in the process of trans-
lating the DASH 17+ questionnaire (Cardoso & Capellini, 2018),
the translators altered two options, completed a pretest, asked
the students to report their understanding of the questionnaire,
and asked the participants themselves to help modify the
sentences.
Pretests or interviews can also be used to optimize the
expert translations of other scales to Chinese versions. Chen,
Hao, Feng, Zhang, and Huang (2011) selected 20 families for
apretest on expert translations of the PEDSQL survey; the
participants gave feedback including suggestions to revise the
scale translations. Pei, Xia, and Yan (2010) completed
a pretest via short interview with 15 participants in translating
FABQ to Chinese. In the process of translating the Hagos
score (Cao et al., 2018), 20 participants also completed
a pretest on versions translated by experts. After absorbing
their feedback, they formed the final Chinese version.
In these studies, pretesting was completed through inter-
views with user groups to optimize expert translations. The
purpose of this was to eliminate or reduce the bias caused by
different levels of knowledge between the measured group
and the translation team. However, in various language
versions of SUS apart from the Indonesian and Persian
versions no other team has completed a pretest in their
translation process.
In this study, similar to the published version, we also
adopted a forward-back SUS translation process. To better
suit the Chinese native speaking participants. We added
a structured interview; participants were asked to report
the obstacles they encountered in reading the translated
version and we adjusted the translation according to their
feedback. After the back-interview, the participants finally
choseatranslationresultthatwasbuiltintotheformal
version of the translation.
3. Translation methodology design
The forward-back translation method we used in this study
was also used by Alghannam et al. (2018), Blažica and Lewis
(2015), Borkowska and Jach (2017), and Sharfina and
Santoso (2017).
3.1. Forward translation
We used forward translation to put the original English SUS
into Chinese. We adopted the Finstad (2006) original version
(for non-native speakers), wherein Item 8 translates to awk-
ward. The forward translators are five native Chinese speak-
ers, all of whom have passed TEM-8 (the highest-level English
language test given in mainland China). Each person translated
the questionnaire independently, then discussed and optimized
their work with Chinese language Ph.D. holders. If the major-
ity of the translators provided the same results, their wording
was used directly. If there was a substantial difference, the two
majority translations were retained (Table 1).
After confirmation by Chinese language experts, we found
no difference between Item 8 in the original versus two
published Chinese versions (Sauro & Lewis, 2014; Sheu
et al., 2017). The verbs (think) and (believe) share
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION 3
similar meaning in Chinese, and the translation of certain
items with different word orders are the same for example,
I think most people can use the system quickly by learning
and I think most users can learn to use the system quickly
have the same meaning.
We found differences among the forward translations for
Items 5, 6, and 8. On Item 6, three Chinese translators
suggested inconsistencywhile two suggested
(contradiction) and (instability). The published
version uses inconsistency. On Item 5, unlike the published
version, all translators suggested rather than .
Their back translations were basically the same, but in
Chinese, the expression is more formal (and accurate)
while the expression is more colloquial.
Item8provedverytrickytotranslate.Threeversionsuse
the phrase (awkward)andtwouse(very
difficult). Another published version uses (trouble)
and yet another troublesome.T
hisconfirmsaprevious
assertion (Bangor et al., 2008;Finstad,2006) that there are
notable differences between non-native English speakers for
this item translation. Three of the translators suggested the
word ,whichmeansstrangeor obstructedin
Chinese.
3.2. Back translation
Most of our translators suggested that Item 6 retain the phras-
ing不一in the back-translated version and that Item 5 use
. Item 8 was translated with two words, and
, so we provided both back translations to another eight
independent translators (also holding TEM-8 certification)
whom we divided into two groups. The two groups returned
basically the same results, again with the exception of Item 8. In
the first group, all four translators suggested awkward.In
the second group, none used the word awkwardat all. This
marks an important departure from previous studies on the
Chinese SUS translation, as this Chinese word has never
appeared in any published version. The preliminary translation
we established after this process is shown in Table 2.
3.3. Structured interviews
The forward translation reported here was obtained by trans-
lation experts. It remains unclear whether Chinese native
speakers can understand the meaning of the items or whether
there may be ambiguity in the translation. We conducted
preliminary testing similarly to Finstad (2006) and Sharfina
and Santoso (2017), but in a slightly different two-part
process. The first part was a structured interview regarding
the translated results wherein a moderator asked the partici-
pants to freely report their level of understanding of each
item, including any ambiguous items. In the second part, we
modified the questionnaire according to the opinions pro-
vided by the participants in the first part. We listed several
options for any ambiguous item and asked the participants to
select the one they felt was most appropriate.
Participants: We selected 31 participants including profes-
sors and graduate students of the School of Industrial Design
and from local IT companies. The participants had completed
multiple perceived usability assessments prior to this study in
UMUX, USE, and SUS (directly translated Chinese version)
formats. All were aged 2046 years at the time of the study
and all are native Chinese speakers.
Research methods and measures: The purpose of the
survey was to determine how well native Chinese speakers
understand the preliminary SUS translation described above.
Participants were required to report whether they understood
each item and to explain any items they did not understand.
Procedures: Moderators distributed the preliminary
Chinese SUS translation to the participants, explaining
that it is a newly translated questionnaire to evaluate per-
ceived usability for native Chinese speakers. The moderator
asked each participant to read each item aloud before he or
she provided a response. Responses included understand-
ing, incomprehension, or ambiguity. Participant responses
were recorded when they contained any self-reported
misunderstanding.
Somewhat surprisingly, almost all the participants stated that
they did not understand or had doubts regarding the translation
of Item 6. About half of the participants stated that Item 9
contained ambiguity. The results are shown in Table 3.
Chinese language experts suggested that the participants did
not understand the directivity of the phrase inconsistency.
On Item 9, the process of building confidentwas unclear to
the participants; the original English SUS version does not
explain this phrasing. The moderator asked the participants to
report further opinions on ambiguous items including the
number of phrases (i.e., content) that they found problematic.
The frequency of problem-content occurrence on Items 6 and 9
is shown in Table 4.
3.4. Modification of preliminary translation according to
interview results
We considered the correlation between SUS items (Bangor
et al., 2008) as discussed in earlier studies (Sauro & Lewis,
2011). Based on a large sample, Lewis and Sauro (2017b)
found that SUS is most suitable as a single dimension.
Although the factor loadings of Items 4 and 10 are larger
than the factor loads of other items, so far there is no evidence
that they belong to a second factor. Since there are no esthetic
Table 2. Preliminary Chinese SUS translation.
1使
2
3使
4使
5
6不一
7大多很快会使
8使
9使,
10 使,
Table 3. Participant-reported frequency statistics (items not understood or
ambiguous).
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10
0 00012601150
4Y. WANG ET AL.
or visual factors encapsulated in all items, we abandoned any
inconsistenciesin esthetics and interface style in the pre-
sentation of Item 6 accordingly. According to the statistical
results, we revised Items 6 and 9, as listed in Table 5.
3.5. Back-interview
Twenty-seven of our 31 participants followed up within
one day. They were first asked to read all the alternative
options and compare them with the preliminary translation
results, then to report if they found any ambiguity. The
participants then reported the expression they thought
would best replace Items 6 and 9. Twenty-five of the partici-
pants reported no ambiguity, while the other two reported
misunderstandings; these were attributable to above-average
reading speed, so their results were eliminated after a simple
explanation. The frequency results are shown in Table 6.
We built the options with the highest selection fre-
quency into the final Chinese SUS translation. This final
version is shown alongside the original English version in
Table 7.
4. Experiment I
4.1. Participants
We recruited 217 native Chinese-speaking participants, 116
male and 101 female, via crowdsourcing platform. The parti-
cipants ranged in age from 19 to 42 at the time of their
participation. We added educational background restrictions
(Bachelors degree or above) to ensure a consistent base-level
understanding of Chinese.
4.2. Experimental design
We used assessment methods provided by Kortum and
Bangor (2013) and Blažica and Lewis (2015). We also used
a daily product for online usability testing.
4.3. Method
The questionnaire was completed online. Once the participant
was recruited, he or she scanned a QR code in WeChat and
was directed to the test page. The participants then filled in
the questionnaire online according to the given instructions.
4.4. Materials
The participants evaluated the usability of the Jing Dong (JD)
mobile phone application (app). The mobile JD app, a shopping
platform widely used in China, had a monthly active volume of
about 500 million in 2019. In mainland China, the Taobao
shopping app has the most users of any similar application at
1.01 billion. We selected the JD app as the evaluation object to
prevent ceiling effect (e.g., where the participants may be very
Table 4. Frequency of problem-content occurrence and examples of item 6 and item 9.
Participants opinion Frequency Example
Item 6
Visual expression of interface 8 I think what I see may not be what I want to see
Esthetic feeling 10 The interface is not nice. For example, I think the button is this function, but actually it is function
Function/Expected functions 19 Maybe my expected function is different from the actual result
it function doesnt seem to work the way I thought it would .
Expected results 21 I think its the result I want and is different.
What I expected was different from what I actually saw
Operation/Operating results 20 When I clicked on I saw something
I think I have seen the results of the operation is not I wanted
Item 9
Operation results 6 I think its the satisfaction of the operation that creates confidence
Master the system/use it skillfully 13 I think its the satisfaction of the operation that gives confidence
Only when I can use it skillfully can I have confidence
I feel confident in my ability to master it
Achieve a specific goal 14 Maybe I have completed a task before I feel confident
Proficiency 17 I felt confident after I became very proficient with the system .
Table 5. Participant-informed alternative options for item 6 and item 9.
Item 6
1 I think theres a lot of inconsistency in operates and functions.
2 I think there are a lot of inconsistencies in the various functions of the system
3 I think there are a lot of inconsistencies in the operation behavior and
functions of this system
4 In using the system, I found that many of the operating results were
inconsistent with the expected functions.
5 When I use this system, the result of operation is inconsistent with the
function of the system
6 When I used this system, results of many operations produced inconsistent
functions
Item 9
1 When I use this system, I feel confident about the results.
2 I felt very confident that I can master the system.
3 When I use this system, I feel confident that I can achieve every goal
Table 6. Frequency of alternative options for items 6 and 9.
No. 1 No. 2 No. 3 No. 4 No. 5 No. 6
Item 6
2 13135 3
Item 9
7 15 5 NA NA NA
Table 7. Final Chinese SUS version.
1愿意使.
2太复.
3使.
4使.
5.
6使,不一.
7大多很快会使.
8使.
9掌握.
10 使,.
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION 5
familiar with mobile Taobao). The homepage of the JD app is
shown in Figure 1.
We added a survey item to the questionnaire, the usage
frequency of the JD app (general statistics expressed as monthly
rate of activity), to prevent interference with the first item. We
designed two options and added explanations: 1) almost no use
or little use over the past two months and 2) frequent use over
nearly one month or multiple uses daily. Items were scored on
a five-point scale. The statement This test is not assessment
your own abilities, but user feedbackwas displayed prior to
testing to assure participants that they were not think their
personal capabilities were under test. We replaced the phrase
systemwith JD APPon the questionnaire as well.
We added a UMUX-LITE questionnaire (Lewis, Utesch, &
Maher, 2013) after the SUS was completed to determine the
concurrent validity of our translation. According to previous
researchers (Borsci et al., 2015), the correlation between
UMUX-LITE and SUS is as high as 0.81. UMUX-LITE is
brief and easy to translate. The translated version (by expert)
is almost identical to the literal translation provided by Sauro
and Lewis (2014). Although we did not report the translation
process, after pretesting, no participant reported any ques-
tions about any item.
4.5. Results
Basic statistics and normative comparison:T
hemeanvalue
of all 217 participants was 72.77 (n = 217, SD = 11.82), which
is in line with results reported by Sauro and Lewis (2011).
AccordingtotheCGS(Sauro&Lewis,2014), the corre-
sponding level is B- (72.674) and the 95% confidence inter-
val is C+ (71.19) to B (74.35). Our result is also consistent
with previous work by Kortum and Sorber (2015). It falls
below the average for popular apps, 76.1, but is in line with
our predictions; this is because the JD app is not the most
popular Chinese online shopping platform.
The UMUX-LITE mean value of all 217 participants was
72.45 (n = 217, SD = 10.18), 95% CI (71.00, 73.82).
Infrequently using participants scored n (53) = 63.27,
SD = 11.47, 95% CI (60.20, 66.23) and frequently using
participants scored n (164) = 75.42, SD = 7.69, 95% CI
(74.20, 76.51).
Sensitivity: The Chinese SUS version is sensitive to differ-
ences in frequency of use. This also means that participants
with different usage frequency rated the app differently in our
case. The score of infrequently using participants is 63.87
(n = 53, SD = 13.33), 95% CI (60.19, 67.54) and that of
frequently using participants is 75.64 (n = 164, SD = 9.71),
95% CI (74.14, 77.14) The Two-sample independent group
t-test shown the difference between the two is statistically
significant (t (215) = 6.965, p< .01).
Reliability: The Cronbach alpha reliability we obtained is
0.84, 95% CI (0.807, 0.870), which is lower than that of the
original English version (0.92) but does exceed than the lower
threshold of 0.7. This result is basically in line with other
translated versions (Alghannam et al., 2018; Blažica & Lewis,
2015; Borsci, Federici, Mele, & Conti, 2015).
Concurrent validity: The overall correlation between UMUX-
LITE and SUS score was highly significant (r(215) = 0.807,
p< .0001) and was significantly greater than Nunnallys(1978)
minimum criterion of 0.3 (95% CI from 0.755 to 0.848), which is
consistent with results reported by Lewis, Utesch, and Maher
(2015). For infrequently using participants, the correlation is
0.743; for frequently using participants, the correlation is 0.745,
whichismuchhigherthanthe general 0.3 (Nunnally, 1978).
Construct validity: Our KMO is 0.879, much higher than 0.7,
indicating that the Chinese SUS version we established has good
construct validity. Although the Varimax-rotated two-factor
result for the Chinese version (Table 8) is different from the
original versions factor analysis, recent studies have shown that
the original SUS factor may be only one (Lewis & Sauro, 2017b).
Figure 1. JD app homepage.
6Y. WANG ET AL.
5. Experiment II
For effective comparison against the literal translation from
Sauro and Lewis (2014) of SUS, we designed Experiment II
and re-tested the three most varied items, namely Item 6, Item
8, and Item 9. The reason why we did not choose the overall
questionnaire for retesting was because the other items of the
entire questionnaire did not significantly change. If measured
simultaneously, it is difficult to reduce the interference from
retest effect.
5.1. Participants
We recruited 151 native Chinese-speaking participants, 83
male and 68 female undergraduate and graduate students, to
conduct remote testing through recruitment or crowdsour-
cing platforms. For those who used the Internet survey, we
maintained educational background restrictions (Bachelors
degree or above) to ensure a consistent base-level understand-
ing of Chinese. We did not arrange for the recruited partici-
pants to test the JD app, but since the crowdsourcing platform
is anonymous, it was not clear whether any of the participants
had participated in Experiment I.
5.2. Experimental design
The experiment was conducted with a within-group design.
One participant completed the two parts of the questionnaire
at the same time. In keeping with Experiment I, we continued
to utilize a retrospective assessment of the JD app.
5.3. Method
In line with Experiment I, the questionnaire was completed
online. Once the participant was recruited, he or she scanned
a QR code in WeChat and was directed to the test page. The
participants then filled in the questionnaire online according
to the given instructions.
5.4. Materials
The evaluation material for the experiment was still the JD
app in this case, but the test materials are different from
Experiment I. The test material was divided into two parts.
The first part is the literal translation of SUS, which comes
from Measuring the user experience(Chinese version)
(Sauro & Lewis, 2014). Except for replacing the word system
with JD APP, we did not change the expression of the item.
The second part has three items. These three questions come
from Item 6, Item 8, and Item 9 in the final version of the
questionnaire (Table 7). The purpose of this design was to
compare changes in participantsevaluations before and after
cultural adaptation in the same group.
Since the experiment was conducted with a within-group
design, it was inevitably affected by retest effect. There are
three sources of retest effect (Arendasy & Sommer, 2017); on
a short questionnaire, it is likely a memory of the results of
the questionnaire. Retest effect can be minimized by using
alternative forms and increasing the time interval of the
survey (Arendasy & Sommer, 2017). Compared with the ori-
ginal item, our modified items were identical but there are
certain differences in the description; they meet the require-
ments of alternative forms. In order to increase the time
interval, after the subject completed the items of the literal
SUS translation, we inserted an intelligence test item as
a distraction. The second part of the questionnaire began
after this test item was completed.
5.5. Results
5.5.1. Basic statistics
The total score of the literal version is 70.54 (n = 151,
SD = 11.07), 95% CI (68.77, 72.33) which translates to the
CGS level as C (6571). Compared to the results of
Experiment I, the CGS difference spans two grades. The
score of the literal version is lower, but the difference in scores
is not significant (t (366) = 1.818, p= .07, p> .05). The
results of the basic statistics and the paired sample t-test
between the three items from the literal version and those of
our final version (Table 7) are shown in Table 9.
5.5.2. Reliability
The literal version has Cronbach alpha of 0.785, 95% CI
(0.730, 0.833). This is greater than the minimum acceptable
value of 0.7, and is not very different from other published
translations which have reliabilities that range from 0.79 to
0.84 (Lewis, 2018b)
5.5.3. Degree of correlation
The correlations between Items 6, 8, and 9 and the total score
on the literal version are 0.604, (95% CI from 0.456 to 0.713),
Table 8. Varimax-rotated two-factor solution, Chinese SUS version.
Component
Factor 1 Factor 2
Item 1 .643 .047
Item 2 .676 .384
Item 3 .715 .410
Item 4 .195 .746
Item 5 .761 .089
Item 6 .646 .311
Item 7 .551 .449
Item 8 .605 .491
Item 9 .415 .493
Item 10 .111 .807
Table 9. Comparison of three items: literal Chinese SUS versus final version (cross-cultural adaptation).
Literal translation version (SD) Final version (SD). t (df) Sig.(2-tailed)
Item 6 6.11 (2.01) 6.27 (2.04) 0.98 (150) 0.328
Item 8 7.00 (2.21) 7.04 (2.33) 0.22 (150) 0.826
Item 9 7.25 (1.57) 7.61 (1.64) 2.86 (150) 0.005
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION 7
0.756 (95% CI from 0.678 to 0.817), and 0.483 (95% CI from
0.312 to 0.615) respectively. In Experiment I, these correla-
tions are 0.683 (95% CI from 0.593 to 0.757), 0.760 (95%CI
from 0.682 to 0.815), and 0.633 (95% CI from 0.514 to 0.734),
respectively.
On Item 6, the correlation between the literal version and
the final version is 0.475, 95% CI (0.300, 0.635); the correla-
tion of Item 8 is 0.671, 95% CI (0.525, 0.778) and that of Item
9 is 0.527, 95% CI (0.384, 0.658). The internal correlation of
these three item pairs is consistent with the large sample
inter-item correlations published for other SUS versions
(Bangor et al., 2008; Lewis, 2018b).
6. Discussion
To the best of our knowledge, most user evaluation practi-
tioners in China currently utilize the literal translation of the
SUS or another version established without user feedback.
Given the large number of native Chinese users, our goal in
conducting this study was to establish an unambiguous ver-
sion of SUS that is understandable for native Chinese speak-
ers. We re-translated the Chinese version of SUS accordingly.
We translated Item 8 and Item 5 in a manner unlike any
previously published version, and we added a structured
interview after forward translation. This process is a cross-
cultural adaptation. Participants in the structured interviews
evaluated the translation results and reported any misunder-
standing of the expert translations, proffering alternative sug-
gestions. Structured interviews revealed different opinions
among our participants on Items 6 and 9, so we modified
the preliminary translation results according to the partici-
pantsstatements and SUS dimensions. The final Chinese SUS
version we secured has good reliability and validity. It is also
sensitive regardless of variations in usage frequency and has
high concurrent validity.
In contrast to the literally translated version of SUS, we set
up a comparative study in Experiment II. First, we ran
a comparison between the literal translation version and the
cross-cultural adaptation (the final version we established).
Data analysis shows that the reliability of the cross-cultural
adaptation is 0.84 (95% CI from 0.807 to 0.870), while the
reliability of the literal version is 0.785 (95% CI from 0.730 to
0.833). In effect, the reliability of the literal version is roughly
equal to the reliability of other translated versions (0.790.84),
but appears to be somewhat less reliable than the original
SUS, which usually exceeds 0.90. The cross-cultural adapta-
tion version had a reliability of 0.84. There is some overlap in
the 95% confidence interval by comparison with the literal
version (0.785). This suggests that although the reliability of
the cross-cultural adaptation version has been improved, the
difference between the two versions is not significant. This is
not surprising, as we altered only a few items rather than
redesigning the entire SUS. Of course, more samples and
verification of different kinds of systems are yet needed to
support this assertion.
In Experiment II, we directly compared the direct changes
of the three items. The mean and standard deviation of the
three items both increased. Although the difference is not
significant, we did find changes caused by the cross-cultural
adaptation. We also found that the correlations between three
items and the total score in the cross-cultural adaptation
version do differ compared with the same three items of the
literal version. On Experiment II, the correlations between the
three items and the total score are 0.604, 0.756, and 0.483,
respectively. On Experiment I, the correlations between the
three items processed by cultural adaptation and the total
score are 0.683, 0.760, and 0.633. The difference between
Item 9 was significant with no overlap in the 95% CI interval;
the differences between the other two items were not
significant.
The difference in total score between the literal version and
cross-cultural adaptation version is not significant, but the
same difference on the CGS scale spans two grade levels.
Cross-cultural adaptation does not serve to rewrite or under-
mine the previous translation, but to allow the participants to
understand it more easily. Assuming that the participants
understand the meaning of the items accurately, then if they
think the test material is easier to use, the SUS total score may
increase.
There is a possibility that the existence of social desirability
response bias, the tendency to respond in a way that avoids
criticism (Arnold & Feldman, 1981), as well as humility and
moderation in Chinese culture may have driven participants
to give milder answers to ambiguous questions. If there is no
strong emotional stimulation, participants tend to give
responses near to the zero or baseline (Verduyn & Lavrijsen,
2015). If the subject does not know the specific meaning of an
item, he or she will provide a conservative assessment to
subconsciously avoid making mistakes; thus, these assessment
results consistently approach an intermediate evaluation.
There is also a possibility that for an item that is ambig-
uous in a literal translation, the subject may have speculated
regarding the items meaning and responded based on the
speculation. In the structural interviews, we also found that
although the participants expressed ambiguity regarding some
of the items in the literal translation, except for Item 6 (a few
participantsguesses deviated from the SUS factor), each sub-
jects own guess or subjective interpretation is basically con-
sistent with the single SUS factor. Even if an item seemed
ambiguous on the direct translated version, the participant
could guess the general meaning of the item. Therefore, even
if such participants use the literally translated version of SUS,
their scores are not much different from the SUS scores that
have been cross-culturally adapted.
By comparing the average score and standard deviation of
the three items, we found that they all slightly increased; this
indicates that a more accurate expression led to a more accu-
rate participant evaluation. The scores of the experimental
materials were generally above 70 points, which reveals
a certain tendency compared with the intermediate state (50
points, all items, where participants selected the median
value). If the subject did not tend to give an overly positive
answer to an ambiguous question, the standard deviation of
the data decreased. If the presentation of the questionnaire is
very accurate, then the participants can confidently make
a biased judgment. This leads to an increase in data dispersion
and a higher score than the medium score. As a result, the
scores of the three items on our final version varied to greater
8Y. WANG ET AL.
extent than the scores of the corresponding items in the literal
version.
The paired sample t-test shows that Item 9 has significant
differences in between Experiment I and Experiment II, which
may be due to the role of social approval bias on this item.
This item seems to reflect the ability or confidence of the
participants to master the system. When expressed accurately,
the participants tend to report their achievements more posi-
tively resulting in an increase in the score. This may be
illustrated in the increase in the correlation between Item 9
and the total score.
We did not readily observe significant differences in the
two groups between the scores of the three items in the literal
translation and culturally adapted translation. There are two
possible reasons for this. First, it is possible that the inherent
usability of our experimental materials is not very high. The
SUS score is only 72. Even if the participants very confidently
make extreme judgments, the difference between the total
score and the intermediate score is relatively small. Second,
participants perform subjective analysis as they fill out ques-
tionnaires. Because speculations or subjective understanding
of the participants is roughly the same as the SUS factor, the
total score remains largely unchanged.
In general, it is unclear how precisely each subject treats
items that are ambiguous or unclear. In the structured inter-
views, because our participants were skilled users, they could
infer the meaning of ambiguous items relatively accurately. If
they are new users, or users who have never used a similar
questionnaire, their speculation may be less accurate.
Therefore, cross-cultural adaptation is necessary to minimize
speculation and improve the accuracy of the measurements.
7. Limitations and future work
We recruited relatively few participants for the purposes of
this study. In order to visually compare the changes before
and after cross-cultural adaptation, we adopted a within-
group design in Experiment II. This limitation was necessary
to avoid retest effect. We could not let the participants com-
plete two questionnaires, the literal translation and final ver-
sion, within in a short period of time. We also separated three
items, 6, 8, and 9, in Experiment II to minimize retest effect
and in doing so may have affected the results. We know that if
the sample size was larger, we could more accurately compare
the changes in SUS scores after cross-cultural adaptation.
A between-group design tests would further enhance the
results. In short, more research is yet needed.
We plan to encourage more Chinese users to adopt the
cross-cultural adaptation SUS to measure the usability of
products and systems. We will build a large sample database
of usability in the Chinese environment so that we can con-
tinue to study the factor structure of Chinese SUS versions. In
addition, we plan to study whether the number of items can
be further reduced on the Chinese SUS. In the English ver-
sion, it has been found that reducing one item does not affect
the reliability of the survey overall (Lewis & Sauro, 2017a). We
also will report translations of positive SUS versions and other
usability scales such as UMUX and UMUX-LITE in Chinese.
Though we used a literal translation in this study, such literal
versions require large samples to verify their reliability.
Other researchers (Lewis, 2018a; Lewis et al., 2015) have
reported a high correlation between UMUX-LITE and SUS.
Future work should include reporting correlations between
the cross-cultural adaptation SUS and other Chinese versions
of scales such as UMUX-LITE, and UMUX.
References
Alghannam,B.A.,Albustan,S.A.,Al-Hassan,A.A.,&
Albustan,L.A.(2018). Towards a standard arabic system usability
scale: Psychometric evaluation using communication disorder app.
International Journal of HumanComputer Interaction,34(9),
799804. doi:10.1080/10447318.2017.1388099
Arendasy, M. E., & Sommer, M. (2017). Reducing the effect size of the
retest effect: Examining different approaches. Intelligence,62,8998.
doi:10.1016/j.intell.2017.03.003
Arnold, H. J., & Feldman, D. C. (1981). Social desirability response bias
in self-report choice situations. The Academy of Management Journal,
24(2), 377385.
Assila, A., De Oliveira, K. M., & Ezzedine, H. (2016). Standardized
usability questionnaires: Features and quality focus. Electronic
Journal of Computer Science & Information Technology,6(1), 1531.
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical
evaluation of the System Usability Scale. International Journal of
HumanComputer Interaction,24,574594. doi:10.1080/
10447310802205776
Blažica, B., & Lewis, J. R. (2015). A slovene translation of the system
usability scale: The sus-si. International Journal of Human-Computer
Interaction,31(2), 112117. doi:10.1080/10447318.2014.986634
Borkowska, A., & Jach, K. (2017). Pre-testing of polish translation of
System Usability Scale (SUS). In Information systems architecture and
technology: Proceedings of 37th international conference on information
systems architecture and technology ISAT 2016, Karpacz, Poland
Part I (pp. 143153). Springer International.
Borsci, S., Federici, S., Bacci, S., Gnaldi, M., & Bartolucci, F. (2015).
Assessing user satisfaction in the era of user experience: Comparison
of the sus, umux, and umux-lite as a function of product experience.
International Journal of Human-Computer Interaction,31(8),
484495. doi:10.1080/10447318.2015.1064648
Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of
the system usability scale: A test of alternative measurement models.
Cogn Process,10(3), 193197. doi:10.1007/s10339-009-0268-9
Borsci, S., Federici, S., Mele, M. L., & Conti, M. (2015). Short scales of
satisfaction assessment: A proxy to involve disabled users in the usabil-
ity testing of websites. Human-Computer interaction: Users and con-
texts (pp. 3542). Cham: Springer International Publishing.
Brooke, J. (1996). SUS: A quick and dirtyusability scale. In P. Jordan,
B. Thomas, & B. Weerdmeester (Eds.), Usability evaluation in industry
(pp. 189194). London, UK: Taylor & Francis.
Brooke, J. (2013). SUS: A retrospective. Usability Professionals
Association,8(2), 2940.
Cao, S., Cao, J., Li, S., Wang, W., Qian, Q., & Ding, Y. (2018). Cross-
cultural adaptation and validation of the simplified chinese version of
copenhagen hip and groin outcome score (hagos) for total hip
arthroplasty. Journal of Orthopaedic Surgery and Research,13(1),
278. doi:10.1186/s13018-018-0971-2
Cardoso, M. H., & Capellini, S. A. (2018). Translation and cross-cultural
adaptation of the detailed assessment of speed of handwriting 17+ to
brazilian portuguese: Conceptual, item and semantic equivalence.
Codas,30(1), e20170041. doi:10.1590/2317-1782/20182017041
Chen, R., Hao, Y., Feng, L., Zhang, Y., & Huang, Z. (2011). The Chinese
version of the pediatric quality of life inventory(pedsql) family
impact module: Cross-cultural adaptation and psychometric evalua-
tion. Health & Quality of Life Outcomes,9(1), 16. doi:10.1186/1477-
7525-9-16
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION 9
Dianat, I., Ghanbari, Z., & Asgharijafarabadi, M. (2014). Psychometric
properties of the persian language version of the system usability scale.
Health Promotion Perspectives,4(1), 8289. doi:10.5681/hpp.2014.011
Erdinç, O., & Lewis, J. R. (2013). Psychometric evaluation of the t-csuq:
The turkish version of the computer system usability questionnaire.
International Journal of Human-Computer Interaction,29(5),
319326. doi:10.1080/10447318.2012.711702
Finstad, K. (2006). The system usability scale and non-native english
speakers. Journal of Usability Studies,1(4), 185188.
Flavián, C., Guinalíu, M., & Gurrea, R. (2006). The role played by
perceived usability, satisfaction and consumer trust on website
loyalty. Information and Management,43(1), 114. doi:10.1016/j.
im.2005.01.002
Gao, Q., Zhu, B., Rau, P. L. P., Vyas, S., Chen, C., & Li, H. (2013). User
experience with chinese handwriting input on touch-screen mobile
phones. Cross-Cultural design. Methods, practice, and case studies
(pp. 384392). Berlin, Heidelberg: Springer.
Herman, L. (1996). Towards effective usability evaluation in Asia:
Cross-Cultural differences. In Proceedings of the 6th Australian
Conference on ComputerHuman Interaction (OZCHI'96), Sydney,
Australia (pp. 135136). Washington, DC: IEEE Computer
Society.
Hilton, A., & Skrutkowski, M. (2002). Translating instruments into other
languages: Development and testing processes. Cancer Nursing,25(1),
17. doi:10.1097/00002820-200202000-00001
Jia, G., Zhou, J., Yang, P., Lin, C., Cao, X., Hu, H., Ning, G. (2013).
Integration of user centered design in the development of health mon-
itoring system for elderly. 35th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC).(pp.
17481751). Osaka, Japan: IEEE.
Konstantina, O., Nikolaos, T., & Christos, K. (2015). Perceived usability
evaluation of learning management systems: Empirical evaluation of
the system usability scale. The International Review of Research in
Open and Distributed Learning,16(2), 227246.
Kortum, P., & Sorber, M. (2015). Measuring the usability of mobile applica-
tions for phones and tablets. International Journal of Human-Computer
Interaction,31(8), 518529. doi:10.1080/10447318.2015.1064658
Kortum, P. T., & Bangor, A. (2013). Usability ratings for everyday
products measured with the system usability scale. International
Journal of Human-Computer Interaction,29(2), 6776. doi:10.1080/
10447318.2012.681221
Lei, J., Xu, L., Meng, Q., Zhang, J., & Gong, Y. (2014). The current status
of usability studies of information technologies in china: A systematic
study. BioMed Research International,2014, 568303.
Lewis, J. (2006). Usability testing. In: G. Salvendy (Ed.), Handbook of
human factors and ergonomics (pp. 12751316). Hoboken, NJ: John
Wiley and Sons.
Lewis, J., Utesch, B., & Maher, D. (2013). UMUX-LITE When theresno
time for the SUS. Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, Paris, France (pp. 20992102).
Lewis, J. R. (2018a). Measuring perceived usability: The csuq, sus, and
umux. International Journal of HumanComputer Interaction,34(12),
11481156. doi:10.1080/10447318.2017.1418805
Lewis, J. R. (2018b). The system usability scale: Past, present, and future.
International Journal of HumanComputer Interaction,34(7),
577590. doi:10.1080/10447318.2018.1455307
Lewis, J. R., & Sauro, J. (2017a). Can I leave this one out? The effect of
dropping an item from the sus. Journal of Usability Studies,13(1), 3846.
Lewis, J. R., & Sauro, J. (2017b). Revisiting the factor structure of system
usability scale. Journal of Usability Studies,12(4), 183192.
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2015). Investigating the
correspondence between umux-lite and sus scores. International con-
ference of design, user experience, and usability (pp. 204211). Los
Angeles, CA: Springer.
Li, F., & Li, Y. (2011). Usability evaluation of e-commerce on b2c
websites in china. Procedia Engineering,15, 52995304. doi:10.1016/
j.proeng.2011.08.982
Li, R., Chen, Y. V., Sha, C., & Lu, Z. (2017). Effects of interface layout on
the usability of in-vehicle information systems and driving safety.
Displays,49, 124132. doi:10.1016/j.displa.2017.07.008
Liu, Z. (2014). User experience in asia. Journal of Usability Studies,9(2),
4250.
Liu, Z., Zhang, J., Zhang, H., & Chen, J. (2011). Usability in China.InI.
Douglas & Z. Liu (Eds.), Global usability (pp. 111135). London, UK:
Springer.
Martinsa, A. I., Rosa, A. F., Queirós, A., & Silva, A. (2015). European
Portuguese validation of the System Usability Scale. Procedia
Computer Science,67, 293300. doi:10.1016/j.procs.2015.09.273
Mcgee, M., Rich, A., & Dumas, J. (2004). Understanding the usability
construct: User-perceived usbility. Human Factors & Ergonomics
Society Annual Meeting Proceedings,48(5), 907911. doi:10.1177/
154193120404800535
Mohamad, M. M., Yaacob, N. A., & Yaacob, N. M. (2018). Translation,
cross-cultural adaptation, and validation of the malay version of the
system usability scale questionnaire for the assessment of mobile apps.
Jmir Human Factors,5(2), e10308. doi:10.2196/10308
Noiwan, J., & Norcio, A. F. (2006). Cultural differences on attention and
perceived usability: Investigating color combinations of animated
graphics. International Journal of Human-Computer Studies,64(2),
103122. doi:10.1016/j.ijhcs.2005.06.004
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-
Hill.
Park, J., Han, S. H., Kim, H. K., Cho, Y., & Park, W. (2013). Developing
elements of user experience for mobile phones and services: Survey,
interview, and observation approaches. Human Factors in Ergonomics
& Manufacturing,23(4), 279293. doi:10.1002/hfm.20316
Pei, L., Xia, J., & Yan, J. (2010). Cross-cultural adaptation, reliability and
validity of the chinese version of the fear avoidance beliefs
questionnaire. Journal of International Medical Research,38(6),
19851996. doi:10.1177/147323001003800612
Rajanen, D., Clemmensen, T., Iivari, N., Inal, Y., Rızvanoğlu, K.,
Sivaji, A., Boison, D. (2017). Ux professionalsdefinitions of usabil-
ity and ux A comparison between turkey, finland, denmark, france
and malaysia. INTERACT 2017: Human-Computer Interaction
INTERACT 2017, Mumbai, India (pp 218239).
Rosa, A. F., Martins, A. I., Costa, V., Queirós, A., Silva, A., &
Rocha, N. P. (2015). European Portuguese validation of the
Post-Study System Usability Questionnaire (PSSUQ). 2015 10th
Iberian Conference on Information Systems and Technologies
(CISTI) (pp. 1720). Aveiro, Portugal: IEEE.
Rubin, J., & Chisnell, D. (2008). Handbook of usability testing: How to
plan, design, and conduct effective tests. Hoboken, NJ: Wiley.
Sauro, J. (2010). Does better usability increase customer loyalty? Retrieved
from http://www.measuringusability.com/usability-loyalty.php
Sauro, J., & Lewis, J. R. (2009, April 49). Correlations among prototypi-
cal usability metrics: Evidence for the construct of usability.
Proceedings of the 27th International Conference on Human Factors
in Computing Systems, CHI 2009, Boston, MA: ACM.
Sauro, J., & Lewis, J. R. (2011). When designing usability question-
naires, does it hurt to be positive? Proceedings of CHI 2011 (pp.
22152223). Vancouver, Canada: Association for Computing
Machinery.
Sauro, J., & Lewis, J. R. (2014). Quantifying the user experience. Beijing,
China: China Machine Press.
Sharfina, Z., & Santoso, H. B. (2017). An Indonesian adaptation of the
System Usability Scale (SUS). International Conference on Advanced
Computer Science & Information Systems (pp. 145148). Malang:
IEEE.
Shepard,L.A.,Camilli,G.,&Averill,M.(1981). Comparison of
procedures for detecting test-item bias with both internal and
external ability criteria. Journal of Educational Statistics,6(4),
317375.
Sheu, F., Fu, H., & Shih, M. (2017). Pre-Testing the Chinese version
of the System Usability Scale (C-SUS). Workshop Proceedings of
the 25th International Conference on Computers in Education
(pp. 2834). New Zealand: Asia-Pacific Society for Computers in
Education.
Tullis, T., & Albert, W. (2008). Measuring the user experience: Collecting,
analyzing, and presenting usability metrics (2nd ed.). Beijing, China:
Publishing House of Electronics Industry.
10 Y. WANG ET AL.
Tullis, T. S., & Stetson, J. N. (2004). Acomparisonofquestionnairesfor
assessing website usability. Paper presented at the Usability Professionals
Association Annual Conference (pp. 711). Minneapolis, MN: UPA.
Van de Vijver, A. J. R., & Rothmann, S. (2004). Assessment in multi-
cultural groups: The South African case. SA Journal of Industrial
Psychology,30(4), 17. doi:10.4102/sajip.v30i4.169
Van de vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of
comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey
(Eds.), Handbook of cross-cultural psychology (Vol. 1, 2nd ed., pp.
257300). Boston, MA: Allyn & Bacon.
Van de vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated
analysis of bias in cross-cultural assessment. European Journal of
Psychological Assessment,13,2937. doi:10.1027/1015-5759.13.1.29
Verduyn, P., & Lavrijsen, S. (2015). Which emotions last longest and
why: The role of event importance and rumination. Motivation and
Emotion,39(1), 119127. doi:10.1007/s11031-014-9445-y
Vermeeren, A. P. O. S., Law, L. C., Roto, V., Obrist, M., Hoonhout, J., &
VäänänenVainioMattila, K. (2010). User experience evaluation methods:
Current state and development needs.Nordic conference on human-computer
interaction (pp. 521530), Reykjavik, Iceland. New York, NY: ACM.
Vijver, F. V. D., & Tanzer, N. K. (2004). Bias and equivalence in
cross-cultural assessment: An overview. Revue Européenne De
Psychologie Appliquée/european Review of Applied Psychology,54(2),
119135. doi:10.1016/j.erap.2003.12.004
Wang, J. (2003). Human-computer interaction research and practice
in china. Interactions,10(2), 8896. doi:10.1145/637848
Wang, Y., & Lv, F. (2017). Usability testing of ATM machines interface
based on eye tracking data. Chinese Journal of Ergonomics,23(1),
4854.
Yan, Y., Wang, G., Liu, S., & Wu, H. (2012). Usability evaluation of
infusion pump based on system usability scale. China Medical Devices,
27(10), 2527.
Yang, T., Linder, J., & Bolchini, D. (2012). Deep: Design-oriented
evaluation of perceived usability. International Journal of Human-
Computer Interaction,28(5), 308346. doi:10.1080/10447318.
2011.586320
About the Authors
Yuhui Wang is a researcher of the Industrial Design Department,
Huazhong University of Science and Technology. His research interests
HCI design, usability assessment and product design.
Tian Lei is an associate professor at the School of Mechanical Science
and Engineering, Huazhong University of Science and Technology,
where he is Secretary. His research covers information visualization in
medicine and engineering, usability, and mobile HCI.
Xinxiong Liu is a Professor of Industrial Design Department, Huazhong
University of Science and Technology. His research interests usability
and product design.
INTERNATIONAL JOURNAL OF HUMANCOMPUTER INTERACTION 11
... 27 In this study, considering cross-cultural differences in mainland China, an adapted version of the SUS by Chinese researchers will be deployed, with formal permission granted, demonstrating a high Cronbach's alpha reliability of 0.84 and a sensitivity of 75.64. 28 This Chinese version of SUS holds 10 questions which are answered using a 5-point (1)(2)(3)(4)(5) Likert scale producing an overall score between 0 and 100, with higher scores indicating better usability and user experience. 28 Participants' satisfaction with the study intervention Participants will be requested to rate their satisfaction with this DT-PLB intervention after completing the 8-week intervention. ...
... 28 This Chinese version of SUS holds 10 questions which are answered using a 5-point (1)(2)(3)(4)(5) Likert scale producing an overall score between 0 and 100, with higher scores indicating better usability and user experience. 28 Participants' satisfaction with the study intervention Participants will be requested to rate their satisfaction with this DT-PLB intervention after completing the 8-week intervention. Only one question listed in the paper sheet will be asked about their satisfaction using a 10-point numeric rating scale, where '1' represents 'very dissatisfied' and '10' means 'very satisfied'. ...
Article
Full-text available
Introduction Effective chronic obstructive pulmonary disease (COPD) interventions require intensive and repetitive exercises, yet their monotonous nature can reduce adherence. Innovative rehabilitation devices that are safe, user-friendly, engaging and cost-effective are crucial. This study introduces a digital gamification-based approach to pursed lip breathing (PLB) exercises, guided by the Behaviour Change Wheel (BCW) framework. The digital platform transforms traditional PLB into an interactive and enjoyable experience, enhancing motivation and adherence. Using a pre-post study design, this feasibility trial aims to assess the safety, feasibility and acceptability of the digital gamification PLB intervention protocol driven by the BCW framework installed on WeChat (DT-PLB) for home-based COPD management. Methods and analysis The methodology of this study is divided into two phases. Phase 1 refers to the development of the DT-PLB system based on research evidence, behavioural analysis from the insight of the BCW and stakeholders’ perspectives, and phase 2 points to present the pre-post trial design for the DT-PLB system consisting of five smartphone-based software interface modules: Ranking, Report, Daily PLB Tasks, Social Community and Mine. Eligible patients with COPD will be recruited from a university hospital in Sichuan Province, Mainland China. The DT-PLB will be conducted in non-hospital settings for patients with COPD for 10 min per session, three times a day on a daily basis for 8 weeks. Data collection will be conducted at two time points: baseline and post-intervention. Demographic data (eg, age, gender and marital status) will be collected only at baseline. The primary outcome measures in this study will be a series of feasibility outcomes involving participant recruitment and completion of the DT-PLB intervention. Additionally, several clinical outcomes in terms of the effects of the DT-PLB intervention on dyspnoea, exercise capability, quality of life, and pulmonary function index will be evaluated as secondary outcomes. Ethics and dissemination This study has received Manchester Metropolitan University ethical approval (REC reference 56631) and the Affiliated Hospital of Southwest Medical University ethical approval (REC reference KY2023105). The findings from DT-PLB will be disseminated widely through peer-reviewed publications, scientific conferences and workshops. If successful, DT-PLB will be directly applied to the Affiliated Hospital of Southwest Medical University to manage PLB exercises. Trial registration number NCT06063733 .
... whether it is useful, efficient, and satisfactory). UMUX is consistent with the measurement of commonly used System Usability Scales (Wang, Lei, and Liu 2020). When raw scores were converted to usability grades using the Sauro-Lewis CGS scale, UMUX and SUS showed significant agreement in the assigned grades (Lewis 2018). ...
Article
Full-text available
Maps are increasingly designed to emphasise the aesthetic and emotional experience of the consumer. In this context, humanistic thematic maps play a key role in the cartographic industry. The key to playing these roles is the understanding and enhancement of user experience (UX), which directly affects the innovation of maps. This study explores the impact of maps in terms of usability, personal involvement, and aesthetics by surveying 128 users' preferences across various map designs. Three variables were examined: symbol dimensionality (2D vs. 2.5D), environmental references in the landscape, and users' familiarity with the campus cultural landscape. The findings reveal that 2D maps perform better regarding task completion speed and ease. Maps with environmental references are rated more highly for usability and aesthetic appeal; maps with 2.5D and environmental references also receive higher overall preference ratings. Additionally, users possessing greater familiarity with the campus cultural landscape exhibit more personal involvement.
... Quantitative data were collected from the participants through their responses in the Chinese version of the 10-item System Usability Scale (SUS) (Wang et al., 2019). SUS was used to measure the usability of the Caspar Health system, participants rating their level of agreement on a 5-point Likert scale ranging from 1 (fully disagree) to 5 (fully agree) (Lewis, 2018;Peres et al., 2013). ...
Article
Full-text available
Background Patients with post-acute COVID-19 syndrome, also referred to as “long COVID,” may face persistent physical, cognitive and psychosocial symptoms which can be challenging to manage given the strict social distancing measures imposed during the pandemic. Telerehabilitation (TR) became increasingly common during COVID-19 pandemic and has been applied to post-acute COVID-19 conditions in previous clinical studies, and it was reported that patients’ symptoms were alleviated and their overall health improved. This study examined the usability and acceptability of TR by occupational therapists delivered for patients suffering from post-acute COVID-19 in Hong Kong. Methods In this mixed-methods usability study, participants rated items on the System Usability Scale (SUS) and completed a semi-structured questionnaire via audio-recorded telephone calls. Descriptive data were used to summarize the quantitative data, and thematic analysis was applied to analyze the qualitative data. Results Twelve participants (mean age 56.5 years) who had completed a 6-week TR program via the Caspar Health system were recruited for the study. A median SUS score of 56.25 was reported for its usability, despite 83% of the participants viewed the TR system as fairly acceptable. Four themes, namely perception of using the TR system - performance expectancy of TR, other psychosocial and environmental factors, and intention to use TR, were generated on the basis of the participants’ interviews. Most participants reported their willingness to continue using TR and that they would recommend it to other patients. Conclusion Most of the participants were receptive to TR and perceived health benefits from its use. Future research could consider integrating the perspectives of both occupational therapists and patients to generate a more comprehensive understanding of the facilitators of and the barriers to TR for patients who experience long COVID.
... Portuguese validation of SUS are reported in two distinct studies [33,34]. Also other researchers from 30 different countries validated and used it in their own culture [35][36][37]. ...
Preprint
Full-text available
Background and aim: Considering the scope of the application of artificial intelligence beyond the field of computer science, one of the concerns of researchers is to provide quality explanations about the functioning of algorithms based on artificial intelligence and the data extracted from it. The purpose of the present study is to validate the Italian version of system causability scale (I-SCS) to measure the quality of explanations provided in a xAI. Method: For this purpose, the English version, initially provided in 2020 in coordination with the main developer, was utilized. The forward-backward translation method was applied to ensure accuracy. Finally, these nine steps were completed by calculating the content validity index/ratio and conducting cognitive interviews with representative end users. Results: The original version of the questionnaire consisted of 10 questions. However, based on the obtained indexes (CVR below 0.49), one question (Question 8) was entirely removed. After completing the aforementioned steps, the Italian version contained 9 questions. The representative sample of Italian end users fully comprehended the meaning and content of the questions in the Italian version. Conclusion: The Italian version obtained in this study can be used in future research studies as well as in the field by xAI developers. This tool can be used to measure the quality of explanations provided for an xAI system in Italian culture.
... The platform's effectiveness was assessed based on participants' task check-ins, diet and exercise records, reading educational articles, and health consultations. The Chinese version of the System Usability Scale (SUS) was used to evaluate their satisfaction with the platform [36]. The scale includes two dimensions, usability, and learnability, with 10 items. ...
Article
Full-text available
Objectives This study aimed to develop a mobile frailty management platform for Chinese community-dwelling older adults and evaluate its effectiveness, usability and safety. Methods Based on literature research, the research team combined the frailty cycle and integration models, self-determination theory, and technology acceptance models and determined the frailty interventions through expert discussion, then transformed it into multimedia resources, finally, engineers developed the mobile management platform. A cluster sampling, parallel, single-blind, controlled quasi-experimental trial was conducted. Sixty older adults from two community health service centers were recruited from March to August 2023. The control group received routine community care, while the intervention group used the mobile frailty management platform. The incidence of frailty, scores of quality of life, depression, sleep quality, and grip strength within 12 weeks were compared between the two groups, and the availability and safety of the platform were assessed. Results A total of 52 participants completed the study, 27 in the intervention group and 25 in the control group. At 12 weeks after the intervention, the frailty state of the intervention group was reversed to pre-frailty. There were no significant differences in the scores of quality of life, depression, sleep quality, and grip strength between the two groups before and 4 weeks after intervention. At 8 weeks and 12 weeks after the intervention, the quality of life, depression, and grip strength of the intervention group were improved with statistical significance (P < 0.05). Sleep quality was statistically significant only 12 weeks after the intervention (P < 0.05). System Usability Scale score for the platform was (87.96 ± 5.88), indicating a highly satisfactory user experience. Throughout the intervention, no adverse events were reported among the older adults. Conclusions The mobile frailty management platform effectively improved frailty status, depressive mood, sleep quality, grip strength, and quality of life for Chinese community-dwelling older adults. It holds clinical application value and is an effective tool for strengthening frailty management among Chinese community-dwelling older adults.
... However, as psychometric properties are sample dependent, it is essential to evaluate the psychometric properties when using patient-reported outcome measure in new settings or populations [12]. The SUS has been translated into numerous languages such as Chinese [13], Finnish [14], French [15], Hindi [15], Indonesian [16], and Polish [17]. It has undergone psychometric validation [18], including in Arabic [19], Danish [20], Dutch [21], German [22], Italian [23], Malay [24], Persian [25], Portuguese [26], Slovene [27], and Spanish [28]. ...
Article
Background The Swedish health care system is undergoing a transformation. eHealth technologies are increasingly being used. The System Usability Scale is a widely used tool, offering a standardized and reliable measure for assessing the usability of digital health solutions. However, despite the existence of several translations of the System Usability Scale into Swedish, none have undergone psychometric validation. This highlights the urgent need for a validated and standardized Swedish version of the System Usability Scale to ensure accurate and reliable usability evaluations. Objective The aim of the study was to translate and psychometrically evaluate a Swedish version of the System Usability Scale. Methods The study utilized a 2-phase design. The first phase translated the System Usability Scale into Swedish and the second phase tested the scale’s psychometric properties. A total of 62 participants generated a total of 82 measurements. Descriptive statistics were used to visualize participants’ characteristics. The psychometric evaluation consisted of data quality, scaling assumptions, and acceptability. Construct validity was evaluated by convergent validity, and reliability was evaluated by internal consistency. Results The Swedish version of the System Usability Scale demonstrated high conformity with the original version. The scale showed high internal consistency with a Cronbach α of .852 and corrected item-total correlations ranging from 0.454 to 0.731. The construct validity was supported by a significant positive correlation between the System Usability Scale and domain 5 of the eHealth Literacy Questionnaire ( P =.001). Conclusions The Swedish version of the System Usability Scale demonstrated satisfactory psychometric properties. It can be recommended for use in a Swedish context. The positive correlation with domain 5 of the eHealth Literacy Questionnaire further supports the construct validity of the Swedish version of the System Usability Scale, affirming its suitability for evaluating digital health solutions. Additional tests of the Swedish version of the System Usability Scale, for example, in the evaluation of more complex eHealth technology, would further validate the scale.
... Follow-up phone calls will be conducted 2 and 4 weeks after intervention to gather feedback on the prototype and complete brief questionnaire surveys (System Usability Scale). 13,14 The study will include a 2-month internal pilot RCT, for which roughly 42 patients with COPD (10% of the total sample size) 15 will be recruited from several hospitals in Beijing. The aim of this pilot is to assess the success of participant recruitment and retention, as well as the utilization of EmoEase. ...
Article
Full-text available
Background Mental health problems in patients with chronic obstructive pulmonary disease (COPD) are common and frequently neglected. Digital psychological interventions may reduce mental health problems, but their effectiveness has not been evaluated in the Chinese COPD population. In this study, we will develop an integrated digital psychological intervention (EmoEase) and evaluate its effectiveness and cost-effectiveness in enhancing the mental wellbeing of patients with COPD in China. Methods This study is a multicenter, two-arm, randomized controlled trial (RCT) with a parallel-group design to enroll at least 420 patients with COPD with age over 35 years. Participants will be assigned to receive either usual care (control group) or usual care + EmoEase (intervention group). Assessments will take place at baseline (T0) and 4 weeks (T1), 8 weeks (T2), and 16 weeks (T3) after baseline, and participants will be asked to complete questionnaires and physical measurements. The primary outcome measure will assess mental wellbeing using the Warwick Edinburgh Mental Wellbeing Scale (WEMWBS). Secondary outcome measures will assess mental health, physical health, COPD symptoms, health risk behaviors, socioeconomic indicators, and healthcare utilization and expenditure. Analyses will utilize an intention-to-treat approach. Discussion This is the first RCT to examine the value of EmoEase, a novel digital psychological intervention for patients with COPD. If this intervention is effective and cost-effective, it could be rapidly scaled up to provide mental healthcare for patients with COPD in China. Trial registration ClinicalTrials.gov Identifier: NCT06026709. Date of first submission: 30 August 2023. https://clinicaltrials.gov/study/NCT06026709
Article
Full-text available
Purpose Individualized anticoagulation therapy is a major challenge for patients after heart valve replacement. Mobile applications assisted by Artificial intelligence (AI) have great potential to meet the individual needs of patients. The study aimed to develop an AI technology-assisted mobile application (app) for anticoagulation management, understand patients’ acceptance of such applications, and determine its feasibility. Methods After using the mobile application for anticoagulation management for 2 weeks, patients, doctors, and nurses rated its usability using the System Usability Scale (SUS). Additionally, semi-structured interviews were conducted with some patients, doctors, and nurses to gain insights about their thoughts and suggestions regarding the procedure. Results The study comprised 80 participants, including 38 patients, 18 doctors, and 24 nurses. The average SUS score for patients was 82.37±5.45; for doctors, it was 84.17±5.82; and for nurses, it was 81.88±6.44. This means the patients, physicians, and nurses rated the app highly usable. Semi-structured interviews were conducted on the app’s usability with 18 participants (six nurses, three physicians, and nine patients). The interview results revealed that patients found the application of anticoagulation management simple and convenient, with high expectations for a precise dosage recommendation of anticoagulant drugs. Some patients expressed concerns regarding personal information security. Both doctors and nurses believed that elderly patients needed assistance from young family members to use the app and that it could improve patients’ anticoagulant self-management ability. Some nurses also mentioned that the use of the app brought great convenience for transitional care. Conclusion This study confirmed the feasibility of using an AI technology-assisted mobile application for anticoagulation management in patients after heart valve replacement. To further develop this application, challenges lie in continuously improving the accuracy of recommended drug doses, obtaining family support, and ensuring information security.
Article
Full-text available
Background To translate and cross-culturally adapt the Copenhagen Hip and Groin Outcome Score (HAGOS) into a Simplified Chinese version (HAGOS-C) and evaluate the reliability, validity, and responsiveness of the HAGOS-C in total hip arthroplasty (THA) patients. Methods The cross-cultural adaptation was performed according to the internationally recognized guidelines of the American Academy of Orthopaedic Surgeons Outcome Committee. A total of 192 participants were recruited in this study. The intra-class correlation coefficient (ICC) was used to determine reliability. Construct validity was analyzed by evaluating the correlations between HAGOS-C and EuroQoL 5-dimension (EQ-5D), as well as the short form (36) health survey (SF-36). Responsiveness of HAGOS-C was evaluated according to standard response means (SRM) and standard effect size (ES) between the first test and the third test (6 months after primary THA). Results The original version of the HAGOS was well cross-culturally adapted and translated into Simplified Chinese. HAGOS-C was indicated to have excellent reliability (ICC = 0.748–0.936, Cronbach’s alpha = 0.787–0.886). Moderate to substantial correlations between subscales of HAGOS-C and EQ-5D (r = 0.544–0.751, p < 0.001), as well as physical function (r = 0.567–0.640, p < 0.001), role physical (r = 0.570–0.613, p < 0.001), bodily pain (r = 0.467–0.604, p < 0.001), and general health (r = 0.387–0.432, p < 0.001) subscales of SF-36, were observed. The ES of 0.805–1.100 and SRM of 1.408–2.067 revealed high responsiveness of HAGOS-C. Conclusions HAGOS-C was demonstrated to have excellent acceptability, reliability, validity, and responsiveness in THA, which could be recommended for patients in mainland China.
Article
Full-text available
Background: A mobile app is a programmed system designed to be used by a target user on a mobile device. The usability of such a system refers not only to the extent to which product can be used to achieve the task that it was designed for, but also its effectiveness and efficiency, as well as user satisfaction. The System Usability Scale is one of the most commonly used questionnaires used to assess the usability of a system. The original 10-item version of System Usability Scale was developed in English and thus needs to be adapted into local languages to assess the usability of a mobile apps developed in other languages. Objective: The aim of this study is to translate and validate (with cross-cultural adaptation) the English System Usability Scale questionnaire into Malay, the main language spoken in Malaysia. The development of a translated version will allow the usability of mobile apps to be assessed in Malay. Methods: Forward and backward translation of the questionnaire was conducted by groups of Malay native speakers who spoke English as their second language. The final version was obtained after reconciliation and cross-cultural adaptation. The content of the Malay System Usability Scale questionnaire for mobile apps was validated by 10 experts in mobile app development. The efficacy of the questionnaire was further probed by testing the face validity on 10 mobile phone users, followed by reliability testing involving 54 mobile phone users. Results: The content validity index was determined to be 0.91, indicating good relevancy of the 10 items used to assess the usability of a mobile app. Calculation of the face validity index resulted in a value of 0.94, therefore indicating that the questionnaire was easily understood by the users. Reliability testing showed a Cronbach alpha value of .85 (95% CI 0.79-0.91) indicating that the translated System Usability Scale questionnaire is a reliable tool for the assessment of usability of a mobile app. Conclusions: The Malay System Usability Scale questionnaire is a valid and reliable tool to assess the usability of mobile app in Malaysia.
Article
Full-text available
The System Usability Scale (SUS) is the most widely used standardized questionnaire for the assessment of perceived usability. This review of the SUS covers its early history from inception in the 1980s through recent research and its future prospects. From relatively inauspicious beginnings, when its originator described it as a “quick and dirty usability scale,” it has proven to be quick but not “dirty.” It is likely that the SUS will continue to be a popular measurement of perceived usability for the foreseeable future. When researchers and practitioners need a measure of perceived usability, they should strongly consider using the SUS.
Article
Full-text available
Objetivo Realizar a adaptação transcultural do Detailed Assessment of Speed of Handwriting 17+ (DASH 17+) para brasileiros. Método (1) Avaliação de equivalências conceitual e de itens e (2) Avaliação da equivalência semântica, sendo necessários quatro tradutores e aplicação do estudo piloto em 36 estudantes. Resultados (1) Os conceitos e os itens são equivalentes na cultura britânica e brasileira. (2) Foram realizadas adaptações quanto à frase classificada como pangrama na língua inglesa utilizada nas tarefas de cópia e a escolha pela letra cursiva minúscula na tarefa de escrita do alfabeto. Com a aplicação do pré-teste, constatou-se aceitabilidade e compreensão dos estudantes nas tarefas propostas. Conclusão Com a finalização da Equivalência conceitual, de itens e semântica do DASH 17+, apresentou-se a versão em português brasileiro. Como continuidade, novos estudos sobre as propriedades psicométricas devem ser realizados, a fim de mensurar a velocidade de escrita de jovens e adultos com maior confiabilidade e validade ao procedimento.
Article
Full-text available
The primary purpose of this research was to investigate the relationship between two widely used questionnaires designed to measure perceived usability: the Computer System Usability Questionnaire (CSUQ) and the System Usability Scale (SUS). The correlation between concurrently collected CSUQ and SUS scores was 0.76 (over 50% shared variance). After converting CSUQ scores to a 0–100-point scale (to match the range of the SUS scores), there was a small but statistically significant difference between CSUQ and SUS means. Although this difference (just under 2 scale points out of a possible 100) was statistically significant, it did not appear to be practically significant. Although usability practitioners should be cautious pending additional independent replication, it appears that CSUQ scores, after conversion to a 0–100-point scale, can be interpreted with the Sauro–Lewis curved grading scale. As a secondary research goal, investigation of variations of the Usability Metric for User Experience (UMUX) replicated previous findings that the regression-adjusted version of the UMUX-LITE (UMUX-LITEr) had the closest correspondence with concurrently collected SUS scores. Thus, even though these three standardized questionnaires were independently developed and have different item content and formats, they largely appear to be measuring the same thing, presumably, perceived usability.
Article
Full-text available
There are times when user experience practitioners might consider using the System Usability Scale (SUS), but there is an item that just doesn't work in their context of measurement. For example, the first item is "I think I would like to use this system frequently." If the system under study is one that would only be used infrequently, then there is a concern that including this item would distort the scores, or at best, distract the participant. The results of the current research show that the mean scores of all 10 possible nine-item variants of the SUS are within one point (out of a hundred) of the mean of the standard SUS. Thus, practitioners can leave out any one of the SUS items without having a practically significant effect on the resulting scores, as long as an appropriate adjustment is made to the multiplier (specifically, multiply the sum of the adjusted item scores by 100/36 instead of the standard 100/40, or 2.5, to compensate for the dropped item).
Conference Paper
Given many advancement in technology, information & communication technology (ICT) in education for enhancing effectiveness of teaching and learning has become a widely applied and discussed area. Usability is central for the success of any instructional design product or learning materials, including any educational websites, learning management system, mobile devices, and wearable technology. The System Usability Scale (SUS) is one of the commonly used questionnaires for usability rating. Objectives: With the increasing interest in usability studies and user experience research, there is a need to officially translate it into Chinese and also to validate the translation. The aim of this paper is to describe the process of translating the original System Usability Scale (SUS) from English into Chinese (C-SUS), and to evaluate its reliability and validity in the college students. Methods: This study consisted of two phases. In phase one, the SUS was translated into Chinese by a group of translators and experts in education using Brislin’s (1970, 1986) translation and back-translation method. Both semantic equivalence and content validity were assessed. In the second phase, the psychometric properties of the C-SUS were tested with two studies and with convenience samples of 125 (study 1) and 104 (study 2) college students recruited from a private university in southern Taiwan. Reliability was assessed by internal consistency and construct validity was tested using exploratory factor analysis. Data analyses was performed using SPSS 23.0 to assess reliability and validity. Results: The sematic equivalence and content validity index of the Chinese version of SUS were satisfactory. Results also indicated that the Chinese version had a high level of equivalence with the original English version and demonstrated a high internal consistency. Exploratory factor analysis revealed the presence of two factors supporting the conceptual dimension of the original instrument. Conclusion: The study provides initial psychometric properties of the Chinese version of the SUS and supports it as a reliable and valid instrument to measure usability for design products and services for Chinese speaking individuals.
Article
This research aims to customize the System Usability Scale (SUS) so it can be used as a standard usability measure on native Arabic-language speakers. A process of rigor translation followed by psychometric evaluation has been administered through a questionnaire in the Arabic-language that reflects a reasonable level of reliability, validity and sensitivity when compared to the original English-language SUS questionnaire. The final result of the Arabic-System Usability Scale (A-SUS) is then applied on students majoring in Communication Disorders Sciences (Kuwait University) to measure the usability of a mobile application. A-SUS encapsulates the essence of the original SUS and provides an adequate tool for professionals to use with Arabic native speaking users to evaluate systems usability.